notesum.ai
Published at December 3OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
cs.CV
Released Date: December 3, 2024
Authors: Junyuan Zhang, Qintong Zhang, Bin Wang, Linke Ouyang, Zichen Wen, Ying Li, Ka-Ho Chow, Conghui He, Wentao Zhang

| OCR | Retrieval | Generation | Overall | ||||
| Edit Distance | LCS@1 | LCS@5 | EM | F1 | EM@1 | F1@1 | |
| Ground Truth | - | 63.53 | 86.22 | 33.54 | 50.19 | 26.42 | 39.77 |
| Pipeline-based OCR | |||||||
| MinerU [32] | 0.2328 | 52.53 | 73.61 | 30.50 | 46.08 | 24.52 | 36.84 |
| Marker [25] | 0.2621 | 56.94 | 78.53 | 30.08 | 46.02 | 23.89 | 36.51 |
| End-to-end OCR | |||||||
| GOT [34] | 0.2884 | 45.80 | 67.06 | 26.36 | 40.62 | 21.51 | 32.69 |
| Nougat [2] | 0.3303 | 44.77 | 61.46 | 24.81 | 37.94 | 20.40 | 30.89 |
| Vision-Language Model for OCR | |||||||
| Qwen2-VL-72B [33] | 0.2564 | 53.16 | 72.97 | 26.72 | 41.23 | 23.45 | 35.91 |
| InternVL2-Llama3-76B [5] | 0.4450 | 42.43 | 57.51 | 20.74 | 32.89 | 20.58 | 31.23 |