Published Wed Feb 04 2026 08:00:00 GMT+0800 (中国标准时间)
deep-diveRAGFlowDifycomparison
RAGFlow vs Dify KBs — head-to-head on one complex PDF
A 30-page product manual with tables, two-column layout and scanned pages. Indexed with both. Top-5 accuracy and answer quality compared.
Test material#
A real electronics manual PDF:
- 30 pages
- 12 tables (specs, comparisons, error codes)
- 6 pages of two-column layout
- 2 scanned pages (appendix from older edition)
Test questions (30)#
- “What’s the max output power of model X-200?” (table)
- “How do I handle error code E04?” (table)
- “What’s the temperature limit mentioned in the two-column section?” (layout)
- “What safety rules are in appendix B (scanned)?” (OCR)
Results#
| Metric | Dify default | Dify tuned | RAGFlow default | RAGFlow tuned |
|---|---|---|---|---|
| Tables top-1 | 6/12 | 9/12 | 11/12 | 12/12 |
| Two-column top-1 | 2/6 | 4/6 | 6/6 | 6/6 |
| Scanned top-1 | 0/2 | 0/2 | 2/2 | 2/2 |
| Plain paragraphs top-1 | 9/10 | 10/10 | 9/10 | 10/10 |
| Overall | 17/30 | 23/30 | 28/30 | 30/30 |
Why RAGFlow lands the haymaker on complex docs#
- DeepDoc parsing identifies title / paragraph / table / figure layouts before chunking
- Tables preserve structure — header + cells stay 2D; an entire table is one chunk
- Built-in OCR for scans
- Multi-route recall — vector + BM25 + knowledge graph, fused
Why Dify is still fine for plain paragraphs#
Dify’s strengths are API consistency and ecosystem. If your KB is 90% Markdown / Notion exports without layout, Dify suffices and is lighter.
Picking#
| Your docs are mostly | Pick |
|---|---|
| Markdown / Notion / web | Dify |
| API reference / simple PDFs | Dify |
| Product manuals with many tables | RAGFlow |
| Scanned / legacy docs | RAGFlow |
| Mixed, max-quality goal | RAGFlow as retrieval backend + Dify as app layer |
Combo architecture (production-recommended)#
Dify governs workflow + multi-LLM. RAGFlow does what it does best — complex parsing + multi-route recall.