Published Wed Feb 25 2026 08:00:00 GMT+0800 (中国标准时间)
deep-diveFastGPTRAG
Tuning FastGPT KBs — from 0.62 to 0.89 accuracy, step by step
FastGPT defaults give ~60% retrieval accuracy. Tuned right it reaches ~90%. Step-by-step measurements for chunking, augmentation, reranking.
Baseline#
- Data: 3,500 e-commerce FAQs + 200 product docs
- Test set: 150 real user questions
- Metric: MRR@5
Starting point — defaults#
bge-m3 embedding, paragraph chunking ~500 chars, top_k=5.
MRR@5 = 0.62 — usable but users complained “answers feel off.”
Step 1 — chunking#
| Strategy | MRR@5 | Delta |
|---|---|---|
| Paragraph (default) | 0.62 | — |
| H2/H3 heading | 0.71 | +9 |
| QA split (FastGPT built-in) | 0.79 | +17 |
| Parent-child (small for retrieval, large for context) | 0.83 | +21 |
Key: FastGPT’s QA-split feature, where an LLM rewrites long passages into Q-A pairs and indexes them separately, is best-in-class.
Step 2 — question augmentation#
FastGPT can auto-generate 2-3 question variants per chunk at index time.
| State | MRR@5 |
|---|---|
| Off | 0.83 |
| On (2 variants/chunk) | 0.86 |
Cost: 3-5× index time, but it’s one-off.
Step 3 — reranker#
Rerank top-5 with bge-reranker-v2-m3.
| Config | MRR@5 | Latency Δ |
|---|---|---|
| No reranker | 0.86 | 0 |
| + bge-reranker-v2-m3 | 0.89 | +150ms |
150ms for +3 points — worth it in nearly every support case.
Step 4 — post-processing#
| Post-process | MRR@5 |
|---|---|
| None | 0.89 |
| Dedup (cosine > 0.95) | 0.89 |
| Enforce ≥1 “answer-typed” chunk in top-k | 0.91 |
Step 5 — prompt last mile#
You are [Brand]'s support assistant. Answer using only the reference material below.
Rules:
1. If insufficient, reply "I don't have that info, handing off to a human."
2. Do not invent numbers or dates.
3. Cite chunk_id alongside answers.
Reference: {{context}}
Question: {{question}}
| Prompt | Faithfulness |
|---|---|
| Default | 0.81 |
| Strict + cite chunk_id | 0.94 |
Cumulative gain#
| Stage | MRR@5 | Faithfulness | Cost |
|---|---|---|---|
| Baseline | 0.62 | 0.78 | — |
| + QA split | 0.79 | 0.81 | Index time +50% |
| + Parent-child | 0.83 | 0.81 | Index time +30% |
| + Question aug. | 0.86 | 0.82 | Index time +200% |
| + Reranker | 0.89 | 0.85 | Query +150ms |
| + Prompt | 0.89 | 0.94 | 0 |
Things that did NOT help#
- Swapping embedding to Conan-embedding (+0.005, noise)
- Bumping context window from 5 to 10 turns (irrelevant for single-shot retrieval)
- Higher LLM temperature for “richer” answers (hurts faithfulness)