Tuning FastGPT KBs — from 0.62 to 0.89 accuracy, step by step

FastGPT defaults give ~60% retrieval accuracy. Tuned right it reaches ~90%. Step-by-step measurements for chunking, augmentation, reranking.

Baseline#

Data: 3,500 e-commerce FAQs + 200 product docs
Test set: 150 real user questions
Metric: MRR@5

Starting point — defaults#

bge-m3 embedding, paragraph chunking ~500 chars, top_k=5.

MRR@5 = 0.62 — usable but users complained “answers feel off.”

Step 1 — chunking#

Strategy	MRR@5	Delta
Paragraph (default)	0.62	—
H2/H3 heading	0.71	+9
QA split (FastGPT built-in)	0.79	+17
Parent-child (small for retrieval, large for context)	0.83	+21

Key: FastGPT’s QA-split feature, where an LLM rewrites long passages into Q-A pairs and indexes them separately, is best-in-class.

Step 2 — question augmentation#

FastGPT can auto-generate 2-3 question variants per chunk at index time.

State	MRR@5
Off	0.83
On (2 variants/chunk)	0.86

Cost: 3-5× index time, but it’s one-off.

Step 3 — reranker#

Rerank top-5 with bge-reranker-v2-m3.

Config	MRR@5	Latency Δ
No reranker	0.86	0
+ bge-reranker-v2-m3	0.89	+150ms

150ms for +3 points — worth it in nearly every support case.

Step 4 — post-processing#

Post-process	MRR@5
None	0.89
Dedup (cosine > 0.95)	0.89
Enforce ≥1 “answer-typed” chunk in top-k	0.91

Step 5 — prompt last mile#

You are [Brand]'s support assistant. Answer using only the reference material below.
Rules:
1. If insufficient, reply "I don't have that info, handing off to a human."
2. Do not invent numbers or dates.
3. Cite chunk_id alongside answers.

Reference: {{context}}
Question: {{question}}

Prompt	Faithfulness
Default	0.81
Strict + cite chunk_id	0.94

Cumulative gain#

Stage	MRR@5	Faithfulness	Cost
Baseline	0.62	0.78	—
+ QA split	0.79	0.81	Index time +50%
+ Parent-child	0.83	0.81	Index time +30%
+ Question aug.	0.86	0.82	Index time +200%
+ Reranker	0.89	0.85	Query +150ms
+ Prompt	0.89	0.94	0

Things that did NOT help#

Swapping embedding to Conan-embedding (+0.005, noise)
Bumping context window from 5 to 10 turns (irrelevant for single-shot retrieval)
Higher LLM temperature for “richer” answers (hurts faithfulness)