flag92 flag92
Blog
Published Wed Feb 25 2026 08:00:00 GMT+0800 (中国标准时间)
deep-diveFastGPTRAG

Tuning FastGPT KBs — from 0.62 to 0.89 accuracy, step by step

FastGPT defaults give ~60% retrieval accuracy. Tuned right it reaches ~90%. Step-by-step measurements for chunking, augmentation, reranking.

Baseline#

  • Data: 3,500 e-commerce FAQs + 200 product docs
  • Test set: 150 real user questions
  • Metric: MRR@5

Starting point — defaults#

bge-m3 embedding, paragraph chunking ~500 chars, top_k=5.

MRR@5 = 0.62 — usable but users complained “answers feel off.”

Step 1 — chunking#

StrategyMRR@5Delta
Paragraph (default)0.62
H2/H3 heading0.71+9
QA split (FastGPT built-in)0.79+17
Parent-child (small for retrieval, large for context)0.83+21

Key: FastGPT’s QA-split feature, where an LLM rewrites long passages into Q-A pairs and indexes them separately, is best-in-class.

Step 2 — question augmentation#

FastGPT can auto-generate 2-3 question variants per chunk at index time.

StateMRR@5
Off0.83
On (2 variants/chunk)0.86

Cost: 3-5× index time, but it’s one-off.

Step 3 — reranker#

Rerank top-5 with bge-reranker-v2-m3.

ConfigMRR@5Latency Δ
No reranker0.860
+ bge-reranker-v2-m30.89+150ms

150ms for +3 points — worth it in nearly every support case.

Step 4 — post-processing#

Post-processMRR@5
None0.89
Dedup (cosine > 0.95)0.89
Enforce ≥1 “answer-typed” chunk in top-k0.91

Step 5 — prompt last mile#

You are [Brand]'s support assistant. Answer using only the reference material below.
Rules:
1. If insufficient, reply "I don't have that info, handing off to a human."
2. Do not invent numbers or dates.
3. Cite chunk_id alongside answers.

Reference: {{context}}
Question: {{question}}
PromptFaithfulness
Default0.81
Strict + cite chunk_id0.94

Cumulative gain#

StageMRR@5FaithfulnessCost
Baseline0.620.78
+ QA split0.790.81Index time +50%
+ Parent-child0.830.81Index time +30%
+ Question aug.0.860.82Index time +200%
+ Reranker0.890.85Query +150ms
+ Prompt0.890.940

Things that did NOT help#

  • Swapping embedding to Conan-embedding (+0.005, noise)
  • Bumping context window from 5 to 10 turns (irrelevant for single-shot retrieval)
  • Higher LLM temperature for “richer” answers (hurts faithfulness)

Search

Press ⌘ K to open