flag92 flag92
All tags
benchmark

Tag: benchmark

3 posts tagged "benchmark".

Wed Apr 08 2026 08:00:00 GMT+0800 (中国标准时间)

How to evaluate your RAG quality — 5 metrics, 3 toolkits

"Feels better than last week" is not measurement. Five quantitative RAG metrics and three open-source evaluators.

Tue May 12 2026 08:00:00 GMT+0800 (中国标准时间)

2026 Chinese embedding benchmark — bge-m3, Conan, m3e, bce, OpenAI

Same Chinese KB, same real support questions — five embedding models compared on retrieval accuracy, speed and cost.

Fri May 08 2026 08:00:00 GMT+0800 (中国标准时间)

2026 Chinese-support LLM bake-off — Qwen, DeepSeek, GLM, Doubao, ERNIE

Same support prompt and knowledge base — which of the five China-trained LLMs ships the best AI support? 200 real questions decide.

Search

Press ⌘ K to open