Published Wed Apr 29 2026 08:00:00 GMT+0800 (中国标准时间)
opsKPImetrics
AI support KPI framework — how to prove the AI actually saved money
Don't measure deflection alone — here's a 4-layer 12-metric framework covering experience, cost and training.
Why deflection alone is wrong#
Teams ship AI support and brag “60% deflection.” Leadership likes it for three months, then asks:
- How much of that 60% is “user gave up” versus “actually resolved”?
- Did human agent count actually drop?
- Did CSAT rise or fall?
You need a layered framework.
4 layers, 12 metrics#
Layer 1 — coverage#
| Metric | Definition | Target |
|---|---|---|
| AI touch rate | Conversations AI participated in / total | > 95% |
| First response time (FRT) | User first message → AI first reply | < 3s |
| Average handling time (AHT) | Start → close | Drops month-over-month |
Layer 2 — resolution quality#
| Metric | Definition | Target |
|---|---|---|
| AI deflection rate | No human handoff + no return visit / total | 45–70% |
| Human handoff rate | AI- or user-initiated handoffs / total | < 35% |
| 24h return rate | Same user re-asks same issue within 24h | < 10% |
Layer 3 — experience#
| Metric | Definition | Target |
|---|---|---|
| CSAT (AI leg) | 1–5 rating of the AI’s replies | > 4.0 |
| CSAT (human leg) | 1–5 rating of human agents | > 4.5 |
| Sentiment score | LLM-scored negative sentiment | Drops month-over-month |
Layer 4 — cost and training#
| Metric | Definition | Target |
|---|---|---|
| Cost per conversation | Tokens + infra + labor / conversations | Drops month-over-month |
| KB hit rate | RAG top-1 hits ground truth | > 80% |
| Human-edit feedback loop | Human-edited AI replies fed back into KB | > 50% |
Dashboards#
- Grafana for real-time (FRT, QPS, error rate)
- Metabase for weekly (CSAT, deflection, cost trend)
- Notion / Lark sheet for monthly business review
Three common traps#
- Faking deflection: counting “user closed in frustration” as “resolved.” Fix: always pair deflection with 24h return rate
- Inflated CSAT: prompting ratings only on positive endings. Fix: force rating prompts on all closures
- Hidden LLM cost: finance hides LLM tokens under “cloud services” — nobody reviews. Fix: book LLM cost separately