Published Fri Apr 24 2026 08:00:00 GMT+0800 (中国标准时间)

case studyeducationpromo dayload test

Case · An online-edu promo day with 50× traffic — engineering notes

A coding-education company handled 120k conversations in 24 hours during 618 sale, AI deflection 81%. How engineering pre-tested, rate-limited, degraded gracefully.

Background#

Business: adult online coding education
Avg monthly conversations: ~25k
618 promo day: ~120k (48× normal)
Team: 12 support agents + 2 engineers
Stack: Chatwoot + Dify + FastGPT + n8n (an instance of the education industry playbook)

System runs fine normally; promo days blow up. Past lessons:

2024 June: support system down 4 hours
2025 Nov: Dify retrieval latency 200ms → 8s
2025 June: human agents pile up; avg response 40 minutes

For the first AI-powered promo day, targets:

AI deflection > 70%
First response stable < 5s
Zero downtime
Human handoff < 30%

6-week prep#

Week 1 — baseline load test#

Locust simulating users:

Concurrent	Holds?
500	✓
1,000	✓
2,000	⚠ FRT 5s
5,000	✗ Dify worker OOM
10,000	✗ Postgres connections exhausted

Target: stable at 10,000 concurrent.

Week 2 — infra scale#

Chatwoot: 1 node → 3 nodes (K8s HPA)
Dify workers: 2 → 12 replicas
Postgres: 8C/16G + PgBouncer
Redis: 16GB
Front with Cloudflare (CDN + Turnstile)

Week 3 — KB refactor#

Normal KB is 800+ FAQs; promo introduces “flash sales,” “group buying,” “promo codes.”

Added 200+ promo-specific FAQs
Eval-set regression: MRR@5 0.87 → 0.91
Add “no match → template reply + handoff” fallback

Week 4 — degradation plan#

3-tier degradation:

Trigger	Action
Dify QPS > 80	Switch to DeepSeek-V3 (cheaper than Qwen-72B)
Dify QPS > 150	Enable hot-FAQ cache (Redis top-100)
Dify QPS > 250	Fall back to AnythingLLM (slightly lower quality, never down)

Each tier has automatic alert + human-confirmed activation.

Week 5 — rate limits + circuit breakers#

n8n added multiple layers:

Per IP: 30 req/min (anti-bot)
Per user_id: 10 req/min
Global Dify QPS: 300
Business API (order lookup): 200 req/s, queue overflow
Monthly LLM token cap: enforces cache-only mode when hit

Week 6 — dry run#

Real-world simulation:

All-day load test Monday (10k concurrent for 4 hours)
Intentionally drop a Dify replica, verify auto-recovery
Failover Postgres to secondary
Practice “all-agent alert” runbook

Found 3 issues, fixed all.

Traffic curve#

00:00 - 09:00: slow climb, 10k cumulative
09:00: opening, instant 500 QPS
09:00 - 09:30: peak 1,200 concurrent
09:30 - 12:00: plateau, 500-800 concurrent
12:00 - 18:00: afternoon waves, 300-600
18:00 - 22:00: second peak, 700
22:00 - 24:00: tapering

Total: 121,400 conversations.

AI performance#

Metric	Target	Actual
AI deflection	> 70%	81%
First response (avg)	< 5s	1.8s
First response P95	< 8s	4.2s
First response P99	< 15s	9.8s
Handoff rate	< 30%	19%
Downtime	0	0

All targets beaten.

Degradation actually fired#

09:05 — hot-FAQ cache activated (25 min ahead of expected)
09:13 — DeepSeek replacing Qwen (held 47 min)
AnythingLLM fallback never triggered

CSAT during degradation: 4.4 → 4.2; user-imperceptible.

Day-of incidents#

09:14 — Chatwoot Sidekiq OOM#

Symptom: ticket creation slowed; Sidekiq queue backed up.

Cause: Sidekiq default queue handling n8n webhooks; some webhooks were slow.

Action: pulled concurrency 25 → 60; recovered in 2 minutes.

Follow-up: dedicated Sidekiq process for webhooks; main process handles tickets only.

Symptom: agents flagged 5 AI replies containing fake promo codes.

Action: n8n filter — any code in AI replies must be in {{valid_codes}}; replace with “please contact support” if not.

Follow-up: Prompt constraint “use only codes from {{valid_codes}}, never invent” + post-validation.

19:23 — Cloudflare Turnstile false-positives#

Symptom: users reported chat box wouldn’t open.

Cause: Turnstile “auto-challenge” too aggressive on some mobile browsers.

Action: switched to “invisible” mode.

Follow-up: explicit fallback messaging for Turnstile.

Customer feedback#

Survey of 500 customers:

68% “AI support better than expected”
22% “same as before”
7% “obviously a machine” (mostly about not recommending courses)
3% complained “too fast, felt dismissive” (rare but real)

5 most important engineering lessons#

Load test 4 weeks ahead — time to find bottlenecks
Degrade, don’t break — slightly lower quality > total outage
Cache is cheaper than scale — top-100 cache absorbed 60% of LLM calls
Defensive prompt engineering — whitelist any field the LLM might invent
Practice “humans also fail” — pre-agreed runbook for “all-agent alert”

Cost vs normal#

Item	Normal	Promo day
LLM tokens	$40	$580 (14× thanks to cache, not 48×)
Infra scale	—	$200 (K8s burst)
Engineering overtime	—	$1,200 (2 people × $300 × 2 days)
Day total	—	~$2,000

Vs the GMV uplift from 121,400 conversations (conservatively ¥800,000), ROI ~56×.

Next time#

True queueing — let users take a number rather than wait open-endedly
Mobile push — “close chat, ping me when reply lands”
LLM-as-judge KB warmup 1 week ahead