Case · An online-edu promo day with 50× traffic — engineering notes
A coding-education company handled 120k conversations in 24 hours during 618 sale, AI deflection 81%. How engineering pre-tested, rate-limited, degraded gracefully.
Background#
- Business: adult online coding education
- Avg monthly conversations: ~25k
- 618 promo day: ~120k (48× normal)
- Team: 12 support agents + 2 engineers
- Stack: Chatwoot + Dify + FastGPT + n8n (an instance of the education industry playbook)
Pre-promo worries#
System runs fine normally; promo days blow up. Past lessons:
- 2024 June: support system down 4 hours
- 2025 Nov: Dify retrieval latency 200ms → 8s
- 2025 June: human agents pile up; avg response 40 minutes
For the first AI-powered promo day, targets:
- AI deflection > 70%
- First response stable < 5s
- Zero downtime
- Human handoff < 30%
6-week prep#
Week 1 — baseline load test#
Locust simulating users:
| Concurrent | Holds? |
|---|---|
| 500 | ✓ |
| 1,000 | ✓ |
| 2,000 | ⚠ FRT 5s |
| 5,000 | ✗ Dify worker OOM |
| 10,000 | ✗ Postgres connections exhausted |
Target: stable at 10,000 concurrent.
Week 2 — infra scale#
- Chatwoot: 1 node → 3 nodes (K8s HPA)
- Dify workers: 2 → 12 replicas
- Postgres: 8C/16G + PgBouncer
- Redis: 16GB
- Front with Cloudflare (CDN + Turnstile)
Week 3 — KB refactor#
Normal KB is 800+ FAQs; promo introduces “flash sales,” “group buying,” “promo codes.”
- Added 200+ promo-specific FAQs
- Eval-set regression: MRR@5 0.87 → 0.91
- Add “no match → template reply + handoff” fallback
Week 4 — degradation plan#
3-tier degradation:
| Trigger | Action |
|---|---|
| Dify QPS > 80 | Switch to DeepSeek-V3 (cheaper than Qwen-72B) |
| Dify QPS > 150 | Enable hot-FAQ cache (Redis top-100) |
| Dify QPS > 250 | Fall back to AnythingLLM (slightly lower quality, never down) |
Each tier has automatic alert + human-confirmed activation.
Week 5 — rate limits + circuit breakers#
n8n added multiple layers:
- Per IP: 30 req/min (anti-bot)
- Per user_id: 10 req/min
- Global Dify QPS: 300
- Business API (order lookup): 200 req/s, queue overflow
- Monthly LLM token cap: enforces cache-only mode when hit
Week 6 — dry run#
Real-world simulation:
- All-day load test Monday (10k concurrent for 4 hours)
- Intentionally drop a Dify replica, verify auto-recovery
- Failover Postgres to secondary
- Practice “all-agent alert” runbook
Found 3 issues, fixed all.
Promo-day numbers#
Traffic curve#
00:00 - 09:00: slow climb, 10k cumulative
09:00: opening, instant 500 QPS
09:00 - 09:30: peak 1,200 concurrent
09:30 - 12:00: plateau, 500-800 concurrent
12:00 - 18:00: afternoon waves, 300-600
18:00 - 22:00: second peak, 700
22:00 - 24:00: tapering
Total: 121,400 conversations.
AI performance#
| Metric | Target | Actual |
|---|---|---|
| AI deflection | > 70% | 81% |
| First response (avg) | < 5s | 1.8s |
| First response P95 | < 8s | 4.2s |
| First response P99 | < 15s | 9.8s |
| Handoff rate | < 30% | 19% |
| Downtime | 0 | 0 |
All targets beaten.
Degradation actually fired#
- 09:05 — hot-FAQ cache activated (25 min ahead of expected)
- 09:13 — DeepSeek replacing Qwen (held 47 min)
- AnythingLLM fallback never triggered
CSAT during degradation: 4.4 → 4.2; user-imperceptible.
Day-of incidents#
09:14 — Chatwoot Sidekiq OOM#
Symptom: ticket creation slowed; Sidekiq queue backed up.
Cause: Sidekiq default queue handling n8n webhooks; some webhooks were slow.
Action: pulled concurrency 25 → 60; recovered in 2 minutes.
Follow-up: dedicated Sidekiq process for webhooks; main process handles tickets only.
13:42 — LLM hallucinated promo codes#
Symptom: agents flagged 5 AI replies containing fake promo codes.
Action: n8n filter — any code in AI replies must be in {{valid_codes}}; replace with “please contact support” if not.
Follow-up: Prompt constraint “use only codes from {{valid_codes}}, never invent” + post-validation.
19:23 — Cloudflare Turnstile false-positives#
Symptom: users reported chat box wouldn’t open.
Cause: Turnstile “auto-challenge” too aggressive on some mobile browsers.
Action: switched to “invisible” mode.
Follow-up: explicit fallback messaging for Turnstile.
Customer feedback#
Survey of 500 customers:
- 68% “AI support better than expected”
- 22% “same as before”
- 7% “obviously a machine” (mostly about not recommending courses)
- 3% complained “too fast, felt dismissive” (rare but real)
5 most important engineering lessons#
- Load test 4 weeks ahead — time to find bottlenecks
- Degrade, don’t break — slightly lower quality > total outage
- Cache is cheaper than scale — top-100 cache absorbed 60% of LLM calls
- Defensive prompt engineering — whitelist any field the LLM might invent
- Practice “humans also fail” — pre-agreed runbook for “all-agent alert”
Cost vs normal#
| Item | Normal | Promo day |
|---|---|---|
| LLM tokens | $40 | $580 (14× thanks to cache, not 48×) |
| Infra scale | — | $200 (K8s burst) |
| Engineering overtime | — | $1,200 (2 people × $300 × 2 days) |
| Day total | — | ~$2,000 |
Vs the GMV uplift from 121,400 conversations (conservatively ¥800,000), ROI ~56×.
Next time#
- True queueing — let users take a number rather than wait open-endedly
- Mobile push — “close chat, ping me when reply lands”
- LLM-as-judge KB warmup 1 week ahead