flag92 flag92
Blog
Published Fri Apr 24 2026 08:00:00 GMT+0800 (中国标准时间)
case studyeducationpromo dayload test

Case · An online-edu promo day with 50× traffic — engineering notes

A coding-education company handled 120k conversations in 24 hours during 618 sale, AI deflection 81%. How engineering pre-tested, rate-limited, degraded gracefully.

Background#

  • Business: adult online coding education
  • Avg monthly conversations: ~25k
  • 618 promo day: ~120k (48× normal)
  • Team: 12 support agents + 2 engineers
  • Stack: Chatwoot + Dify + FastGPT + n8n (an instance of the education industry playbook)

Pre-promo worries#

System runs fine normally; promo days blow up. Past lessons:

  • 2024 June: support system down 4 hours
  • 2025 Nov: Dify retrieval latency 200ms → 8s
  • 2025 June: human agents pile up; avg response 40 minutes

For the first AI-powered promo day, targets:

  1. AI deflection > 70%
  2. First response stable < 5s
  3. Zero downtime
  4. Human handoff < 30%

6-week prep#

Week 1 — baseline load test#

Locust simulating users:

ConcurrentHolds?
500
1,000
2,000⚠ FRT 5s
5,000✗ Dify worker OOM
10,000✗ Postgres connections exhausted

Target: stable at 10,000 concurrent.

Week 2 — infra scale#

  • Chatwoot: 1 node → 3 nodes (K8s HPA)
  • Dify workers: 2 → 12 replicas
  • Postgres: 8C/16G + PgBouncer
  • Redis: 16GB
  • Front with Cloudflare (CDN + Turnstile)

Week 3 — KB refactor#

Normal KB is 800+ FAQs; promo introduces “flash sales,” “group buying,” “promo codes.”

  • Added 200+ promo-specific FAQs
  • Eval-set regression: MRR@5 0.87 → 0.91
  • Add “no match → template reply + handoff” fallback

Week 4 — degradation plan#

3-tier degradation:

TriggerAction
Dify QPS > 80Switch to DeepSeek-V3 (cheaper than Qwen-72B)
Dify QPS > 150Enable hot-FAQ cache (Redis top-100)
Dify QPS > 250Fall back to AnythingLLM (slightly lower quality, never down)

Each tier has automatic alert + human-confirmed activation.

Week 5 — rate limits + circuit breakers#

n8n added multiple layers:

  • Per IP: 30 req/min (anti-bot)
  • Per user_id: 10 req/min
  • Global Dify QPS: 300
  • Business API (order lookup): 200 req/s, queue overflow
  • Monthly LLM token cap: enforces cache-only mode when hit

Week 6 — dry run#

Real-world simulation:

  • All-day load test Monday (10k concurrent for 4 hours)
  • Intentionally drop a Dify replica, verify auto-recovery
  • Failover Postgres to secondary
  • Practice “all-agent alert” runbook

Found 3 issues, fixed all.

Promo-day numbers#

Traffic curve#

00:00 - 09:00: slow climb, 10k cumulative
09:00: opening, instant 500 QPS
09:00 - 09:30: peak 1,200 concurrent
09:30 - 12:00: plateau, 500-800 concurrent
12:00 - 18:00: afternoon waves, 300-600
18:00 - 22:00: second peak, 700
22:00 - 24:00: tapering

Total: 121,400 conversations.

AI performance#

MetricTargetActual
AI deflection> 70%81%
First response (avg)< 5s1.8s
First response P95< 8s4.2s
First response P99< 15s9.8s
Handoff rate< 30%19%
Downtime00

All targets beaten.

Degradation actually fired#

  • 09:05 — hot-FAQ cache activated (25 min ahead of expected)
  • 09:13 — DeepSeek replacing Qwen (held 47 min)
  • AnythingLLM fallback never triggered

CSAT during degradation: 4.4 → 4.2; user-imperceptible.

Day-of incidents#

09:14 — Chatwoot Sidekiq OOM#

Symptom: ticket creation slowed; Sidekiq queue backed up.

Cause: Sidekiq default queue handling n8n webhooks; some webhooks were slow.

Action: pulled concurrency 25 → 60; recovered in 2 minutes.

Follow-up: dedicated Sidekiq process for webhooks; main process handles tickets only.

13:42 — LLM hallucinated promo codes#

Symptom: agents flagged 5 AI replies containing fake promo codes.

Action: n8n filter — any code in AI replies must be in {{valid_codes}}; replace with “please contact support” if not.

Follow-up: Prompt constraint “use only codes from {{valid_codes}}, never invent” + post-validation.

19:23 — Cloudflare Turnstile false-positives#

Symptom: users reported chat box wouldn’t open.

Cause: Turnstile “auto-challenge” too aggressive on some mobile browsers.

Action: switched to “invisible” mode.

Follow-up: explicit fallback messaging for Turnstile.

Customer feedback#

Survey of 500 customers:

  • 68% “AI support better than expected”
  • 22% “same as before”
  • 7% “obviously a machine” (mostly about not recommending courses)
  • 3% complained “too fast, felt dismissive” (rare but real)

5 most important engineering lessons#

  1. Load test 4 weeks ahead — time to find bottlenecks
  2. Degrade, don’t break — slightly lower quality > total outage
  3. Cache is cheaper than scale — top-100 cache absorbed 60% of LLM calls
  4. Defensive prompt engineering — whitelist any field the LLM might invent
  5. Practice “humans also fail” — pre-agreed runbook for “all-agent alert”

Cost vs normal#

ItemNormalPromo day
LLM tokens$40$580 (14× thanks to cache, not 48×)
Infra scale$200 (K8s burst)
Engineering overtime$1,200 (2 people × $300 × 2 days)
Day total~$2,000

Vs the GMV uplift from 121,400 conversations (conservatively ¥800,000), ROI ~56×.

Next time#

  • True queueing — let users take a number rather than wait open-endedly
  • Mobile push — “close chat, ping me when reply lands”
  • LLM-as-judge KB warmup 1 week ahead

Search

Press ⌘ K to open