flag92 flag92
Blog
Published Fri May 15 2026 08:00:00 GMT+0800 (中国标准时间)
case studye-commercepractice

Case · A cross-border e-commerce running 80k daily orders — 6 months on Chatwoot + Dify

A Shopify + WhatsApp e-commerce shop migrated from Intercom to Chatwoot + Dify. Six months of real numbers — cost, deflection, satisfaction.

Numbers shared with the customer’s consent. Company name anonymized; scale, processes and figures untouched.

Background#

  • Business: B2C beauty, cross-border to US / EU / SEA
  • Channels: 3 Shopify storefronts + Amazon + WhatsApp Business
  • Daily orders: ~80,000
  • Monthly active customers: ~450,000
  • Previous SaaS: Intercom Pro + Fin AI Agent
  • Team: 10 multilingual support agents (Vietnam, Philippines, Mexico)

Pre-migration pain#

After 3 years on Intercom, four issues surfaced:

  1. Monthly bill > $4,000 — Fin AI is outcome-priced; peaks hit $4,200/mo
  2. Multi-store fragmentation — 3 storefronts + 1 Amazon; Intercom workspace isolation forced agent switching
  3. WhatsApp surcharge — Intercom WhatsApp adds another $800/mo
  4. AI not customizable — Fin accepts docs only, no business logic (e.g. “look up order before replying”)

Decision#

Evaluated November 2025:

OptionMonthly est.Decision
Stay on Intercom$4,200× Too expensive
Switch to Zendesk + AI$3,500× Expensive and vendor-locked
Switch to Chatwoot + Dify$250-400
Build in-house$50 infra + 3 person-months× Not worth it

Architecture as deployed#

low confidence

Shopify × 3

Chatwoot Inbox
per-brand

Amazon API

WhatsApp Business
self-managed WaBA

Email · Postmark

Web Widget

Dify Workflow · Agent

Language detect

Intent classify
pre / post / complaint

n8n tool calls

Order · Shopify

Tracking · Aftership

Refund · Shopify

CRM · HubSpot

Qwen2.5-72B generate

Human agent

6-month numbers#

Cost#

ItemOld (Intercom)New (Chatwoot + Dify)
Platform$1,990 (5 seats)$0
Fin AI (outcome)$2,210$0
WhatsApp$800$200 (direct WaBA)
LLM tokens (DeepSeek + Qwen)$180
VPS / DB$120
Total / mo$5,000$500
Annual savings$54,000

AI deflection#

MonthConversationsAI-onlyTo humanDeflection
1 (ramp)28,40011,36017,04040%
231,20018,72012,48060%
330,80021,5609,24070%
4 (Black Friday)78,50056,52021,98072%
535,60026,3449,25674%
638,10028,9569,14476%

Key insight: 40% → 76% took 5 months. Agents fed prompt improvements daily for the first 2 months; from month 3 the KB stabilized.

Customer satisfaction#

MetricBefore (Intercom Fin)After (Chatwoot + Dify)
First response (FRT)12 s2.4 s
Avg handling4 m 18 s2 m 02 s
CSAT (AI leg)4.1 / 54.3 / 5
CSAT (human leg)4.5 / 54.6 / 5
NPS+28+34

Surprise: CSAT rose. We attribute it to longer, more detailed replies (agents tuned the prompt to “explain”), where Fin was terse.

Business impact#

  • Pre-sales conversion (inquiry → order): 8.2% → 10.5%
  • Refund rate: 3.1% → 2.4% (AI offers alternatives before refund)
  • Agent count: 10 → 6 (4 moved to ops / content)

Scars#

1. WhatsApp template approvals#

Meta pre-approves outbound templates. The first 2 weeks we were rejected 14 times — urgency words (“now”, “immediately”) were flagged as marketing; Chinese templates approved slower than English.

Fix: hew to Meta’s template examples verbatim.

2. Multilingual KB drift#

We started with 4 language-specific KBs. Updating policy meant 4 edits.

Fix: single English KB + prompt-driven translation. See cross-border playbook.

3. Qwen2.5-72B hallucinations#

~0.5% of replies invented order IDs / amounts.

Fix:

  1. Prompt rule: “Use only {{order_amount}}, never rewrite”
  2. LLM-as-judge sampler — 100 replies/day scored on faithfulness
  3. Hallucinations fed into a negative-example set; weekly regression

4. Black Friday stress#

Traffic 50× normal. Day 1 Dify workers blew up.

Fix:

  • Locust load test a week ahead
  • Worker replicas 2 → 8
  • Postgres index: messages(conversation_id, created_at)
  • Redis memory 256 MB → 2 GB
  • Critical workflows got a “graceful degrade” branch: 5-second API timeout → fallback answer

Still improving#

  • Tiered KB by customer segment (VIPs see “premium” KB)
  • Cross-channel conversation memory (WhatsApp + Email + Widget per customer)
  • Refund-decision LLM-as-judge classifier

Bottom line#

Headlines:

  • -90% cost ($5,000 → $500)
  • +90% relative deflection (40% → 76%)
  • -80% FRT (12s → 2.4s)
  • -40% agents (10 → 6)
  • +6 NPS (28 → 34)

Lessons:

  1. Don’t expect best numbers in week 1 — budget 3 months of ramp
  2. KB iteration beats model upgrades (6 months, no model swap, KB-only)
  3. n8n is the underrated business-side glue (see n8n orchestrated solution)
  4. LLM-as-judge is mandatory — without it you won’t know when prompts degrade

Search

Press ⌘ K to open