Case · A securities firm shipped a fully local AI support — the compliance path
No data egress, regulator approval, internal + external audit, DR plan. The real compliance journey for a mid-sized securities brokerage's local AI support.
Background#
- Business: mid-sized securities brokerage
- AUM: ~¥120B
- Onboarded customers: ~1.2M
- Support agents: 85 (3 cities)
- Regulator: front-line CSRC compliance
- Previous: human-only support + traditional call center vendor
Why AI now#
Four drivers:
- Support pressure: market-open peak hits 800+ inquiries / minute
- Audit burden: existing trails are too coarse; annual external audit takes 8-12 weeks
- Training cost: new agents take 3 months to ramp; 70% retention
- Peer pressure: top brokerages launched AI; customers notice
Blocker: no cloud LLM allowed — CSRC requires customer dialogue stays on-shore.
Selection#
Evaluated 5 directions:
| Option | Decision |
|---|---|
| Cloud LLM | × Compliance dead end |
| Outsource to a top-3 brokerage’s IT | × Cross-data + opacity |
| Build a dialogue model in-house | × 12-18 months, capability gap |
| On-prem open-source LLM + framework | ✓ |
| On-prem commercial LLM | ✓ but 3-5× more expensive |
Final stack: Rasa CALM + Chatwoot + RAGFlow + Qwen 2.5-72B on-prem (a finance variant of the fully local solution).
Architecture#
Infrastructure: inference on 8 × A100 80GB (vLLM); Postgres HA + off-site backup; logs to WORM storage + blockchain attestation.
Compliance journey (11 months)#
Month 1-2 — kickoff#
- IT presents proposal
- Risk, compliance, legal review
- CRO signs off
Month 3-4 — architecture justification#
- Model selection rationale (why Qwen)
- Security architecture (traffic, permissions, encryption)
- DR architecture (3 sites, 5 replicas)
Month 5 — pre-regulator chat#
- Informal touch with the local CSRC office
- Submit “AI System Plan,” “Data Security Statement,” “Compliance Undertaking”
- Got “no objection”
Month 6-7 — build#
- Hardware in place (A100 × 8, PG HA, network)
- Base deploy
- Fine-tune Qwen on internal corpus (~3 weeks)
Month 8 — internal UAT + audit#
- Risk team rides along 1,000 test cases
- 12 edge issues found (e.g. “recommend a stock”) — all gated by Rasa flows
- Internal audit clears
Month 9 — regulator on-site#
- CSRC office on-site for 3 days
- Focus: data egress, audit log completeness, kill switch
- 2 findings: log retention < 5 years, no quarterly pen test
- 4 weeks of remediation
Month 10 — external audit + 3rd-party security#
- MLPS 3.0 (China grade-3) passed
- Third-party pen test passed
- ISO 27001 recert passed
Month 11 — go-live#
- Canary: 1% authenticated customers
- Week 2: 5%
- Week 3: 20%
- Week 4: 100%
6-month post-launch data#
| Metric | Before | After |
|---|---|---|
| Monthly conversations | 480k | 520k (slight rise) |
| AI deflection | 0% | 58% |
| First response | 47 s avg | 2.3 s avg |
| Human agent hours / mo | 13,600 | 7,200 |
| Agent headcount | 85 | 60 (25 rotated) |
| CSAT | 4.1 | 4.3 |
| Complaint rate | 0.21% | 0.18% |
| External audit duration | 8-12 weeks | 4 weeks (structured logs) |
Technical decisions#
1. Why Rasa, not Dify#
Financial conversations can’t guess. “Can I enable margin trading?” requires:
- Customer tier sufficient?
- Risk assessment sufficient?
- Risk disclosure signed?
- Funds account compliant?
- All yes → guide to “enable” flow
Rasa’s Flows + LLM-for-NLU fits much better than Dify’s pure-LLM workflows.
2. Why local Qwen vs cloud GPT#
CSRC’s data-egress red line is strict:
- OpenAI / Anthropic completely off-limits
- Azure OpenAI China-edition is theoretically possible but compliance overhead is heavy
- Domestic cloud LLM APIs (Alibaba, Tencent) are viable but data flows still need review
Local Qwen: zero egress, audit passes straight through.
3. Why no direct business-system access#
Too risky. LLM directly reading accounts / orders / funds risks privilege escalation. Design:
- LLM only sees “public KB” (rules, policy, flow)
- Account data requires a business interface (with permission check)
- Interface returns structured data → Rasa flow assembles the reply
Hardware + cost#
| Item | Spec | Monthly |
|---|---|---|
| A100 80GB × 8 | $120k upfront / 36-month amort | ¥21k |
| GPU host × 2 | 64C/256G | ¥8k |
| Storage | 100TB (incl. WORM audit) | ¥6k |
| Network / security | Firewall + MLPS + monitoring | ¥5k |
| Software licenses | Rasa Pro + commercial support | ¥10k |
| 4-person ops team | — | ¥80k |
| Total | ¥130k / mo |
Sounds expensive, but 25 agents × ¥15k/mo = ¥375k/mo saved. Net ¥245k/mo saving.
Intangible wins#
- Audit efficiency: external audit 8-12 weeks → 4 weeks, ~600 hours saved annually
- Training: agents ramp 3 months → 6 weeks (AI catches misses)
- Brand: cited by regulator as “digitization exemplar”
- New business support: launching ETF options support went from 2 weeks of training to 3 days
Scars#
- Compliance review rejected first prompt — missing “not investment advice” disclaimer; rewrote everything
- Qwen 72B errs on compound interest math — switched to Rasa calling an external compute service
- Rasa Flows are heavier to maintain than expected — when business changes, Flow edits cost more than prompt edits; team stabilized at month 4
- A100 utilization is low — 30-40% daytime, idle at night; considering training jobs to soak it up
Advice for regulated peers#
If you’re in a regulated industry, 4 cautions:
- Talk to the regulator early — start the conversation 2 months before build
- Design compliance architecture first — data flows, permission matrix, audit needs before any code
- Don’t chase the newest model — pick stable + commercially supported (Qwen + Rasa Pro)
- Budget compliance time — 2-3 months of approvals after technical readiness; build it into the plan