Published Fri May 01 2026 08:00:00 GMT+0800 (中国标准时间)

case studyfinancecomplianceon-prem

Case · A securities firm shipped a fully local AI support — the compliance path

No data egress, regulator approval, internal + external audit, DR plan. The real compliance journey for a mid-sized securities brokerage's local AI support.

Background#

Business: mid-sized securities brokerage
AUM: ~¥120B
Onboarded customers: ~1.2M
Support agents: 85 (3 cities)
Regulator: front-line CSRC compliance
Previous: human-only support + traditional call center vendor

Why AI now#

Four drivers:

Support pressure: market-open peak hits 800+ inquiries / minute
Audit burden: existing trails are too coarse; annual external audit takes 8-12 weeks
Training cost: new agents take 3 months to ramp; 70% retention
Peer pressure: top brokerages launched AI; customers notice

Blocker: no cloud LLM allowed — CSRC requires customer dialogue stays on-shore.

Selection#

Evaluated 5 directions:

Option	Decision
Cloud LLM	× Compliance dead end
Outsource to a top-3 brokerage’s IT	× Cross-data + opacity
Build a dialogue model in-house	× 12-18 months, capability gap
On-prem open-source LLM + framework	✓
On-prem commercial LLM	✓ but 3-5× more expensive

Final stack: Rasa CALM + Chatwoot + RAGFlow + Qwen 2.5-72B on-prem (a finance variant of the fully local solution).

Architecture#

Infrastructure: inference on 8 × A100 80GB (vLLM); Postgres HA + off-site backup; logs to WORM storage + blockchain attestation.

Compliance journey (11 months)#

Month 1-2 — kickoff#

IT presents proposal
Risk, compliance, legal review
CRO signs off

Month 3-4 — architecture justification#

Model selection rationale (why Qwen)
Security architecture (traffic, permissions, encryption)
DR architecture (3 sites, 5 replicas)

Month 5 — pre-regulator chat#

Informal touch with the local CSRC office
Submit “AI System Plan,” “Data Security Statement,” “Compliance Undertaking”
Got “no objection”

Month 6-7 — build#

Hardware in place (A100 × 8, PG HA, network)
Base deploy
Fine-tune Qwen on internal corpus (~3 weeks)

Month 8 — internal UAT + audit#

Risk team rides along 1,000 test cases
12 edge issues found (e.g. “recommend a stock”) — all gated by Rasa flows
Internal audit clears

Month 9 — regulator on-site#

CSRC office on-site for 3 days
Focus: data egress, audit log completeness, kill switch
2 findings: log retention < 5 years, no quarterly pen test
4 weeks of remediation

Month 10 — external audit + 3rd-party security#

MLPS 3.0 (China grade-3) passed
Third-party pen test passed
ISO 27001 recert passed

Month 11 — go-live#

Canary: 1% authenticated customers
Week 2: 5%
Week 3: 20%
Week 4: 100%

6-month post-launch data#

Metric	Before	After
Monthly conversations	480k	520k (slight rise)
AI deflection	0%	58%
First response	47 s avg	2.3 s avg
Human agent hours / mo	13,600	7,200
Agent headcount	85	60 (25 rotated)
CSAT	4.1	4.3
Complaint rate	0.21%	0.18%
External audit duration	8-12 weeks	4 weeks (structured logs)

Technical decisions#

1. Why Rasa, not Dify#

Financial conversations can’t guess. “Can I enable margin trading?” requires:

Customer tier sufficient?
Risk assessment sufficient?
Risk disclosure signed?
Funds account compliant?
All yes → guide to “enable” flow

Rasa’s Flows + LLM-for-NLU fits much better than Dify’s pure-LLM workflows.

2. Why local Qwen vs cloud GPT#

CSRC’s data-egress red line is strict:

OpenAI / Anthropic completely off-limits
Azure OpenAI China-edition is theoretically possible but compliance overhead is heavy
Domestic cloud LLM APIs (Alibaba, Tencent) are viable but data flows still need review

Local Qwen: zero egress, audit passes straight through.

3. Why no direct business-system access#

Too risky. LLM directly reading accounts / orders / funds risks privilege escalation. Design:

LLM only sees “public KB” (rules, policy, flow)
Account data requires a business interface (with permission check)
Interface returns structured data → Rasa flow assembles the reply

Hardware + cost#

Item	Spec	Monthly
A100 80GB × 8	$120k upfront / 36-month amort	¥21k
GPU host × 2	64C/256G	¥8k
Storage	100TB (incl. WORM audit)	¥6k
Network / security	Firewall + MLPS + monitoring	¥5k
Software licenses	Rasa Pro + commercial support	¥10k
4-person ops team	—	¥80k
Total		¥130k / mo

Sounds expensive, but 25 agents × ¥15k/mo = ¥375k/mo saved. Net ¥245k/mo saving.

Intangible wins#

Audit efficiency: external audit 8-12 weeks → 4 weeks, ~600 hours saved annually
Training: agents ramp 3 months → 6 weeks (AI catches misses)
Brand: cited by regulator as “digitization exemplar”
New business support: launching ETF options support went from 2 weeks of training to 3 days

Scars#

Compliance review rejected first prompt — missing “not investment advice” disclaimer; rewrote everything
Qwen 72B errs on compound interest math — switched to Rasa calling an external compute service
Rasa Flows are heavier to maintain than expected — when business changes, Flow edits cost more than prompt edits; team stabilized at month 4
A100 utilization is low — 30-40% daytime, idle at night; considering training jobs to soak it up

Advice for regulated peers#

If you’re in a regulated industry, 4 cautions:

Talk to the regulator early — start the conversation 2 months before build
Design compliance architecture first — data flows, permission matrix, audit needs before any code
Don’t chase the newest model — pick stable + commercially supported (Qwen + Rasa Pro)
Budget compliance time — 2-3 months of approvals after technical readiness; build it into the plan