AI customer support for financial services
Banking, brokerage, insurance — strict regulation, sensitive data, zero tolerance for wrong answers. AI support has to walk a tightrope between compliance and experience.
Recommended stack
RasaChatwootRAGFlowOllama (local inference)
Monthly cost
$400 - $1500 (incl. local GPU inference)
Compliance notes
Banking, securities and insurance regulations. Conversation, knowledge-base, prompt and model versions all retained on WORM storage. Inference stays on-shore.
Key challenges
- Strict data-residency makes cloud LLMs usually a no-go
- A wrong answer can trigger regulatory action — AI must be explainable
- Customers must authenticate before private-data Q&A
- Compliance audits need full conversation and decision trails
Why finance is the hard mode#
| Requirement | Impact on stack |
|---|---|
| Data stays on-shore | Local inference: Qwen / DeepSeek / GLM + Ollama / vLLM |
| Explainable answers | RAG with citations, layered with rule engines |
| Strong authentication | Widget requires OAuth / SSO; unauthenticated users see only public KB |
| Full audit trail | Log conversations, KB versions, prompt versions, model versions to WORM storage |
Recommended architecture#
Why Rasa rather than Dify#
Financial conversations cannot “guess.” Rasa’s CALM separates flow logic from language understanding — declarative flows for “where the user is and what’s missing,” LLM for NLU. Dify is better for free-form Q&A but loses some control over strict procedures.
On-prem hardware reference#
| Scenario | Model | Hardware |
|---|---|---|
| Public FAQ | Qwen 2.5-7B-Instruct | 1 × A10 24GB |
| Internal business chat | Qwen 2.5-14B-Instruct | 1 × A100 40GB |
| Heavy RAG + long context | Qwen 2.5-32B | 2 × A100 80GB |
Anti-patterns#
- Sending customer conversations to a cloud LLM (data export violation)
- Letting an LLM execute trades directly (always gate through a rule engine)
- Returning LLM-generated “numbers” to users (always source from structured data)