AI customer support for financial services

Banking, brokerage, insurance — strict regulation, sensitive data, zero tolerance for wrong answers. AI support has to walk a tightrope between compliance and experience.

Recommended stack

RasaChatwootRAGFlowOllama (local inference)

Monthly cost

$400 - $1500 (incl. local GPU inference)

Compliance notes

Banking, securities and insurance regulations. Conversation, knowledge-base, prompt and model versions all retained on WORM storage. Inference stays on-shore.

Key challenges

Strict data-residency makes cloud LLMs usually a no-go
A wrong answer can trigger regulatory action — AI must be explainable
Customers must authenticate before private-data Q&A
Compliance audits need full conversation and decision trails

Why finance is the hard mode#

Requirement	Impact on stack
Data stays on-shore	Local inference: Qwen / DeepSeek / GLM + Ollama / vLLM
Explainable answers	RAG with citations, layered with rule engines
Strong authentication	Widget requires OAuth / SSO; unauthenticated users see only public KB
Full audit trail	Log conversations, KB versions, prompt versions, model versions to WORM storage

Recommended architecture#

Why Rasa rather than Dify#

Financial conversations cannot “guess.” Rasa’s CALM separates flow logic from language understanding — declarative flows for “where the user is and what’s missing,” LLM for NLU. Dify is better for free-form Q&A but loses some control over strict procedures.

On-prem hardware reference#

Scenario	Model	Hardware
Public FAQ	Qwen 2.5-7B-Instruct	1 × A10 24GB
Internal business chat	Qwen 2.5-14B-Instruct	1 × A100 40GB
Heavy RAG + long context	Qwen 2.5-32B	2 × A100 80GB

Anti-patterns#

Sending customer conversations to a cloud LLM (data export violation)
Letting an LLM execute trades directly (always gate through a rule engine)
Returning LLM-generated “numbers” to users (always source from structured data)