flag92 flag92
By industry

AI customer support for financial services

Banking, brokerage, insurance — strict regulation, sensitive data, zero tolerance for wrong answers. AI support has to walk a tightrope between compliance and experience.

Recommended stack
RasaChatwootRAGFlowOllama (local inference)
Monthly cost
$400 - $1500 (incl. local GPU inference)
Compliance notes
Banking, securities and insurance regulations. Conversation, knowledge-base, prompt and model versions all retained on WORM storage. Inference stays on-shore.

Key challenges

  • Strict data-residency makes cloud LLMs usually a no-go
  • A wrong answer can trigger regulatory action — AI must be explainable
  • Customers must authenticate before private-data Q&A
  • Compliance audits need full conversation and decision trails

Why finance is the hard mode#

RequirementImpact on stack
Data stays on-shoreLocal inference: Qwen / DeepSeek / GLM + Ollama / vLLM
Explainable answersRAG with citations, layered with rule engines
Strong authenticationWidget requires OAuth / SSO; unauthenticated users see only public KB
Full audit trailLog conversations, KB versions, prompt versions, model versions to WORM storage

Unauthenticated

Authenticated

Procedural

Free-form

Complex

User

Web / App

Authentication

Public FAQ Bot
Dify + public KB

Rasa CALM

Balance / Transfer / Close
strict state machine

RAGFlow private KB

Chatwoot human agent

Local LLM
Qwen 14B / 32B

Core trading system

Why Rasa rather than Dify#

Financial conversations cannot “guess.” Rasa’s CALM separates flow logic from language understanding — declarative flows for “where the user is and what’s missing,” LLM for NLU. Dify is better for free-form Q&A but loses some control over strict procedures.

On-prem hardware reference#

ScenarioModelHardware
Public FAQQwen 2.5-7B-Instruct1 × A10 24GB
Internal business chatQwen 2.5-14B-Instruct1 × A100 40GB
Heavy RAG + long contextQwen 2.5-32B2 × A100 80GB

Anti-patterns#

  • Sending customer conversations to a cloud LLM (data export violation)
  • Letting an LLM execute trades directly (always gate through a rule engine)
  • Returning LLM-generated “numbers” to users (always source from structured data)

Search

Press ⌘ K to open