flag92 flag92
Blog
Published Fri Apr 17 2026 08:00:00 GMT+0800 (中国标准时间)
case studyhealthcarecomplianceRAG

Case · How a healthcare provider built RAG with zero PHI leakage

An internet hospital launched AI health consultations. The hardest part — PHI must never reach the LLM. Their 3-tier KB + Pipeline redaction + audit architecture.

Background#

  • Business: internet hospital (telemedicine, e-prescriptions, drug delivery)
  • MAU: ~800k
  • Daily consults: ~4,000 (health questions: ~1,200)
  • Team: 23 physicians + 8 pharmacists + 12 support
  • Legal boundaries: AI does not diagnose, zero PHI leakage

Why AI#

Physicians’ time was consumed by “what does my lab report mean,” “can these meds combine,” “is this insurance-covered.” Triage data showed:

  • 65% of consults are answerable with public medical knowledge + public policy
  • 25% need patient history (PHI) — physician required
  • 10% emergency (symptoms) — immediate triage

Goal: hand the 65% “information lookup” tier to AI, with zero leakage.

3-tier architecture#

Tier 1 — knowledge layering#

TierContentVisible to
L1 publicHospital info, intake flow, insurance policy, common health knowledge, drug labelsAI + any visitor
L2 semi-privateAuthenticated user’s appointments, payments, lab-report download linksAI via tool calls + the user themselves
L3 private PHIMedical records, lab-report contents, physician notesNever in RAG, physicians only

L1 lives in RAGFlow KB. L2 is reached through structured business APIs. L3 is completely siloed.

Tier 2 — Pipeline redaction middleware#

All user messages pass through Open WebUI Pipelines:

# pipelines/healthcare_redact.py
import re

PATTERNS = [
    (re.compile(r'\b\d{15,18}[Xx0-9]?\b'), '[ID]'),
    (re.compile(r'\b1[3-9]\d{9}\b'), '[PHONE]'),
    (re.compile(r'case\s*#?\s*(\w+)', re.I), 'case#:[CASE_ID]'),
    # 50+ more patterns for lab IDs, insurance numbers
]

EMERGENCY_KEYWORDS = [
    'chest pain', 'unconscious', 'bleeding', 'suicide', 'overdose',
    'cannot breathe', 'seizure', 'stroke', 'heart attack', 'shock',
    # ~80 total, in EN + ZH variants
]

class Pipeline:
    def __init__(self):
        self.name = "Healthcare Redact + Emergency Detect"

    def pipe(self, body, user, **kwargs):
        msg = body['messages'][-1]['content']

        if any(k.lower() in msg.lower() for k in EMERGENCY_KEYWORDS):
            body['skip_llm'] = True
            body['fallback'] = {
                'role': 'assistant',
                'content': "Emergency keyword detected. Routing to on-call physician. Please call your local emergency line immediately."
            }
            notify_oncall_doctor(user, msg)
            return body

        for pattern, repl in PATTERNS:
            msg = pattern.sub(repl, msg)
        body['messages'][-1]['content'] = msg

        audit_log(user, msg)
        return body

Tier 3 — audit trail#

Every conversation produces 4 records:

TypeStorage
Original message (pre-redact)Encrypted, physician-only access
Redacted text (what LLM saw)WORM, 6 months online, 5 years archive
LLM replyWORM, same
Decision chain (which KB chunks were retrieved)WORM, same

Regulator audits can pull complete trails in 5 minutes.

Hard constraints on AI replies#

Every prompt appends:

Important: this is based on public medical knowledge and our policy. It is for information only and does not constitute diagnosis or treatment advice. For specific concerns, consult a licensed physician.

Dify Workflow enforces:

  • “Which medicine should I take” → never names a drug, always “consult a physician”
  • “Can I stop this medication” → immediate human handoff
  • Pregnancy / children / elderly → enhanced disclaimer
  • Mental health → always human

4-month numbers#

MetricBeforeAfter
Daily consults4,0004,800 (slight rise)
AI deflection062% (close to 65% target)
Avg response8 min1.8 s (AI leg)
Physician hours / day280105
Emergency keyword hits~18/day (all handover < 30s)
Leakage incidents0

A real emergency#

23:47 one night, a user asked in a regular consult: “I have chest pain and can’t breathe.”

  • Pipelines detected in 0.2 s
  • Immediately returned emergency reply with emergency phone
  • Simultaneously pinged on-call physician via Lark + SMS
  • On-call entered the conversation 28 s later
  • Guided user to call ambulance; user transported 5 min later

Post-mortem: this was the 1,847th emergency-keyword trigger in 4 months and the first life saved. Worth it.

Regulator audit#

At month 3, provincial + city health commission joint review:

CheckOur response
Does it diagnose?Pulled 200 samples, all carried disclaimers
Does PHI reach LLM?Pulled 50 before/after redaction pairs
Emergency handlingPulled 50 emergency-trigger records
Audit completenessPulled any moment’s full trail
Patient consentProvided signup terms

Passed; awarded “Digital Health Consultation Compliance Unit.”

Unsolved problems#

1. Elderly UX#

Elderly users have non-standard phrasing, typos, odd punctuation — RAG retrieval suffers. Workaround: looser embedding threshold + bias toward “go to human.” Root cause unsolved.

2. Dialect#

Some southern dialect inputs confuse even Qwen. Building a dialect→standard preprocessing model.

3. Multi-turn entity tracking#

Patient asks “that medication” after 5 turns — AI often guesses wrong. Adding an entity-tracking layer.

5 advisories for medical peers#

  1. L1/L2/L3 tiering is non-negotiable — PHI never enters RAG, no exceptions
  2. Emergency keyword dictionary updated monthly — medical team owns it; adding new words matters more than removing old
  3. Disclaimer on every reply — legally required
  4. Pipelines beats Workflow for redaction — one layer before LLM, more reliable
  5. Audit trail isn’t just conversations — retrieved chunks, prompt versions, model versions all logged

Cost#

ItemMonthly
GPU inference (2 × A100 40GB)¥30k
Chatwoot + RAGFlow + Open WebUI servers¥6k
WORM audit storage¥4k
Software / security licenses¥5k
2-person ops¥40k
Total¥85k / mo

Physician hours saved: 175h/day × 30 × ¥200/h = ¥1,050k/mo. Net ¥965k/mo saved.

Search

Press ⌘ K to open