Agent cho Customer Support: FAQ bot → Ticket routing → Escalation

Kiến trúc 3 tầng FAQ → Triage → Escalation cho agent customer support: dùng uncertainty làm tín hiệu load-balancing, warm handoff không mất context, giảm 60-...

Tại sao bạn nên quan tâm? Hệ thống support truyền thống đang vỡ trận: rule-based routing dựa trên keyword match thất bại với ngữ nghĩa phức tạp, gây misroute và SLA breach. Giải pháp là pipeline inference 3 tầng (FAQ → Triage → Escalation) sử dụng model uncertainty như tín hiệu load-balancing để chuyển ticket cho human chỉ khi entropy cao, giảm 60-75% khối lượng ticket xuống AI agent trong khi giữ an toàn cho edge cases.

Vấn đề

Customer support là bài toán classification với chi phí lỗi không đối xứng (asymmetric error costs): bot trả lời sai gây tức giận khách hàng, churn, và rủi ro pháp lý; trong khi chuyển cho human hơi sớm chỉ tốn thêm chi phí lao động.

Cách tiếp cận cũ dùng keyword matching thất bại trên semantic nuance. Ví dụ: câu "I can't login" có thể là:

Billing: thẻ hết hạn, subscription bị suspend
Technical: MFA bị lỗi, cookie expired
Account recovery: quên mật khẩu, account bị hack

Rule-based system dựa trên từ khóa như login → technical_queue sẽ misroute 30-40% ticket, buộc human agent phải chuyển tiếp lần nữa, gây SLA breach và trải nghiệm "Can you repeat that?" tồi tệ.

Single agent monolithic cũng thất bại: 73% fail trên task phức tạp khi phải đồng thời xử lý retrieval, classification, và escalation prediction trong một context window.

Ý tưởng cốt lõi

Xử lý support như multi-stage inference pipeline với 3 tầng chuyên biệt, mỗi tầng sử dụng confidence threshold để quyết định pass lên tầng trên hoặc dừng lại. Đây là chiến lược pessimistic triage (phân loại bi quan): khi model uncertainty cao (entropy flat), hệ thống luôn chọn chuyển cho human thay vì mạo hiểm trả lời sai.

3-Tier Architecture

1. FAQ Agent (L0) - Pattern Recognition Agent này thực hiện RAG-based deflection với grounded citations. Nếu retrieval confidence dưới ngưỡng (thường cosine similarity < 0.7), agent không hallucinate câu trả lời mà tạo ticket chuyển lên Triage Agent.

# SOUL.md - FAQ Agent (Level 0)

## Identity
Bạn là L0 FAQ Agent cho SaaS B2B platform. Nhiệm vụ: trả lời câu hỏi thường gặp 
bằng RAG hoặc tạo ticket nếu không chắc chắn.

## Constraints
- KHÔNG BAO GIỜ bịa đặt câu trả lời khi retrieval confidence < 0.7
- Luôn cung cấp citation links cho mọi claim
- Sử dụng tool `create_ticket` khi thiếu thông tin

## Tools
- `search_knowledge_base(query: str) -> List[Chunk]`: Tìm kiến thức từ vector DB
- `create_ticket(customer_id: str, summary: str, confidence: float)`: Tạo ticket mới

## Thresholds
- Confidence >= 0.85: Trả lời trực tiếp + citation
- 0.7 <= Confidence < 0.85: Trả lời với disclaimer "Có thể cần xác nhận thêm"
- Confidence < 0.7: Tạo ticket, tag "needs_triage"

2. Triage Agent - Semantic Routing Sử dụng intent classification + urgency detection qua fine-tuned LLM hoặc BERT-style models. Agent khớp ticket embeddings với historical resolution clusters để assign đúng queue (billing vs technical vs legal).

# SOUL.md - Triage Agent

## Identity  
Bạn là Triage Coordinator. Phân tích ticket mới tạo và route đến đúng specialist 
queue dựa trên semantic similarity với historical resolutions.

## Constraints
- Phân biệt được "urgency" (gấp) vs "priority" (quan trọng)
- Check repeat contact patterns trong 24h qua
- KHÔNG xử lý nội dung ticket, chỉ routing

## Tools
- `classify_intent(text: str) -> Intent`: Phân loại ý định (billing/technical/legal)
- `check_sentiment_velocity(thread_id: str) -> float`: Đo độ biến động sentiment
- `assign_queue(ticket_id: str, queue: str, priority: int)`: Chuyển vào hàng đợi
- `predict_escalation_risk(features: Dict) -> float`: Tính xác suất cần escalate

## Routing Rules
- Intent = "billing" + Sentiment < -0.5 → Queue: "Billing_Urgent"  
- Intent = "technical" + Keywords: ["production", "down"] → Immediate Escalation
- Legal keywords: ["lawyer", "cancel contract", "violation"] → Legal Queue + High Risk Flag

3. Escalation Prediction Agent - Risk Scoring Sử dụng survival analysis (Deep Neural Networks) hoặc LLM-based risk scoring để dự đoán xác suất cần human intervention trong 4 giờ tới. Features gồm sentiment velocity, repeat contact patterns, và legal keywords.

Khi dự đoán risk cao, hệ thống thực hiện Warm Handoff: serialize toàn bộ conversation state (intent scores, retrieved KB articles với relevance weights, sentiment trajectory) thành context bundle để human agent tiếp nhận không cần hỏi lại "Can you repeat that?".

# SOUL.md - Escalation Agent

## Identity
Bạn là Escalation Predictor. Dựa trên survival analysis và context thread, quyết định 
có cần đưa human vào ngay lập tức hay để AI agent xử lý tiếp.

## Constraints  
- Risk score > 0.7: Warm handoff ngay lập tức
- Risk score 0.4-0.7: Gắn nhãn "monitor closely", check mỗi 2 turn
- Risk score < 0.4: Tiếp tục autonomous resolution

## Warm Handoff Protocol
Khi escalate, tạo context bundle chứa:
1. Intent classification scores (top-3 với probabilities)
2. Retrieved KB articles với relevance weights  
3. Sentiment trajectory graph (timestamped)
4. Tool calls đã thực hiện và kết quả
5. Current plan state (nếu đang trong multi-step task)

Team Configuration (Multi-Agent)

# agents.yaml - Customer Support Team
team:
  name: "Tiered Support System"
  handoff_protocol: "capability_based_with_audit_trail"
  
  agents:
    - name: "faq_agent"
      role: "l0_deflection"
      model: "gpt-4-turbo"
      soul_file: "souls/faq_agent.md"
      max_turns: 3
      exit_conditions: 
        - "confidence_low"
        - "ticket_created"
      
    - name: "triage_agent"  
      role: "semantic_router"
      model: "claude-3-5-sonnet"
      soul_file: "souls/triage_agent.md"
      input_filter: "ticket_created_events"
      
    - name: "escalation_agent"
      role: "risk_predictor"  
      model: "fine_tuned_bert_risk"
      soul_file: "souls/escalation_agent.md"
      hooks:
        pre_action: "log_risk_calculation"
        post_action: "notify_human_if_escalated"

  shared_memory:
    type: "vector_db_redis_hybrid"
    retention: "30_days"
    
  channels:
    - type: "web_chat"
      entry_point: "faq_agent"
    - type: "email"  
      entry_point: "triage_agent"  # Email usually needs immediate triage
    - type: "telegram"
      entry_point: "faq_agent"

The "Entropy as Vital Sign" Mechanism

Chiến lược then chốt là sử dụng model uncertainty làm stethoscope (ống nghe y tế). Khi LLM's top-2 token probabilities gần nhau (entropy cao), hoặc semantic search trả về max similarity < 0.7, đó là dấu hiệu "vitals bất thường" — cần chuyển ngay lên specialist.

Pipeline này mirror cognitive hierarchy của problem-solving:

Pattern Recognition (FAQ): "Have I seen this exact question 1000 times?" — pure retrieval
Semantic Routing (Triage): "What domain is this?" — classification
Risk Prediction (Escalation): "Will this explode if I handle it wrong?" — survival analysis

Tại sao nó hoạt động

Asymmetric Error Cost Optimization: Chiến lược pessimistic triage nhận ra rằng cost của việc bot trả lời sai là thảm họa (churn, legal risk), trong khi cost của việc chuyển cho human sớm chỉ là labor cost. Optimal strategy là maximize precision ngay cả khi recall.

Model Entropy as Stethoscope: Không giống rule-based system (chỉ match pattern), LLM agent đo lường "độ phẳng" của probability distribution. Khi distribution flat (top-2 tokens gần bằng nhau), agent nhận ra mình "không chắc" — đây là signal load-balancing hoàn hảo để chuyển cho human.

Survival Analysis: Thay vì chỉ nhìn current state, escalation prediction dùng survival analysis (phân tích sống sót) để estimate time-to-escalation. Nó phân tích:

Sentiment velocity: Tốc độ giảm của sentiment score (nhanh = sắp nổi giận)
Repeat contact patterns: Khách hàng liên hệ lần thứ 3 trong ngày = 87% AUC escalate
Legal trigger words: "lawyer", "cancel contract" — bất kể sentiment positive vẫn escalate

Warm Handoff vs Cold Transfer: Traditional routing chuyển ticket như chuyển cuộc gọi điện thoại — người nhận không biết gì về lịch sử. Warm handoff serialize cả conversation graph (bao gồm cả failed retrieval attempts và relevance scores) giúp human agent hiểu tại sao AI không xử lý được, không chỉ cái gì cần xử lý.

So sánh với approach cũ:

Approach	Accuracy	Latency	Cost	Risk
Keyword Routing	60-70%	`<100ms`	Thấp	Misroute cao
Single Agent	75-80%	`2-5s`	Trung bình	Hallucinate nguy hiểm
3-Tier Pipeline	85-90%	`1-3s`	Cao (x3 models)	Pessimistic fail-safe

Ý nghĩa thực tế

Benchmarks thực chiến:

60-75% deflection cho L0 FAQs (industry standard)
~87% AUC trên escalation prediction (arXiv:2010.06145)
70% autonomous resolution cho structured tasks (order tracking, returns) — AgentWorks data
~40% reduction trong "trap loop" churn (eesel AI case studies) nhờ sentiment-driven escalation

Ai đang dùng:

AgentWorks: 70% autonomous resolution cho use cases có cấu trúc rõ ràng
eesel AI: Sentiment-driven escalation giảm churn đáng kể
Ada, Intercom Fin, Sierra: Triển khai tương tự với AI-first customer service
Airline industry: Dùng policy-constrained agents (arXiv:2602.16666v2) với hard guardrails trên PII exposure

Hạn chế — cái này KHÔNG giải quyết được:

Cold-start problem: Sản phẩm mới không có historical embeddings để match
Cross-system diagnosis: Novel bugs đòi hỏi chẩn đoán qua nhiều hệ thống (API, database, CDN) vẫn cần human
EU AI Act compliance: Yêu cầu citation trails mà LLM có thể hallucinate
Vague requests: Đại từ mơ hồ ("that thing I bought") vẫn là unsolved problem

Trade-off quan trọng: Pipeline này tăng token cost (chạy 3 model thay vì 1) và latency (có thể +1-2s so với single shot) nhưng giảm total cost of ownership bằng cách tránh expensive mistakes (wrong refund, legal escalation).

Đào sâu hơn

Docs chính thức:

Anthropic Claude Support Agent Guide — Best practices cho multi-tier support
OpenClaw Support Templates — SOUL.md templates cho customer service
AWS Contact Lens — Sentiment analysis và escalation prediction cho call center

Bài liên quan TroiSinh:

Cùng cụm (Use Cases):

Paper: Customer Support Ticket Escalation Prediction using Deep Neural Networks (2020) — Survival analysis cho DNN escalation prediction
Paper: Leveraging Large Language Models for Automated Ticket Escalation (2025) — LLM feature engineering cho proactive escalation
Blog: eesel AI Chatbot Escalation Strategy — 3-tier trigger taxonomy (customer-driven, AI-driven, sentiment-driven)

Agent cho Customer Support: FAQ bot → Ticket routing → Escalation

Vấn đề

Ý tưởng cốt lõi

3-Tier Architecture

Team Configuration (Multi-Agent)

The "Entropy as Vital Sign" Mechanism

Tại sao nó hoạt động

Ý nghĩa thực tế

Đào sâu hơn

Agent cho Giáo dục

Agent cho Sales & Marketing

Agent cho Internal Ops

Agent Teams: Shared task board và delegation

Channels & Integration

Scheduling & Automation

On this page