5-layer Security: Defense-in-depth từ Rate limiting đến Encryption — kiến trúc bảo vệ Agent runtime

Kiến trúc 5 lớp bảo mật cho AI Agent: Rate limiting chống brute-force, Injection detection phân tích AST, SSRF protection chặn metadata IP, Shell sanitizatio...

AI Agent runtime như OpenClaw hay GPT-5.3-Codex không chỉ là "chatbot gọi API". Chúng là tiến trình có quyền thực thi shell, đọc filesystem, và gọi HTTP outbound. Một lỗ hổng duy nhất trong validation có thể biến "fetch ảnh profile" thành "exfiltrate AWS credentials" hoặc "execute arbitrary commands". Kiến trúc 5-layer security không phải checklist bảo mật thông thường, mà là hệ thống defense-in-depth ánh xạ đến 5 ranh giới tín nhiệm (trust boundaries) riêng biệt từ network ingress đến data storage.

Vấn đề

Các framework thế hệ đầu (LangChain sớm) xử lý bảo mật theo mô hình "per-layer" độc lập: gateway kiểm tra auth, exec policy kiểm tra lệnh, sandbox kiểm tra filesystem. Nhưng khi LLM có thể gọi tool, attacker khai thác "cross-layer composition" — prompt injection → tool call → shell escape.

Vụ Capital One năm 2019 là lời cảnh tỉnh: attacker dùng SSRF để gọi 169.254.169.254 (AWS metadata service), lộ 100 triệu hồ sơ khách hàng. OWASP xếp SSRF #7 trong API Security Top 10. Với Agent runtime, vấn đề còn nghiêm trọng hơn: agent có thể tự động thực thi code, gọi API liên tục, và duy trì trạng thái (stateful) qua nhiều turn — tạo attack surface vượt xa web apps truyền thống. Single missed validation cho phép pivoting từ "fetch profile image" sang "exfiltrate AWS credentials" chỉ trong một tool call.

Ý tưởng cốt lõi

5-layer security không phải 5 bước tuần tự, mà là 5 "confusion boundaries" — nơi dữ liệu bị nhầm lẫn thành code, hoặc context của attacker bị nhầm là context của server.

Layer 1: Rate Limiting — Intent Confusion Boundary

Bạn không thể phân biệt request đến từ user thật hay bot đang brute-force payload injection. Rate limiting không chỉ là "làm chậm lại", mà là behavior-based throttling.

Dùng token bucket algorithm để giới hạn 100 req/min per IP, nhưng quan trọng hơn là giới hạn theo "behavior signature" — số lần thử tool call thất bại, độ phức tạp của input (độ sâu parse tree). Điều này ngăn attacker dò tìm injection vectors mà không chặn legitimate users đứng sau corporate NAT.

# GoClaw rate limiting config
rate_limit:
  algorithm: token_bucket
  capacity: 100
  refill_rate: 1 per minute
  per_key: ip_address
  burst_behavior: exponential_backoff  # Tăng thời gian chờ khi phát hiện pattern bất thường

Layer 2: Injection Detection — Language Confusion Boundary

Đây là ranh giới nơi application thấy một string, nhưng SQL engine hoặc shell thấy executable syntax. Cách tiếp cận cũ (regex tìm từ khóa "DROP", "DELETE") thất bại với encoding tricks.

Giải pháp là AST-based parsing + pattern matching: parse input bằng cùng grammar với consumer (SQL parser cho SQLi, shell parser cho command injection). Reject inputs chứa shell metacharacters (;, |, `, $()) hoặc SQL keywords ngoài quoted contexts ngay tại ingress.

Ví dụ: Input "; cat /etc/passwd #" bị phát hiện vì parse tree chứa Command Substitution node, không phải vì chứa chuỗi "cat".

Layer 3: SSRF Protection — Spatial Confusion Boundary

Đây là "spatial confusion" — tinh tế nhất. Bạn tưởng "localhost" là máy của user, nhưng server là privileged insider trên network của chính nó. Khi yêu cầu server fetch http://127.0.0.1:8080/admin, bạn không tấn công server; bạn nhờ server tấn công chính đồng nghiệp của nó.

Metadata service tại 169.254.169.254 không phải "magic" — đó chỉ là REST API tin tưởng mọi request từ "inside the house". 90% cloud SSRF attacks nhắm vào IP này.

Giải pháp: Strict allow-list validation với URL parsing chặt chẽ. Block:

Metadata IPs: 169.254.169.254, 127.0.0.1/8, ::1, 10.0.0.0/8
Dangerous schemes: file://, gopher://, dict://
Bypass techniques: decimal IP notation (2130706433 = 127.0.0.1), @ trick (http://evil.com@169.254.169.254)

Parse URL trước validation, normalize hostname, rồi mới check allowlist — tránh bypass qua fragment (#) hoặc userinfo.

Layer 4: Shell Sanitization — Execution Context Confusion Boundary

Spaces và semicolons là data đối với bạn, nhưng là operators đối với /bin/sh. Giải pháp không phải "sanitization" (dễ bị bypass bởi encoding), mà là không bao giờ pass user input đến system() hay exec() dưới dạng string.

Dùng parameterized APIs (ví dụ: subprocess.run với list args thay vì shell=True) hoặc sandboxed containers với seccomp-bpf filters. GoClaw thực hiện exec trong Docker container với allowlist syscall, không dựa vào lexical parsing (bị bypass bởi l\s hay busybox multiplexing).

Layer 5: Encryption — Data Boundary

TLS 1.3 cho transit (ngăn credential sniffing khi SSRF leak internal data qua network mirror). AES-256 cho data at rest (bảo vệ chống lại file-read exploits khi attacker vượt qua lớp 3 nhưng bị chặn ở lớp 4).

Không chỉ là "bật HTTPS", mà là end-to-end encryption từ agent memory đến vector DB, bao gồm cả encryption cho ephemeral container volumes và TLS mutual auth giữa agent nodes.

Tại sao nó hoạt động

Logic cốt lõi là separation of confusion types. Mỗi lớp fix một loại nhầm lẫn ngữ nghĩa (semantic) khác nhau:

Rate limiting giải quyết intent confusion — hành vi thay vì danh tính.
Injection detection giải quyết language confusion — parse tree thay vì string matching.
SSRF giải quyết spatial confusion — network topology awareness.
Shell giải quyết execution context confusion — parameterized API thay vì string concatenation.

Trade-off rõ ràng:

Rate limiting có thể false positive với corporate NAT (nhiều user chung IP).
AST parsing chậm hơn regex (~10-50ms overhead), nhưng đáng đổi để tránh bypass.
Sandboxing tăng latency khởi động container (~100-500ms), không phù hợp real-time sub-100ms.

So sánh với monolithic security (WAF đơn lớp):

Monolithic Security	5-Layer Defense
Single perimeter (WAF)	Multi-layer, zero-trust giữa các stage
Pattern matching (regex)	Grammar-aware parsing (AST)
Blacklist IPs	Behavior-based throttling + spatial awareness
Input sanitization	Architectural separation (parameterized APIs)

Ý nghĩa thực tế

Capital One Breach (2019): 100 triệu hồ sơ bị lộ qua SSRF đến 169.254.169.254. Với 5-layer, lớp 3 (SSRF protection) sẽ block metadata IP ngay tại gateway.

OWASP Benchmarks: SSRF ranked #7 trong API Security Top 10. 90% cloud SSRF attacks nhắm vào metadata IPs — chặn được ở lớp 3.

Reddit Case Study: Shell injection qua SEO poisoning của trusted channels (thảo luận r/cybersecurity 2025). Lớp 4 (shell sanitization) ngăn chặn bằng cách không cho phép arbitrary command execution dù input đến từ "trusted source".

Hạn chế:

Không bảo vệ chống zero-day bypass trong URL parsers (khác biệt parsing giữa Python urllib và requests).
Không chống insider threat với legitimate access.
Không chống side-channel timing attacks (nghe thời gian xử lý để đo lường data).

Paper: "SoK: The Attack Surface of Agentic AI — Tools, and Autonomy" (2025) — Systematization of Knowledge về attack surface mới của AI agents.
Paper: "A Systematic Taxonomy of Security Vulnerabilities in the OpenClaw" (2025) — Phân tích 190 vulnerabilities trong OpenClaw runtime.
Blog: TCM Security — "Understanding Detecting and Exploiting SSRF" — Kỹ thuật detection qua DNS callback confirmation.

5-layer Security: Defense-in-depth từ Rate limiting đến Encryption — kiến trúc bảo vệ Agent runtime

Vấn đề

Ý tưởng cốt lõi

Layer 1: Rate Limiting — Intent Confusion Boundary

Layer 2: Injection Detection — Language Confusion Boundary

Layer 3: SSRF Protection — Spatial Confusion Boundary

Layer 4: Shell Sanitization — Execution Context Confusion Boundary

Layer 5: Encryption — Data Boundary

Tại sao nó hoạt động

Ý nghĩa thực tế

Đào sâu hơn

Multi-tenant Architecture

Agent Permission Model

Prompt Injection Defense

Hooks & Quality Control

Production Deployment

On this page