Phân tích Harness của Claude Code: Từ vụ leak source code

Phân tích kỹ thuật 512K dòng code rò rỉ của Claude Code, giải mã cách Anthropic thiết kế Agent-Computer Interface (ACI) và harness agent cho coding. Từ lý th...

Định nghĩa

Claude Code harness là kiến trúc Agent-Computer Interface (ACI) được Anthropic thiết kế để Claude có thể đọc, viết và thực thi code trong môi trường local, được hé lộ qua vụ leak 512K dòng source code vào tháng 3/2025. Đây là case study hiếm hoi cho phép chúng ta quan sát cách một big tech thiết kế harness production thực tế — không chỉ là lý thuyết trên paper mà là trade-off thực sự giữa safety, latency và capability.

Giải thích chi tiết

Bối cảnh vụ leak và những gì được hé lộ

Tháng 3/2025, repository chứa source code của Claude Code — công cụ CLI cho phép Claude thao tác trực tiếp với filesystem và terminal — bị leak ra ngoài với 512K dòng code. Sự cố này, tuy không mong muốn, đã tạo ra cơ hội hiếm có để reverse-engineer một production-grade AI harness.

Điều quan trọng: đây không phải là "prompt template" hay "system message" đơn thuần. Chúng ta thấy được toàn bộ hệ thống: cách agent khởi tạo session, quản lý state, xin phép người dùng (permission model), xử lý lỗi (failure recovery), và cách nó "nhìn" vào một codebase có kích thước terabyte mà không bị overflow context window.

Agent-Computer Interface (ACI) trong Claude Code

Claude Code triển khai một ACI "conservative by design". Thay vì cho agent toàn quyền chạy rm -rf /, harness bắt buộc một permission matrix rõ ràng:

Tool categorization:

Read-only tools: read_file, view_directory, grep_search — chạy tự động không cần ask
Destructive tools: write_file, bash với side effects — bắt buộc human-in-the-loop approval
Info tools: get_system_info, list_processes — read-only nhưng rate-limited

Context window strategy cho codebase lớn: Khi làm việc với repo lớn (ví dụ: Linux kernel), Claude Code không đổ cả codebase vào prompt. Thay vào đó, harness triển khai "lazy loading" qua search index:

Agent phân tích task, xác định files liên quan qua grep và find
Chỉ read các file cần thiết, thường là specific line ranges (ví dụ: lines 45-120)
Edit operations được thực hiện qua write_file với explicit diff checking

Trade-off ở đây: latency cao hơn (phải read nhiều lần) nhưng safety và cost (token usage) tốt hơn.

Kiến trúc Multi-Agent ẩn sau harness

Phân tích sâu hơn vào logic orchestration, chúng ta thấy dấu vết của kiến trúc multi-agent tương tự pattern $124.70 DAW (Developer Autonomous Workflow) mà Anthropic đã công bố:

Ba agent ẩn:

Planner: Tạo specification, phân tích dependencies, quyết định files nào cần đụng tới. Output là .planner/ directory chứa task breakdown.
Executor: Agent thực thi thao tác — viết code, chạy test, debug. Đây là phần CLI người dùng tương tác trực tiếp.
Reviewer: Chạy ở background (hoặc explicit command), kiểm tra lại changes qua git diff, đánh giá test coverage.

Communication mechanism: Không phải message passing qua API (latency cao), mà là "file system as message bus". Planner viết spec vào /tmp/claude-plan.md, Executor đọc và thực thi, Reviewer ghi nhận xét vào /tmp/claude-review.md. Đây là kỹ thuật "stateful harness" tiết kiệm token hơn multi-turn conversation.

State Management và Session Design

Claude Code triển khai một hybrid memory model:

Persistent state (giữa các session):

.claude/ directory trong project root chứa:
- memory.json: Project-specific context (tech stack, conventions, recent errors)
- scratchpad.md: Notes cho "future self"
- plan-history/: Các plan đã thực thi trước đó

Ephemeral context (trong một session):

Conversation history với user
Recently accessed files cache (LRU eviction khi gần đầy context window)

Context window pressure handling: Khi conversation dài, harness không đơn thuần cắt bỏ old messages (cách làm naive). Thay vào đó, nó dùng "summarization checkpoint": khi đạt ngưỡng 80% context window, agent tự động tạo summary của work done so far, lưu vào memory.json, và clear conversation history nhưng giữ lại critical tool outputs.

Security Guardrails và Permission Design

Phân tích security layer cho thấy Anthropic áp dụng defense-in-depth:

Tool-level guards:

bash command được parse qua denylist trước khi execute (chặn rm -rf, sudo, curl | sh)
write_file có backup mechanism: tạo .claude/backups/ trước khi overwrite
Network access bị cô lập (sandbox) — agent không thể tự ý gọi API bên ngoài ngoại trừ explicit allowed list

Approval workflows: Harness implement "sticky permissions": nếu user approve write_file cho src/utils.py, agent có thể tiếp tục edit file đó trong 5 phút tiếp theo mà không cần ask lại. Nhưng nếu chuyển sang file khác hoặc tool khác, reset approval. Đây là trade-off giữa UX (không bị spam confirm) và safety (limit blast radius).

Ví dụ thực tế

Cách Claude Code xử lý "Edit file lớn"

Giả sử bạn yêu cầu: "Thêm logging vào hàm process_payment trong src/transaction.py (file 2000 dòng)".

Cách naive (harness kém):

Read toàn bộ 2000 dòng vào context (tốn ~4000 tokens)
Claude tìm hàm, edit, write lại cả file (risk overwrite các phần khác)
Token cost cao, risk cao

Cách Claude Code harness thực hiện:

Search phase: Dùng grep_search để tìm def process_payment → biết chính xác ở dòng 845
Read phase: read_file với range 840-860 (chỉ 20 dòng, ~40 tokens)
Analysis phase: Agent xác định cần insert 3 dòng logging
Edit phase: write_file với start_line=845, end_line=847, content mới — chỉ edit đúng chỗ cần thiết, không touch phần còn lại
Verify phase: Tự động read_file lại để confirm change đúng intent

Lesson learned: Good harness không phải là "AI thông minh hơn", mà là "thông tin đúng được đưa vào đúng lúc".

Multi-agent workflow trong feature implementation

Scenario: "Implement user authentication feature".

Planner agent execution:

Tạo file .claude/plan/auth-impl.md chứa:
- Tasks: 1) Create User model, 2) Add login endpoint, 3) Write tests
- Files to touch: models/user.py, routes/auth.py, tests/test_auth.py
- Risks: "Database migration needed, backup first"

Executor agent (CLI):

Đọc plan từ file system
Thực thi từng task, sử dụng tool calls
Khi gặp lỗi test, không tự ý sửa logic lung tung mà ghi chú vào .claude/scratchpad.md: "Test failed vì missing salt in password hash"

Reviewer agent (triggered explicitly hoặc post-completion):

Chạy git diff để review changes
Kiểm tra xem có secrets bị hardcode không (regex scan)
Ghi kết quả vào .claude/review/auth-review.md

Trade-off analysis: Latency cao hơn single-agent (phải wait planner rồi mới execute), nhưng quality và safety cao hơn đáng kể. Đây là minh chứng cho "Harness Engineering > Prompt Engineering" — cùng một model (Claude 3.5 Sonnet), nhưng orchestration khác cho kết quả khác biệt.

Ứng dụng

Sinh viên và Researcher

Nếu bạn đang nghiên cứu AI Engineering, vụ leak Claude Code là "gold mine" để hiểu production thực tế:

So sánh theory vs practice: Paper SWE-agent nói về ACI tối ưu, nhưng Claude Code implementation cho thấy các constraints thực tế (latency, user patience, safety requirements)
Reverse engineering exercise: Thử implement một subset của harness (ví dụ: chỉ file read/write với approval flow) để hiểu tại sao Anthropic thiết kế như vậy

AI Engineer và Tech Lead

Khi xây dựng internal coding tools cho công ty:

ACI Design Pattern: Áp dụng lazy loading và selective context injection từ Claude Code để optimize token cost
Permission Matrix: Thiết kế tool categories (read-only vs destructive) giống Claude Code thay vì "cho agent mọi quyền"
Human-in-the-loop: Implement "sticky permissions" để cân bằng UX và safety — không phải approve từng lệnh ls nhưng vẫn chặn rm -rf

Doanh nghiệp xây dựng Coding Agent

Cho các startup hoặc enterprise muốn xây dựng "Devin của riêng mình":

Benchmark: Dùng Claude Code harness như baseline để đánh giá harness của bạn (có thiếu state management không? Permission model có chặt không?)
Safety-first approach: Học cách Anthropic prioritize safety over capability — feature "auto-run" bị disable by default, phải explicit opt-in
Context scaling: Áp dụng kỹ thuật line-range reading vào RAG pipeline của bạn để handle codebase enterprise-scale (terabytes of code)

So sánh

Đặc điểm	Claude Code Harness	SWE-agent (OpenAI)	Devin (Cognition)
Architecture	Multi-agent ẩn (Planner-Executor-Reviewer)	Single-agent với ACI tối ưu	End-to-end autonomous
Permission Model	Explicit approval, sticky permissions	Semi-autonomous, human interrupt	Full autonomous (sandbox)
Context Strategy	Lazy loading, line-range reads	Repo-wide search, file-level	IDE-like full context
State Management	Persistent `.claude/` directory	Session-based, limited persistence	Cloud session, resume capability
Safety Approach	Conservative, denylist commands	Moderate, rely on sandbox	Aggressive, full isolation
Trade-off	Safety + Cost > Speed	Speed + Capability > Safety	Maximum capability, expensive

Kết luận: Claude Code chọn vị trí "conservative and safe" trên spectrum — họ trade một phần autonomy (agent không tự quyết mọi thứ) để đổi lấy trust và safety. Đây là lựa chọn deliberate từ Anthropic, phù hợp với positioning "AI assistant" thay vì "AI replacement". Ngược lại, SWE-agent ưu tiên throughput và số lượng task hoàn thành, phù hợp hơn cho research benchmark nhưng riskier cho production use.

Phân tích Harness của Claude Code: Từ vụ leak source code

Định nghĩa

Giải thích chi tiết

Bối cảnh vụ leak và những gì được hé lộ

Agent-Computer Interface (ACI) trong Claude Code

Kiến trúc Multi-Agent ẩn sau harness

State Management và Session Design

Security Guardrails và Permission Design

Ví dụ thực tế

Cách Claude Code xử lý "Edit file lớn"

Multi-agent workflow trong feature implementation

Ứng dụng

Sinh viên và Researcher

AI Engineer và Tech Lead

Doanh nghiệp xây dựng Coding Agent

So sánh

Bài viết liên quan

Cùng cụm

Xây dựng Harness cho Coding Agent từ đầu

Xây dựng Agent cho Production: Enterprise patterns

Tối ưu Harness: Đo lường và cải thiện hiệu suất

Tương lai: Meta-Harness, Harness-ception, và paradigm tiếp theo

Đọc tiếp

Tool & Permission Design

Multi-Agent Architecture

Security & Guardrails

On this page