Scaling to 1000+ Agents: Lessons learned — Khi nhiều agent không đồng nghĩa với nhanh hơn

Scaling agent systems tuân theo structural laws, không phải compute laws. Hiểu dependency graph, trade-off giữa parallel vs sequential, và lý do Go goroutine...

Khi bạn chuyển từ 10 agent sang 1000, mọi giả định về performance đều sụp đổ. Không phải vì thiếu GPU hay RAM, mà vì lỗi coordinate và cascade failure. Giả thuyết "thêm agent = nhanh hơn" giống như cho rằng thêm đầu bếp sẽ nấu nhanh hơn trong một căn bếp nhỏ—nếu không biết ai làm món nào, họ chỉ va vào nhau.

Vấn đề

Heuristic "More Agents Is All You Need" thất bại vì bỏ qua task topology. Agent không phải CPU core—họ là stateful, non-deterministic, và dễ lỗi. Trong chuỗi sequential (A→B→C→D), xác suất lỗi tổng cộng là $P_\text{total} \approx 1 - (1 - p)^N$ . Với $p=5\%$ và $N=10$ agents, bạn có 40% khả năng toàn bộ workflow fail chỉ vì một handoff lỗi.

Cách cũ: Thêm agent cho đến khi chạy nhanh. Kết quả: Coordination overhead bùng nổ, context window bloat, và "runaway token cost"—một agent lỗi kéo theo chín agent khác retry liên tục, đốt tiền API không kiểm soát. Reddit cộng đồng gọi đây là "financial circuit breakers" moment—khi bạn nhận ra thêm agent không giải quyết được bài toán, mà chỉ thêm điểm lỗi.

Ý tưởng cốt lõi

Insight then chốt: Agent là distributed systems nodes với bộ nhớ non-deterministic, không phải thread pool. Bạn không scale bằng cách thêm core, mà bằng cách phân tích dependency graph.

Dependency Graph Topology

Có hai hình dạng workflow quyết định kiến trúc:

1. Cây rộng (Embarrassingly Parallel) Research 10 tài liệu độc lập, verify 100 data points song song. → Dùng Independent Architecture (map-reduce). Mỗi agent chạy isolated, kết quả aggregate ở cuối. Benchmark cho thấy 30-40% cải thiện so với Single-Agent.

2. Chuỗi dài (Sequential Chain) Write code → Debug → Test → Deploy. → Dùng Single-Agent (SAS) hoặc Centralized Hub-and-Spoke. Thêm agent vào chuỗi này tạo thêm handoff points—mỗi điểm là rủi ro lỗi. Decentralized (P2P) architecture làm chậm 15-25% so với SAS vì coordination overhead không giải quyết được critical path.

Five Architecture Patterns

Pattern	Topology	Khi nào dùng	Trade-off
SAS	One agent làm tất cả	Sequential tasks, context cần giữ nguyên	Context window bloat, nhưng không có handoff error
Independent	Fan-out/fan-in, không orchestrator	Wide trees, parallelizable research	30-40% gain, nhưng khó debug cross-agent
Centralized	Hub-and-spoke với orchestrator	Mixed workloads, cần routing linh hoạt	Orchestrator là bottleneck single point
Decentralized	P2P consensus	Emergent collaboration, agent negotiation	15-25% slower cho sequential, cao overhead
Hybrid	Hierarchy + directed peer	Production reality, pha trộn task types	Complex governance, nhưng optimal cho mixed

Governance at Scale

Khi vượt quá 100 agents, code không còn là vấn đề—con người và template mới là vấn đề.

Template Libraries: Force standardized interfaces qua SOUL.md templates. Không để "snowflake agents" tồn tại—mỗi agent phải kế thừa từ base template với 23 lớp validation.
Phased Rollout: 10 power users → 50-100 core departments → 500+ self-service với guardrails. Giống như khai thác mỏ: thử nghiệm trước khi cho nổ lớn.

Runtime Engineering: Go vs Node

Tại sao GoClaw chọn Go thay vì Node.js cho fleet 1000+ agents?

Goroutines vs Event Loop:

Go: M:N scheduling cho phép 1000+ agents chạy concurrent I/O trên vài OS threads. Mỗi goroutine bắt đầu với 2KB stack, grow khi cần. Khi một agent block (gọi LLM API), scheduler switch ngay sang agent khác—không có "callback hell".
Node.js: Single-threaded event loop. Nếu một agent xử lý file đồng bộ hoặc LLM call timeout, toàn bộ event loop bị block. 1000 agents = 1000 closure chains trong memory, dễ leak khi nesting sâu.

Memory Isolation: Trong Go, mỗi agent là goroutine nhẹ với channel-based communication. Trong Node.js, mỗi agent là promise chain trong cùng một heap—một agent leak có thể kill cả process.

Tại sao nó hoạt động

Bản chất là structural load balancing thay vì blind scaling.

Trade-off rõ ràng giữa parallel và sequential:

Parallel tasks: Coordination chi phí thấp vì không có dependency. Independent architecture loại bỏ orchestrator bottleneck, tận dụng được 100% parallelizable work.
Sequential tasks: Critical path không đổi dù thêm bao nhiêu agent. Thêm handoff chỉ thêm độ trễ và điểm lỗi. Decentralized architecture thua SAS ở đây vì consensus overhead không song song hóa được critical path.

Error containment: Trong multi-agent architecture, lỗi bị contain trong agent pod. Trong monolithic single-agent, một hallucination ở step 3 lan ra toàn bộ reasoning chain qua 20+ bước sau—hiện tượng "laundered error" (lỗi bị giặt sạch qua nhiều bước, khó trace nguyên nhân).

Predictive Model: Research từ Google và MIT cho thấy có thể predict optimal architecture với 87% accuracy chỉ bằng cách analyze task properties: parallelizability score và step dependency graph. Đây là "aha moment"—bạn không cần trial-and-error đau đớn, chỉ cần nhìn cấu trúc bài toán.

Ý nghĩa thực tế

Metric	Naive Scaling (Thêm agent bừa)	Structural Scaling (Phân tích graph)
Parallel Tasks	+5% (overhead cao)	+30-40% (Independent)
Sequential Tasks	-25% (handoff lỗi)	0% (giữ nguyên SAS)
Debug Complexity	$N^2$ interactions	Isolated failure domains
Cost Control	Runaway tokens	Financial circuit breakers

Benchmark thực tế:

Mantel: 10k+ B2B users với hybrid architecture, xử lý sequential workflow bằng SAS và parallel research bằng Independent agents.
MIT Media Lab: Mô phỏng 1,052 agents concurrent với persistent memory, chứng minh Decentralized architecture chỉ hiệu quả khi task có tính negotiation cao.
Google Research: 87% accuracy trong việc predict optimal architecture từ task properties, giảm thời gian tuning từ tuần xuống giờ.

Ai đang dùng: Mantel (10k+ users), Sonic Automotive, BNY Mellon (enterprise deployment), và các nền tảng trong hệ sinh thái OpenClaw/GoClaw xử lý 1000+ concurrent agents trên VPS giá rẻ nhờ goroutine efficiency.

Hạn chế:

Runtime cost explosions: Agent chạy 24/7 tạo "token spend runaway" nếu không có circuit breaker (AdenHQ pattern).
Task classification: Cần biết trước task structure (thường không biết cho đến khi chạy).
Governance overhead: Tăng super-linearly khi vượt 500 agents nếu không có automated template enforcement.

Đào sâu hơn

Tài liệu chính:

Kim et al. (2026) — "Towards a Science of Scaling Agent Systems": Định nghĩa quantitative scaling laws.
Park et al. (2024) — "Generative Agent Simulations of 1,000 People": Kiến trúc mô phỏng 1,052 agents.
arXiv:2602.03794 — "Understanding Agent Scaling in LLM-Based Multi-Agent Systems": Diminishing returns analysis.

Bài liên quan TroiSinh:

Cùng cụm (advanced-architecture):

Scaling to 1000+ Agents: Lessons learned — Khi nhiều agent không đồng nghĩa với nhanh hơn

Vấn đề

Ý tưởng cốt lõi

Dependency Graph Topology

Five Architecture Patterns

Governance at Scale

Runtime Engineering: Go vs Node

Tại sao nó hoạt động

Ý nghĩa thực tế

Đào sâu hơn

GoClaw vs OpenClaw Internals

Build Custom Channel

Agent Evaluation

Production Deployment

Security & Multi-tenant

Future Ecosystem

On this page