Benchmark suite for evaluating multi-agent coding systems across multi-step, multi-file workflows.
Automated code-review and fix loop that also produces ML training data. Runs stack-aware review agents, verifies findings against the code, applies fixes validated by the test suite, and records every run as an ATIF trajectory for fine-tuning.
Claude Code plugin marketplace. 131 agent skills for code review, docs, testing, strategy, and planning across Python, Go, Rust, Elixir, React, and iOS.
Multi-agent orchestrator for software development. Human-in-the-loop approval gates, per-agent model routing, sandboxed execution with credential isolation, real-time dashboard.
Self-improving coding agent in Rust. A directed-search harness rewrites the agent's own prompts and tool descriptions, re-evaluates each candidate end-to-end on a frozen slice of real terminal tasks under a token-penalized reward, and accepts only changes that clear a measured noise margin.