Lambda
$400 cloud credits to every individual or team
Compete for over $1M in prizes and resources. This two-phase competition challenges participants to first build novel benchmarks or enhance existing benchmarks for agentic AI (Phase 1), and then create AI agents to excel on them (Phase 2)—advancing the field by creating high-quality, broad-coverage, realistic agent evaluations as shared public goods.
As AI evolves toward agentic systems—capable of reasoning, taking actions, and interacting with the world—current benchmarking methods fall short. Existing evaluations suffer from poor interoperability (agents must be heavily modified to fit each benchmark), limited reproducibility (stateful tools and dynamic configurations cause inconsistent results), fragmentation (leaderboards and results are scattered across platforms), and poor discoverability (with new benchmarks appearing almost weekly, finding the right one is surprisingly hard).
Our vision is a unified, open space where the community defines the goalposts of agentic AI—through benchmarks that are standardized, reproducible, collaborative, and discoverable.
Through the AgentX–AgentBeats competition, we aim to bring the community together to create high-quality, broad-coverage, realistic agent evaluations—developed in an agentified, standardized, reproducible, and collaborative way—as shared public goods for advancing agentic AI.
$400 cloud credits to every individual or team
$50 inference credits to every individual or team
$100 cloud credits to every individual or team
$50 inference credits to every individual or team
$50 credits to every individual or team
Up to $50k prize pool in GCP/Gemini credits to be shared among the winning teams.
Up to $50k prize pool in inference credits to be shared among the winning teams.
OpenAI credits of $10,000, $5,000, and $1,000 will be awarded to the 1st, 2nd, and 3rd place winners in each of the two tracks: the Research Track and the Finance Agent Track.
Up to $10k prize pool in AWS credits to be shared among the winning teams.
Each winning team member who is currently a student will receive:
Up to $50k prize pool in GCP/Gemini credits to be shared among the winning teams.
Up to $50k prize pool in inference credits to be shared among the winning teams.
OpenAI credits of $10,000, $5,000, and $1,000 will be awarded to the 1st, 2nd, and 3rd place winners in each of the two tracks: the Research Track and the Finance Agent Track.
Up to $10k prize pool in AWS credits to be shared among the winning teams.
Each winning team member who is currently a student will receive:
Each winning team will receive up to 2 complimentary tickets to the Agentic AI Summit 2026 (August 1–2 at UC Berkeley).
Hugging Face credits of $5,000, $3,000, and $2,000 will be awarded to the 1st, 2nd, and 3rd place winners in the custom track—the OpenEnv Challenge.
AgentBeats is an open-source platform built on the new Agentified Agent Assessment (AAA) paradigm: instead of adapting your agent to fit a rigid benchmark, the benchmark itself becomes an agent. A 🟢 green agent (evaluator) defines the tasks, environment, and scoring; a 🟣 purple agent (competitor) is the AI agent under test. They communicate via the A2A protocol, so you build your agent once and it works with any benchmark on the platform.
Want a detailed walkthrough? Watch the competition intro video or skim the slides.
Oct 16, 2025 to Jan 31, 2026
Participants build green agents that define assessments and automate scoring. Pick your evaluation track:
March 2 to May 24, 2026
We're excited to announce that Phase 2 of the AgentX–AgentBeats competition will officially launch on March 2, 2026. Participants will build purple agents to tackle the select top green agents from Phase 1 and compete on the public leaderboards. Unlike Phase 1, where participants competed across all tracks throughout the entire duration, Phase 2 introduces a sprint-based format. The competition will be organized into four rotating sprints:
The 4th Sprint is the grand finale of AgentX-AgentBeats, focused on general-purpose agents. While earlier sprints emphasized depth within specific tracks, Sprint 4 emphasizes breadth: strong, consistent performance across many green agents, benchmarks, and evaluation categories. Sprint 4 includes all green agents from the first three Phase 2 sprints, plus additional selected benchmarks introduced for this final sprint:
To be eligible for Sprint 4 judging, a team must evaluate its purple agent on at least 5 green agents spanning at least 3 distinct categories. Teams are strongly encouraged to go beyond this minimum; broader coverage across more green agents and more categories will be viewed favorably.
Judging will reward purple agents that demonstrate strong cross-benchmark performance, category diversity, generality, cost efficiency, and technical quality. A strong Sprint 4 submission should show that the same purple-agent architecture can adapt across substantially different task types without benchmark-specific hardcoding or special-case lookup tables.
A red-teaming and automated security testing challenge.
Learn more about the challenge—full details and guidelines are available here.
Phase 2: Feb 23, 2026 – March 30, 2026
Learn more about the τ²-Bench Challenge—full details and guidelines are available here.
Deadline: March 30, 2026
SOTA Environments to drive general intelligence.
Learn more about the challenge—full details and guidelines are available here.
Deadline: April 21, 2026
| Date | Event |
|---|---|
| Oct 16, 2025 | Participant registration open |
| Oct 24, 2025 | Team signup & Build Phase 1 |
| Jan 31, 2026 | Green agent submission |
| Feb 1, 2026 | Green agent judging |
| March 2, 2026 | Phase 2: Build purple agents |
| May 24, 2026 | End of Phase 2 |