This LLM Agents Hackathon, hosted by Berkeley RDI and in conjunction with the LLM Agents MOOC, aims to bring together students, researchers, and practitioners to build and showcase innovative work in LLM agents, grow the AI agent community, and advance LLM agent technology. It is open to the public and will be held both virtually and in-person at UC Berkeley.
The hackathon is designed to have 5 tracks:
- Applications Track: Building innovative LLM agent applications across diverse domains, from coding assistants to personal AI companions.
- Benchmarks Track: Creating and improving benchmarks for AI agents, enabling standardized evaluation and comparison of different agent architectures and capabilities.
- Fundamentals Track: Enhancing core agent capabilities such as memory, planning, reasoning, and tool use through novel frameworks and techniques.
- Safety Track: Addressing critical safety concerns in AI agent deployment, including misuse prevention, privacy, interpretability, and broader societal impacts.
- Decentralized and Multi-Agents Track: Advancing tools, frameworks, and applications for decentralized multi-agent systems, focusing on enhanced capabilities, interactions, and deployment.
We hope this hackathon with these specially-designed tracks can help demonstrate that we are entering a new phase of maturity and practicality of LLM agent technology where:
- Every developer can learn to use LLM agent technology for building innovative applications (Applications Track)
- Decentralized community collaboration can effectively bring the community together to build key technologies and infrastructure for LLM agents, serving as important foundations and public good for the community in AI (Benchmarks, Fundamentals, Safety, and Multi-Agent Tracks)
PRIZES & RESOURCES
⭐ More than $200k in prizes and resources! With more to be announced...
PRIZES 3 Winners! 1st, 2nd and 3rd place - $25k, $10k, and $5k in OpenAI credits
RESOURCES Access & credits are available for hackathon teams
⧉ Learn More Here
PRIZES Winners receive prizes totaling $25k in Google Cloud credits
RESOURCES Access & credits are available for hackathon teams
⧉ Learn More Here
PRIZES Winners will be selected for a total of $15,000 in gift cards for the Applications track.
RESOURCES Learn more and check their openings.
⧉ Learn More Here
PRIZES Winners selected across all 5 tracks for prizes totaling $4.5k in λ credits
RESOURCES Llama API endpoint throughout the hackathon and GPU compute credits
⧉ Learn More Here
PRIZES Winners receives prizes totaling up to $32k in Intel Tiber Cloud Credits
RESOURCES Compute resources, including CPU and GPU.
⧉ Learn More Here
PRIZES Winners will be selected for up to $10,000 in gift cards for the Safety track.
RESOURCES Learn more about their work on AI Safety
⧉ Learn More Here
PRIZES Up to $20,000 in cash for winners of the Safety track! (Up to $10k/$6.5k/$3.5k for 1st/2nd/3rd place).
RESOURCES Learn more about their grants.
⧉ Learn More Here
PRIZES Up to $6,000 in AWS cloud credits for winners.
RESOURCES Learn more and check out their openings, especially PhD internships in GenAI/LLMs.
⧉ Learn More Here
RESOURCES Learn more, watch for an info session, and check out their openings.
⧉ Learn More Here
RESOURCES Learn more and check out their openings.
⧉ Learn More Here
RESOURCES Learn more and check out their openings.
⧉ Learn More Here
SPECIAL RAFFLE
All participants eligible. Raffle winners receive travel grants for the LLM Agents Summit in August 2025 in Berkeley, CA.
SHOWCASE OPPORTUNITY Hackathon winners will be invited to showcase their projects at the Summit.
HACKATHON TRACKS
Applications Track
Develop innovative LLM-based agents for various domains, including coding assistants, customer service, regulatory compliance, data science, AI scientists, and personal assistants. Focus on both hard-design problems (novel domain-specific tools) and soft-design problems (high-fidelity human simulations and improved AI agent interfaces).
Benchmarks Track
Create or improve AI agent benchmarks for novel tasks or extend existing ones. Focus on developing multi-modal or multi-agent benchmarks, improving evaluation methods, and creating more robust and efficient testing environments for AI agents.
Fundamentals Track
Enhance core agent capabilities in memory, planning, reasoning, tool-use, and multimodal interactions. Improve existing frameworks, design novel prompting schemes, and develop better methods for agents to interact with various tools and environments.
Safety Track
Address critical safety concerns in AI agent deployment, including preventing misuse, ensuring privacy, improving interpretability, and assessing broader societal impacts. Develop methods for better control, auditing, and accountability of AI agents in various applications and multi-agent systems.
Decentralized and Multi-Agents Track
Develop innovative tools, frameworks, and infrastructure to enhance multi-agent capabilities and decentralized deployment. Investigate how multiple agents interact with each other and how we can better leverage their capabilities. Design novel applications across domains, emphasizing decentralization benefits like robustness, scalability, and privacy.
TIMELINE (SUBJECT TO CHANGE)
It's not too late to join! Sign up today!
Date | Event |
---|---|
October 21 | Begin Hackathon
Participant Sign Up Open (required) |
October 28 | Team Sign Up Open (required) |
November 20 | Mid-hackathon Progress Check-in DUE (optional) |
November 25 |
Credits & API Access Sign Up DUE (optional)
Compute Resources Sign Up DUE (optional) |
December 19 11:59pm PST |
Fill Out Project Submission Form DUE (required) |
December 20 | Judging (12/20 9am - 1/7 11:59pm PST) |
January 9 | Winners Announced at 9am PST |
HACKATHON PROGRAM SCHEDULE
Date | Program Session |
---|---|
November 12 | Your Compute to Win the LLM Agents MOOC Hackathon - 'Get Started' Demos with Lambda
Event Recording |
November 21 | Building with Intel: Tiber AI Cloud and Intel Liftoff
Event Recording |
November 26 | Workshop with Google AI: Building with Gemini for the LLM Agents MOOC Hackathon
Event Recording |
December 3 | Info Session with Sierra
Event Recording |
JUDGING CRITERIA
Applications Track
Strong submissions will demonstrate novel use cases addressing real-world problems, with seamless integration of the LLM agent into the target domain and intuitive UI/UX. Projects should display strong potential for impact and widespread adoption.
Benchmarks Track
Strong submissions will provide comprehensive, standardized benchmarks with clear evaluation criteria for agent capabilities. Another option is to expand and improve existing benchmarks by generating more high-quality data or curating more accurate examples. Project should enable meaningful cross-agent comparisons and offer insights into efficiency, accuracy, and generalization.
Fundamentals Track
Strong submissions will aim to enhance current LLM agent capabilities (e.g., long-term memory, planning, function calling, tool use, or multi-step reasoning). Projects should contain innovative approaches to solving complex problems autonomously.
Safety Track
Strong submissions will thoroughly define and address high-impact safety risks, proposing effective solutions, frameworks, or protocols. Projects should showcase effectiveness through comprehensive testing, validation, and evaluation.
Decentralized and Multi-Agents Track
Strong submissions will expand on limitations of existing frameworks, offering solutions for improved communication, multi-agent collaboration, and scalability. Projects should address practical challenges of real-world deployment.
SUBMISSION
Project submission will happen through our Google Form. For more information, see the submission requirements. Here are the following items to ensure that your project is accepted:
- Video Presentation - Include a link to your video presentation. It should be no more than 3 minutes. Your video should contain a presentation walking through your project and a recorded demo of how your project works.
- Presentation Slides - Include a link to your presentation slides in PDF format.
- Project Code - Include a Github with an informative README with steps on how to run your project. Externally link any large datasets.
- Documentation - Include project information that describes the findings of your project or how it works.
JUDGES
Dawn Song
UC Berkeley
Xinyun Chen
Google DeepMind
Burak Gokturk
Google
Chi Wang
Google DeepMind
Shunyu Yao
OpenAI
Yuandong Tian
Meta AI
Edwin Arbus
OpenAI
Rahul Unnikrishnan Nair
Intel
Chuan Li
Lambda
Henry Xiao
AMD
Will Lu
Orby AI
Elie Bursztein
Google DeepMind
Soheil Koushan
Anthropic
Shai Limonchik
Anthropic
Caiming Xiong
Salesforce
Jason Wu
Salesforce
Tim Weingarten
Adept AI Labs
Josh Albrecht
Imbue
Rumman Chowdhury
Humane Intelligence
Alok Tongaonkar
Palo Alto Networks
Anand Raghavan
Cisco
Pushkar Nandkar
SambaNova Systems
Chenxi Wang
Rain Capital
Corinne Riley
Greylock Partners
Rajko Radovanovic
a16z
Daniel Miessler
Unsupervised Learning
Julian Stastny
Center on Long Term Risk
Henry Sleight
MATS