This LLM Agents Hackathon, hosted by Berkeley RDI and in conjunction with the LLM Agents MOOC, aims to bring together students, researchers, and practitioners to build and showcase innovative work in LLM agents, grow the AI agent community, and advance LLM agent technology. It is open to the public and will be held both virtually and in-person at UC Berkeley.


The hackathon is designed to have 5 tracks:

  • Applications Track: Building innovative LLM agent applications across diverse domains, from coding assistants to personal AI companions.
  • Benchmarks Track: Creating and improving benchmarks for AI agents, enabling standardized evaluation and comparison of different agent architectures and capabilities.
  • Fundamentals Track: Enhancing core agent capabilities such as memory, planning, reasoning, and tool use through novel frameworks and techniques.
  • Safety Track: Addressing critical safety concerns in AI agent deployment, including misuse prevention, privacy, interpretability, and broader societal impacts.
  • Decentralized and Multi-Agents Track: Advancing tools, frameworks, and applications for decentralized multi-agent systems, focusing on enhanced capabilities, interactions, and deployment.

We hope this hackathon with these specially-designed tracks can help demonstrate that we are entering a new phase of maturity and practicality of LLM agent technology where:

  • Every developer can learn to use LLM agent technology for building innovative applications (Applications Track)
  • Decentralized community collaboration can effectively bring the community together to build key technologies and infrastructure for LLM agents, serving as important foundations and public good for the community in AI (Benchmarks, Fundamentals, Safety, and Multi-Agent Tracks)
For Hackathon discussion, please join the Hackathon channel at LLM Agents Discord. For more information and to answer frequently asked questions, please refer to our ongoing Hackathon FAQ.


PRIZES & RESOURCES

⭐ More than $200k in prizes and resources! With more to be announced...

Sponsor Logo

PRIZES Winners will be selected for a total of $15,000 in gift cards for the Applications track.


RESOURCES Learn more and check their openings.


⧉ Learn More Here

Sponsor Logo

PRIZES Winners selected across all 5 tracks for prizes totaling $4.5k in λ credits

RESOURCES Llama API endpoint throughout the hackathon and GPU compute credits


⧉ Learn More Here

Sponsor Logo

PRIZES Winners receives prizes totaling up to $32k in Intel Tiber Cloud Credits

RESOURCES Compute resources, including CPU and GPU.



⧉ Learn More Here

Sponsor Logo

PRIZES Winners will be selected for up to $10,000 in gift cards for the Safety track.

RESOURCES Learn more about their work on AI Safety


⧉ Learn More Here

Sponsor Logo

PRIZES Up to $20,000 in cash for winners of the Safety track! (Up to $10k/$6.5k/$3.5k for 1st/2nd/3rd place).

RESOURCES Learn more about their grants.

⧉ Learn More Here

Sponsor Logo

PRIZES Up to $6,000 in AWS cloud credits for winners.

RESOURCES Learn more and check out their openings, especially PhD internships in GenAI/LLMs.

⧉ Learn More Here

Sponsor Logo

RESOURCES Learn more, watch for an info session, and check out their openings.

⧉ Learn More Here

Sponsor Logo

RESOURCES Learn more and check out their openings.


⧉ Learn More Here

Sponsor Logo

RESOURCES Learn more and check out their openings.


⧉ Learn More Here

Sponsor Logo

SPECIAL RAFFLE All participants eligible. Raffle winners receive travel grants for the LLM Agents Summit in August 2025 in Berkeley, CA.


SHOWCASE OPPORTUNITY Hackathon winners will be invited to showcase their projects at the Summit.


HACKATHON TRACKS

Applications Track

Develop innovative LLM-based agents for various domains, including coding assistants, customer service, regulatory compliance, data science, AI scientists, and personal assistants. Focus on both hard-design problems (novel domain-specific tools) and soft-design problems (high-fidelity human simulations and improved AI agent interfaces).

Benchmarks Track

Create or improve AI agent benchmarks for novel tasks or extend existing ones. Focus on developing multi-modal or multi-agent benchmarks, improving evaluation methods, and creating more robust and efficient testing environments for AI agents.

Fundamentals Track

Enhance core agent capabilities in memory, planning, reasoning, tool-use, and multimodal interactions. Improve existing frameworks, design novel prompting schemes, and develop better methods for agents to interact with various tools and environments.

Safety Track

Address critical safety concerns in AI agent deployment, including preventing misuse, ensuring privacy, improving interpretability, and assessing broader societal impacts. Develop methods for better control, auditing, and accountability of AI agents in various applications and multi-agent systems.

Decentralized and Multi-Agents Track

Develop innovative tools, frameworks, and infrastructure to enhance multi-agent capabilities and decentralized deployment. Investigate how multiple agents interact with each other and how we can better leverage their capabilities. Design novel applications across domains, emphasizing decentralization benefits like robustness, scalability, and privacy.


TIMELINE (SUBJECT TO CHANGE)

It's not too late to join! Sign up today!

Date Event
October 21 Begin Hackathon
Participant Sign Up Open (required)
October 28 Team Sign Up Open (required)
November 20 Mid-hackathon Progress Check-in DUE (optional)
November 25 Credits & API Access Sign Up DUE (optional)
Compute Resources Sign Up DUE (optional)
December 19
11:59pm PST
Fill Out Project Submission Form DUE (required)
December 20 Judging (12/20 9am - 1/7 11:59pm PST)
January 9 Winners Announced at 9am PST

HACKATHON PROGRAM SCHEDULE

Date Program Session
November 12 Your Compute to Win the LLM Agents MOOC Hackathon - 'Get Started' Demos with Lambda
Event Recording
November 21 Building with Intel: Tiber AI Cloud and Intel Liftoff
Event Recording
November 26 Workshop with Google AI: Building with Gemini for the LLM Agents MOOC Hackathon
Event Recording
December 3 Info Session with Sierra
Event Recording

JUDGING CRITERIA

Applications Track

Strong submissions will demonstrate novel use cases addressing real-world problems, with seamless integration of the LLM agent into the target domain and intuitive UI/UX. Projects should display strong potential for impact and widespread adoption.

Benchmarks Track

Strong submissions will provide comprehensive, standardized benchmarks with clear evaluation criteria for agent capabilities. Another option is to expand and improve existing benchmarks by generating more high-quality data or curating more accurate examples. Project should enable meaningful cross-agent comparisons and offer insights into efficiency, accuracy, and generalization.

Fundamentals Track

Strong submissions will aim to enhance current LLM agent capabilities (e.g., long-term memory, planning, function calling, tool use, or multi-step reasoning). Projects should contain innovative approaches to solving complex problems autonomously.

Safety Track

Strong submissions will thoroughly define and address high-impact safety risks, proposing effective solutions, frameworks, or protocols. Projects should showcase effectiveness through comprehensive testing, validation, and evaluation.

Decentralized and Multi-Agents Track

Strong submissions will expand on limitations of existing frameworks, offering solutions for improved communication, multi-agent collaboration, and scalability. Projects should address practical challenges of real-world deployment.


SUBMISSION

Project submission will happen through our Google Form. For more information, see the submission requirements. Here are the following items to ensure that your project is accepted:

  • Video Presentation - Include a link to your video presentation. It should be no more than 3 minutes. Your video should contain a presentation walking through your project and a recorded demo of how your project works.
  • Presentation Slides - Include a link to your presentation slides in PDF format.
  • Project Code - Include a Github with an informative README with steps on how to run your project. Externally link any large datasets.
  • Documentation - Include project information that describes the findings of your project or how it works.

JUDGES

With more to be announced!

Dawn Song
UC Berkeley

Xinyun Chen
Google DeepMind

Burak Gokturk
Google

Chi Wang
Google DeepMind

Shunyu Yao
OpenAI

Yuandong Tian
Meta AI

Edwin Arbus
OpenAI

Rahul Unnikrishnan Nair
Intel

Chuan Li
Lambda

Henry Xiao
AMD

Will Lu
Orby AI

Elie Bursztein
Google DeepMind

Soheil Koushan
Anthropic

Shai Limonchik
Anthropic

Caiming Xiong
Salesforce

Jason Wu
Salesforce

Tim Weingarten
Adept AI Labs

Josh Albrecht
Imbue

Rumman Chowdhury
Humane Intelligence

Alok Tongaonkar
Palo Alto Networks

Anand Raghavan
Cisco

Pushkar Nandkar
SambaNova Systems

Chenxi Wang
Rain Capital

Corinne Riley
Greylock Partners

Rajko Radovanovic
a16z

Daniel Miessler
Unsupervised Learning

Julian Stastny
Center on Long Term Risk

Henry Sleight
MATS



Join the mailing list to stay informed!

* indicates required