CS294/194-280 Advanced Large Language Model Agents

Prospective Students

Students interested in the course should first try enrolling in the course in CalCentral. The class number for CS194-280 is 33840. The class number for CS294-280 is 33841. Please join the waitlist if the class is full.
We plan to expand the class size to allow more students to join. Please fill in the petition form if you are on the waitlist or can’t get added to the waitlist. You will receive an email notification around the beginning of the spring semester if you are allowed in.
Do not email course staff or TAs. Please use Edstem for any questions. For private matters, post a private question on Edstem and make sure it is visable to all teaching staff.

Course Staff

Instructor	(Guest) Co-instructor	(Guest) Co-instructor

Dawn Song	Xinyun Chen	Kaiyu Yang
Professor, UC Berkeley	Research Scientist, Google DeepMind	Research Scientist, Meta FAIR

Teaching Staff: Alex Pan, Tara Pande, Ashwin Dara, Jason Yan

Class Time and Location

Lecture: 4-6pm PT Monday at Anthro/Art Building 160

Course Description

Large language model (LLM) agents have been an important frontier in AI, however, they still fall short critical skills, such as complex reasoning and planning, for solving hard problems and enabling end-to-end applications in real-world scenarios. Building on our previous course, this course dives deeper into advanced topics in LLM agents, focusing on reasoning, AI for mathematics, code generation, and program verification. We begin by introducing advanced inference and post-training techniques for building LLM agents that can search and plan. Then, we focus on two application domains: mathematics and programming. We study how LLMs can be used to prove mathematical theorems, as well as generate and reason about computer programs. Specifically, we will cover the following topics:

Inference-time techniques for reasoning
Post-training methods for reasoning
Search and planning
Agentic workflow, tool use, and functional calling
LLMs for code generation and verification
LLMs for mathematics: data curation, continual pretraining, and finetuning
LLM agents for theorem proving and autoformalization

Syllabus

Date	Guest Lecture (4:00PM-6:00PM PT)	Supplemental Readings
Jan 27th	Inference-Time Techniques for LLM Reasoning Xinyun Chen, Google DeepMind Recording Intro Slides	- Large Language Models as Optimizers - Large Language Models Cannot Self-Correct Reasoning Yet - Teaching Large Language Models to Self-Debug All readings are optional this week.
Feb 3rd	Learning to reason with LLMs Jason Weston, Meta Recording Slides	- Direct Preference Optimization: Your Language Model is Secretly a Reward Model - Iterative Reasoning Preference Optimization - Chain-of-Verification Reduces Hallucination in Large Language Models
Feb 10th	On Reasoning, Memory, and Planning of Language Agents Yu Su, Ohio State University Recording Slides	- Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization - HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models - Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Feb 17th	No Class - Presidents’ Day
Feb 24th	Open Training Recipes for Reasoning in Language Models Hanna Hajishirzi, University of Washington Recording Slides	- Tulu 3: Pushing Frontiers in Open Language Model Post-Training - Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback - OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs
Mar 3rd	Coding Agents and AI for Vulnerability Detection Charles Sutton, Google DeepMind Recording Slides	- Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities - From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code
Mar 10th	Multimodal Autonomous AI Agents Ruslan Salakhutdinov, CMU/Meta Recording Slides	- Mind2Web: Towards a Generalist Agent for the Web - WebArena: A Realistic Web Environment for Building Autonomous Agents - VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks - Tree Search for Language Model Agents
Mar 17th	Multimodal Agents – From Perception to Action Caiming Xiong, Salesforce AI Research Recording Slides	- OSWORLD: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments - AGUVIS: Unified Pure Vision Agents For Autonomous GUI Interaction
Mar 24th	No Class - Spring Recess
Mar 31st	AlphaProof: when reinforcement learning meets formal mathematics Thomas Hubert, Google DeepMind 10am-noon PT Recording Slides	- AI achieves silver-medal standard solving International Mathematical Olympiad problems - Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm - The Future of Mathematics? - Building the Mathematical Library of the Future
Apr 7th	Language models for autoformalization and theorem proving Kaiyu Yang, Meta FAIR Recording Slides	- LeanDojo: Theorem Proving with Retrieval-Augmented Language Models - Autoformalization with Large Language Models - Autoformalizing Euclidean Geometry
Apr 14th	Advanced topics in theorem proving Sean Welleck, CMU Recording Slides	- Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs - miniCTX: Neural Theorem Proving with Long-Contexts - Lean-STaR: Learning to Interleave Thinking and Proving - ImProver: Agent-Based Automated Proof Optimization
Apr 21st	Abstraction and Discovery with Large Language Model Agents Swarat Chaudhuri, UT Austin 10am-noon PT Recording Slides	- An In-Context Learning Agent for Formal Theorem-Proving - Symbolic Regression with a Learned Concept Library
Apr 28th	Towards building safe and secure agentic AI Dawn Song, UC Berkeley Recording Slides	- Privtrans: Automatically Partitioning Programs for Privilege Separation - DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks - AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases - Progent: Programmable Privilege Control for LLM Agents

Enrollment and Grading

Prerequisites: Students are strongly encouraged to have had experience and basic understanding of Machine Learning and Deep Learning before taking this class, e.g., have taken courses such as CS182, CS188, and CS189.

Please fill out the petition form if you are on the waitlist or can’t get added to the waitlist.

This is a variable-unit course. All enrolled students are expected to participate in lectures in person and complete weekly reading summaries related to the course content. Students enrolling in one unit are expected to submit an article that summarizes one of the lectures. Students enrolling in more than one unit are expected to submit a lab assignment and a project instead of the article. For students enrolling in 2 units, the project should have a written report, which can be a survey in a certain area related to LLMs. For students enrolling in 3 or 4 units, projects will follow either an applications track or a research track:

Applications Track: Projects in this track focus on applied use cases of LLMs and do not necessarily need to contribute novel research. Students in this track will work in groups of 3-4. The project for 3-unit students should include an implementation (coding) component that programmatically interacts with LLMs, while 4-unit students must complete a more substantial implementation with the potential for real-world impact.
Research Track: Students in this track will conduct novel research under the supervision of postdocs and graduate students, with the goal of publishing in a workshop or conference. Research track projects must be completed in groups of 2-3, and students must apply to participate via a forthcoming Google form. The expectations for implementation and intellectual contributions will align with the project requirements for 3- and 4-unit students.

The grade breakdowns for students enrolled in different units are the following:

	1 unit	2 units	3/4 units
Participation	40%	16%	8%
Reading Summaries	10%	4%	2%
Quizzes	10%	4%	2%
Article	40%
Lab		16%	8%
Project
Proposal		10%	10%
Milestone		10%	10%
Poster Presentation		10%	10%
Presentation Recording		10%	5%
Report		20%	20%
Implementation			25%

Lab and Project Timeline

	Released	Due
Project group formation	1/27	2/24
Project proposal	2/3	2/24
Project milestone	2/24	3/31
Lab	3/31	4/28
Project final poster presentation	4/28	5/5
Project final presentation recording	4/28	5/16
Project final report	4/28	5/16

Office Hours

Alex: 6-7pm on Mondays on Zoom