CS294/194-196 Large Language Model Agents

Prospective Students

Students interested in the course should first try enrolling in the course in CalCentral. The class number for CS194-196 is 32306. The class number for CS294-196 is 32304. Please join the waitlist if the class is full.
We plan to expand the class size to allow more students to join. Please fill in the signup form if you are on the waitlist or can’t get added to the waitlist. You will receive an email notification around the beginning of the fall semester if you are allowed in.
Do not email course staff or TAs. Please use Edstem for any questions. For private matters, post a private question on Edstem and make sure it is visable to all teaching staff.

Course Staff

Instructor	(Guest) Co-instructor

Dawn Song	Xinyun Chen
Professor, UC Berkeley	Research Scientist, Google DeepMind

GSIs: Alex Pan & Sehoon Kim

Readers: Tara Pande & Ashwin Dara

Class Time and Location

Lecture: 3-5pm PT Monday at Latimer 120

Course Description

Large language models (LLMs) have revolutionized a wide range of domains. In particular, LLMs have been developed as agents to interact with the world and handle various tasks. With the continuous advancement of LLM techniques, LLM agents are set to be the upcoming breakthrough in AI, and they are going to transform the future of our daily life with the support of intelligent task automation and personalization. In this course, we will first discuss fundamental concepts that are essential for LLM agents, including the foundation of LLMs, essential LLM abilities required for task automation, as well as infrastructures for agent development. We will also cover representative agent applications, including code generation, robotics, web automation, medical applications, and scientific discovery. Meanwhile, we will discuss limitations and potential risks of current LLM agents, and share insights into directions for further improvement. Specifically, this course will include the following topics:

Foundation of LLMs
Reasoning
Planning, tool use
LLM agent infrastructure
Retrieval-augmented generation
Code generation, data science
Multimodal agents, robotics
Evaluation and benchmarking on agent applications
Privacy, safety and ethics
Human-agent interaction, personalization, alignment
Multi-agent collaboration

Syllabus

Date	Guest Lecture	Readings (due Sunday 11:59pm before lecture on Gradescope)
Sept 9	LLM Reasoning Denny Zhou, Google DeepMind Intro Slides Original Recording Edited Video	- Chain-of-Thought Reasoning Without Prompting - Large Language Models Cannot Self-Correct Reasoning Yet - Premise Order Matters in Reasoning with Large Language Models - Chain-of-Thought Empowers Transformers to Solve Inherently Serial Problems All readings are optional this week.
Sept 16	LLM agents: brief history and overview Shunyu Yao, OpenAI Slides Original Recording Edited Video	- WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents - ReAct: Synergizing Reasoning and Acting in Language Models
Sept 23	Agentic AI Frameworks & AutoGen Chi Wang, AutoGen-AI Building a Multimodal Knowledge Assistant Jerry Liu, LlamaIndex Chi’s Slides Jerry’s Slides Original Recording Edited Video	- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation - StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows
Sept 30	Enterprise trends for generative AI, and key components of building successful agents/applications Burak Gokturk, Google Slides Original Recording Edited Video	- Google Cloud expands grounding capabilities on Vertex AI - The Needle In a Haystack Test: Evaluating the performance of RAG systems - The AI detective: The Needle in a Haystack test and how Gemini 1.5 Pro solves it
Oct 7	Compound AI Systems & the DSPy Framework Omar Khattab, Databricks Slides Original Recording Edited Video	- Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs - Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together
Oct 14	Agents for Software Development Graham Neubig, Carnegie Mellon University Slides Original Recording Edited Video	- SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering - OpenHands: An Open Platform for AI Software Developers as Generalist Agents
Oct 21	AI Agents for Enterprise Workflows Nicolas Chapados, ServiceNow Slides Original Recording Edited Video	- WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? - WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks - TapeAgents: a Holistic Framework for Agent Development and Optimization
Oct 28	Towards a unified framework of Neural and Symbolic Decision Making Yuandong Tian, Meta AI (FAIR) Slides Original Recording Edited Video	- Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping - Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces - Composing Global Optimizers to Reasoning Tasks via Algebraic Objects in Neural Nets - SurCo: Learning Linear Surrogates For Combinatorial Nonlinear Optimization Problems
Nov 4	Project GR00T: A Blueprint for Generalist Robotics Jim Fan, NVIDIA Slides Original Recording Edited Video	- Voyager: An Open-Ended Embodied Agent with Large Language Models - Eureka: Human-Level Reward Design via Coding Large Language Models - DrEureka: Language Model Guided Sim-To-Real Transfer
Nov 11	No Class - Veterans Day
Nov 18	Open-Source and Science in the Era of Foundation Models Percy Liang, Stanford University Slides Original Recording Edited Video	- Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models
Nov 25	Measuring Agent capabilities and Anthropic’s RSP Ben Mann, Anthropic Slides Original Recording Edited Video	- Announcing our updated Responsible Scaling Policy - Developing a computer use model
Dec 2	Towards Building Safe & Trustworthy AI Agents and A Path for Science‑ and Evidence‑based AI Policy Dawn Song, UC Berkeley Slides Edited Video	- A Path for Science‑ and Evidence‑based AI Policy - DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models - Representation Engineering: A Top-Down Approach to AI Transparency - Extracting Training Data from Large Language Models - The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks All readings are optional this week.

Enrollment and Grading

Prerequisite: Students are strongly encouraged to have had experience and basic understanding of Machine Learning and Deep Learning before taking this class, e.g., have taken courses such as CS182, CS188, and CS189.

Please fill out the signup form if you are on the waitlist or can’t get added to the waitlist.

This is a variable-unit course. All enrolled students are expected to participate in lectures in person and complete weekly reading summaries related to the course content. Students enrolling in one unit are expected to submit an article that summarizes one of the lectures. Students enrolling in more than one unit are expected to submit a lab assignment and a project instead of the article. The project of students enrolling in 2 units should have a written report, which can be a survey in a certain area related to LLMs. The project of students enrolling in 3 units should also have an implementation (coding) component that programmatically interacts with LLMs, and the project of students enrolling in 4 units should have a very significant implementation component with the potential for either real world impacts or intellectual contributions. The grade breakdowns for students enrolled in different units are the following:

	1 unit	2 units	3/4 units
Participation	40%	16%	8%
Reading Summaries & Q/A	10%	4%	2%
Quizzes	10%	4%	2%
Article	40%
Lab		16%	8%
Project
Proposal		10%	10%
Milestone 1		10%	10%
Milestone 2		10%	10%
Presentation		15%	15%
Report		15%	15%
Implementation			20%

Lab and Project Timeline

	Released	Due
Project group formation	9/9	9/16
Project proposal	9/22	9/30
Labs	10/1	10/15
Project milestone #1	10/19	10/25
Project milestone #2	10/29	11/20
Project final presentation	11/19	12/17
Project final report	11/19	12/17

Office Hours

Alex: 5-6pm on Mondays on Zoom
Sehoon: 10-11am on Tuesdays on Zoom