CS 194/294-267 Understanding Large Language Models: Foundations and Safety

Videos: Youtube videos of many of the lectures are available here.

The syllabus contains links to the video of each lecture, if available.

Prerequisite: Prospective students should have taken CS 182/282A Deep Neural Networks or its equivalent(s) and had some hands-on experience with deep learning.

Course Staff

Course Staff
Role	Person
Instructor	(Guest) Co-instructor

Dawn Song	Dan Hendrycks
Professor, UC Berkeley	Director, Center for AI Safety

GSI: Yu Gai

edstem

Class Time and Location

Lecture: 3:30-5pm PT Tuesday at Soda 306

First lecture rescheduled to Jan 19 noon-1:30pm at Soda 306

Course Description

Generative AI and Large Language Models (LLMs) including ChatGPT have ushered the world into a new era with rich new capabilities for wide-ranging application domains. At the same time, there is little understanding of how these new capabilities emerge, their limitations and potential risks. In this class, we will introduce foundations of LLMs, study methods for better understanding LLMs, discuss scaling laws and emergence, and explore the risks and challenges with these technologies and how we can build towards safe and beneficial AI. In particular, this class will cover a wide-ranging topics including:

Foundations of LLMs
Interpretability
Scaling laws
Adversarial robustness
AI alignment and governance
Trojans and unlearning
Privacy and watermarking
Agency and emergence
Reasoning and mathematics
Evaluation and benchmarking

Syllabus

Date	Topic	Readings (forms due at 2pm before the day of the lecture)
Jan 19	Intro (slides) & transformer, LLM foundations (slides) Guest speaker: Łukasz Kaiser (Member of Technical Staff, OpenAI)	- Attention Is All You Need - Chain-of-Thought Prompting - (Optional) Neural GPUs - (Optional) GPT becoming a Turing machine Fill this out after reading the papers!
Jan 23	AI safety primer (slides) and representation engineering (slides) (video) Guest speaker: Dan Hendrycks (Director, Center for AI Safety)	- Unsolved Problems in ML Safety - Representation Engineering - (Optional) Catastrophic AI Risks Fill this out after reading the papers!
Jan 30	Interpretability and model editing (slides) (video) Guest speaker: David Bau (Assistant Professor, Northeastern University)	- Locating and Editing Factual Associations in GPT - Function Vectors in LLMs - (Optional) Visualizing and Understanding CNNs - (Optional) Linear Classifier Probes - (Optional) In-context Learning and Induction Heads Fill this out after reading the papers!
Feb 6	Inside-out interpretability: training dynamics in multi-layer transformer (slides) (video) Guest speaker: Yuandong Tian (Research Scientist and Senior Manager, Meta AI Research)	- Scan and Snap - JoMA - (Optional) StreamingLLM - (Optional) Deja Vu - (Optional) H₂O Fill this out after reading the papers!
Feb 13	Models within models: how do LLMs represent the world? (slides) (video) Guest speaker: Martin Wattenberg (Professor, Harvard University)	- Emergent World Representations - The System Model and the User Model - (Optional) Probing Classifiers - (Optional) Climbing towards NLU - (Optional) Can LMs Encode Perceptual Structure without Grounding? Fill this out after reading the papers!
Feb 20	Benchmarks and evals, safety vs. capabilities, machine ethics (slides) (video) Guest speakers: Dan Hendrycks (Director, Center for AI Safety) and Bo Li (Associate Professor, University of Chicago)	- DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models - Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark Fill this out after reading the papers!
Feb 27	Memorization in language models (slides) (video) Guest speaker: Eric Wallace (UC Berkeley)	- (Optional) Extracting Training Data from Large Language Models - (Optional) Scalable Extraction of Training Data from (Production) Language Models Fill this out for bonus points!
Mar 5	Watermarking and AI safety Guest speaker: Boaz Barak (Professor, Harvard University)	- (Optional) Watermarking in the sand - (Optional) Thoughts on AI safety Fill this out for bonus points!
Mar 12	Using AI to understand AI (slides) (video) Guest speaker: Jacob Steinhardt (Assistant Professor, UC Berkeley)	- Interpreting CLIP’s Image Representation via Text-Based Decomposition - Mass-Producing Failures of Multimodal Systems with Language Models - (Optional) Describing Differences between Text Distributions with Natural Language Fill this out for bonus points!
Mar 19	The security of LLMs (slides) (video) Guest speaker: Nicholas Carlini (Research Scientist, Google DeepMind)	- Universal and Transferable Adversarial Attacks on Aligned Language Models - (Optional) Poisoning Web-Scale Training Datasets is Practical - (Optional) Stealing Part of a Production Language Model Fill this out after reading the papers!
Mar 26	Spring recess
Apr 2	No lecture
Apr 9	Towards safer AI though mechanistic interpretability and formal verification Guest speaker: Max Tegmark (Professor, MIT)	- Opening the AI black box: program synthesis via mechanistic interpretability Fill this out after reading the papers!
Apr 16	Deep Learning for Mathematical Reasoning (slides) Guest speaker: Christian Szegedy (Co-founder, xAI)	- Autoformalization with Large Language Models - (Optional) Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs - (Optional) Magnushammer: A Transformer-Based Approach to Premise Selection Fill this out after reading the papers!
Apr 23	Project presentation

Enrollment and Grading

Prerequisite: Prospective students should have taken CS 182/282A Deep Neural Networks or its equivalent(s) and had some hands-on experience with deep learning.

Prospective students should first try enrolling in the course through CalCentral. The class number of CS 194-267 (for undergraduate students) is 34188 and the class number of CS 294-267 (for graduate students) is 34187. Please join the waitlist if the course is full. Please fill in the enrollment petition form if you are on or cannot join the waitlist. We will reach out to you if your petition is approved.

This is a variable-unit course. All enrolled students are expected to attend lectures in person and submit reading summaries and questions for Q&A before each lecture. Students enrolling in one unit are expected to write an article that summarizes one of the lectures. Students enrolling in more than one units are expected to complete a lab assignment and a project. The project of students enrolling in 2 units should have a written report, which can be a survey in an area relevant to LLMs. The project of students enrolling in 3 units should also have an implementation (coding) component that programmatically interacts with LLMs, and the project of students enrolling in 4 units should have a significant implementation component with the potential for either real world impacts or intellectual contributions. The grade breakdowns for students enrolled in different units are the following:

Group	1 unit	2 units	3/4 units
Lecture participation	25%	10%	10%
Reading summaries & questions for Q&A	25%	10%	10%
Article	50%
Lab		20%	10%
Project
- Proposal		10%	10%
- Milestone		10%	10%
- Presentation		20%	15%
- Report		20%	15%
- Implementation			20%

Lab and Project Timeline

Events	Released	Due
Project group formation	Jan 19	Jan 30
Project proposal	Jan 23	Feb 13
Lab	Jan 30	Feb 27
Project milestone	Feb 13	Mar 19
Project presentation	Mar 19	Apr 23
Project final report	Mar 19	Apr 30

Office Hours

Yu Gai