Skip to the content.

Course Staff

Course Staff
Role Person
Instructor (Guest) Co-instructor
Dawn Song Dan Hendrycks
Dawn Song Dan Hendrycks
Professor, UC Berkeley Director, Center for AI Safety

GSI: Yu Gai

edstem

Class Time and Location

Lecture: 3:30-5pm PT Tuesday at Soda 306

First lecture rescheduled to Jan 19 noon-1:30pm at Soda 306

Course Description

Generative AI and Large Language Models (LLMs) including ChatGPT have ushered the world into a new era with rich new capabilities for wide-ranging application domains. At the same time, there is little understanding of how these new capabilities emerge, their limitations and potential risks. In this class, we will introduce foundations of LLMs, study methods for better understanding LLMs, discuss scaling laws and emergence, and explore the risks and challenges with these technologies and how we can build towards safe and beneficial AI. In particular, this class will cover a wide-ranging topics including:

Syllabus (subject to change)

Date Topic Readings
(forms due at 2pm before the day of the lecture)
Jan 19 Intro (slides) & transformer, LLM foundations (slides)
Guest speaker: Łukasz Kaiser (Member of Technical Staff, OpenAI)
- Attention Is All You Need
- Chain-of-Thought Prompting
- (Optional) Neural GPUs
- (Optional) GPT becoming a Turing machine
Fill this out after reading the papers!
Jan 23 AI safety primer (slides) and representation engineering (slides)
Guest speaker: Dan Hendrycks (Director, Center for AI Safety)
- Unsolved Problems in ML Safety
- Representation Engineering
- (Optional) Catastrophic AI Risks
Fill this out after reading the papers!
Jan 30 Interpretability and model editing (slides)
Guest speaker: David Bau (Assistant Professor, Northeastern University)
- Locating and Editing Factual Associations in GPT
- Function Vectors in LLMs
- (Optional) Visualizing and Understanding CNNs
- (Optional) Linear Classifier Probes
- (Optional) In-context Learning and Induction Heads
Fill this out after reading the papers!
Feb 6 Inside-out interpretability: training dynamics in multi-layer transformer (slides)
Guest speaker: Yuandong Tian (Research Scientist and Senior Manager, Meta AI Research)
- Scan and Snap
- JoMA
- (Optional) StreamingLLM
- (Optional) Deja Vu
- (Optional) H2O
Fill this out after reading the papers!
Feb 13 Models within models: how do LLMs represent the world? (slides)
Guest speaker: Martin Wattenberg (Professor, Harvard University)
- Emergent World Representations
- The System Model and the User Model
- (Optional) Probing Classifiers
- (Optional) Climbing towards NLU
- (Optional) Can LMs Encode Perceptual Structure without Grounding?
Fill this out after reading the papers!
Feb 20 Benchmarks and evals, safety vs. capabilities, machine ethics (slides)
Guest speakers: Dan Hendrycks (Director, Center for AI Safety) and Bo Li (Associate Professor, University of Chicago)
- DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
- Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Fill this out after reading the papers!
Feb 27 Memorization in language models (slides)
Guest speaker: Eric Wallace (UC Berkeley)
- (Optional) Extracting Training Data from Large Language Models
- (Optional) Scalable Extraction of Training Data from (Production) Language Models
Fill this out for bonus points!
Mar 5 Watermarking and AI safety
Guest speaker: Boaz Barak (Professor, Harvard University)
- (Optional) Watermarking in the sand
- (Optional) Thoughts on AI safety
Fill this out for bonus points!
Mar 12 Using AI to understand AI (slides)
Guest speaker: Jacob Steinhardt (Assistant Professor, UC Berkeley)
- Interpreting CLIP’s Image Representation via Text-Based Decomposition
- Mass-Producing Failures of Multimodal Systems with Language Models
- (Optional) Describing Differences between Text Distributions with Natural Language
Fill this out for bonus points!
Mar 19 The security of LLMs (slides)
Guest speaker: Nicholas Carlini (Research Scientist, Google DeepMind)
- Universal and Transferable Adversarial Attacks on Aligned Language Models
- (Optional) Poisoning Web-Scale Training Datasets is Practical
- (Optional) Stealing Part of a Production Language Model
Fill this out after reading the papers!
Mar 26 Spring recess  
Apr 2 No lecture  
Apr 9 Towards safer AI though mechanistic interpretability and formal verification
Guest speaker: Max Tegmark (Professor, MIT)
- Opening the AI black box: program synthesis via mechanistic interpretability
Fill this out after reading the papers!
Apr 16 Deep Learning for Mathematical Reasoning (slides)
Guest speaker: Christian Szegedy (Co-founder, xAI)
- Autoformalization with Large Language Models
- (Optional) Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs
- (Optional) Magnushammer: A Transformer-Based Approach to Premise Selection
Fill this out after reading the papers!
Apr 23 Project presentation  

Enrollment and Grading

Prospective students should first try enrolling in the course through CalCentral. The class number of CS 194-267 (for undergraduate students) is 34188 and the class number of CS 294-267 (for graduate students) is 34187. Please join the waitlist if the course is full. Please fill in the enrollment petition form if you are on or cannot join the waitlist. We will reach out to you if your petition is approved.

This is a variable-unit course. All enrolled students are expected to attend lectures in person and submit reading summaries and questions for Q&A before each lecture. Students enrolling in one unit are expected to write an article that summarizes one of the lectures. Students enrolling in more than one units are expected to complete a lab assignment and a project. The project of students enrolling in 2 units should have a written report, which can be a survey in an area relevant to LLMs. The project of students enrolling in 3 units should also have an implementation (coding) component that programmatically interacts with LLMs, and the project of students enrolling in 4 units should have a significant implementation component with the potential for either real world impacts or intellectual contributions. The grade breakdowns for students enrolled in different units are the following:

Group 1 unit 2 units 3/4 units
Lecture participation 25% 10% 10%
Reading summaries & questions for Q&A 25% 10% 10%
Article 50%    
Lab   20% 10%
Project      
- Proposal   10% 10%
- Milestone   10% 10%
- Presentation   20% 15%
- Report   20% 15%
- Implementation     20%

Lab and Project Timeline

Events Released Due
Project group formation Jan 19 Jan 30
Project proposal Jan 23 Feb 13
Lab Jan 30 Feb 27
Project milestone Feb 13 Mar 19
Project presentation Mar 19 Apr 23
Project final report Mar 19 Apr 30

Office Hours

Yu Gai