Program Task Description

Category 1: Preventing Accidental Misuse

How will hallucinations and bias in LLM agents differ from typical LLMs? What novel mitigations do we need to develop?
- Reduce the amount of wasteful or harmful API calls in tool-based LLM agents through either software-based or ML-based solutions
- [2305.15334] Gorilla: Large Language Model Connected with Massive APIs
What are some unintended consequences of deploying LLM agents?
- Demonstrate a novel failure mode of LLM agents
- [2402.06627] Feedback Loops With Language Models Drive In-Context Reward Hacking
- [2402.06664] LLM Agents can Autonomously Hack Websites

How do we attack and defend LLM agents? Will this paradigm differ from standard jailbreaking of LLMs?
- Create novel prompt injection attacks or demonstrate a novel defense for LLM agents
- [2311.01011] Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
- [2402.06363] StruQ: Defending Against Prompt Injection with Structured Queries
What privacy challenges will LLM agents face? How can we mitigate these?
- Design a novel differentially private training method to train agents on user trajectories while hiding PII
- [2401.05459] Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security
- [2407.19354] The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies
How do we prevent LLM agents from being directed to create bioweapons or cyberweapons?
- Analyze the difficult of creating a novel cyberweapon with an LLM
- Building an early warning system for LLM-aided biological threat creation | OpenAI
- [2403.03218] The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

How do we better evaluate and audit LLM agents?
- Develop methods to automatically red-team agents
- [2309.15817] Identifying the Risks of LM Agents with an LM-Emulated Sandbox
How do we monitor and hold LLM agents accountable for their actions?
- Create techniques that identify dangerous actions in AI agent trajectories
- [2401.13138] Visibility into AI Agents

What will be the environmental cost of agents?
- Design a project that estimates the carbon emissions cost of deploying an agent on a task
- [2111.00364] Sustainable AI: Environmental Implications, Challenges and Opportunities
Are agents fair? How might their deployment impact societal values or influence economic incentives? How should we govern them?
- Study how LLM agents may affect a particular sector of society, e.g., how will LLMs lead to content homogenization
- What happens when ChatGPT starts to feed on its own writing?
- [2407.14981] Open Problems in Technical AI Governance