Program Task Description
Category 1: Memory, Planning, Reasoning
-
Can you create agent frameworks that improve on existing agent fundamental benchmarks, such as ALFWorld or HotpotQA?
- Design a prompting scheme that improves an agent’s ability to recover from errors during deployment
- [2203.14465] STaR: Bootstrapping Reasoning With Reasoning
- [2210.03629] ReAct: Synergizing Reasoning and Acting in Language Models
- [2211.01910] Large Language Models Are Human-Level Prompt Engineers
- [2303.11366] Reflexion: Language Agents with Verbal Reinforcement Learning
- [2303.17651] Self-Refine: Iterative Refinement with Self-Feedback
- [2304.05128] Teaching Large Language Models to Self-Debug
- [2305.10601] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- [2309.03409] Large Language Models as Optimizers
- [2312.07540] diff History for Neural Language Agents
Category 2: Tool-Use, Function-Calling, RAG
-
How should best LLMs interact with tools such as retrieval and code?
- Improve upon the ideas in Gorilla through prompting
- [2302.04761] Toolformer: Language Models Can Teach Themselves to Use Tools
- [2302.07842] Augmented Language Models: a Survey
- [2304.05376] ChemCrow: Augmenting large-language models with chemistry tools
- [2303.17580] HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
- [2305.17126] Large Language Models as Tool Makers
- [2305.06983] Active Retrieval Augmented Generation
- AutoGPT
Category 3: Multimodal and Interactive Agents
-
How should agents operate in the world and with users?
- Design an interactive LLM agent that improve upon the GAIA benchmark or that helps users “debug” real-world problems (such as fixing a tire)
- ViperGPT
- [2303.11381] MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
- [2304.10592] MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
- [2404.14394] A Multimodal Automated Interpretability Agent