Fundamentals Track

Program Task Description

Category 1: Memory, Planning, Reasoning

Can you create agent frameworks that improve on existing agent fundamental benchmarks, such as ALFWorld or HotpotQA?
- Design a prompting scheme that improves an agent’s ability to recover from errors during deployment
- [2203.14465] STaR: Bootstrapping Reasoning With Reasoning
- [2210.03629] ReAct: Synergizing Reasoning and Acting in Language Models
- [2211.01910] Large Language Models Are Human-Level Prompt Engineers
- [2303.11366] Reflexion: Language Agents with Verbal Reinforcement Learning
- [2303.17651] Self-Refine: Iterative Refinement with Self-Feedback
- [2304.05128] Teaching Large Language Models to Self-Debug
- [2305.10601] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- [2309.03409] Large Language Models as Optimizers
- [2312.07540] diff History for Neural Language Agents

Category 2: Tool-Use, Function-Calling, RAG

How should best LLMs interact with tools such as retrieval and code?

Category 3: Multimodal and Interactive Agents

How should agents operate in the world and with users?
- Design an interactive LLM agent that improve upon the GAIA benchmark or that helps users “debug” real-world problems (such as fixing a tire)
- ViperGPT
- [2303.11381] MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
- [2304.10592] MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
- [2404.14394] A Multimodal Automated Interpretability Agent