Program Task Description

Category 1: Create your own AI agent benchmark on a novel task

Category 2: Build upon current AI agent benchmarks