Artificial intelligence (AI) is evolving at an unprecedented pace, making it increasingly difficult to anticipate its societal impacts and risks. For example, recent benchmarks such as CyberGym and BountyBench have demonstrated that AI agents can already tackle real-world cybersecurity tasks, including zero-day discovery. In cybersecurity, AI plays a dual role—strengthening both offensive and defensive capabilities. It is therefore critical for developers, researchers, and policymakers to stay informed in a timely manner.
To address this need, we are implementing the “Frontier AI Cybersecurity Observatory”, a central hub and open platform for continuously tracking frontier AI capabilities in cybersecurity. By aggregating and maintaining cybersecurity benchmarks across various attack and defense stages, the observatory will enable the following:
- Systematic monitoring of AI cybersecurity capabilities across different stages of attack and defense, providing early signals of potential imbalances between offensive and defensive AI capabilities
- Comparison of existing benchmarks to identify gaps where communities need to develop improved benchmarks and evaluation frameworks
- Identification of which models perform best across different attack and defense stages
Want to add your own benchmark?
Please visit our huggingface and contribute by submitting a pull request!
Have suggestions to improve the observatory?
As this is an early-stage effort, we are actively gathering feedback from the community and would greatly value your input. Please share your suggestions here.