# Steel Agent Leaderboard > A community-maintained leaderboard and benchmark registry for AI agents. Tracks performance across web navigation, coding, desktop control, tool use, research, and general reasoning benchmarks. Maintained by Steel (https://steel.dev). All data is community-sourced and independently verified where possible. The leaderboard covers 50+ benchmarks and hundreds of agent results. ## Leaderboard - [WebVoyager Leaderboard](https://leaderboard.steel.dev/index.md): Rankings of web navigation agents on the WebVoyager benchmark, the most widely adopted web agent evaluation. ## Results Index - [All Results](https://leaderboard.steel.dev/results.md): All agent benchmark results across every tracked benchmark, filterable by category, benchmark, and agent. ## Benchmark Registry - [Full Registry](https://leaderboard.steel.dev/registry.md): All benchmarks across all categories with descriptions, top agents, scores, and metadata. - [Web Navigation](https://leaderboard.steel.dev/registry/web-navigation.md): WebVoyager, WebArena, VisualWebArena, BrowserGym, AssistantBench, and more. - [Research](https://leaderboard.steel.dev/registry/research.md): BrowseComp, MMSearch-Plus, and deep research agent benchmarks. - [Desktop Control](https://leaderboard.steel.dev/registry/desktop-control.md): OSWorld, AndroidWorld, Windows Agent Arena, macOSWorld, and more. - [Coding](https://leaderboard.steel.dev/registry/coding.md): SWE-bench Verified, HumanEval+, MLE-bench, Aider Benchmark, and more. - [Tool Use](https://leaderboard.steel.dev/registry/tool-use.md): ToolBench, Tau-bench, MCP Atlas, Gorilla APIBench, and more. - [General Reasoning](https://leaderboard.steel.dev/registry/general-reasoning.md): GAIA, ARC-AGI-2, GPQA Diamond, Humanity's Last Exam, and more. - [Specialized](https://leaderboard.steel.dev/registry/specialized.md): Sotopia, AgentHarm, MedAgentBench, FORTRESS, and more. ## Optional - [Full context file](https://leaderboard.steel.dev/llms-full.txt): All leaderboard data, benchmark descriptions, and results in a single markdown file optimized for LLM context.