General reasoning benchmark - Public

Humanity's Last Exam

3,000 expert-level questions across 100+ academic disciplines, crowd-sourced from domain experts. Designed to be at or beyond the frontier of human knowledge — the hardest factual benchmark yet.

BENCHMARK

Benchmark type:: Public benchmark
Benchmark domain:: General reasoning
Task count:: 3,000
Evaluation method:: Exact match

Top model score: ~26%
Human score: N/A

View Humanity's Last Exam benchmark paper

Where this benchmark fits

Use this page when you need the benchmark-specific context. For side-by-side comparison, go back to the full registry or open the general reasoning view . You can also jump straight to this benchmark in the master registry list .