Gorilla APIBench
1,645 API tasks across HuggingFace, TorchHub, and TensorHub. Evaluates if agents generate accurate API calls including correct arguments and library usage without hallucination.
- Benchmark type:
- Public benchmark
- Benchmark domain:
- Tool use
- Task count:
- 1,645
- Evaluation method:
- AST matching
- Top model score
- ~80%
- Human score
- N/A
About this benchmark
Gorilla APIBench is a benchmark for evaluating LLMs on API call generation, introduced by Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez at UC Berkeley in May 2023. It comprises APIs from three major ML hubs: HuggingFace, TorchHub, and TensorHub, totaling over 1,600 API entries. Given a natural language query, models must generate semantically and syntactically correct API calls, including accurate function names, parameters, and argument values. The benchmark specifically targets hallucination in API usage, a key failure mode where models fabricate non-existent APIs or incorrect parameters.
Evaluation measures functional correctness of generated API calls, comparing predicted calls against ground truth. Gorilla, a fine-tuned LLaMA-based model, surpasses GPT-4 on API call accuracy. When combined with a document retriever, Gorilla demonstrates strong adaptation to test-time documentation changes and substantially reduces hallucination compared to direct prompting of base LLMs.
The project has grown into a broader ecosystem including the Berkeley Function Calling Leaderboard (BFCL), which has evolved through versions V1 through V4 with multi-turn, multi-step, and agentic evaluations. Gorilla and APIBench are Apache 2.0 licensed, and the project has served approximately 500,000 requests since launch, making it one of the most widely adopted function-calling evaluation frameworks.
Where this benchmark fits
Use this page when you need the benchmark-specific context. For side-by-side comparison, go back to the full registry or open the tool use view . You can also jump straight to this benchmark in the master registry list .