Our Datasets
Our Research
Our Enterprise Solutions
Our Leaderboards
Careers
Get Data
Leaderboards
Rigorous benchmarks, not cherry-picked results.
Design custom evaluations that measure your specified model capabilities.
Collaborate With Us
LeetBench: A Benchmark for Competitive Programming & Algorithmic Reasoning
OpenAI o3
46%
Anthropic Opus 4
37.6%
DeepMind Gemini 2.5 Pro
23.4%
Amazon Nova Premier
10%
Mistral Magistral Medium
9.4%
xAI Grok 3
8.4%
NVIDIA Llama-3.1-Nemotron-Ultra-253B-v1
8.4%
Meta Llama 4 Maverick
6.4%
Microsoft Phi 4
4.4%
VADER: Vulnerability Assessment, Detection, Explanation, and Remediation
OpenAI o3
54.6%
Gemini 2.5 Pro
53.6%
Claude 3.7
52.3%
Grok 3 Beta
52.0%
GPT-4.1
50.0%
GPT-4.5
49.2%
FinanceArena: FinanceQA, Assumption-Based
OpenAI o3
21.7%
Anthropic Claude Opus 4
13%
xAI Grok 4
10.9%
Qwen QwQ-32B
10.9%
OpenAI 4o mini
10.9%
Meta Llama 4 Maverick
8.7%
xAI Grok 3
8.7%
Google DeepMind Gemini 2.5 Pro
6.5%
Ready to build better AI?
Contact Us