Research
Data quality makes all the difference.
We’re driven by the conviction that model performance is fundamentally bounded by training data quality. Through expert collaboration, rigorous curation methodologies, and deep domain expertise, we research datasets that power tomorrow’s models.
SpreadsheetBench 2
Evaluating LLM agents on challenging, expert-curated, end-to-end spreadsheet tasks — financial modeling, debugging, and visualization in complex multi-sheet workbooks.
View benchmarkIDE-Bench
A comprehensive framework for evaluating AI IDE agents on real-world software engineering tasks through an IDE-native tool interface.
Read paperCore Research Areas
1
Computer Use
We’ve created training data and reinforcement learning environments that teach AI agents to navigate real software workflows end-to-end, capturing judgment calls and edge cases that only experienced practitioners recognize.












