Products

Intelligence isn’t built in a lecture hall. It’s built through practice, feedback, and failure. It’s built through experience.

Our products encode that full spectrum: expert demonstrations, preference signals, adversarial environments, and the reinforcement loops that turn a capable model into a reliable one.

As the frontier of what models can do expands, what they need to learn from expands with it.

Rubric and Verifier-based RL

Combines expert-crafted rubrics with automated verifiers that grade model outputs the way a seasoned professional would, rewarding nuance and penalizing shortcuts across reasoning, code generation, and instruction-following tasks.

Tool-calling RL Environments

Provides custom RL environments built on top of real APIs, MCP servers, and developer tools, enabling models to learn how to call, chain, and recover from errors across complex service workflows with automated evaluation at every step.

SFT

Delivers high-quality prompt-response pairs and chain-of-thought demonstrations through supervised fine-tuning, giving models a foundation of skills before RL begins, teaching them to reason, follow instructions, and navigate professional tasks from the ground up.

Computer-use and Browser-use Environments

Pairs high-fidelity browser and desktop environments with expert-demonstrated trajectories, teaching agents to navigate interfaces, complete multi-step workflows, and operate software the way a domain expert would.

RLHF

Captures the subtleties of what makes one response genuinely better than another through RL from human feedback, training models to internalize the taste, judgment, and standards of domain experts across thousands of comparison pairs.

Code Generation

Spans expert-written code, test cases, and debugging traces that teach models to write production-quality software, handle edge cases, and reason through architectural decisions the way experienced engineers do.

Professional Domains

Draws on 100,000+ verified practitioners across medicine, law, finance, engineering, and more, capturing the tacit knowledge and real-world judgment that textbooks leave out and synthetic data can’t replicate.

Deep Research

Covers long-horizon research tasks, teaching models to gather evidence across sources, synthesize findings, and produce thorough analyses that mirror how skilled researchers build understanding over hours of investigation. 

Loss Analyses

Identifies where and why models fail in professional contexts through systematic study, pinpointing the precise failure modes and distributional gaps that inform how every dataset and environment is designed.

Multimodal

Teaches models to see, interpret, and reason across image, audio, video, and text together, closing the gap between how humans experience the world and how AI processes it.

Off-the-shelf Data

Offers ready-to-deploy datasets across high-demand capability areas, giving labs and companies immediate access to rigorously validated training data without the lead time of a custom engagement.

Off-the-shelf Data

Offers ready-to-deploy datasets across high-demand capability areas, giving labs and companies immediate access to rigorously validated training data without the lead time of a custom engagement.

Custom Evals and Training Datasets

Tailors evaluation suites and training datasets to your specific capability targets, designing every prompt, rubric, and environment from scratch to address the exact gaps your models need to close.