Our research is guided by the thesis that model performance is bounded by quality of training data. Great models start with great data.
A comprehensive testing suite evaluating LLMs' performance on complex numerical financial analysis tasks that mirror real-world investment work.
A benchmark designed to assess LLM performance across four key vulnerability-handling dimensions using 174 real-world software vulnerabilities.
We focus on several key areas to advance the field of artificial intelligence
Researching methods to ensure AI systems behave in accordance with human values and intentions while minimizing potential risks.
Developing systems that can understand and reason across multiple modalities including images, audio, and video.
Developing AI systems that can interact with and control computer interfaces, enabling autonomous task execution and workflow automation.
Developing ways for creating high-quality training datasets that drive superior model performance and reliability.
Creating comprehensive benchmarks and evaluation frameworks to assess AI model capabilities across diverse real-world scenarios.
Our research findings are advancing foundational model capabilities through human-generated, specialized datasets.