Elite Speed Advantage: The solve-rate advantage narrowed sharply at the top (3.2x overall to 1.7x in the top 5%), confirming ...
Artificial intelligence systems now breeze through many academic tests that once challenged both machines and people. That ...
OpenAI's new GPT-5.4 clobbers humans on pro-level work in tests - by 83% ...
New ORCA results show Gemini leading in practical math, but no AI matches the consistency of a simple calculator.
The three corporate human rights-related benchmarks published so far in 2026 are: ...
Scientists created a benchmark to measure empathy in AI conversations, revealing that some chatbots now rival average human emotional support.
Mandatory testing was brought in last year, with World Athletics president Sebastian Coe declaring it would "protect and promote the integrity of women’s sport" ...
Despite its dramatic name, Humanity’s Last Exam is not meant to signal the end of human importance. Instead, it highlights ...
Living human neurons were trained to play Doom, extending the long-running engineering benchmark into biological computing.
OpenAI released GPT-5.4 today with native computer use, a 1M-token context window, and new professional benchmarks. Find what ...
The challenge for modern marketers is not whether to trust the data, but how to translate it into work that still feels human ...
As large language models (LLMs) gain momentum worldwide, there’s a growing need for reliable ways to measure their performance. Benchmarks that evaluate LLM outputs allow developers to track ...