Back to Wire

LLMs

Top AI Models Fail to Profit in Soccer Betting Simulation

Source: Arstechnica Original Author: Financial Times 2 min read Intelligence Analysis by Gemini

Sonic Intelligence

00:00 / 00:00

Signal Summary

Top AI models, including xAI Grok, consistently lost money in a simulated soccer betting season.

Explain Like I'm Five

"Imagine you teach a super-smart computer brain all about soccer and tell it to bet on games to win money. Even the smartest computer brains from big companies tried, but most of them lost money! It shows that even though computers are smart with facts, they're not very good at guessing what will happen in a real, changing game, especially over a long time. It's harder than it looks!"

Deep Intelligence Analysis

The practical application of advanced AI models in dynamic, real-world prediction tasks, such as sports betting, reveals persistent limitations in their ability to adapt and generate consistent value. A recent 'KellyBench' report, which simulated a Premier League season, demonstrated that leading AI systems from Google, OpenAI, and Anthropic, including xAI's Grok, largely failed to turn a profit. This outcome challenges the perception of AI as an infallible predictor and highlights a significant gap between controlled benchmark performance and real-world efficacy.

The study rigorously tested eight top AI systems, providing them with detailed historical data and statistics, and instructing them to maximize returns while managing risk. Despite this comprehensive input, most models struggled to adapt to new events and updated player data as the season progressed. Anthropic's Claude Opus 4.6, while performing best, still incurred an average loss of 11%. More notably, xAI's Grok 4.20 experienced outright bankruptcy in one attempt, underscoring its fragility in dynamic, long-term scenarios. Google's Gemini 3.1 Pro showed inconsistent results, achieving a profit in one instance but failing in another, indicating a lack of robust, generalizable performance.

These findings carry substantial implications beyond the realm of sports betting, particularly for industries considering AI for complex forecasting, resource allocation, or strategic decision-making in unpredictable environments. The inability of current frontier models to consistently manage risk and adapt to evolving real-world conditions suggests that their deployment in high-stakes scenarios requires extreme caution and extensive human oversight. This research provides critical feedback for AI developers, emphasizing the need to move beyond static knowledge and improve models' capabilities in continuous learning, uncertainty quantification, and robust adaptive reasoning to truly unlock their potential in dynamic, real-world applications.

AI-assisted intelligence report · EU AI Act Art. 50 compliant

Impact Assessment

This study underscores the persistent limitations of even frontier AI models in dynamic, unpredictable real-world environments, contrasting sharply with their performance on controlled benchmarks. The inability to consistently generate profit in a complex domain like sports betting reveals a fundamental challenge in AI's capacity for long-term adaptive reasoning and risk management, crucial for broader societal applications.

Key Details

A 'KellyBench' report tested eight top AI systems, including Google, OpenAI, Anthropic, and xAI Grok, on a simulated 2023-24 Premier League season.
Most AI models lost money, highlighting struggles with real-world analysis and adapting to dynamic events.
Anthropic's Claude Opus 4.6 performed best, with an average loss of 11% across attempts.
xAI's Grok 4.20 went bankrupt in one attempt and failed to complete two others.
Google's Gemini 3.1 Pro achieved a 34% profit in one go but also experienced bankruptcy in another.

Optimistic Outlook

Identifying these specific failure points in real-world, dynamic scenarios provides valuable insights for AI researchers to develop more robust and adaptable models. Future iterations could incorporate improved mechanisms for handling uncertainty, learning from continuous feedback, and managing risk, leading to more reliable AI systems for complex decision-making.

Pessimistic Outlook

The consistent underperformance of leading AI models in a relatively structured, albeit dynamic, environment like sports betting raises concerns about their readiness for more critical real-world applications. Over-reliance on current AI capabilities for complex prediction or resource allocation tasks could lead to significant financial losses or suboptimal outcomes if models cannot adapt to unforeseen events or manage inherent risks effectively.

More reporting around this signal.

Related coverage selected to keep the thread going without dropping you into another card wall.

LLMs

CAP-CoT Boosts LLM Chain-of-Thought Reasoning with Cycle Adversarial Prompting

CAP-CoT uses adversarial prompting to iteratively refine LLM Chain-of-Thought reasoning, improving accuracy and stabilit...

LLMs

Tandem Framework Boosts LLM Reasoning Efficiency by 40% with SLMs

Tandem combines LLMs and SLMs to reduce reasoning computational costs by 40% while maintaining performance.

LLMs

Mutual Forcing Accelerates Autoregressive Audio-Video Generation

Mutual Forcing enables efficient, fast autoregressive audio-video generation with fewer steps.

AI Agents

Co-Director: Multi-Agent Framework for Coherent Generative Video Storytelling

Co-Director is a multi-agent framework for coherent generative video storytelling.

Science

QACD: New Framework Boosts Causal Discovery in Noisy Data

QACD introduces a quantitative argumentation framework to improve causal discovery in finite-sample regimes.

AI Agents

AdaPlan-H Introduces Self-Adaptive Hierarchical Planning for LLM Agents

AdaPlan-H enables LLM agents to self-adapt planning granularity for complex tasks.

Top AI Models Fail to Profit in Soccer Betting Simulation

Sonic Intelligence

Explain Like I'm Five

Deep Intelligence Analysis

Impact Assessment

Key Details

Optimistic Outlook

Pessimistic Outlook

Get the next signal in your inbox.

More reporting around this signal.

CAP-CoT Boosts LLM Chain-of-Thought Reasoning with Cycle Adversarial Prompting

Tandem Framework Boosts LLM Reasoning Efficiency by 40% with SLMs

Mutual Forcing Accelerates Autoregressive Audio-Video Generation

Co-Director: Multi-Agent Framework for Coherent Generative Video Storytelling

QACD: New Framework Boosts Causal Discovery in Noisy Data

AdaPlan-H Introduces Self-Adaptive Hierarchical Planning for LLM Agents