SIRE Intelligence: Data Science Report

How In-House Modelling, LLMs and a community of Machine Learning Engineers Are Powering a New Era of Sustainable +EV Betting

Date Science Report

1. Summary: Over the past season, Score’s Data Science team was given a clear mandate: develop a framework we could expand into a sustainable, scalable betting strategy. From day one, the end goal was a Multi-Source Prediction Engine, a self-optimising system capable of learning from multiple contributors and adapting in real time, which would power our LLM approach. To measure the impact of such an approach, we developed robust baseline models at first because they would:

  • Quantify the predictive edge of traditional statistical models.

  • Serve as benchmarks for both our in-house Large Language Models (LLM) and models developed by the Machine Learning (ML) community.

  • Provide feature inputs the LLM could draw on if it determined they were valuable.

Building these market models taught us that historical simulations have severe limitations. They can overfit to past data, fail to capture changing market dynamics, and create a false sense of stability. In today’s sports markets, where odds are efficiently priced and heavily influenced by bookmakers, traditional models often end up being little more than noise. The real edge lies elsewhere. Our new approach leverages LLM to uncover orthogonal signals, hidden within massive context windows of data. This isn’t just an incremental improvement, it is a fundamentally new way of extracting value, designed for live, adaptive performance. As Score Co-Founder and “MicroPrediction - Building an Open AI Network” author, Peter Cotton puts it: “The LLMs we have now, we didn’t have six months ago. Statistics can be a form of reasoning, but it’s not the only one at our disposal.”

With those lessons learned, we moved from static baselines to real-time LLM predictions powered by the Multi-Source Prediction Engine.

2. The Modelling Foundations

There are four different in-house models developed to predict football match outcomes. (1X2)

  1. Elo

The Elo model was initially developed to calculate the skill levels of players in Chess. Each player’s elo will either increase or decrease based on the outcome of the match. In our implementation, we estimate the probability of each outcome based on the rating difference between home and away teams.

  1. Davidson

The Davidson model extends the Bradley-Terry framework by incorporating the possibility of a draw in pairwise comparisons. Each team is assigned a strength, and match outcomes are modelled probabilistically based on the relative strengths of the teams. These parameters are fitted using maximum likelihood:

  • Team strength

  • Home advantage

  • Draw parameter

  1. Dixon-Coles

The Dixon-Coles model is designed to model the number of goals scored by each team. It extends the Poisson regression framework by introducing a correction factor for low-scoring games and accounting for the dependency between teams' goals. In our implementation, we estimate the attacking and defensive strengths of each team to derive outcome probabilities from predicted goal distributions.

  1. Sarmanov

The Sarmanov Negative Binomial model is designed to model the number of goals scored by each team using a bivariate negative binomial distribution. It extends the standard framework by allowing for overdispersion in goal counts and introducing a dependency structure between the two teams’ scores through the Sarmanov distribution. In our implementation, we estimate attacking and defensive strengths along with a correlation parameter to generate outcome probabilities from the joint goal distribution.

3. Building Meta Models

With the four individual models performing well, we moved toward ensemble modelling, combining multiple models to improve predictive accuracy.

  1. Meta Pairwise

The first meta model developed is meta pairwise, which combines the outputs from Elo model, Davidson model, and implied probability from bookmaker odds.

To measure the performance of the Meta Model, we tested the model on bankroll growth using match data from the top 5 leagues (Premier League, La Liga, Serie A, Ligue 1, and Bundesliga) across 2 seasons (23/24 and 24/25). The line chart below illustrates the bankroll growth for five different models and strategies.

  1. ChatGPT (Flat Lines)

  2. Sportsmonks (Flat Lines)

  3. Agentic Model (Flat Lines)

  4. Sportsmonks (+ Kelly Criterion)

  5. The Meta Model (Agentic Execution)

As illustrated in the chart, both the ChatGPT model and the Sportsmonks model (based on a premium paid football data subscription) experienced early bankroll depletion, while the Agentic Model went bust near the end of the 2023/24 season. However, when bankroll management was applied using a fractional Kelly Criterion combined with an edge threshold, the Sportsmonks model lowered its loss to approximately 10% over two seasons.

In contrast, the Meta Model achieved a solid return of +32.96% over the same period. This highlights the critical role of bankroll management in long-term betting sustainability. Additionally, the Meta Model consistently outperformed the Sportsmonks approach, further demonstrating the effectiveness and added value of our model.

  1. Meta Logistic

Meta Logistic combines the outputs from Elo, Davidson, Dixon Coles, and Sarmanov models, alongside implied probability from bookmaker odds and historical team statistics to predict the outcome of a football match.

After backtesting our model from 19/20 to 24/25 season, considering different Kelly Fraction and edge thresholds, we have the ROI illustrated in this grid search plot.

The best parameters returned 370.3% on betting draw outcomes. This improvement is significant compared to the first meta model, as we introduce more models and features to the ensemble model. Although a 370.3% return on paper seems impressive, plotting the bankroll growth reveals that the model's performance is unstable, likely due to the aggressive nature of Kelly betting, despite using a 30% Kelly fraction.

Key takeaways: Although one strategy in our Meta Logistic backtest yielded +370%, the bankroll growth was unstable, indicating that the ROI is not sustainable when deployed live without careful adjustment.

4. Expanding to LLMs

In today’s fast-paced sports prediction industry, winning means adapting in real time. That’s why we’ve embraced a more ambitious, next-generation approach with Large Language Models (LLMs), moving beyond the constraints of traditional approaches to harness live performance data, accelerate learning, and enable autonomous improvement. This is more than a technical upgrade; it’s a strategic transformation designed to put SIRE LLM at the forefront of AI-driven sports forecasting.

The following pillars outline the foundation of this upgrade:

Moving Beyond Backtesting

Traditional backtesting often suffers from overfitting and fails to capture real-world adaptability. Past performance alone doesn’t guarantee future results, so instead of relying on historical simulations, we are shifting our focus toward live performance evaluation, measuring SIRE LLM’s predictions in real-time, where it truly matters.

Proven Real-World Results

During the Club World Cup, our LLM-powered SIRE terminal delivered outstanding performance, achieving over 3x bankroll growth. These results underscore the model’s capacity to adapt, learn, and thrive under real competitive conditions.

A Vision for Autonomous Model Evolution

We are entering an era where neural networks are trained, evaluated, analyzed, and iterated without direct human intervention. This is the path toward Artificial Superintelligence (ASI). Our goal for SIRE LLM is to learn from its past predictions, autonomously refine itself, and deliver increasingly precise forecasts.

Strategic Role of the Data Science Team

Our in-house Data Science team will focus exclusively on advancing SIRE LLM’s architecture and performance. Meanwhile, our community of sports machine learning (ML) experts will focus on modelling and backtesting to uncover +EV (positive expected value) features and models for integration into the LLM.

Collaborative Advantage

With improved modelling from our community of sports ML experts and a streamlined focus on the SIRE LLM, we will generate more accurate predictions. These predictions will feed into a meta-model responsible for optimal bankroll management and bet sizing, ensuring that every insight from the LLM is translated into maximum strategic advantage.

The Future Is Now

By combining cutting-edge live evaluation with autonomous improvement, SIRE LLM isn’t just keeping pace with the future of AI in sports prediction — it’s leading the charge.

Last updated