PunterScore's prediction system is a supervised machine learning pipeline that converts structured football match data into probability estimates for specific betting markets. Every published prediction is the output of this pipeline — not editorial opinion, not tipster judgement.
The pipeline has four stages: data ingestion, feature engineering, model inference, and output filtering. Predictions are generated automatically for all qualifying fixtures across 30+ leagues, reviewed for data quality, then published with a written analysis summary generated from the model's top-weighted input variables.
Our core principle: A prediction is only as trustworthy as the process that produced it. Every design decision in our methodology prioritises calibration accuracy — meaning our stated confidence ratings should match observed win rates as closely as possible over large sample sizes.
Our model ingests data from multiple authoritative sources covering match events, player statistics, team performance metrics, and contextual match information. Data is collected via live feeds and historical archives, normalised to a common schema, and stored in a match-level database that currently spans over 5 million historical fixtures across 30+ leagues from 2008 onwards.
| Data Category |
Variables |
Update Frequency |
| Match Results |
Scoreline, goals by minute, HT/FT, attendance |
Post-match |
| Team Statistics |
Shots, shots on target, possession, corners, fouls, cards |
Post-match |
| Shot-Level Data |
Location, shot type, assist type, body part, game state |
Post-match |
| Player Data |
Minutes played, goals, assists, key passes, dribbles, tackles |
Post-match |
| Squad & Injuries |
Confirmed absences, return dates, suspension status |
Daily |
| Line-ups |
Starting XI, formation, tactical shape |
Pre-match (~1hr) |
| Odds & Markets |
Opening/closing lines across major bookmakers |
Live |
| Contextual |
Weather, pitch surface, crowd capacity, travel distance |
Pre-match |
All data undergoes automated quality checks before being passed to the feature engineering stage. Fixtures with insufficient data depth — typically very low-tier leagues or newly-promoted clubs — are excluded from the prediction pipeline until minimum data thresholds are met.
Expected Goals (xG) is one of our most predictive feature groups. Rather than using raw goals scored and conceded — which carry substantial match-to-match variance — xG measures the underlying quality of chances created and allowed, providing a more stable signal of team offensive and defensive capability.
We calculate our own xG values using a shot-level logistic regression model trained on historical shot data. Each shot is assigned a probability value between 0 and 1 representing the likelihood of it resulting in a goal, based on the following inputs:
Team-level xG is the sum of individual shot xG values across a match or rolling window. We use several xG-derived features as model inputs, including xG per 90 minutes, xG difference (xGD), xG against (xGA), and xG overperformance — the difference between actual goals and xG, which indicates regression candidates.
Why xG matters for Over/Under predictions: Teams that consistently outscore their xG are likely finishing above their sustainable rate. Our model adjusts expected future output toward xG, reducing the influence of short-term variance when predicting goals markets like Over/Under 2.5 or BTTS.
The core prediction engine is a multi-layer feedforward neural network with residual connections, trained via stochastic gradient descent with adaptive learning rate scheduling. The network takes the engineered feature vector for each fixture as input and outputs a probability distribution across match outcomes.
The architecture uses separate output heads per market type — so the Over/Under 2.5 head, the BTTS head, and the 1X2 head are each trained with market-specific loss functions and output calibration. This produces better per-market accuracy than a single shared output layer.
The network is retrained on a rolling basis — incorporating new match data as seasons progress — to capture evolving tactical trends, league-specific goal rate shifts, and team quality changes across transfer windows. A full retrain occurs at the start of each major season (August), with incremental updates every 30 days.
The table below lists the primary feature categories used by the model, their data type, the rolling window applied, and their relative contribution to prediction accuracy as measured by permutation feature importance.
| Variable Group |
Examples |
Window |
Importance |
| xG (Attack) |
xG/90, xG last 5, xG trend |
5 / 10 / season |
Very High |
| xG (Defence) |
xGA/90, xGA last 5, clean sheet rate |
5 / 10 / season |
Very High |
| Team Form |
Points/game, win%, goals scored/conceded |
5 / 10 |
High |
| Home/Away Split |
Home xG, home win%, away clean sheet rate |
Season + H2H |
High |
| Head-to-Head |
H2H result, H2H goals avg, H2H BTTS rate |
Last 5 meetings |
Medium |
| Injury Impact |
Weighted absence score, position coverage |
Current |
Medium |
| Tactical Setup |
Formation, pressing intensity, defensive line |
Last 3 matches |
Medium |
| League Position |
Table position, points gap, relegation/title pressure |
Current |
Medium |
| Fixture Congestion |
Days rest, matches in 14 days, rotation index |
Current |
Low–Medium |
| Market Odds |
Opening line, line movement, bookmaker consensus |
Pre-match |
Medium |
| Weather / Pitch |
Temperature, precipitation, pitch condition |
Pre-match |
Low |
A model that outputs a 90% confidence rating should be correct approximately 90% of the time, measured over a large sample. This property — called calibration — is distinct from raw accuracy, and it is what makes our confidence ratings genuinely informative rather than decorative.
After initial training, raw model probabilities are post-processed using Platt scaling — a logistic regression layer fitted on a held-out calibration dataset — to align stated probabilities with observed frequencies. We re-calibrate the model with each retraining cycle and validate calibration against the most recent 90-day results window.
| Stated Confidence Band |
Observed Win Rate |
Sample (30d) |
Calibration Error |
| 85–100% |
91.2% |
68 predictions |
±2.1% |
| 75–84% |
83.5% |
97 predictions |
±2.8% |
| 65–74% |
71.8% |
124 predictions |
±3.4% |
| 60–64% |
63.2% |
59 predictions |
±4.1% |
We only publish predictions where model confidence clears a minimum 60% threshold. Tips below this level are not surfaced — not because they have no informational value, but because small-sample variance at lower confidence makes them unreliable at a per-user level.
Not all betting markets are equally predictable. Our model performs differently across market types — reflecting both the underlying predictability of each market and the quality of our feature engineering for that specific problem. Below are our 30-day win rates per market.
Correct Score is our most challenging market — the correct score is an exact discrete outcome with high inherent variance. We publish correct score tips selectively and only at high confidence thresholds. All figures are sourced from the public results log.
Every version of our model is validated against historical data before deployment using a walk-forward backtesting methodology. Unlike standard train/test splits, walk-forward testing simulates real-world deployment — the model is trained on data up to a given date, tested on the immediately following period, then the window advances forward.
This prevents look-ahead bias — a common failure mode in sports prediction models where historical test performance overstates what would actually have been achievable in live deployment. Our backtesting covers seasons 2018–2024 across all major leagues, representing approximately 300,000 test fixtures.
A new model version is only deployed to production if it meets all of the following criteria on held-out validation data: overall win rate ≥ 74%, calibration error < 5% across all confidence bands, and no statistically significant degradation versus the current live model on a matched sample of fixture types.
Live vs. Backtest performance: Our current live 30-day win rate of 78% is in line with our backtested expectation of 76–80% for the current model version. Small discrepancies between live and backtest figures are normal due to sample size variation at the 30-day window level.
Transparency about limitations is a core part of our methodology. The following are genuine sources of variance and prediction failure that no model can eliminate:
In-game events: Red cards, goalkeeper injuries, and tactical changes at half-time are unpredictable in advance and can fundamentally alter a match's trajectory. Our model is a pre-match tool — it has no live updating capability during a fixture.
Squad rotation: Managers occasionally field unexpected line-ups — particularly in cup competitions or when rotation is concealed pre-match. Our model re-runs with confirmed line-ups approximately 1 hour before kick-off, but early published tips reflect pre-announcement uncertainty.
Low-data leagues: For lower-tier competitions with limited historical data, our model's confidence intervals are wider and feature quality is lower. We apply stricter minimum confidence thresholds for these leagues before publishing.
Genuine randomness: Football contains an irreducible random component. Even a perfectly calibrated 90% prediction fails 10% of the time — by definition. Over 348 predictions in a 30-day window, variance alone will cause some deviations from expected win rates in both directions. The 78% observed figure is the product of both model quality and normal statistical variance.
We publish all outcomes — wins and losses — in our results tracker, and we never adjust historical performance figures retroactively. This log is the only credible measure of our model's live performance.