FPL Core Logo
FPL Core
  • Home
  • Leagues
  • Manager Report
    • Player Statistics
    • Player Comparison
    • Position Tracker
  • Fixtures
  • Transfer Planner
    New
  • Team Builder
  • GW Reports
    New
  • Blog
Contact
Github

© 2025 FPL Core

Back to Blog

720,000 Rows of Obsession: Cracking the FPL Price Algorithm (Part 2 of 7)

olbaud
February 18, 2026
FPL AnalysisPrice Algorithm
720,000 Rows of Obsession: Cracking the FPL Price Algorithm (Part 2 of 7)

 

Part 2 of 7: Cracking the FPL Price Algorithm

Read Part 1: “The Rabbit Hole”


I had 4 seasons of data. 720,254 rows sitting in a parquet file. One row per player per day, with price, ownership, transfers, form, status, everything the FPL API gives you.

Now I had to figure out what to do with it.


Cleaning the dataset

Part 1 covered where the data came from. Wayback Machine, Supabase, 4 seasons stitched into one file. What it didn’t cover is why that stitching was the hardest part of the whole project.

Player IDs aren’t consistent across seasons. FPL assigns new IDs every year, so “Mohamed Salah” is player 253 in one season and player 308 in the next. You have to match by name, team, and position and names aren’t consistent either (“M.Salah” vs “Salah” vs “Mohamed Salah”). Some players get transferred between clubs mid-season. Some get reclassified. A midfielder one year becomes a forward the next. Some get promoted, relegated, retired, or just disappear from the API without explanation. A fun fact that last week 17 people transferred Son out of their teams even though he left Spurs in 25. This data still needs to be included.

Price bases reset every season. A player who started 2023-24 at £7.0m and ended at £8.5m starts 2024-25 at £8.0m. A new base, new price, no continuity with their previous trajectory. Transfer counts reset to zero. Form resets. Ownership resets. Every season is a clean slate as far as the API is concerned, which means every derived metric: cumulative transfers, days since last price change, rolling averages. Had to be recalculated per season from scratch.

As always I want to be transparent. There are the gaps in the data. The Wayback Machine captures snapshots when it feels like it, not on a schedule. 2022-23 is missing about 80 days. 2024-25 has a 24 days missing. You can’t just interpolate. A missing day means missing transfers, which means any cumulative feature is wrong from that point forward. The models had to know which days were real observations and which were reconstructed.

The final dataset has zero duplicate player-days, no nulls in any key column, and every value in a valid range. The coverage gaps are documented and the models account for them. Good enough to trust, imperfect enough to keep honest.

The first charts that made me sit up

The obvious first step: take every player-day in the dataset and ask a simple question. How many net transfers did they get, and did they actually rise within the next few days?

The noise cloud: net transfer distributions for rises vs non-rises, massively overlapping
The noise cloud: net transfer distributions for rises vs non-rises, massively overlapping

The grey is every player-day where the player didn’t rise within 3 days. The green is every day where they did. Even with this generous window, the overlap is massive. At 150,000+ net transfers, nearly a quarter of player-days still don’t result in a rise. And 218 player-days preceded a rise despite having negative net transfers that day. The player had built up enough pressure over previous days that today’s transfers didn’t matter. Net transfers alone doesn’t cut it.

Here’s a real example from this season. Declan Rice, February 2026:

Date Price In Out Net Cumulative
—— ——: —: —-: —-: ———–:
Feb 6 £7.5 166,643 38,890 +127,753 178,919
Feb 7 £7.5 46,299 21,189 +25,110 202,774
Feb 8 £7.5 64,141 28,393 +35,748 237,021
Feb 9 £7.5 116,925 32,193 +84,732 318,533
Feb 10 £7.5 192,613 53,406 +139,207 465,125
Feb 11 £7.6 5,575 21,948 -16,373 0
Feb 12 £7.6 4,332 13,390 -9,058 -7,880
Feb 13 £7.6 4,769 20,000 -15,231 -21,131

Five days of heavy buying pushed Rice’s cumulative pressure to 465,125. By the time the algorithm triggered the rise on Feb 11, the market had already reversed. 21,948 managers were selling him that day. Only 5,575 buying. The price went up anyway. The algorithm doesn’t care about today. It cares about what already happened.

(Why “within 3 days” and not “that exact day”? Because the algorithm tracks cumulative transfer pressure, not just today’s snapshot. A player can cross the threshold tomorrow or the day after based on transfers that happened today. Predicting “will rise within 3 days” is the right question, and it’s what the models target throughout this series. Why not 5 or 7 days? Because the algorithm’s memory fades. Transfers lose their influence over time, and beyond 3 days you’d be linking today’s data to a rise today’s data didn’t cause. More on that decay in Part 3.)

But then I split it by ownership.

Rise rate vs net transfers, split by ownership bucket: dramatically different curves
Rise rate vs net transfers, split by ownership bucket: dramatically different curves

Same transfers. Completely different outcomes. A 2-5% owned player getting 50,000 net transfers has a real chance of rising. A 20%+ owned player getting the same 50,000? Barely a blip. The threshold for highly-owned players is dramatically higher.

This was the first real pattern. Not just “more transfers = more likely to rise” but “the same number of transfers means completely different things depending on who owns the player.” The algorithm scales.

The class imbalance problem

Before going further, I need to explain why this is genuinely hard.

99.7% of player-days are non-rises, 0.28% are rises
99.7% of player-days are non-rises, 0.28% are rises

Of the 720,254 player-days in the dataset, exactly 2,035 are rises. That’s 0.28%. Falls are more common at 7,934 (1.1%), but still rare. The other 98.6% is nothing happening.

If you build a model that says “nobody will rise today, ever”, it’s 99.7% accurate, beating LiveFPL, FFhub and FFFix. And completely useless.

This is the fundamental problem with price prediction. Accuracy is a worthless metric when the thing you’re trying to predict almost never happens. Say a model predicts 10 rises and gets 8 right. Good, right? But if there were actually 20 rises that day and it missed 12 of them, is that a good model?

You need a metric that cares about both problems: false alarms (saying someone will rise when they won’t) and missed rises (a player rises and you didn’t see it coming). That metric is F1, the harmonic mean of precision and recall. It’s the only number that matters for the rest of this series.

Technical Sidebar: F1, Precision, and Recall

– Precision: “of the rises I predicted, how many actually rose?” Low precision = too many false alarms.
– Recall: “of all actual rises, how many did I catch?” Low recall = too many misses.
– F1: harmonic mean of precision and recall. Punishes you for being good at one but bad at the other.
– F1 = 0.50 is mediocre. F1 = 0.65 is good. F1 = 1.0 is perfect (and impossible here).
– Accuracy is useless at this imbalance. A “predict nothing” model gets 99.7%.
– The key technical challenge: getting the model to care about the 0.28% that matters without drowning in false positives. Techniques tested: SMOTE oversampling (didn’t help), scale_pos_weight reweighting (worked). Note: the raw class ratio (0.28%) implies ~357:1 imbalance, but the production target (will_rise_3d at 5%+ ownership) has an effective ratio of ~11:1. A later ablation (Part 4) showed the optimal spw is actually very low. The model performs best when not heavily overweighting the positive class.

The hypothesis testing phase

Once I had the data visualised, I started testing specific ideas about how the algorithm might work. Not machine learning yet, just old-fashioned hypothesis testing. Does X cause Y? Does this variable matter?

Four hypotheses tested: three supported, one later resolved with better data
Four hypotheses tested: three supported, one later resolved with better data

H1: Ownership scales the threshold. Confirmed. For every 1% increase in ownership, a player needs roughly 1,508 more net transfers to trigger a rise. A player at 30% ownership needs about 30,000 more transfers than a player at 10%. That’s 65% of the median rise threshold. Ownership isn’t just a factor, it’s the dominant factor.

H2: Price scales the threshold. Confirmed, but weaker. About 9,467 extra transfers per £1.0 of price. A £10.0 player needs ~52,000 more transfers than a £4.5 player. Statistically significant, but later work showed price is actually negligible once you control for ownership. Expensive players tend to be highly owned, and it’s the ownership doing the work.

H3: Falls are easier than rises. Strongly supported. Falls require only 17% of the normalised transfers that rises need. About 6x easier to trigger a fall than a rise. The algorithm punishes selling harder than it rewards buying. This explains why prices tend to deflate across a season. A few bad weeks and a player’s price crumbles, but getting it back takes sustained heavy demand.

H4: Cumulative transfers matter more than daily. Couldn’t test directly at first. I didn’t have the algorithm’s internal cumulative state. But later, when I got ground-truth data from Supabase, this turned out to be the most important insight of the entire project. More on that in Part 3.

These were the first four. Over the course of the project, I’d end up testing around 50 specific ideas about how the algorithm works. Market floors, decay rates, cooldown periods, wildcard effects, flag change locks, ownership protection rules. Most confirmed. Some didn’t. A few turned out to be asking the wrong question entirely. This is the point I’m expecting a few of you reading this to suggest rules to test. Im hoping you can help me crack this

Technical Sidebar: Hypothesis Validation

– Method: isolate one variable, hold others constant, measure effect on rise/fall probability
– Ownership scaling: linear regression on rise threshold vs ownership%. R2=0.036, slope=1,508 (p = 3.25e-09)
– Price scaling: regression on threshold vs price. slope=9,467 (p = 2.72e-07)
– Falls vs rises: Mann-Whitney U test. Rise median = 3,939 normalised transfers, fall median = 670. Ratio: 0.17 (p = 1.17e-127).
– Bonferroni correction applied (alpha = 0.0125). All significant even under the stricter threshold
– Research: p7_hypothesis_testing.py, p7_report.py

The features that matter (first pass)

Armed with the hypothesis results, I built the first feature set. 22 features for the baseline model:

The obvious ones: net_transfers_daily (how many people moved today), cumulative_transfers (running total since last price change), ownership_percent, price.

The temporal ones: gameweek, days_to_deadline (the deadline drives transfer panic), is_deadline_day, day_of_week (less activity on weekends. People have lives, apparently).

The lagged ones: net_transfers_lag1 and lag2 (what happened yesterday and the day before), rolling_3d and rolling_7d averages, transfer_velocity (acceleration of transfers), positive_streak (consecutive days of net positive transfers).

The contextual ones: form, form_change, status (is the player injured/doubtful), prev_status, status_changed, became_available (just returned from injury).

And a few derived ones: excess_daily (transfers above the player’s average), excess_cumul (cumulative excess), v3_score (a hand-tuned formula from earlier experiments).

The first model

XGBoost. A gradient-boosted decision tree ensemble, the workhorse of tabular prediction. 200 trees, max depth 8, learning rate 0.1. Set scale_pos_weight=30 to handle the class imbalance.

Trained on older data, tested on recent data. Always chronological splits, never random, because random splits leak future information and give you results that look great and mean nothing.

F1 = 0.55.

Wrong almost as often as it was right. Precision around 0.50, recall around 0.60. It caught some rises, missed a lot, and flagged a fair number of false alarms. If this were a school report, it would say “shows potential, must try harder.”

But the feature importances were interesting.

Feature importance chart showing cumulative_transfers and ownership_percent dominating
Feature importance chart showing cumulative_transfers and ownership_percent dominating

cumulative_transfers, the running total of net transfers since the last price change, was the most important feature by a mile. Twice as important as the second place. ownership_percent was second. Everything else was background noise by comparison.

The model was telling me the same thing the hypothesis tests said: how much total transfer pressure has built up, and how many people own the player. Those two things matter more than everything else combined.

The ownership bucket system

Before moving on to fancier models, I tried the other approach: hand-tuned rules.

Split players into ownership buckets: 0-1%, 1-2%, 2-5%, 5-10%, 10-15%, 15-20%, 20-30%, 30%+. For each bucket, calculate the transfer threshold that best separates rises from non-rises. Add day-of-week adjustments (Monday and Tuesday have lower thresholds than Friday and Saturday. People make different decisions at different points in the gameweek).

Rise rate by ownership bucket: 0% at 0-1%, climbing steeply above 5%
Rise rate by ownership bucket: 0% at 0-1%, climbing steeply above 5%

The ownership bucket chart confirmed something important: below 1% ownership, the rise rate is zero in our data. Across all 4 seasons, 532,000 player-days. Even on 95 days when those players got over 20,000 net transfers, still zero observed rises.

Is it a hard rule? Almost certainly. A Clopper-Pearson exact binomial test gives a 95% confidence interval of [0%, 3.8%], meaning we can’t mathematically prove the rate is exactly zero, but it’s at most ~4% even in the worst case. With 95 high-transfer opportunities and zero successes, the pattern is overwhelming. The algorithm either blocks rises below 1% ownership entirely, or makes them so rare they’re effectively impossible.

The humbling result

So I had two approaches: an ML model with 22 features and an XGBoost ensemble, and a hand-tuned rule system with per-bucket thresholds and day-of-week adjustments.

ML model F1 = 0.55 vs hand-tuned rules F1 = 0.59: rules won
ML model F1 = 0.55 vs hand-tuned rules F1 = 0.59: rules won

ML model: F1 = 0.55. Hand-tuned rules: F1 = 0.59. The rules won.

Weeks of ML engineering. Gradient boosting, hyperparameter tuning, cross-validation. Beaten by a system I could have built in a spreadsheet. A few if-statements, per-bucket thresholds, and day-of-week adjustments outperformed 200 decision trees. The machine learned less than I could have worked out with a calculator and some patience.

This was humbling. But it was also the most useful result of the entire baseline phase. It told me two things. First, the bottleneck wasn’t the algorithm, it was the features. XGBoost is exceptionally good at finding patterns in tabular data. If it couldn’t beat hand-tuned rules with 22 features, the problem wasn’t that it needed more trees or a lower learning rate. The problem was that those 22 features didn’t contain enough signal. Second, the rules were winning because they encoded domain knowledge directly. Things like “ownership buckets matter” and “Monday is different from Friday” that the raw features couldn’t express.

The algorithm was hiding something. There were rules in the data that 22 features couldn’t capture. I needed to stop adding features and start thinking about what the algorithm was actually doing.

Technical Sidebar: Baseline Model (v1)

– Algorithm: XGBoost (n_estimators=200, max_depth=8, lr=0.1, scale_pos_weight=30)
– Also tested LightGBM with class_weight='balanced'. Similar F1
– 22 features (BASELINE_FEATURES in config.py)
– Temporal split: train on older data, test on recent data (no random split)
– Time-series cross-validation: p6_temporal_validation.py
– F1 = 0.5496
– Key insight from feature importance: cumulative_transfers >> net_transfers_daily. The algo tracks running totals, not daily snapshots
– Rule-based v3 (per-bucket thresholds + day-of-week conditional): F1 = 0.59. Beat the ML model
– The ML model lost to hand-tuned rules, confirming the features were the bottleneck, not the model


What’s next

F1 = 0.55 was the starting line, not the finish. I knew the features were the bottleneck. I knew ownership mattered. I knew transfers accumulated.

But I didn’t know the rules yet. The algorithm wasn’t just counting transfers. It was applying rules that 22 features couldn’t express. A market-wide floor. An exponential decay. A cooldown period after price changes. Protection rules for low-ownership players.

Each of those rules was sitting in the data. I just hadn’t looked in the right places.

Next: Part 3, “One Threshold to Rule Them All”


This is Part 2 of a 7-part series about reverse-engineering the FPL price change algorithm. Read Part 1: “The Rabbit Hole”. The research behind this series powers fplcore.com.

Tags:Data ScienceFPLMachine LearningPrice ChangesSeriesXGBoost
← View all posts