Thursday, February 11, 2021

2020 NFL Game Predictions: Season Recap

Welcome!  I intend this to be an ongoing project of predicting NFL game outcomes, point spreads, and final scores.  My hope is that my models will improve over time as they become more sophisticated and use better data.  I will try to regularly publish predictions to keep myself accountable, and will make observations along the way about what is working and what isn't.  See below for latest updates.  Enjoy!




Previous 2020 Predictions:

________________________________

Season Recap:

Here we are at the end of the 2020 season.  I have run predictions from week 2 all the way through the Super Bowl.  I have used two models to make these predictions, I have compared with the models at 538, and I have made my own predictions about particular games.  So how did I do? 

Model Statistics

Let's look at the stats:
  • Whole Season accuracy % (95% Confidence Interval)
    1. 538 QB: 68% (0.62, 0.74)
    2. Personal: 66% (0.60, 0.72)
    3. 538 Traditional: 65% (0.59, 0.71)
    4. LR Model: 63% (0.57, 0.69)
    5. RF Model: 63% (0.56, 0.69)
  • Regular Season accuracy % (95% Confidence Interval)
    1. 538 QB: 68% (0.62, 0.74)
    2. Personal: 67% (0.60, 0.72)
    3. 538 Traditional: 66% (0.59, 0.71)
    4. RF Model: 63% (0.56, 0.70)
    5. LR Model: 63% (0.57, 0.69)
  • Playoffs accuracy % (95% Confidence Interval)
    1. 538 QB: 62% (0.31, 0.86) - not significant
    2. 538 Traditional 62% (0.31, 0.86) - not significant
    3. LR Model: 62% (0.31, 0.86) - not significant
    4. Personal: 62% (0.31, 0.86) - not significant
    5. RF Model: 54% (0.25, 0.81) - not significant
538's QB adjusted model performed the best at 68% accuracy over the course of the whole season.  My personal predictions were second best at 66% accuracy.  The 538 traditional model took third at 65% accuracy.  My LR model barely edged out the RF model with both getting 63% accuracy.  So while my home grown models did NOT beat the professional models, they still did pretty well.  Sure, it would be nice to beat the professional models but that is unlikely without some major investment of time/energy in this beyond what is already being done.

The whole season and regular season results are statistically significant (two-tailed binomial test) while the playoff results in isolation are not. But that is not surprising with so few playoff games.  Also, this exercise does suggest once again that 70% seems to be an upper bound beyond which no model can accurately predict due to the randomness inherent in the game.  Anything above that would be unsustainable luck.

Trends

Week to week the predictions tended to stay between 40% and 80% accuracy with an average of just above 60%.  That changes in the playoffs with divergence in both directions, but throughout the regular season the models were fairly consistent.  A couple exceptions:
  • Week 2: 538 QB got 15 of 16 games correct, the highest weekly accuracy during the regular season of any model.
  • Week 9: the LR model gets 12 of 14 games correct.
  • Week 11: all of the models (except for the 538 traditional) performed worse than 50%, with the LR model going down to 29% (4 of 14 games correct).

The models start off in a larger spread of accuracy at the beginning of the season, but over time the cumulative accuracy stabilizes until the models are in a tight cluster by the end of the season.


Probability vs. Actuality of Win

How does probability of a win relate to actuality of a win?  Presumably, the greater the probability of a win, the more likely the actuality of a win.  Did this happen in the models?

In the LR model, this turned out to be true.  The average predicted probability for predicted wins that turned into actual wins was 66% compared with 61% for actual losses.  Similarly, the probability for predicted losses that turned into actual losses was 34% compared with 39% for actual wins.  That is, probabilities that were higher did tend towards actual wins while probabilities that were lower tended toward actual losses.

The LR model did tend to overestimate the probabilities, predicting an average of 36% for losses and 64% for wins.  In actuality, the average predicted probability for actual losses was 44% and 56% for actual wins, much closer together than what the LR model was predicting.



If I break the probabilities into clustered groups, we have the below view.  Generally speaking, the clusters that had the lowest predicted probabilities had the lowest actual probabilities.  But this was not always true.  For example, for teams with a predicted probability between 15% to 20% of winning a game, the actual probability was 36%, although the 20% to 25% category was only 31% actual probability.


How did the RF model do?  For one, the RF model did not have such extreme probabilities.  It produced an average probability of 38% for predicted losses that were actual losses compared with 41% for actual wins, and 62% for predicted wins that were actual wins compared with 59% for actual losses.  Thus, there is much less separation between predicted outcome probabilities (39%, 61%) and actual outcome probabilities (46%, 54%).




With the clustered probabilities, the RF model does a better job matching the predicted probability of outcome with the actual probability of outcome this season.  The lowest predicted probabilities do tend to match the lowest actual probabilities.


How do my models compare with 538's QB Adjusted model, which produced the best results this season?  The 538 QB Adjusted model looks a lot more like the LR model than the RF model.  When predicted to be a loss, the actual losses had an average probability of 34% compared with actual wins of 38%.  Similarly, predicted wins that resulted in actual wins had an average probability of 69% compared with 64% actual losses.  



Likewise, the 538 QB model tended to generally have predicted probability clusters that mirrored the actual probabilities, but not as well as the RF model.  Again, this was more similar to the LR model.


In short, all models tended to have average predicted probabilities that, more or less, mapped correctly to average actual probabilities this season.  Probability of win did tend to correlate with actuality of win.

Probability, Point Difference, and Team Score

What is the relationship between the probability of a win, the actual point difference, and the actual team score?  The relationship between the probability of a win and the actual point difference is not strong in my models, which is not too surprising as the models are designed to predict a win (binary: 1 or 0).  

The LR model gives an R^2 value of 0.12 with an equation of PD = 27.774*prob - 13.887 while the RF model gives an R^2 of 0.13 with an equation of PD = 39.038*prob - 19.519.  So a probability of 0% will predict a rounded point difference of -14 (LR) or -20 (RF), while a probability of 100% will predict a rounded point difference of 14 (LR) or 20 (RF), with both models predicting a point difference of 0 for 50%.  Not great but its something.




Team score is much more promising.  By forcing the intercept of the models to 0, the LR model yields an R^2 of 0.82 with an equation of score = 45.467*prob, while the RF model yields an R^2 of 0.84 with an equation of score = 47.15*prob.  Thus, a probability of 0% will predict a score of 0 in both models while a probability of 100% will predict a rounded score of 45 (LR) or 47 (RF).




So there is a fairly strong relationship between the probability of a win and the team score, although the probability of a win and the resulting point difference are not strongly related (and a different model would be required).

Team Model Stats

So much for the models themselves.  Now on to the teams.  What did my models think of  each team throughout the season in terms of their likelihood of winning each game?  Below are some interesting stats:

Regular season
  • The Chiefs had the highest expected value (LR: 10.73, RF: 8.30) for total games.  The Jaguars had the lowest (LR: 3.71; RF: 4.08)
  • The Chiefs had the highest average probability (LR: 72%; RF: 64%).  The Jaguars had the lowest (LR: 25%; RF 31%)
  • The Chiefs and the Vikings had the highest LR max predicted probability (92%) for a single match up.  The Jets had the lowest (50%), meaning they were never predicted to win a game with more than 50% likelihood.
  • The Colts had the highest RF predicted max probability (86%) for a single matchup.  The Broncos had the lowest (51%).
  • The Packers had the highest minimum LR predicted probability (49%), meaning that they were never predicted to have a probability of winning a game less than 49%.  The Jaguars and the Panthers had the lowest minimum predicted LR probability at 8%.
  • The Chiefs had the highest minimum RF predicted probability (54%).  They were never predicted to lose by the RF model.  The Ravens and the Packers were also never predicted to lose by RF.  The Jaguars had the lowest minimum predicted RF probability at 14%.
  • The Raiders had the highest Actual Wins - LR Predicted wins with 4 (predicted 3 wins; 7 actual).  The Cowboys had the lowest with -5 (11 predicted wins; 6 actual).  The Vikings also had -5 (12 predicted; 7 actual).
  • The Browns, Giants, and Lions had the highest Actual Wins - RF Predicted wins with 4.  The Ravens and the Vikings had the lowest with -3.
  • The Buccaneers had the highest Actual - LR Probabilities with 3.39.  They were expected to have 7.61 wins but ended up with 11.  My model clearly underestimated them.  The Bills had 3.37 (8.63 expected, but 12 actual).  The Jaguars had the lowest with -3.71 (3.71 expected, 0 actual).
  • The Chiefs had the highest Actual - RF Probabilities with 4.70 (8.30 expected, 13 actual). The  Bills had 4.69.  The Jaguars had the lowest with -4.08 (4.08 expected, 0 actual).
In short, the models judged the Chiefs to be the best team during the regular season and the Jaguars to be the worst team of the regular season.  The Raiders, Buccaneers, and Bills overperformed.  The Vikings and Ravens underperformed.


Playoffs
  • The Buccaneers had the highest LR expected value (1.99), but they played the most games.  The Bills were second (1.62) and the Chiefs were third (1.61).  The Bears were the lowest (0.32) with only one game and worst probability for that single game.
  • The Buccaneers had the highest RF expected value (1.92) with the Chiefs second (1.62).  The Steelers had the lowest (0.37) along with Washington (0.37).
  • The Chiefs had the most predicted LR wins (3) with the Buccaneers at 2.  For RF, the Chiefs, Bills, Packers, Saints, and Ravens tied for 2.  The Buccaneers were 1.
  • The Buccaneers had the most actual wins with 4.  That's a lot of wins in the playoffs.
  • The Seahawks had the highest average LR probability with 0.63 (only 1 game which they lost).  The Bears had the lowest (0.32 - 1 game).
  • The Saints had the highest average RF probability with 61%.  The Steelers and Washington had the lowest at 0.37.
  • The Saints had the highest max LR probability at 68% (against the Bears).  The Bears had the lowest (32%).
  • The Browns and Buccaneers had the highest max RF probability at 63%, against the two teams with the lowest: the Steelers and Washington (37%).  
  • The Seahawks had the highest minimum LR probability (63%), the Bears had the lowest (32%).
  • The Saints had the highest minimum RF probability (61%), while the Steelers and Washington had the lowest (37%).
  • The Buccaneers had the highest actual - LR predicted wins with 2 (2 predicted wins, 4 actual).  Several teams tied for -1: the Steelers, Titans, Chiefs, Seahawks, and Packers.
  • The Buccaneers had the highest actual - RF predicted wins with 3 (1 predicted, 4 actual).  Several teams tie for -1: the Packers, Saints, Ravens, and Seahawks.
  • The Buccaneers had the highest Actual - LR probabilities with 2.01 (1.99 expected, 4 actual).  The next highest was the Chiefs with 0.40.  The lowest was the Seahawks with -0.63.
  • The Buccaneers had the highest Actual - RF probabilities with 2.08 (1.92 expected, 4 actual).  Seahawks again had the lowest with -0.58.
In short, the Chiefs were the favorite in the playoffs.  The Buccaneers were the underdog, the overperformers, and the biggest surprise to my models.  The Seahawks were the biggest upset and playoff disappointment.  The Bears were the worst team in the playoffs.  


The Best Team:

Which was the "best" team this season?  If each team were to play every other team once, which team would have the most wins?  By summing up the probabilities of each game matchup generated by the LR model, the 32 game season would have a final top 5 rank of:
  1. Chiefs
  2. Packers
  3. Ravens
  4. Saints
  5. Bills
Chiefs remain on top at 1.  Packers remain at 2.  Ravens jump up to 3.  Saints move to 4 from 5.  Bills slide from 3 to 5.  Buccaneers (oddly) move from 4 to 8, contrary to their crushing Super Bowl victory against the Chiefs.

By summing the outcome of each game matchup (1 point per win), the final top 5 rank would be:
  1. Chiefs
  2. Packers
  3. Buccaneers
  4. Ravens
  5. Bills
Chiefs remain at 1.  Packers move up to 2 from 3.  Buccaneers move from 2 to 3.  Ravens take 4 and Bills slide to 5.  

What does the RF model say?  The top 5 by sum of probabilities of each game matchup are:
  1. Packers
  2. Chiefs
  3. Buccaneers
  4. Bills
  5. Saints
The RF model is largely in agreement with the LR model.  Packers are number 1 instead of the Chiefs, but barely.  Chiefs are 2 instead of 1.  Buccaneers are 3 instead of 8.  Bills take 4 (instead of 5) and Saints take 5 (instead of 4).  The Ravens are placed at 9.  

The top 5 by outcome of each game matchup are:
  1. Buccaneers
  2. Packers
  3. Chiefs
  4. Saints
  5. Bills
The Buccaneers take the number 1 spot (finally) instead of 3.  Packers take 2 as they did before.  Chiefs are 3 instead of 1.  Ravens move to 8 instead of 4.  Saints are 4 instead of 6.  Bills are 5 in both.

So how do we arrive at an answer to our question?  One way is to average the ranks into a single rank to balance out the strengths/weaknesses of each model and way of ranking.  When we do that we find the following:
  • 1 and 2. Chiefs/Packers: on average, they tie in rank.  They both do well in rank by expected value and in rank by total wins.  Yes, both teams were beat by the Buccaneers, but the models suggest that both teams would have more expected wins and total wins than the Buccaneers (and in the LR model, both teams are still predicted to win against the Buccaneers in a single game matchup).  If pushed, I would choose the Chiefs as the best team, following the LR model which performed better than the RF model throughout the season and in line with other statistics already mentioned.
  • 3. Buccaneers : The Buccaneers are hard to judge.  They won the Super Bowl and beat both the Chiefs and the Packers.  However, they were a 5th seed coming in to the playoffs and finished the season at 11-5.  They rank poorly by expected wins for the LR model and are 3rd for the RF model.  However, they perform well for total wins: 3rd for LR and 1st for RF.  So the models judge them to win a lot of games by a narrow margin.  Balancing all of these factors out places them in 3rd place behind the Chiefs and Packers.  So yes, that would mean that the better teams (Chiefs and Packers) lost to a worse team (Buccaneers) in the playoffs,  and the "best" team did NOT win the Super Bowl, but that's always a possibility in the playoffs and in football.
  • 4 and 5.  Saints and Bills: both tie in average rank.  Both teams are consistently 4 and 5 in expected value and total wins.
  • 6-10. In order of average rank: Ravens, Seahawks, Steelers, Colts, and Titans.
In short, the best teams were the Chiefs/Packers, then Buccaneers, then Saints/Bills.  The numbers are below, sorted by combined average rank.



The Worst Team:

Based on the above rankings, the "winners" for worst team were the Jets and Jaguars.  The LR model chose the Jaguars as the worst, followed by the Jets, while the RF model picked the Jets as worst, followed by the Jaguars.  When averaged, they tied for last.  Based on other stats I would probably choose the Jaguars as the worst team if forced to choose between the two.

The Best Team to Miss the Playoffs:

Which was the best team to miss the playoffs? I'd suggest that the Dolphins were the best team to miss. The Dolphins ranked ahead of several teams that did make the playoffs (Rams, Washington, Bears).  They finished at 10-6, second in the AFC East behind the Bills.

The Worst Team to Make the Playoffs:

Which was the worst team to make the playoffs?  Here are the two possibilities:
  • Bears - the Bears had an average rank of 19 and finished the regular season at 8-8.  However, they got the 7 seed wild card for finishing second in the NFC North.
  • Washington - Washington had an average rank of 15.25 and finished the regular season at 7-9, winning the NFC East division to get the 4th seed.
Between these two and based on other statistics, I'd choose the Bears as the worst team to make the playoffs.

The Biggest Playoff Upset

The Seahawks were the biggest playoff disappointment.  They had the highest probability for winning (63%) of the teams that didn't win a game in the playoffs and consequently, the biggest negative gap between expected wins and actual wins.  As a Seahawks fan, this is especially disappointing.

Summary and Conclusion

I could endlessly analyze and explore, but this post is already too long and life demands that I move on to other things.  But to summarize my findings:
  • My models were 63% accurate while 538 was 68% accurate.
  • The Chiefs were the best overall, the Jaguars were the worst overall.
  • The Buccaneers, although they won the Super Bowl, were NOT the best team this season.  But credit is due to them for beating the models over and over again. 
  • Don't bet against Tom Brady
So that concludes my predictions and analysis for the 2020 NFL season.  I hope you enjoyed these posts.  See you in a few months for the 2021 season with new predictions, and hopefully some model improvements and additional insights.

Thanks!