Wednesday, September 30, 2020

2020 NFL Game Predictions: Week 4

Welcome!  I intend this to be an ongoing project of predicting NFL game outcomes, point spreads, and final scores.  My hope is that my models will improve over time as they become more sophisticated and use better data.  I will try to regularly publish predictions to keep myself accountable, and will make observations along the way about what is working and what isn't.  See below for latest updates.  Enjoy!




Previous 2020 Predictions:
_________________________________________________________________________________
Recap:
11-4.  The models certainly did better.  The Steelers-Titans game was postponed due to Covid 19 illnesses and will be played in week 7 after some schedule reshuffling.  All other games were played, although the Patriots had to play without their starting QB.  The logistic model had an expected value of 9.38 while the random forest model had an expected value of 8.78, so actual performance this week was higher than expected.

Of the oddities and bold predictions, 3 of the 4 games went my way.  Only the Bengals won against my model predictions.  Upsets included the 49ers loss to the Eagles and the stunning victory of the Browns over the Cowboys.

On to week 5!
_________________________________________________________________________________
 
Week 4:
Week 4 brings an additional new model: a random forest.  A random forest model was chosen because the outcome we are trying to predict is categorical ("win" or "lose").  These models are easy to train (although not as fast as the logistic regression model) and are known to be very powerful and accurate with many decision trees in the forest.  They also automatically perform feature selection and are easy to interpret.

Perhaps a more important point, a logistic model assumes that the variables are linearly related to the target, and I suspect that this is not true.  Said differently, a logistic model roughly assumes that variable x + variable y + variable z = target A.  However, football may not work that way.  It may not be the case that having a certain winratio + years since the playoffs - starters injured = "win".  

Instead, the relationship may be more like, "if starting QB is injured->"lose", else, if win ratio is greater than 0.9->"win", else, if years since playoff is greater than 5-> "lose"...."  That is, various combinations of variables may lead to a win or loss, but the specific combination matters, and changing any values in that combination even slightly can lead to a different result.  In short, we would have a non-linear relationship between the variables and the target.

A non-linear relationship is exactly the sort of relationship that a random forest model assumes.  Consequently, it may perform much better than a logistic regression model.  I intend to run both models in parallel to see how each does with the hope that the random forest model is in fact better and can be used as my production model in future weeks and years.

The random forest model perform largely the same on training and test data as the logistic regression model with 72% accuracy.  Will it perform better than logistic regression in practice?  I hope so.

Which variables matter most in the random forest model?  In order:
  •  Rolling winratio over past 16 games
  • Opponent's years since the playoffs
  • Team's average point difference with opponents
  • Rolling winratio over past 32 games
  • Opponent prior year points for
  • Average team score
  • Average opponent rushing yards
  • Team's years since playoffs
  • Average opponent's total yards
  • Average opponent's passing yards

I notice that although there are some important variables that overlap with the prior logistic regression model (e.g., winratio, years since the playoffs), there are more traditional stats in this model as well (e.g., passing yards) and it seems to be less redundant.  Hopefully that means it can provide more nuance and better predictions overall.

I also retrained the logistic regression model using automated feature selection and interestingly, it now includes other more traditional stats as well.  So hopefully it will also do better than previously.

Now for the predictions.

Oddities and Bold Predictions:
Before retraining the logistic regression model, I noticed some glaring differences between the random forest model and the previous logistic regression model.  While most predictions were the same, some obvious discrepancies along with the poor performance thus far encouraged me to retrain the model.  After retraining, both the logistic regression model and the random forest model have the same picks for this week, which is convenient and a little strange.  But hopefully that means better performance going forward.
  • Bears vs. Colts
    • Both models pick Colts to win.  But Bears are 3-0 and playing at home. while Colts are 2-1.  Both have played mediocre teams so far.  I'm kinda leaning towards the Bears personally.  538 also picks the Bears.
  • Bengals vs. Jaguars
    • The old model predicted a Bengal's victory while the new models pick the Jaguars.  Bengals are 0-2-1 and Jaguars are 1-2 but against weaker teams.  Not really inclined either way.  538 picks the Bengals to win.
  • Buccaneers vs. Chargers
    • The old model had Buccaneers losing, which is clearly the wrong prediction.  The new models make the "right" choice in my opinion by picking the Buccaneers.  538 agrees.
  • Texans vs. Vikings
    • The old model barely picked the Texans to win, but the new models barely pick the Vikings to win.  Texans play at home.  Both are 0-3 but both have played good teams.  No personal inclination on this one.  538 picks the Texans to win.
Good luck in week 4!

Here are the predictions for Week 4:




Tuesday, September 22, 2020

2020 NFL Game Predictions: Week 3

 Welcome!  I intend this to be an ongoing project of predicting NFL game outcomes, point spreads, and final scores.  My hope is that my models will improve over time as they become more sophisticated and use better data.  I will try to regularly publish predictions to keep myself accountable, and will make observations along the way about what is working and what isn't.  See below for latest updates.  Enjoy!




Previous 2020 Predictions:
_________________________________________________________________________________
Recap:
7 - 8 - 1.  With an expected predicted value of 10.2, actual result was 7.5 (the tie is "half" correct).  Not going so well thus far.  Some definite surprises as Lions beat Cardinals and Bengals tied Eagles.  I expected Seahawks to win against my model's prediction and I wasn't surprised to see the other missed predictions as most were close probabilistically. That's the way it is some times.  Hopefully things improve next week.
_________________________________________________________________________________
 
Week 3:
Week 3 brings a retrained logistic model.  As the game has changed somewhat since 2017, a model that relies on features and weighting that was predictive in 2017 may not be as effective in predicting games in 2020.  Hence, the need for retraining.

The result of the retraining is a logistic model that predicts with training and test accuracy of about 73% (11 or 12 predictions of 16 per week).  In practice, the accuracy has been much lower, suggesting that I am overfitting in some way still (and will need to do some investigation as to what needs changing).  Important features in order of importance include:
  • Rolling win ratio of team over past 16 games
  • Team prior wins in season
  • Team season win ratio
  • Rolling win ratio of team over past 32 games
  • Opponent's years since playoffs
  • Opponent's season win ratio
  • Opponent's years since playoffs with the same coach
  • Opponent's years with same coach
  • Opponent's top 1 passer playing in the game
  • Team's top 1 passer player in the game
  • Team's years since playoffs with the same coach
A lot of these variables are collinear and a future to-do item will be to test for and remove collinear variables.


Oddities and Bold Predictions:
  • Dolphins vs. Jaguars
    • 538 picks Jaguars but but my model chooses Dolphins by quite a bit.  Dolphins are 0-2 against Patriots and Bills and did fairly well.  Jaguars 1-1 against Colts and Titans.  I am sort of inclined to go with my model on this one.
  • Bills vs. Rams
    • Essentially a toss up.  Rounded to three decimal places the model predicts 0.500 victory for Rams.  538 goes for Bills.
  • Browns vs. Washington
    • My model has Browns losing, but 538 has them winning. Both are 1-1 but Washington played (IMHO) better teams.  I'd choose Washington for this one.
  • Cowboys vs. Seahawks
    • My model picks against Seahawks again.  But Seahawks are home and are 2-0 while Dallas is 1-1.  I'd definitely go with Seahawks, as does 538.  I don't know why my model doesn't like the Seahawks.
  • Packers vs. Saints
    • The model picks the Saints to win, but I don't know.  Packers are 2-0 and Saints are 1-1 although they will be playing at home.  538 sides with Saints, and though it's a close call, I think I'd still pick Saints.
Good luck in week 3!

Here are the predictions for Week 3:













Wednesday, September 16, 2020

2020 NFL Game Predictions: Week 2

Welcome!  I intend this to be an ongoing project of predicting NFL game outcomes, point spreads, and final scores.  My hope is that my models will improve over time as they become more sophisticated and use better data.  I will try to regularly publish predictions to keep myself accountable, and will make observations along the way about what is working and what isn't.  See below for latest updates.  Enjoy!



Previous 2020 Predictions:
_________________________________________________________________________________
Recap:
8-8.  Not a great start, especially with an expected predicted probability of 10.78 games correct.  Some games went the way I personally expected contrary to the model (e.g., Bills, Buccaneers, Seahawks).  The surprise was Saints losing to Raiders.  But that's football, right?  Maybe the model is a little nervous about its first performance of the year... 

On to next week.
_________________________________________________________________________________
 
Week 2:
First week of predictions!  I have my dusty code up and running, which had last been run over two years ago.  After package updates, syntax changes, name changes (i.e., Washington), and other code adjustments I have a first successful pass at the predictions.  So I begin with a model and code as good as I left it, which looking back on it now, isn't great.  But it's a place to start.

A logistic regression model on the data yields a 72% accuracy on training data and a 72% accuracy on test data.  AUC is 70% for both training and test.  This is consistent with past usage of the model.  Sensitivity and Specificity are roughly the same, and F1 is also about 72%.  So the model is fairly balanced in predicting positives/wins and negatives/losses.

In practice, the model yielded about 61% accuracy in 2016 and 63% accuracy in 2017.  Fivethirtyeight.com scored 64% in 2016 according to my calculations, and Elliot Harrison has a lifetime record of 65.5%.  I can't remember where I read this, but I seem to remember reading an article that hypothesized something like a 70% upper bound prediction accuracy, calculating that 30% of a win is pure random luck instead of team skill or decision making (e.g., ball bounces this way vs. that way, wind direction, etc.).  So a good goal for this season is to beat my own previous seasons, perhaps having an overall accuracy of 64%, with the realistic understanding that I am approaching the upper bound of what is possible in this field.  However, I am in good company amongst these other predictors.

Oddities and Bold Predictions:
Not having rigorously followed NFL player movements or team trends in the past two years, I do not quite have the sense for which teams are "good" and "ought" to win.  But here are some predictions that stand out to me.
  • Browns vs. Bengals
    • Browns win.  I guess the Browns are good now?  538 agrees.
  • Dolphins vs. Bills
    • Dolphins beat Bills.  Personally, I'd go the other way.  538 has Bills winning.  We shall see.
  • Eagles vs. Rams
    • Eagles win.  538 has the Rams winning but barely (51%).
  • Panthers vs. Buccaneers
    • Panthers win?  Probably not.  My model doesn't truly grasp the meaning of "Tom Brady".  538 also picks Buccaneers to win.
  • Patriots vs. Seahawks
    • The model picks Patriots, but I'd personally go with Seahawks.  With the Patriot's loss of Brady, and playing away, while Seahawks are at home and have lots of stability from the prior season, I'd think Seahawks are a much better pick.  And I'm not just saying that as a Seahawks fan...  538 chooses Seahawks.
  • Washington vs. Cardinals
    • My model picks Washington with 79% and 538 picks Cardinals with 68%.  Are the Cardinals good again?  I know they beat SFO last week.  I don't know what to make of this one.

So some oddities and perhaps bold/crazy predictions.  I may get lucky, right? You never know... It's a new season! Good luck!

Here are the predictions for Week 2:









Wednesday, September 9, 2020

2020 NFL Game Predictions: Kickoff

 Welcome!  I intend this to be an ongoing project of predicting NFL game outcomes, point spreads, and final scores.  My hope is that my models will improve over time as they become more sophisticated and use better data.  I will try to regularly publish predictions to keep myself accountable, and will make observations along the way about what is working and what isn't.  See below for latest updates.  Enjoy!



 _________________________________________________________________________________

Kickoff

I last did football predictions for the 2017 season, and even then only for the playoffs.  It's been over two years since I ran the code to ingest data and output predictions and analysis of the regular and post NFL seasons.  However, after completing my MS Data Science degree and having work and life somewhat calm down again, I have taken this up again for the 2020 season.

Who really knows how the season will play out?  With many players sitting out due to concerns over the virus, with games and schedules likely to be changed or canceled, with no-preseason to get teams ready for the regular season, I don't really know how well training data from past years will predict this year's game outcomes.

However, with low expectations for the season, perhaps it is the perfect time to try new things in code, debug, retrain, experiment, and otherwise develop and advance this yearly NFL prediction and analysis effort.  At the very least, it is a good time to try and get this going again.  Accordingly, I will do so as time permits, while recognizing that I also do not know what my life holds in the coming months and will likely need to adjust.

My plan is to at least do predictions and recap every week starting with the second week of the regular season.  Games start tomorrow, so look for my first round of predictions a week from now.  Time to get coding...

Good luck!


Thursday, September 3, 2020

Let's Begin... Again...

I started this blog on November 1, 2014.  Having now completed my MS in Data Science this past June 2020, I find that I have again some amount of time that can be once again devoted to blogging, as well as the desire to do so.  Revisiting my first post, I do find much of the same sentiments expressed there to still be true.

"...I recognize the value of blogging personally for me.  It has generated good conversations in the past with others, and I have enjoyed the thought and work that has gone into various posts.  It has kept my mind active and my skills sharp.  It provides an outlet for creative work that is above and beyond what I do for my day job."

Since writing this, I have been blessed with two children, I have completed a MS in Data Science, and transitioned to a role in Data Science in a leading Seattle area tech company.  A lot has happened, but this sentiment remains true.  Despite the challenges of family and work life, and especially in the context of 2020 Covid-19 challenges and social unrest, the desire to create, to write, to analyze, to think, remains the same.

Hence my return.

My blog is called "Philosophical Analytics" to combine my love of philosophy and of analytics and to rely on both disciplines to inform and benefit the other.  In going forward, I intend to look back and revisit prior blogs, updating, revising, and maybe contradicting my prior self.  I also intend to look and move forward with new ideas and interests, hopefully beneficially informed by six additional years of academic, professional, and life experiences.

To quote my past self, " I hope someone out there finds something that I write and do on this blog insightful, intriguing, and interesting.  If not, if only I benefit from what I do here, it will still have been worth it."

Let's begin again...