Wednesday, December 27, 2017

Data Science and Philosophy of Science: What Makes a Model Good?

Introduction

In a previous article, I discussed philosophical views on the nature of scientific theories, and applied these discussions to data science models.  I concluded that data science models, the terms they invoke and the relationships they postulate, ought to be considered to correspond to reality in some way.  That is, a model's terms do in fact represent something real in the world (although this may be an abbreviation, summary, or approximation of potentially many real entities).  Similarly, a model's prescribed relationship does represent something real in the world (e.g., a causal relationship amongst the terms in the model, or amongst hidden terms that make up the terms in the model, or....).  While such correspondence may only be approximate and fall far short of 100% perfection and predictive accuracy, nevertheless, it is not merely useful.  It does approximate the truth, or attach to reality, in some albeit imperfect way.

Whether or not you agree, let's move on to another question in the philosophy of science that does not necessarily depend on how you answered the realist/anti-realist debate: what makes a good scientific model?  How does this apply to data science models?  Let's explore some ideas and then summarize at the end.

The Problem of Induction

Induction is the formation of generalizations or laws on the basis of past experience.  We believe that future occurrences will behave like past occurrences, and so on the basis of past occurrences, we can predict future occurrences.  For example, based on past experiences, we believe that we know (and have mathematically formulated a law) such that when billiard ball A hits billiard ball B in a certain way with a certain force in conditions X, Y, Z, etc., then ball A will go in this direction at this speed and ball B will go in that direction at that speed.

However, we have no guarantee that the past will be like the future in most cases, as there are not typically necessary relationships between the objects we are interested in.  It is conceivable, because it is not a matter of logical necessity, that ball A will spontaneously combust, or turn into a carrot, when it hits ball B.  Such a thing has never occurred before, but that doesn't mean it cannot happen.  Such thoughts have caused some people (most famously David Hume) to be skeptical about our ability to acquire knowledge through induction.

And yet, this is precisely what we do in the sciences.  Even in the absence of logical necessity, we believe that we know what will happen to ball A and ball B in these circumstances, and we can reliably predict what does in fact happen with a very small margin of error.  We even go so far as to form a law, a matter of physical necessity, to explain this relationship.

But what do we do in the face of competing "laws" that both explain the data we have?  Which theory do we go with and use for future research and development of theory?  This is the problem of induction.  How can we justify inductive inferences?  That is, how can we make universal or natural law claims based on experience, when so many alternative claims could be postulated?

Falsifiable

Enter Karl Popper.  His goal is to answer the problem of induction and to distinguish true scientific theories from pseudo-scientific theories.  He observes that it is really easy to formulate a theory that explains the known data, since it is done so using that data (hindsight is 20-20).  While this theory may be correct, one can think of many alternative theories that also explain the data.  How can one tell which theory to accept?

Popper answers that each theory must make so called risky predictions, that is predictions which one should expect to be false unless the theory is right.  A theory that is not refutable is merely pseudo-scientific.  Once we have excluded the pseudo-scientific theories and we have competing scientific theories, we can test them on the basis of what each predicts, focusing in particular on where they would disagree in a prediction.  That is, each theory must propose hypotheses that are then empirically tested after the theory has been formulated.

Conclusions are deduced from the theory, and these are then compared against each other to make sure that the theory is internally consistent, externally consistent with other unfalsified theories, and that when it makes a prediction, that prediction is correct.  When a theory fails to predict accurately or is discovered to be inconsistent, it is falsified.  If it is not inconsistent and does accurately predict, it is acceptable for use (although it may be falsified in the future).  In this, Popper proposes a deductive style method of testing.  We deduce in a manner similar to this: if theory A is true, then X must occur.  X did not occur.  Therefore, A is falsified.

In short, "Science in his view is a deductive process in which scientists formulate hypotheses and theories that they test by deriving particular observable consequences. Theories are not confirmed or verified. They may be falsified and rejected or tentatively accepted if corroborated in the absence of falsification by the proper kinds of tests" (Stanford Encyclopedia).   Theories are true so far if they are successful in making predictions and surviving falsification. Theories are judged by the deductive consequences of the hypotheses they make.
So a virtue of a theory is its ability to be falsified.  Theories that make stronger claims are more falsifiable because the predictions they make are bolder, and typically, more informative.
While this is all well and good, we still have a problem: we can have two theories that are both unfalsified and that make different predictions.  Which should we use until those predictions can be tested?  To answer, let's look at some other virtues that make a model good.

 

Elegance and Parsimony

A theory that is more simple is to be preferred over a more complex theory, all else being equal.  Simplicity can refer to both syntactic simplicity (the number of complexity of hypotheses in a theory; it is elegant) and to ontological simplicity (the number and kinds of entities postulated by the theory; it is parsimonious) (Stanford Encyclopedia).  Most well known, Occam's razor asserts that “entities must not be multiplied beyond necessity."
So why should we prefer more elegant and parsimonious theories?  That is, when faced with a choice between two theories that both explain the data equally well, why choose one over the other on the grounds that one is more simple?  To answer, let us consider the field of epistemology, that is, the study of knowledge.  Knowledge is said to consist in having a justified and true belief.  When faced with competing theories, we are asking ourselves which theory we ought to believe to be true, so our focus is on the justification for each theory.  Now we have already said that each theory is consistent with the data, so what other grounds do we have for believing one theory to more likely be true than another?  Which is more justified?
The simpler theory is more likely to be true because of probability.  Each entity in a theory has a probability of existing or having a certain relationship with the other entities.  So the more we multiply the entities and relationships, the more we multiply probabilities, which always being less than 1, lowers the overall probability.  For example, suppose you have a theory with 2 entities postulated versus 3 entities.  If each entity has a probability of existing/having a certain relationship of 0.75, then the former theory has a probability of  (0.75)^2 =  0.56 versus the latter theory of (0.75)^3 = 0.42 of being true.  Probabilistically speaking, you ought to prefer the former theory because it is more likely to be true, and since the theories are otherwise indistinguishable, you have no other reason to prefer the latter theory.
Or returning to epistemology and the notion of justification, you have no reason for choosing a more complex theory over a more simple theory when both are equally explanatory of the data.  Suppose for example that you return home and find that your house has been robbed.  What would you conclude?  You know that at least one person must have robbed your house.  But are you justified in believing that two people robbed your house?  What about an alien from outer space that came and robbed your house?  If you have no reason to believe that more than one person robbed your house (or that an alien robbed your house), then it seems you are not justified in believing so.  Instead, you must hold the theory that only a single robber broke into your house.  This in spite of the fact that two robbers did really break into your house (unknown to you).  That is, you must hold the most simple theory that explains the data to be true in order for that belief to be justified.
Granted, judgements about which theories are more simple, elegant, and parsimonious can be subjective to a degree.  We may have disagreements in certain cases.  However, we all intuitively have some understanding about what we are talking about and can agree on many cases that one theory is simpler than another.

Predictive and "Accurate"

These last three virtues are mentioned in the discussion on falsifiability, but deserve more attention in their own right.  The first is that a model must be predictive.  This is related to being falsifiable, in that a falsifiable theory makes predictions that can be proven to be false.   But we are interested in theories that not only make predictions, but that make accurate predictions.  In Popper's terms, we want theories that are strongly falsifiable and have failed to be falsified.  These are our best theories and we have made lots of relatively accurate predictions based on conclusions derived from their claims.   Consequently, they are extremely useful in advancing our understanding of the world and our interaction with it, according to our aims and purposes.

Coherence

A theory in order to not be falsified must be internally and externally consistent.  We can think about this in terms of coherence.  First, the theory must be internally coherent: any claim that the theory makes must not contradict any other claims by the theory.  Such contradictions can be logical, or less strongly, physical.  Even better is the case when the claims are supportive of each other (without being simply alternative ways of saying the same thing).  Second, the theory must be externally coherent: it must not contradict (unless it is challenging the existing paradigm) any of the best scientific theories.

Informative and Explanatory

While there are perhaps other virtues that could be considered, let us consider a final one here.  We do not want theories that are merely predictive and accurate.  We want to understand why.  Thus, we expect a good scientific theory to be informative, to explain why things are the way they are in the world.  It will postulate the causal mechanisms that explain why something happens the way the theory accurately predicts.  It will provide direction for new avenues of research in light of those causal explanations.  In short, we do NOT want a black box, no matter how accurate that black box may be.

Data Science and Model Virtues

So how can the above be applied to data science models?

Falsifiable

A data science model must be falsifiable.  It must make predictions (i.e., hypotheses) that are capable of being false, and are tested accordingly.  This is why separating one's data into a training set, test set, (and verification set) is so important: it keeps one's model falsifiable.  When one builds a model on all of the data, one can have an extremely accurate model when only looking at the data at hand.  However, one is in danger of overfitting the model.  One is modeling aberrations, errors, outliers, or biases in the sample data, and consequently, the model will not generalize to future data.  It has NOT captured the real relationships underlying the data.  Using a hold out test set can keep your model honest, and make sure that your model will generalize to data that it has not seen before.
Furthermore, doing so prevents you from refitting the model with each new addition of data.  If one were to receive data on a daily basis, and on that basis, retrained the model, and if that model significantly changed each day, how confident would you be that your model was going to predict well?  If it would predict something today and something different tomorrow, then your model is not stable and it is not going to make accurate predictions.  It is no longer useful.  It is as though your model is changing its mind every day, changing with the wind, and never subjected to critical scrutiny because it is always explaining the latest data without being held accountable for the inaccurate predictions it is making.  This would be a pseudo-scientific model.

 

Elegance and Parsimony

A data science model must be as simple as possible or as is necessary, according to one's purposes. Why?  Again, it is more likely to be "true" in the sense that one is more likely to have captured that actual relationships among the independent variables and their relationship to the dependent variable. But there is a challenge here, because more simple data science models tend to not be as accurate or predictive, and this can be due to excluding variables that are predictive, even to a small degree.  So we don't want a model to be too simple, and yet, we don't want it to be too complex either given overfitting.  We want to have a model that is as simple as possible without sacrificing accuracy and one that generalizes well when tested. 

 

Predictive and "Accurate"

A data science model must be predictive and accurate.  This is the whole point!  We want to accurately predict unknown values.  If a model doesn't do this, it doesn't matter if it is elegant or falsifiable.  It isn't true.  It does not accurately model reality.  Your model must generalize to new data.

Coherence

A data science model must be coherent.  I suppose one could have a model that contains a variable that is nearly the opposite of a different variable in the model, and the model could use them both.  While possible, I am not sure that both variables would survive even minimal feature selection.  Nevertheless, if your data science model is incoherent in some way, correct it, or look into why your model is paradoxical in this way.

 

Informative and Explanatory

Does you data science model explain, inform, or illuminate the relationships among the variables you are using to predict?  That is, when you look at the coefficients for your linear model, or the branches in your decision tree, do you understand or get an "aha"?  A good model will help us understand what is really going on.  This is especially important when one wants to know what action to take.  Is it better to add square footage or add a new roof if one is trying to improve the resale value of a home?  A good model should be able to tell us and quantify an answer.  This is where avoiding overfitting is so important, because an overfitted model will not have stable or reliable relationships among its variables, and so these cannot be relied on for informing decisions.
There is a downside though, in that some types of modeling are extremely accurate (i.e., neural networks) but are very difficult to interpret.  This is not always a problem if one does not need to understand why the model predicted in the way it did.  If the outcome is all that matters, then interpretability is not as important. 

 

Conclusion

Unfortunately, there are no hard and fast rules here for guidance on how to create good models, which is why model development in data science can be likened to an art or skill.  But with practice, one can develop this skill.  Consider these high level principles when creating your model.  Reference them and work to make sure that either your models have these virtues or that you have really good reasons for lacking them.  If you do so, you will have a good model.

 

Wednesday, December 20, 2017

2017 NFL Game Predictions: Rest of Season, Playoffs, Best Team

 Welcome!  I intend this to be an ongoing project of predicting NFL game outcomes, point spreads, and final scores.  My hope is that my models will improve over time as they become more sophisticated and use better data.  I will try to regularly publish predictions to keep myself accountable, and will make observations along the way about what is working and what isn't.  See below for latest updates.  Enjoy!


  _________________________________________________________________________________

Welcome!  In years past I have predicted NFL scores and outcomes through the season and into the playoffs and Superbowl.  With work, school, and family life, this was not possible this year.  However, I managed to scrap together a little time to run the code to see what the rest of the season looked like.  With a few minor changes to the code, I was able to successfully run predictions for week 16, 17, the playoffs, Superbowl, and a "best" team prediction.  Given that I may not have time to revisit this, I lay out the predictions for the rest of the season below.

Week 16:
Not having followed much of the NFL this year, I am not sure I can really comment on the below games.  But from what little I know (a quick sanity check using win/loss comparisons), the below outcome predictions seem reasonable, although some of the probabilities strike me as a little strange.





Week
Date
Team
HomeAway
Opponent
ProbabilityWin
PredictedTeamWin
16
12/23/2017
Colts
@
Ravens
0.133
0
16
12/23/2017
Packers
Vikings
0.211
0
16
12/23/2017
Ravens
Colts
0.867
1
16
12/23/2017
Vikings
@
Packers
0.789
1
16
12/24/2017
49ers
Jaguars
0.406
0
16
12/24/2017
Bears
Browns
0.727
1
16
12/24/2017
Bengals
Lions
0.295
0
16
12/24/2017
Bills
@
Patriots
0.371
0
16
12/24/2017
Broncos
@
Redskins
0.257
0
16
12/24/2017
Browns
@
Bears
0.273
0
16
12/24/2017
Buccaneers
@
Panthers
0.075
0
16
12/24/2017
Cardinals
Giants
0.866
1
16
12/24/2017
Chargers
@
Jets
0.774
1
16
12/24/2017
Chiefs
Dolphins
0.628
1
16
12/24/2017
Cowboys
Seahawks
0.399
0
16
12/24/2017
Dolphins
@
Chiefs
0.372
0
16
12/24/2017
Falcons
@
Saints
0.447
0
16
12/24/2017
Giants
@
Cardinals
0.134
0
16
12/24/2017
Jaguars
@
49ers
0.594
1
16
12/24/2017
Jets
Chargers
0.226
0
16
12/24/2017
Lions
@
Bengals
0.705
1
16
12/24/2017
Panthers
Buccaneers
0.925
1
16
12/24/2017
Patriots
Bills
0.629
1
16
12/24/2017
Rams
@
Titans
0.581
1
16
12/24/2017
Redskins
Broncos
0.743
1
16
12/24/2017
Saints
Falcons
0.553
1
16
12/24/2017
Seahawks
@
Cowboys
0.601
1
16
12/24/2017
Titans
Rams
0.419
0
16
12/25/2017
Eagles
Raiders
0.813
1
16
12/25/2017
Raiders
@
Eagles
0.187
0
16
12/25/2017
Steelers
@
Texans
0.948
1
16
12/25/2017
Texans
Steelers
0.052
0

Week 16 Results:
  • Estimated: 11.743 (12) correct, so 4 incorrect
  • Actual: 14 - 2
    • Jaguars lost to the 49ers
    • Bengals beat the Lions
  • Comments
    • Seahawks may actually make it!  I have the Falcon's losing in week 17 and the Seahawks winning.  If both happen, Seahawks make the wildcard spot!
    • I have the Titans losing to the Jaguars, so they would go 8-8.  Chargers will beat the Raiders, so they will go 9-7.  Bills will beat the Dolphins so they will go 9-7.  Since the Chargers beat the Bills earlier this season, I'd now pick the Chargers to get into the playoffs over the Bills.
    • I'm a little shaken in my confidence in the Jaguars to be the team to win the Superbowl.  The 49ers are looking really good though and are a different team with Jimmy Garoppolo, having won 4 in a row since he became the starter.  So perhaps it is not the end of the world for my audacious predictions.
Week 17:
What about week 17?  Again, hard to comment since I haven't been following closely.  But the below again looks reasonable.
Week
Date
Team
HomeAway
Opponent
ProbabilityWin
PredictedTeamWin
17
12/31/2017
49ers
@
Rams
0.416
0
17
12/31/2017
Bears
@
Vikings
0.233
0
17
12/31/2017
Bengals
@
Ravens
0.375
0
17
12/31/2017
Bills
@
Dolphins
0.652
1
17
12/31/2017
Broncos
Chiefs
0.129
0
17
12/31/2017
Browns
@
Steelers
0.072
0
17
12/31/2017
Buccaneers
Saints
0.089
0
17
12/31/2017
Cardinals
@
Seahawks
0.259
0
17
12/31/2017
Chargers
Raiders
0.817
1
17
12/31/2017
Chiefs
@
Broncos
0.871
1
17
12/31/2017
Colts
Texans
0.473
0
17
12/31/2017
Cowboys
@
Eagles
0.338
0
17
12/31/2017
Dolphins
Bills
0.348
0
17
12/31/2017
Eagles
Cowboys
0.662
1
17
12/31/2017
Falcons
Panthers
0.419
0
17
12/31/2017
Giants
Redskins
0.093
0
17
12/31/2017
Jaguars
@
Titans
0.591
1
17
12/31/2017
Jets
@
Patriots
0.101
0
17
12/31/2017
Lions
Packers
0.737
1
17
12/31/2017
Packers
@
Lions
0.263
0
17
12/31/2017
Panthers
@
Falcons
0.581
1
17
12/31/2017
Patriots
Jets
0.899
1
17
12/31/2017
Raiders
@
Chargers
0.183
0
17
12/31/2017
Rams
49ers
0.584
1
17
12/31/2017
Ravens
Bengals
0.625
1
17
12/31/2017
Redskins
@
Giants
0.907
1
17
12/31/2017
Saints
@
Buccaneers
0.911
1
17
12/31/2017
Seahawks
Cardinals
0.741
1
17
12/31/2017
Steelers
Browns
0.928
1
17
12/31/2017
Texans
@
Colts
0.527
1
17
12/31/2017
Titans
Jaguars
0.409
0
17
12/31/2017
Vikings
Bears
0.767
1
Week 17 Results:
  • Estimated:  11.8 (so expected 12 correct, 4 incorrect)
  • Actual: 7-9
  • Comments
    • I really need to look at what happens when a team has secured a playoff spot and has nothing to gain by winning their last game.  It makes sense for a team in this position to not make an effort to win the last game, to rest their players, avoid injuries, and prepare for the playoffs.  Although I haven't watched the replays, I suspect this can help explain why the Eagles, Jaguars, Panthers, Rams, and Saints all lost.  Maybe next year I can adjust my model to account for this factor.
    • Falcons won and the Seahawks lost, so the Falcons made the playoffs after all.  The missed field goal at the end of the game is, in a way, a summary of the Seahawks entire season: at times, greatness, but too often, a miss.
    • In the AFC, the Ravens are out and the Titans are in the playoffs.  Bills made it in over the Chargers.  Lots of tie breaking rules at play here.  Four teams tied at 9-7: Titans, Ravens, Chargers, and Bills.  See here for how this tie breaking worked out.
    • In the NFC, all of the teams made it that I thought would, although the Rams are seeded at 3 and the Saints are seeded at 4.  Panthers, Saints, and Rams tied at 11-5.  The Rams beat the Saints earlier in the season so they go ahead of the Saints, and the Saints beat the Panthers, so they go ahead of the Panthers.
    • Playoff impact?
      • Titans vs. Chiefs - my model picks the Titans to win and the Chief's to lose.  Titan's will lose to Patriots, so that doesn't change.
      • Rams vs. Falcons-Rams beat Falcons and then play Vikings but lose to Vikings, so the Vikings continue.
      • Panthers vs. Saints - my model predicts a Panther's victory.  Panthers play the Eagles and lose, so the Eagles go on.
      • So while things are initially different, the ultimate outcome is not changed (based on my model from two weeks ago, which should probably be updated).

Playoffs and Superbowl


So what if the above is mostly correct?  Who will get into the playoffs?  My current predictions suggest the following by seed:
  • AFC - Patriots (1), Steelers (2),  Jaguars (3), Chiefs (4), Ravens (5), Bills (6)
    • Patriots beat the Steelers, so in an expected tie in record, Patriots get the first seed.
    • The Chiefs beat the Chargers twice, so in the case of a tie, they get the division. 
    • Ravens will likely go 10-6, so would have the top wildcard spot.
    • The Bills lost to the Chargers, so if they tie, the Chargers would go in ahead of them.  Neither of them has played the Titans. Bills have a slightly higher probability of winning, so I am giving them the 6th seed, although it could be the Chargers or Titans.
  • NFC - Eagles (1), Vikings (2), Saints (3), Rams (4), Panthers (5), Falcons (6)
    • Eagles will likely go 14-2 and get the top spot, with Vikings second.
    • Saints beat the Panthers twice and I expect will go 12-4 so I gave them the third seed.  I also expect the Panthers to go 12-4, but Saints would win in the tie break.
    • Rams should win the West and get the 4 seed.
    • Panthers then get the highest wildcard (5), and the Falcons get (6).
Sadly (at least for me), it looks like the Seahawks are out of the playoffs. But clearly, a lot can still change...
 
Team
Remaining Predicted Wins
Remaining Predicted Losses
Previous Wins
Previous Losses
Total Wins
Total Losses
Round Wins
Round Losses
Conference
Division
Rank Conference
Patriots
1.528
0.472
11
3
12.528
3.472
13
3
AFC
East
1
Steelers
1.876
0.124
11
3
12.876
3.124
13
3
AFC
North
2
Jaguars
1.185
0.815
10
4
11.185
4.815
11
5
AFC
South
3
Ravens
1.492
0.508
8
6
9.492
6.508
9
7
AFC
North
4
Chiefs
1.499
0.501
8
6
9.499
6.501
9
7
AFC
West
5
Bills
1.023
0.977
8
6
9.023
6.977
9
7
AFC
East
6
Titans
0.828
1.172
8
6
8.828
7.172
9
7
AFC
South
7
Chargers
1.591
0.409
7
7
8.591
7.409
9
7
AFC
West
8
Dolphins
0.72
1.28
6
8
6.72
9.28
7
9
AFC
East
9
Raiders
0.37
1.63
6
8
6.37
9.63
6
10
AFC
West
10
Bengals
0.67
1.33
5
9
5.67
10.33
6
10
AFC
North
11
Broncos
0.386
1.614
5
9
5.386
10.614
5
11
AFC
West
12
Jets
0.327
1.673
5
9
5.327
10.673
5
11
AFC
East
13
Texans
0.579
1.421
4
10
4.579
11.421
5
11
AFC
South
14
Colts
0.606
1.394
3
11
3.606
12.394
4
12
AFC
South
15
Browns
0.345
1.655
0
14
0.345
15.655
0
16
AFC
North
16
Eagles
1.475
0.525
12
2
13.475
2.525
13
3
NFC
East
1
Vikings
1.556
0.444
11
3
12.556
3.444
13
3
NFC
North
2
Saints
1.464
0.536
10
4
11.464
4.536
11
5
NFC
South
3
Panthers
1.506
0.494
10
4
11.506
4.494
12
4
NFC
South
4
Rams
1.165
0.835
10
4
11.165
4.835
11
5
NFC
West
5
Falcons
0.866
1.134
9
5
9.866
6.134
10
6
NFC
South
6
Seahawks
1.342
0.658
8
6
9.342
6.658
9
7
NFC
West
7
Lions
1.442
0.558
8
6
9.442
6.558
9
7
NFC
North
8
Cowboys
0.737
1.263
8
6
8.737
7.263
9
7
NFC
East
9
Packers
0.474
1.526
7
7
7.474
8.526
7
9
NFC
North
10
Redskins
1.65
0.35
6
8
7.65
8.35
8
8
NFC
East
11
Cardinals
1.125
0.875
6
8
7.125
8.875
7
9
NFC
West
12
Bears
0.96
1.04
4
10
4.96
11.04
5
11
NFC
North
13
49ers
0.822
1.178
4
10
4.822
11.178
5
11
NFC
West
14
Buccaneers
0.164
1.836
4
10
4.164
11.836
4
12
NFC
South
15
Giants
0.227
1.773
2
12
2.227
13.773
2
14
NFC
East
16

Suppose that the above playoff spot predictions are true.  Then what happens?  Using head to head predictions based on the season so far:
Wildcard:
Jaguars (3) vs. Bills (6) -> Jaguars (3) (0.569)
Chiefs (4) vs. Ravens (5) -> Ravens (5) (0.529)
Saints (3) vs. Falcons (6) -> Saints (3) (0.55)
Rams (4) vs. Panthers (5) -> Rams (4) (0.501)
Division:
Patriots (1) vs. Ravens (5) -> Patriots (1) (0.71)
Steelers (2) vs. Jaguars (3) -> Jaguars (3) (0.523)
Eagles (1) vs. Rams (4) -> Eagles (1) (0.501)
Vikings (2) vs. Saints (3) -> Vikings (2) (0.514)
Conference:
Patriots (1) vs. Jaguars (3) -> Jaguars (3) (0.542)
Eagles (1) vs. Vikings (2) -> Vikings (2) (0.501)
Superbowl:
Vikings (2) vs. Jaguars (3) -> Jaguars (3) (0.509)

Shocking!  The Jaguars will beat the Steelers, then the Patriots, before beating the Vikings in the Superbowl!  I should note that I originally had the Eagles going all the way and winning the Superbowl, but after factoring in injuries into my model, this was enough to push the Eagles out.  It's too bad Carson Wentz is injured, otherwise I would have the Eagles going all the way.

Best Team of 2017

If each team played every other team, who would come out on top?  If measured by total probability of wins, the Patriots, Vikings, and Steelers take 1, 2, and 3.  The Jaguars come in 4th.  The Eagles are down in 9th (before injuries, they were at 3).  The Giants, Texans, and Buccaneers, take 32,31, and 30.  If measured by total games won, then the Jaguars, Vikings, and Eagles take 1, 2, and 3, while the Giants, Colts, and Buccaneers take 32,31, and 30. The Patriots drop to number 7.
What does this mean?  In short, it means that the Patriots have the highest total probability of winning each game when they are predicted to win.  However, they are predicted to win fewer times than other teams.  As such, they are less consistent, prone to have a high probability of winning one week but then losing the next week by a narrow probability.  Meanwhile, for example, the Vikings are the only team to appear in both top three lists, meaning that they are expected to win with a fairly high probability when they are predicted to win AND predicted to win every game.  However, their total probabilities are lower than the Patriots, meaning that they are predicted to have closer victories than the Patriots, but more victories in total.

I should note that before factoring in injuries, the Eagles were my top choice.  But after injuries and the loss of their QB, their ranking has significantly fallen.
The Vikings appear in both lists at 2.  Jaguars are at 4 and 1.  Patriots are at 1 and 9.  Eagles are at 9 and 3.  Steelers are at 3 and 6.  Any of these teams has a claim at being considered the best team of the season.  Originally, I would have said the Eagles, as I had predicted them to win the Superbowl and they appeared in both top 3 lists.  After factoring in injuries, total probability of winning, and total games predicted to win, I am inclined to pick the Jaguars with the Vikings barely behind.  If the Jaguars can beat the Patriots, Steelers, and then beat the Vikings in the Superbowl as I have predicted, they will without a doubt be the best team of the year.  Thus, I conclude that the Jaguars are (or will be) the best team of the year.
Total Probability:
  

Rank
Team
Expected Value
1
Patriots
23.376
2
Vikings
22.234
3
Steelers
22.124
4
Jaguars
21.967
5
Panthers
21.914
6
Rams
21.623
7
Saints
21.236
8
Falcons
21.046
9
Eagles
20.896
10
Chargers
20.286
11
Lions
19.844
12
Seahawks
19.799
13
Bills
19.246
14
Titans
19.121
15
49ers
18.623
16
Ravens
18.326
17
Cowboys
17.831
18
Chiefs
17.524
19
Dolphins
15.069
20
Bengals
15.046
21
Packers
14.251
22
Redskins
14.011
23
Cardinals
13.375
24
Bears
13.13
25
Raiders
11.364
26
Jets
10.437
27
Broncos
8.201
28
Browns
6.772
29
Colts
6.725
30
Buccaneers
6.511
31
Texans
6.315
32
Giants
3.777

Total Wins:

Rank
Team
Total Wins
1
Jaguars
32
2
Vikings
31
3
Eagles
30
4
Rams
29
5
Panthers
28
6
Steelers
27
7
Patriots
26
8
Saints
25
9
Falcons
24
10
Chargers
23
11
Lions
22
12
Seahawks
21
13
Bills
20
14
Titans
19
15
49ers
18
16
Ravens
17
17
Cowboys
16
18
Chiefs
15
19
Dolphins
14
20
Bengals
13
21
Packers
12
22
Redskins
11
23
Cardinals
10
24
Bears
9
25
Raiders
8
26
Jets
7
27
Broncos
6
28
Texans
5
29
Browns
4
30
Buccaneers
3
31
Colts
2
32
Giants
1
Averages:
Team
AverageRank
AverageWins
Jaguars
2.5
26.9835
Vikings
2
26.617
Eagles
6
25.448
Rams
5
25.3115
Panthers
5
24.957
Patriots
4
24.688
Steelers
4.5
24.562
Saints
7.5
23.118
Falcons
8.5
22.523
Chargers
10
21.643
Lions
11
20.922
Seahawks
12
20.3995
Bills
13
19.623
Titans
14
19.0605
49ers
15
18.3115
Ravens
16
17.663
Cowboys
17
16.9155
Chiefs
18
16.262
Dolphins
19
14.5345
Bengals
20
14.023
Packers
21
13.1255
Redskins
22
12.5055
Cardinals
23
11.6875
Bears
24
11.065
Raiders
25
9.682
Jets
26
8.7185
Broncos
27
7.1005
Texans
29.5
5.6575
Browns
28.5
5.386
Buccaneers
30
4.7555
Colts
30
4.3625
Giants
32
2.3885

Conclusion

So there you have it.  If you haven't been paying attention all season, now having read this post, you can speak somewhat accurately about which teams are good, bad, and likely to make or do well in the playoffs.  Just make sure no one asks you why...
Good luck!