Philosophical Analytics: December 2017

Wednesday, December 27, 2017

Data Science and Philosophy of Science: What Makes a Model Good?

Introduction

In a previous article, I discussed philosophical views on the nature of scientific theories, and applied these discussions to data science models. I concluded that data science models, the terms they invoke and the relationships they postulate, ought to be considered to correspond to reality in some way. That is, a model's terms do in fact represent something real in the world (although this may be an abbreviation, summary, or approximation of potentially many real entities). Similarly, a model's prescribed relationship does represent something real in the world (e.g., a causal relationship amongst the terms in the model, or amongst hidden terms that make up the terms in the model, or....). While such correspondence may only be approximate and fall far short of 100% perfection and predictive accuracy, nevertheless, it is not merely useful. It does approximate the truth, or attach to reality, in some albeit imperfect way.

Whether or not you agree, let's move on to another question in the philosophy of science that does not necessarily depend on how you answered the realist/anti-realist debate: what makes a good scientific model? How does this apply to data science models? Let's explore some ideas and then summarize at the end.

The Problem of Induction

Induction is the formation of generalizations or laws on the basis of past experience. We believe that future occurrences will behave like past occurrences, and so on the basis of past occurrences, we can predict future occurrences. For example, based on past experiences, we believe that we know (and have mathematically formulated a law) such that when billiard ball A hits billiard ball B in a certain way with a certain force in conditions X, Y, Z, etc., then ball A will go in this direction at this speed and ball B will go in that direction at that speed.

However, we have no guarantee that the past will be like the future in most cases, as there are not typically necessary relationships between the objects we are interested in. It is conceivable, because it is not a matter of logical necessity, that ball A will spontaneously combust, or turn into a carrot, when it hits ball B. Such a thing has never occurred before, but that doesn't mean it cannot happen. Such thoughts have caused some people (most famously David Hume) to be skeptical about our ability to acquire knowledge through induction.

And yet, this is precisely what we do in the sciences. Even in the absence of logical necessity, we believe that we know what will happen to ball A and ball B in these circumstances, and we can reliably predict what does in fact happen with a very small margin of error. We even go so far as to form a law, a matter of physical necessity, to explain this relationship.

But what do we do in the face of competing "laws" that both explain the data we have? Which theory do we go with and use for future research and development of theory? This is the problem of induction. How can we justify inductive inferences? That is, how can we make universal or natural law claims based on experience, when so many alternative claims could be postulated?

Falsifiable

Enter Karl Popper. His goal is to answer the problem of induction and to distinguish true scientific theories from pseudo-scientific theories. He observes that it is really easy to formulate a theory that explains the known data, since it is done so using that data (hindsight is 20-20). While this theory may be correct, one can think of many alternative theories that also explain the data. How can one tell which theory to accept?

Popper answers that each theory must make so called risky predictions, that is predictions which one should expect to be false unless the theory is right. A theory that is not refutable is merely pseudo-scientific. Once we have excluded the pseudo-scientific theories and we have competing scientific theories, we can test them on the basis of what each predicts, focusing in particular on where they would disagree in a prediction. That is, each theory must propose hypotheses that are then empirically tested after the theory has been formulated.

Conclusions are deduced from the theory, and these are then compared against each other to make sure that the theory is internally consistent, externally consistent with other unfalsified theories, and that when it makes a prediction, that prediction is correct. When a theory fails to predict accurately or is discovered to be inconsistent, it is falsified. If it is not inconsistent and does accurately predict, it is acceptable for use (although it may be falsified in the future). In this, Popper proposes a deductive style method of testing. We deduce in a manner similar to this: if theory A is true, then X must occur. X did not occur. Therefore, A is falsified.

In short, "Science in his view is a deductive process in which scientists formulate hypotheses and theories that they test by deriving particular observable consequences. Theories are not confirmed or verified. They may be falsified and rejected or tentatively accepted if corroborated in the absence of falsification by the proper kinds of tests" (Stanford Encyclopedia). Theories are true so far if they are successful in making predictions and surviving falsification. Theories are judged by the deductive consequences of the hypotheses they make.

So a virtue of a theory is its ability to be falsified. Theories that make stronger claims are more falsifiable because the predictions they make are bolder, and typically, more informative.

While this is all well and good, we still have a problem: we can have two theories that are both unfalsified and that make different predictions. Which should we use until those predictions can be tested? To answer, let's look at some other virtues that make a model good.

Elegance and Parsimony

A theory that is more simple is to be preferred over a more complex theory, all else being equal. Simplicity can refer to both syntactic simplicity (the number of complexity of hypotheses in a theory; it is elegant) and to ontological simplicity (the number and kinds of entities postulated by the theory; it is parsimonious) (Stanford Encyclopedia). Most well known, Occam's razor asserts that “entities must not be multiplied beyond necessity."

So why should we prefer more elegant and parsimonious theories? That is, when faced with a choice between two theories that both explain the data equally well, why choose one over the other on the grounds that one is more simple? To answer, let us consider the field of epistemology, that is, the study of knowledge. Knowledge is said to consist in having a justified and true belief. When faced with competing theories, we are asking ourselves which theory we ought to believe to be true, so our focus is on the justification for each theory. Now we have already said that each theory is consistent with the data, so what other grounds do we have for believing one theory to more likely be true than another? Which is more justified?

The simpler theory is more likely to be true because of probability. Each entity in a theory has a probability of existing or having a certain relationship with the other entities. So the more we multiply the entities and relationships, the more we multiply probabilities, which always being less than 1, lowers the overall probability. For example, suppose you have a theory with 2 entities postulated versus 3 entities. If each entity has a probability of existing/having a certain relationship of 0.75, then the former theory has a probability of (0.75)^2 = 0.56 versus the latter theory of (0.75)^3 = 0.42 of being true. Probabilistically speaking, you ought to prefer the former theory because it is more likely to be true, and since the theories are otherwise indistinguishable, you have no other reason to prefer the latter theory.

Or returning to epistemology and the notion of justification, you have no reason for choosing a more complex theory over a more simple theory when both are equally explanatory of the data. Suppose for example that you return home and find that your house has been robbed. What would you conclude? You know that at least one person must have robbed your house. But are you justified in believing that two people robbed your house? What about an alien from outer space that came and robbed your house? If you have no reason to believe that more than one person robbed your house (or that an alien robbed your house), then it seems you are not justified in believing so. Instead, you must hold the theory that only a single robber broke into your house. This in spite of the fact that two robbers did really break into your house (unknown to you). That is, you must hold the most simple theory that explains the data to be true in order for that belief to be justified.

Granted, judgements about which theories are more simple, elegant, and parsimonious can be subjective to a degree. We may have disagreements in certain cases. However, we all intuitively have some understanding about what we are talking about and can agree on many cases that one theory is simpler than another.

Predictive and "Accurate"

These last three virtues are mentioned in the discussion on falsifiability, but deserve more attention in their own right. The first is that a model must be predictive. This is related to being falsifiable, in that a falsifiable theory makes predictions that can be proven to be false. But we are interested in theories that not only make predictions, but that make accurate predictions. In Popper's terms, we want theories that are strongly falsifiable and have failed to be falsified. These are our best theories and we have made lots of relatively accurate predictions based on conclusions derived from their claims. Consequently, they are extremely useful in advancing our understanding of the world and our interaction with it, according to our aims and purposes.

Coherence

A theory in order to not be falsified must be internally and externally consistent. We can think about this in terms of coherence. First, the theory must be internally coherent: any claim that the theory makes must not contradict any other claims by the theory. Such contradictions can be logical, or less strongly, physical. Even better is the case when the claims are supportive of each other (without being simply alternative ways of saying the same thing). Second, the theory must be externally coherent: it must not contradict (unless it is challenging the existing paradigm) any of the best scientific theories.

Informative and Explanatory

While there are perhaps other virtues that could be considered, let us consider a final one here. We do not want theories that are merely predictive and accurate. We want to understand why. Thus, we expect a good scientific theory to be informative, to explain why things are the way they are in the world. It will postulate the causal mechanisms that explain why something happens the way the theory accurately predicts. It will provide direction for new avenues of research in light of those causal explanations. In short, we do NOT want a black box, no matter how accurate that black box may be.

Data Science and Model Virtues

So how can the above be applied to data science models?

Falsifiable

A data science model must be falsifiable. It must make predictions (i.e., hypotheses) that are capable of being false, and are tested accordingly. This is why separating one's data into a training set, test set, (and verification set) is so important: it keeps one's model falsifiable. When one builds a model on all of the data, one can have an extremely accurate model when only looking at the data at hand. However, one is in danger of overfitting the model. One is modeling aberrations, errors, outliers, or biases in the sample data, and consequently, the model will not generalize to future data. It has NOT captured the real relationships underlying the data. Using a hold out test set can keep your model honest, and make sure that your model will generalize to data that it has not seen before.

Furthermore, doing so prevents you from refitting the model with each new addition of data. If one were to receive data on a daily basis, and on that basis, retrained the model, and if that model significantly changed each day, how confident would you be that your model was going to predict well? If it would predict something today and something different tomorrow, then your model is not stable and it is not going to make accurate predictions. It is no longer useful. It is as though your model is changing its mind every day, changing with the wind, and never subjected to critical scrutiny because it is always explaining the latest data without being held accountable for the inaccurate predictions it is making. This would be a pseudo-scientific model.

Elegance and Parsimony

A data science model must be as simple as possible or as is necessary, according to one's purposes. Why? Again, it is more likely to be "true" in the sense that one is more likely to have captured that actual relationships among the independent variables and their relationship to the dependent variable. But there is a challenge here, because more simple data science models tend to not be as accurate or predictive, and this can be due to excluding variables that are predictive, even to a small degree. So we don't want a model to be too simple, and yet, we don't want it to be too complex either given overfitting. We want to have a model that is as simple as possible without sacrificing accuracy and one that generalizes well when tested.

Predictive and "Accurate"

A data science model must be predictive and accurate. This is the whole point! We want to accurately predict unknown values. If a model doesn't do this, it doesn't matter if it is elegant or falsifiable. It isn't true. It does not accurately model reality. Your model must generalize to new data.

Coherence

A data science model must be coherent. I suppose one could have a model that contains a variable that is nearly the opposite of a different variable in the model, and the model could use them both. While possible, I am not sure that both variables would survive even minimal feature selection. Nevertheless, if your data science model is incoherent in some way, correct it, or look into why your model is paradoxical in this way.

Informative and Explanatory

Does you data science model explain, inform, or illuminate the relationships among the variables you are using to predict? That is, when you look at the coefficients for your linear model, or the branches in your decision tree, do you understand or get an "aha"? A good model will help us understand what is really going on. This is especially important when one wants to know what action to take. Is it better to add square footage or add a new roof if one is trying to improve the resale value of a home? A good model should be able to tell us and quantify an answer. This is where avoiding overfitting is so important, because an overfitted model will not have stable or reliable relationships among its variables, and so these cannot be relied on for informing decisions.

There is a downside though, in that some types of modeling are extremely accurate (i.e., neural networks) but are very difficult to interpret. This is not always a problem if one does not need to understand why the model predicted in the way it did. If the outcome is all that matters, then interpretability is not as important.

Conclusion

Unfortunately, there are no hard and fast rules here for guidance on how to create good models, which is why model development in data science can be likened to an art or skill. But with practice, one can develop this skill. Consider these high level principles when creating your model. Reference them and work to make sure that either your models have these virtues or that you have really good reasons for lacking them. If you do so, you will have a good model.

Wednesday, December 20, 2017

2017 NFL Game Predictions: Rest of Season, Playoffs, Best Team

Welcome! I intend this to be an ongoing project of predicting NFL game outcomes, point spreads, and final scores. My hope is that my models will improve over time as they become more sophisticated and use better data. I will try to regularly publish predictions to keep myself accountable, and will make observations along the way about what is working and what isn't. See below for latest updates. Enjoy!

_________________________________________________________________________________

Welcome! In years past I have predicted NFL scores and outcomes through the season and into the playoffs and Superbowl. With work, school, and family life, this was not possible this year. However, I managed to scrap together a little time to run the code to see what the rest of the season looked like. With a few minor changes to the code, I was able to successfully run predictions for week 16, 17, the playoffs, Superbowl, and a "best" team prediction. Given that I may not have time to revisit this, I lay out the predictions for the rest of the season below.

Week 16:

Not having followed much of the NFL this year, I am not sure I can really comment on the below games. But from what little I know (a quick sanity check using win/loss comparisons), the below outcome predictions seem reasonable, although some of the probabilities strike me as a little strange.

Week	Date	Team	HomeAway	Opponent	ProbabilityWin	PredictedTeamWin
16	12/23/2017	Colts	@	Ravens	0.133	0
16	12/23/2017	Packers		Vikings	0.211	0
16	12/23/2017	Ravens		Colts	0.867	1
16	12/23/2017	Vikings	@	Packers	0.789	1
16	12/24/2017	49ers		Jaguars	0.406	0
16	12/24/2017	Bears		Browns	0.727	1
16	12/24/2017	Bengals		Lions	0.295	0
16	12/24/2017	Bills	@	Patriots	0.371	0
16	12/24/2017	Broncos	@	Redskins	0.257	0
16	12/24/2017	Browns	@	Bears	0.273	0
16	12/24/2017	Buccaneers	@	Panthers	0.075	0
16	12/24/2017	Cardinals		Giants	0.866	1
16	12/24/2017	Chargers	@	Jets	0.774	1
16	12/24/2017	Chiefs		Dolphins	0.628	1
16	12/24/2017	Cowboys		Seahawks	0.399	0
16	12/24/2017	Dolphins	@	Chiefs	0.372	0
16	12/24/2017	Falcons	@	Saints	0.447	0
16	12/24/2017	Giants	@	Cardinals	0.134	0
16	12/24/2017	Jaguars	@	49ers	0.594	1
16	12/24/2017	Jets		Chargers	0.226	0
16	12/24/2017	Lions	@	Bengals	0.705	1
16	12/24/2017	Panthers		Buccaneers	0.925	1
16	12/24/2017	Patriots		Bills	0.629	1
16	12/24/2017	Rams	@	Titans	0.581	1
16	12/24/2017	Redskins		Broncos	0.743	1
16	12/24/2017	Saints		Falcons	0.553	1
16	12/24/2017	Seahawks	@	Cowboys	0.601	1
16	12/24/2017	Titans		Rams	0.419	0
16	12/25/2017	Eagles		Raiders	0.813	1
16	12/25/2017	Raiders	@	Eagles	0.187	0
16	12/25/2017	Steelers	@	Texans	0.948	1
16	12/25/2017	Texans		Steelers	0.052	0

Week 16 Results:

Estimated: 11.743 (12) correct, so 4 incorrect
Actual: 14 - 2

Jaguars lost to the 49ers
Bengals beat the Lions

Comments

Seahawks may actually make it! I have the Falcon's losing in week 17 and the Seahawks winning. If both happen, Seahawks make the wildcard spot!
I have the Titans losing to the Jaguars, so they would go 8-8. Chargers will beat the Raiders, so they will go 9-7. Bills will beat the Dolphins so they will go 9-7. Since the Chargers beat the Bills earlier this season, I'd now pick the Chargers to get into the playoffs over the Bills.
I'm a little shaken in my confidence in the Jaguars to be the team to win the Superbowl. The 49ers are looking really good though and are a different team with Jimmy Garoppolo, having won 4 in a row since he became the starter. So perhaps it is not the end of the world for my audacious predictions.

Week 17:

What about week 17? Again, hard to comment since I haven't been following closely. But the below again looks reasonable.

Week	Date	Team	HomeAway	Opponent	ProbabilityWin	PredictedTeamWin
17	12/31/2017	49ers	@	Rams	0.416	0
17	12/31/2017	Bears	@	Vikings	0.233	0
17	12/31/2017	Bengals	@	Ravens	0.375	0
17	12/31/2017	Bills	@	Dolphins	0.652	1
17	12/31/2017	Broncos		Chiefs	0.129	0
17	12/31/2017	Browns	@	Steelers	0.072	0
17	12/31/2017	Buccaneers		Saints	0.089	0
17	12/31/2017	Cardinals	@	Seahawks	0.259	0
17	12/31/2017	Chargers		Raiders	0.817	1
17	12/31/2017	Chiefs	@	Broncos	0.871	1
17	12/31/2017	Colts		Texans	0.473	0
17	12/31/2017	Cowboys	@	Eagles	0.338	0
17	12/31/2017	Dolphins		Bills	0.348	0
17	12/31/2017	Eagles		Cowboys	0.662	1
17	12/31/2017	Falcons		Panthers	0.419	0
17	12/31/2017	Giants		Redskins	0.093	0
17	12/31/2017	Jaguars	@	Titans	0.591	1
17	12/31/2017	Jets	@	Patriots	0.101	0
17	12/31/2017	Lions		Packers	0.737	1
17	12/31/2017	Packers	@	Lions	0.263	0
17	12/31/2017	Panthers	@	Falcons	0.581	1
17	12/31/2017	Patriots		Jets	0.899	1
17	12/31/2017	Raiders	@	Chargers	0.183	0
17	12/31/2017	Rams		49ers	0.584	1
17	12/31/2017	Ravens		Bengals	0.625	1
17	12/31/2017	Redskins	@	Giants	0.907	1
17	12/31/2017	Saints	@	Buccaneers	0.911	1
17	12/31/2017	Seahawks		Cardinals	0.741	1
17	12/31/2017	Steelers		Browns	0.928	1
17	12/31/2017	Texans	@	Colts	0.527	1
17	12/31/2017	Titans		Jaguars	0.409	0
17	12/31/2017	Vikings		Bears	0.767	1

Week 17 Results:

Estimated: 11.8 (so expected 12 correct, 4 incorrect)
Actual: 7-9
Comments

I really need to look at what happens when a team has secured a playoff spot and has nothing to gain by winning their last game. It makes sense for a team in this position to not make an effort to win the last game, to rest their players, avoid injuries, and prepare for the playoffs. Although I haven't watched the replays, I suspect this can help explain why the Eagles, Jaguars, Panthers, Rams, and Saints all lost. Maybe next year I can adjust my model to account for this factor.
Falcons won and the Seahawks lost, so the Falcons made the playoffs after all. The missed field goal at the end of the game is, in a way, a summary of the Seahawks entire season: at times, greatness, but too often, a miss.
In the AFC, the Ravens are out and the Titans are in the playoffs. Bills made it in over the Chargers. Lots of tie breaking rules at play here. Four teams tied at 9-7: Titans, Ravens, Chargers, and Bills. See here for how this tie breaking worked out.
In the NFC, all of the teams made it that I thought would, although the Rams are seeded at 3 and the Saints are seeded at 4. Panthers, Saints, and Rams tied at 11-5. The Rams beat the Saints earlier in the season so they go ahead of the Saints, and the Saints beat the Panthers, so they go ahead of the Panthers.
Playoff impact?

Titans vs. Chiefs - my model picks the Titans to win and the Chief's to lose. Titan's will lose to Patriots, so that doesn't change.
Rams vs. Falcons-Rams beat Falcons and then play Vikings but lose to Vikings, so the Vikings continue.
Panthers vs. Saints - my model predicts a Panther's victory. Panthers play the Eagles and lose, so the Eagles go on.
So while things are initially different, the ultimate outcome is not changed (based on my model from two weeks ago, which should probably be updated).

Playoffs and Superbowl

So what if the above is mostly correct? Who will get into the playoffs? My current predictions suggest the following by seed:

AFC - Patriots (1), Steelers (2), Jaguars (3), Chiefs (4), Ravens (5), Bills (6)

Patriots beat the Steelers, so in an expected tie in record, Patriots get the first seed.
The Chiefs beat the Chargers twice, so in the case of a tie, they get the division.
Ravens will likely go 10-6, so would have the top wildcard spot.
The Bills lost to the Chargers, so if they tie, the Chargers would go in ahead of them. Neither of them has played the Titans. Bills have a slightly higher probability of winning, so I am giving them the 6th seed, although it could be the Chargers or Titans.

NFC - Eagles (1), Vikings (2), Saints (3), Rams (4), Panthers (5), Falcons (6)

Eagles will likely go 14-2 and get the top spot, with Vikings second.
Saints beat the Panthers twice and I expect will go 12-4 so I gave them the third seed. I also expect the Panthers to go 12-4, but Saints would win in the tie break.
Rams should win the West and get the 4 seed.
Panthers then get the highest wildcard (5), and the Falcons get (6).

Sadly (at least for me), it looks like the Seahawks are out of the playoffs. But clearly, a lot can still change...

Team	Remaining Predicted Wins	Remaining Predicted Losses	Previous Wins	Previous Losses	Total Wins	Total Losses	Round Wins	Round Losses	Conference	Division	Rank Conference
Patriots	1.528	0.472	11	3	12.528	3.472	13	3	AFC	East	1
Steelers	1.876	0.124	11	3	12.876	3.124	13	3	AFC	North	2
Jaguars	1.185	0.815	10	4	11.185	4.815	11	5	AFC	South	3
Ravens	1.492	0.508	8	6	9.492	6.508	9	7	AFC	North	4
Chiefs	1.499	0.501	8	6	9.499	6.501	9	7	AFC	West	5
Bills	1.023	0.977	8	6	9.023	6.977	9	7	AFC	East	6
Titans	0.828	1.172	8	6	8.828	7.172	9	7	AFC	South	7
Chargers	1.591	0.409	7	7	8.591	7.409	9	7	AFC	West	8
Dolphins	0.72	1.28	6	8	6.72	9.28	7	9	AFC	East	9
Raiders	0.37	1.63	6	8	6.37	9.63	6	10	AFC	West	10
Bengals	0.67	1.33	5	9	5.67	10.33	6	10	AFC	North	11
Broncos	0.386	1.614	5	9	5.386	10.614	5	11	AFC	West	12
Jets	0.327	1.673	5	9	5.327	10.673	5	11	AFC	East	13
Texans	0.579	1.421	4	10	4.579	11.421	5	11	AFC	South	14
Colts	0.606	1.394	3	11	3.606	12.394	4	12	AFC	South	15
Browns	0.345	1.655	0	14	0.345	15.655	0	16	AFC	North	16
Eagles	1.475	0.525	12	2	13.475	2.525	13	3	NFC	East	1
Vikings	1.556	0.444	11	3	12.556	3.444	13	3	NFC	North	2
Saints	1.464	0.536	10	4	11.464	4.536	11	5	NFC	South	3
Panthers	1.506	0.494	10	4	11.506	4.494	12	4	NFC	South	4
Rams	1.165	0.835	10	4	11.165	4.835	11	5	NFC	West	5
Falcons	0.866	1.134	9	5	9.866	6.134	10	6	NFC	South	6
Seahawks	1.342	0.658	8	6	9.342	6.658	9	7	NFC	West	7
Lions	1.442	0.558	8	6	9.442	6.558	9	7	NFC	North	8
Cowboys	0.737	1.263	8	6	8.737	7.263	9	7	NFC	East	9
Packers	0.474	1.526	7	7	7.474	8.526	7	9	NFC	North	10
Redskins	1.65	0.35	6	8	7.65	8.35	8	8	NFC	East	11
Cardinals	1.125	0.875	6	8	7.125	8.875	7	9	NFC	West	12
Bears	0.96	1.04	4	10	4.96	11.04	5	11	NFC	North	13
49ers	0.822	1.178	4	10	4.822	11.178	5	11	NFC	West	14
Buccaneers	0.164	1.836	4	10	4.164	11.836	4	12	NFC	South	15
Giants	0.227	1.773	2	12	2.227	13.773	2	14	NFC	East	16

Suppose that the above playoff spot predictions are true. Then what happens? Using head to head predictions based on the season so far:

Wildcard:

Jaguars (3) vs. Bills (6) -> Jaguars (3) (0.569)

Chiefs (4) vs. Ravens (5) -> Ravens (5) (0.529)

Saints (3) vs. Falcons (6) -> Saints (3) (0.55)
Rams (4) vs. Panthers (5) -> Rams (4) (0.501)

Division:

Patriots (1) vs. Ravens (5) -> Patriots (1) (0.71)
Steelers (2) vs. Jaguars (3) -> Jaguars (3) (0.523)

Eagles (1) vs. Rams (4) -> Eagles (1) (0.501)
Vikings (2) vs. Saints (3) -> Vikings (2) (0.514)

Conference:
Patriots (1) vs. Jaguars (3) -> Jaguars (3) (0.542)
Eagles (1) vs. Vikings (2) -> Vikings (2) (0.501)

Superbowl:

Vikings (2) vs. Jaguars (3) -> Jaguars (3) (0.509)

Shocking! The Jaguars will beat the Steelers, then the Patriots, before beating the Vikings in the Superbowl! I should note that I originally had the Eagles going all the way and winning the Superbowl, but after factoring in injuries into my model, this was enough to push the Eagles out. It's too bad Carson Wentz is injured, otherwise I would have the Eagles going all the way.

Best Team of 2017

If each team played every other team, who would come out on top? If measured by total probability of wins, the Patriots, Vikings, and Steelers take 1, 2, and 3. The Jaguars come in 4th. The Eagles are down in 9th (before injuries, they were at 3). The Giants, Texans, and Buccaneers, take 32,31, and 30. If measured by total games won, then the Jaguars, Vikings, and Eagles take 1, 2, and 3, while the Giants, Colts, and Buccaneers take 32,31, and 30. The Patriots drop to number 7.

What does this mean? In short, it means that the Patriots have the highest total probability of winning each game when they are predicted to win. However, they are predicted to win fewer times than other teams. As such, they are less consistent, prone to have a high probability of winning one week but then losing the next week by a narrow probability. Meanwhile, for example, the Vikings are the only team to appear in both top three lists, meaning that they are expected to win with a fairly high probability when they are predicted to win AND predicted to win every game. However, their total probabilities are lower than the Patriots, meaning that they are predicted to have closer victories than the Patriots, but more victories in total.

I should note that before factoring in injuries, the Eagles were my top choice. But after injuries and the loss of their QB, their ranking has significantly fallen.

The Vikings appear in both lists at 2. Jaguars are at 4 and 1. Patriots are at 1 and 9. Eagles are at 9 and 3. Steelers are at 3 and 6. Any of these teams has a claim at being considered the best team of the season. Originally, I would have said the Eagles, as I had predicted them to win the Superbowl and they appeared in both top 3 lists. After factoring in injuries, total probability of winning, and total games predicted to win, I am inclined to pick the Jaguars with the Vikings barely behind. If the Jaguars can beat the Patriots, Steelers, and then beat the Vikings in the Superbowl as I have predicted, they will without a doubt be the best team of the year. Thus, I conclude that the Jaguars are (or will be) the best team of the year.

Total Probability:

Rank	Team	Expected Value
1	Patriots	23.376
2	Vikings	22.234
3	Steelers	22.124
4	Jaguars	21.967
5	Panthers	21.914
6	Rams	21.623
7	Saints	21.236
8	Falcons	21.046
9	Eagles	20.896
10	Chargers	20.286
11	Lions	19.844
12	Seahawks	19.799
13	Bills	19.246
14	Titans	19.121
15	49ers	18.623
16	Ravens	18.326
17	Cowboys	17.831
18	Chiefs	17.524
19	Dolphins	15.069
20	Bengals	15.046
21	Packers	14.251
22	Redskins	14.011
23	Cardinals	13.375
24	Bears	13.13
25	Raiders	11.364
26	Jets	10.437
27	Broncos	8.201
28	Browns	6.772
29	Colts	6.725
30	Buccaneers	6.511
31	Texans	6.315
32	Giants	3.777

Total Wins:

Rank	Team	Total Wins
1	Jaguars	32
2	Vikings	31
3	Eagles	30
4	Rams	29
5	Panthers	28
6	Steelers	27
7	Patriots	26
8	Saints	25
9	Falcons	24
10	Chargers	23
11	Lions	22
12	Seahawks	21
13	Bills	20
14	Titans	19
15	49ers	18
16	Ravens	17
17	Cowboys	16
18	Chiefs	15
19	Dolphins	14
20	Bengals	13
21	Packers	12
22	Redskins	11
23	Cardinals	10
24	Bears	9
25	Raiders	8
26	Jets	7
27	Broncos	6
28	Texans	5
29	Browns	4
30	Buccaneers	3
31	Colts	2
32	Giants	1

Averages:

Team	AverageRank	AverageWins
Jaguars	2.5	26.9835
Vikings	2	26.617
Eagles	6	25.448
Rams	5	25.3115
Panthers	5	24.957
Patriots	4	24.688
Steelers	4.5	24.562
Saints	7.5	23.118
Falcons	8.5	22.523
Chargers	10	21.643
Lions	11	20.922
Seahawks	12	20.3995
Bills	13	19.623
Titans	14	19.0605
49ers	15	18.3115
Ravens	16	17.663
Cowboys	17	16.9155
Chiefs	18	16.262
Dolphins	19	14.5345
Bengals	20	14.023
Packers	21	13.1255
Redskins	22	12.5055
Cardinals	23	11.6875
Bears	24	11.065
Raiders	25	9.682
Jets	26	8.7185
Broncos	27	7.1005
Texans	29.5	5.6575
Browns	28.5	5.386
Buccaneers	30	4.7555
Colts	30	4.3625
Giants	32	2.3885

Conclusion

So there you have it. If you haven't been paying attention all season, now having read this post, you can speak somewhat accurately about which teams are good, bad, and likely to make or do well in the playoffs. Just make sure no one asks you why...

Good luck!

Philosophical Analytics

Pages

Wednesday, December 27, 2017

Data Science and Philosophy of Science: What Makes a Model Good?

Introduction

The Problem of Induction

Falsifiable

Elegance and Parsimony

Predictive and "Accurate"

Coherence

Informative and Explanatory

Data Science and Model Virtues

Falsifiable

Elegance and Parsimony

Predictive and "Accurate"

Coherence

Informative and Explanatory

Conclusion

Wednesday, December 20, 2017

2017 NFL Game Predictions: Rest of Season, Playoffs, Best Team

Playoffs and Superbowl

Best Team of 2017

Conclusion

Blog Archive

Popular Posts

Labels