Last week, Clay Travis of Fanhouse and Claynation wrote an article about the improbability that 3 NFL players from one university would cause fatalities due to alcohol-impaired driving. When Joel pointed it out here at RTT, a few eyebrows were raised about its implications (and, additionally, mine was raised about the math). So with a little number-crunching of my own and the gracious help of Clay Travis, I took a second look through the figures. After the jump, I'll give my thoughts on the numbers and the results of my math (along with the equations and some explanation). Since the subject matter is such a serious one, however, let me first say a couple few things and hopefully avoid unnecessary misunderstandings.
- I am not trying to diminish the events at hand. Fatality is fatality and tragedy is tragedy.
- I am not trying to defend UT. This is solely a look at the numbers and some thoughts on how they may be interpreted. If there is/was a culture at UT that raised the odds of these things happening, then it would be in my interest to know this and try to help correct it rather than to try to hide it. I live in Knoxville, after all.
- I am not trying to defend these three players.
- However, I am trying to de-mystify the statistical process a little bit. There's a big difference between establishing odds and making conclusions and I hope to discuss this a bit.
- Having run the numbers, the results in Clay's original article are accurate. I'm not going to try to disprove anybody here.
Onward. I'll put the thoughts first and the math second; if you're not one for following stats, you can simply skip the end stuff.
The Significance of the Numbers
The bottom-line number from Clay's article is 0.016% - the probability of one university being the alma mater to 3 football players who caused fatalities through alcohol-impaired driving. Roughly speaking, that's about 1 in 10,000. Because that probability is so low, it sets off alarm bells in our heads when we see it actually happen. Much like winning the lottery, hitting that 1 in 10,000 is seriously beating the odds. But simply calculating those odds is the easy part; the difficulty comes in interpreting the results. This is the point where most people fall short with statistical research; we all like to look at the number we just crunched and run with it. But now we have to figure out its place.
On Causation and Correlation
We've all likely heard the phrase, "correlation is not causation", meaning that just because two things can be linked does not mean one necessarily caused the other to happen. We've also heard that "statistics cannot prove causation", which is a warning that the probabilities may link two events, but it does not tell us how they're related. In short, stats can tell you where to look (correlation), but they're not designed to tell you why the correlation exists (causation). At this point, it is up to the researcher to trace the relationship between the two events and find the why out for themselves. This is the real difficulty in statistics, and this is the step that is widely ignored.
You can actually establish causation through statistics, but only in certain cases and only with an insane amount of effort. Chances are you've never seen any statistical research with enough effort to do this. Generally, you look for correlation with stats and use other research methods to find causation. The stats tell you where to look and the research (hopefully) connects the dots.
Correlation and the 3 Incidents
Certainly, having all three players come from UT stands out. When 0.0567% of the driving population will cause an alcohol-related fatality over 11 years, it's an eye-opener to see that number jump to 1.71% for one school. The problem with the math at this point is that these numbers are so tiny that they cause mathematical problems. For example, if only 1 player came from UT, then the percentage would have been 0.571% rather than 1.71% - still tenfold greater than the national average. We can't even make the jump from zero to one without blowing past the national average, so understanding exactly how deviant that 1.71% mark is can be hard. You can put a gauge as to how significant the 3/175 is, but for the sake of brevity I'll leave that alone. We'll just say that we have something that could warrant a deeper look.
Correlation and Strength of Evidence
The numbers that Clay points out are striking. For our purposes, let's maintain the proposition that they suggest there might be some linkage between UT and an increased likelihood of DUI fatalities. (That is, I'm not going through that math, so we'll keep it.) But how strong is this evidence? Honestly, it's really shaky. Again, there's that problem of scale. For a program to have just one alum in this category renders the program as an outlier because the probabilities are so astronomically low.
But there's something else to consider: the measuring stick we're using here is not well-suited to give this kind of information. The problem is that driving fatalities are a very uncommon event compared to other alcohol-related incidents. If we were to instead look at things like: public intoxication and other alcohol-related arrests; alcohol purchase and consumption rates by players and communities; rates of alcohol-related news stories; and mandatory and voluntary rehab attendance, we'd have data that is scaled much better to the problem at hand. Over the 11 years, the numbers used here suggest about 125,000 drivers guilty of manslaughter while driving impaired. During that same period of time, there are millions of alcohol-related arrests and incidents, countless incidents of alcohol purchases, and who know how many AA (or similar) admittees. Those are numbers that would give a better picture of events because they don't border on the zero-probability limit.
And if you want to make a case against UT, there are definitely things to look at. We all remember Colquitt's 5th incident, for example. Also, UT is located in the heart of moonshining country for much of the 20th century. If you want to look to see if a problem exists/existed at UT, feel free to look. There are mountains of data to examine - data that don't rely on inherently unstable statistics.
Establishing the Link
The statistics, by themselves, are not a link between UT and DUI fatalities. The link is found after the stats tell you that there may be something work investigating. But there are a lot of questions to answer. For example, Stallworth has been away from UT for a long time; what connection between him and UT would increase his odds at this point in time? How has his affiliation with the NFL (and his life after UT) affected him? Enough to alter UT's effect on him? What happened at UT to increase their likelihood of this failure? Did something happen?
How is UT different from other schools? Why would life at UT turn these players into a higher risk later on in life? What could have been done differently, if at all? Did UT do anything that turned out to lower their risk? (That's an interesting question that often gets ignored, but damping factors can be very important.)
My Conclusions to Date
If you haven't guessed already, I'm very leery of attempting to establish UT as a causal factor to the fatalities. Whether the connection exists or not, the numbers are very problematic to deal with and there are much better metrics available. That is not to say that I'm defending UT. Fatalities from DUIs are a serious affair, and no sports fan should ever factor team/school loyalty into the equation when dealing with this issue. The goal is to have zero fatalities and we should be excited whenever we can find ways to cut down the odds. At the same time, it's not healthy to go on a witch hunt and toss around accusations until we feel better. (That is, however, human nature.)
I don't know the other numbers I alluded to, but they're the place to start looking. And if we have reason to suspect a real connection, then with a new staff in place, now might be a great time to see how the risks can be cut down. And as a fan or as a rival, all of us should be supportive of any such effort - at UT or at any other school.
As to the numbers - they are indeed attention-getting.
[First, many apologies for the hard-to-read equations. After a half hour of fighting with the post editor and Word, these were the best I could manage. A readable Word 2007 copy of the equations may be found here. --hooper]
The basic question is: what are the odds that, of the 175(ish) players brought into UT from 1996 through 2001, that 3 would cause fatalities while driving under impairment of alcohol over an 11-year period?
Here are the relevant numbers for the problem at hand.
- Number of people in US: 300,000,000
- Number of drivers in US: 220,000,000
- Number of fatalities/year: 17,000
- Average deaths per death-inducing incident: 1.5
- Number of drivers causing fatalities/year: 11,333 (equals 17,000/1.5)
- Number of years: 11
- Total number of fatalities in question: 187,000
- Total number of drivers causing fatalities: 124,667
- Total number of drivers not causing fatalities: 219,875,333
To calculate the odds 'exactly' would be horrifically difficult, so a few simplifications are added to make things easier. For one, we're assuming those numbers are constant for each year. Another is that a driver can only cause 1 incident (after which the driver loses driving privileges and is likely in jail anyhow); this means there is no "replacement". A third assumption is that the odds of any given driver being 'that driver' is equal. (This is quite untrue, as some people like myself never drink and therefore cannot cause an alcohol-impaired fatality. But we're only looking for the ballpark, so this is useful.)
This is a "combination" problem. We're looking for the number of possible combinations of picking 175 drivers such that 3 of them caused fatalities while DUI. This may feel a bit backward: we started with 175 players prior to the incidents but now we're choosing 175 after the fact. But this works mathematically, so it's a more useful way to think about the problem. Here, we'll use a choose function (a/k/a a combination function), where the number of possible combinations are:
In English, x is the number of combinations you can have of choosing out of n number of items when you take out k items.
We're going to choose 175 people. Of those 175, 3 caused fatalities and 172 did not. So we have to choose 3 from the 124,667 drivers who caused fatalities and 172 from the 219,875,333 drivers who didn't. I'll call the number of possible combinations from the subsets of drivers x_fatal and x_nonfatal, respectively. The end goal is to fill in this relationship:
It's a mouthful in words, but each set of parentheses is a choose function. For example, the choose function for the first parentheses (x_fatal) is:
So there are a lot of combinations here for any 3 drivers causing fatalities. Here is where big-number math gets fun:
You can't make Excel do this one directly. I also have MATLAB, and it won't work. You either have to get some software designed specifically for this or simplify. So let's simplify. That first equation (the choose function) can be written in shorthand as:
With this, the odds are:
To make the math workable, I'll now assume that the number of nonfatal drivers is effectively equal to the total number of drivers. Since 219,875,333 is really close to 220,000,000 and the total we're choosing is microscopically small in comparison, this won't hurt the answer too badly. Also, I'll allow for replacement, which means that the 220,000,000 will stay constant as I choose people. With these two simplifications, the problem becomes:
Again, the math is not directly doable because of the huge numbers. But a quick hand simplification gives:
[Note: GAH! That 175*174*173 should be in the numerator, not the denominator. Massive error on my part there. I'll fix it when I get home tonight, but there you go.]
[Note 2: Fixed. --hooper]
That can be entered into Excel or your standard scientific calculator. You get 0.0001597, which is about 0.016% - exactly the number that Clay Travis received from his math buddy at Maryland. So with all of the assumptions listed above, the odds of this happening from a random draw are somewhere around 1 percent of 1 percent, or as Travis said, would happen for about 1 of every 10,000 D-1A programs.