Answer = Abuse of Function
As you all know, I'm a big stats freak. I love putting up all kinds of pretty numbers and pictures to explain my arguments and to help shape my take on things in football. That said, statistics are easily overused, often misused, and sometimes completely abuse beyond reason. So here, without actually getting all stat-geeky on you, I'm going to highlight a few uses of statistics that I've seen recently that bother me and how they could have been better managed.
Three points before I start, though: (1) The value in statistics is that they show you where to look, not how to fix things. For example, stats tell you that Tennessee's offense is horrible but the defense is great. With stats, you know to look at the offense to find room for improvement, while the defense can be largely left alone. That example is a bit obvious, but that's the general idea with stats. (2) I'm going to use examples from The Wiz of Odds, not because I'm calling him for abuse, but because he finds good tidbits of information and extends the logic too far at times. Jay is one of the better-read feeds in my list and I have a high regard for his stuff. I'd rather use useful information and improve it than illustrate points with junk information, so this is really more of a compliment than anything else. (3) It's not entirely a stats post here, but more of a numbers post. These are merely things I tend to see with quantitative writers more so than qualitative writers.
CASE 1: A SINGLE DATA POINT IS MEANINGLESS
Wiz pointed out this play recently, where a 2-star D-tackle absolutely owned a 4-star guard:
BJ Raji uses one Notre Dame player to tackle another. (via OsborneJC)
Wiz then concludes with a salient point - the star ranking doesn't always tell the story. He doesn't explicitly say that this proves the case, but he does use ATL_Eagle's take: "I don't want to spark the whole recruiting rankings debate again, but let's all agree that sometimes the experts get it wrong."
Plays like this often get used to illustrate that so-and-so (in this case, the DT) is better than such-and-such (here, the OG). The problem with this conclusion is that one data point has a 100% margin of error. That is, the result could indicate a correlation, or it could mean absolutely nothing. You just don't know. So while the conclusion may actually be valid in a general sense, using one play does nothing to strengthen the argument.
Remember, though, that stats, at best, merely tell us where to look. We already know to look at the matchup, so let's break it down:
- 19 seconds: Watching the snap, you'll notice that both players move at approximately the same time; this isn't a case of the defender getting off the snap quicker and beating the guard to the spot. However, note that the defender appears shorter than the guard. That's because (a) he's keeping his body lower for upcoming leverage, and (b) he is shorter.
20 seconds: Note the body positioning when contact is made. (See the freeze-frame below for a reference.) B.J. Raji (the BC D-tackle) has his left shoulder squarely into Eric Olsen's upper chest. That means the force of contact is mostly in-line with Raji's legs and back, giving him maximum use of his strength. Meanwhile, Olsen's contact point means the force is perpendicular to his legs, removing much of his strength from the struggle. Also note the legs: Raji's legs are coiled underneath him, allowing him to use his leg muscles in an attempt to push Olsen back. Meanwhile, Olsen's legs are much straighter. Even if Olsen had a better blocking position, he has very little leg strength available. Right now, Raji is a well-placed spring ready to unload, while Olsen is little more than a 300-lb board - and an upright one at that. (For a contrast, note the position of the other BC tackle; he's much straighter and his legs are fully extended; even without the double-team, he doesn't have a chance on this play.)
- 20 - 24 seconds: For the rest of this play, Raji simply uses his strength to drive Olsen back. Olsen actually does a respectable job of using lateral leverage to steer Raji outward; it's a shame for Olsen that it happened to be steerage right into the running back. By the conclusion of the play, Olsen actually had control, but Raji had done his job. At the very least, the play was disrupted and a lane was open for the linebackers to purse. At the best (i.e. what happened in reality), the running back is caught in the wash and brought down by obstruction.
Looking at the play tells you what a statsheet can't; that Raji had better initial position and leverage, and Olsen made contact with no available power. Is Raji better? Perhaps. Was Raji better on that one play? Yes. However, if Raji was truly that much better than Olsen, we would have seen more plays like that throughout the game. This is a best-case illustration, where Raji's most notable results are used to highlight the game overall. This is why a single data point is statistically useless. (The play's pretty cool, though.)
CASE 2: DISTORTION OF EFFECT - ABSOLUTE VERSUS RELATIVE COMPARISON
Early on in the season, Wiz started looking at the effect of the clock rule changes. From September 4th, he provides this chart:
The effect of the clock changes in 2006 and 2008 appear enormous. The problem is that it's hard for the mind to evaluate the actual degree of difference when the chart is zoomed in so closely (minimum of 125, maximum of 145). A more honest approach is to set the baseline to zero:
Now that the bars actually illustrate the average number of plays in a game, the meaning of the differences is easily comprehensible. It's still a tangible difference, but it doesn't automatically make you think that the NCAA has completely and utterly abandoned football for a commercial-thon. (Altough, for the record, it still feels like that. If they have any sense, they'll change the clock rules back and bring back 4th quarter drama. Right now, endgames are boring.)
CASE 3: FAILURE TO ADDRESS NOISE IN THE SYSTEM
Sometimes a statistic is affected by more than one variable. Take, for example, the Wiz's recent look at the length of games over the last weekend:
G Plays/G Time/G Pts/G
2005 717 140.71 3:21 52.61
2006 792 127.53 3:07 47.53
2007 792 143.42 3:23 55.37
2008 614 134.59 3:11 52.32
Wk 11 55 134.38 3:12 51.09
The longest games of Week 11:
Alabama-Louisiana State: 3:49
Clemson-Florida State: 3:48
Oklahoma-Texas A&M: 3:43
Nevada-Fresno State: 3:36
North Carolina State-Duke: 3:35
Cincinnati-West Virginia: 3:35
Marshall-East Carolina: 3:30
The shortest games of Week 11:
Colorado State-Air Force: 2:42
Louisiana Tech-San Jose State: 2:45
Bowling Green-Ohio: 2:47
Arizona-Washington State: 2:51
Southern Mississippi-Central Florida: 2:52
Ohio State-Northwestern: 2:53
(Cut/Paste from this article by the Wiz.)
The italics on the Alabama-LSU game were added by me. Wiz's conclusion about the game: Another CBS telecast — Alabama at Louisiana State — is at the top of the list of longest games. Clearly, CBS stands for the Commercial Broadcasting System.
While the accusation may indeed be true, there is obvious interference. The Alabama-LSU game went into overtime. Naturally, an overtime game is going to be one of the longer games of the weekend. Yet the overtime effect is not mentioned. This is common with journalists and is usually a result of seeking a word limit or a quick turnaround on an article. The logistics of the profession often create pressure to skp over relevant pieces of information. It's not that Wiz didn't know this; quite the opposite, I'm sure. But in today's skeptical society, it does hurt your conclusion to accuse CBS of commercialization when you skip past a relevant explanation for the game length. Even if CBS is the Commercial Broadcasting System.
To give credit, Wiz did point out that the Oklahoma-Texas A&M game included a team that routinely gets a very high number of offensive plays (that'd be Oklahoma). More plays creates more time-stopping situations (incomplete passes, runs out of bounds, penalties, etc.) While each stoppage may be small, they do add up quickly, and you have something to look into.
Remember, statistics tell you where to look, not what to conclude. It's the post-mortem analysis that finds the problems, which is the phase that far too many numbers-freaks ignore. Also, this is not a knock on Wiz; he does a good job of thinking through his numbers, but he does get caught by his journalism training every now and then.