The following post comes courtesy of RTT community member hooper. Do not mess with a guy who's this close to getting a Ph.D. in nuclear engineering. I'm bumping his conclusion to the top because frankly, it's the only part I understand:
Translation: I was right. Na-na-na-na-na-na. Oh, and Tennessee wins! Woo! Maybe. So whoa on woo.
Anyway, the meat is after the jump, but be warned, wicked math, charts, and graphs ahead. Make sure the safety goggles are snug before proceeding.
For the data given in "How do you solve a problem like McFadden, part II", the best fit of points is by yards:
Linear Fit
Points = -21.45121 + 0.3460472 Yards
Summary of Fit
|
|
RSquare |
0.567489 |
RSquare Adj |
0.495404 |
Root Mean Square Error |
12.87058 |
Mean of Response |
42 |
Observations (or Sum Wgts) |
8 |
Analysis of Variance
Source |
DF |
Sum of Squares |
Mean Square |
F Ratio |
Model |
1 |
1304.0891 |
1304.09 |
7.8725 |
Error |
6 |
993.9109 |
165.65 |
Prob > F |
C. Total |
7 |
2298.0000 |
|
0.0309 |
Parameter Estimates
Term |
|
Estimate |
Std Error |
t Ratio |
Prob>|t| |
Intercept |
|
-21.45121 |
23.06764 |
-0.93 |
0.3883 |
Yards |
|
0.3460472 |
0.123333 |
2.81 |
0.0309 |
Just look at the big red text and ignore the rest (the software package [I'm using] gives a lot more, but it’s easier to highlight than to edit further). Interpretation:
- R2: Rush Yardage explains a little over half the variance in Arkansas’s points.
- Prob > F: A number below 0.05 is generally considered a sign that the model is statistically significant. In other words, the model is useful.
However, notice that the two leftmost points really stand apart. Without them, the remaining points appear to trend very nicely. Treating them as outliers, I’ll remove them:
Linear Fit
Points = -280.3811 + 1.6089548 Yards
Summary of Fit
|
|
RSquare |
0.585571 |
RSquare Adj |
0.481964 |
Root Mean Square Error |
9.124053 |
Mean of Response |
48.5 |
Observations (or Sum Wgts) |
6 |
Analysis of Variance
Source |
DF |
Sum of Squares |
Mean Square |
F Ratio |
Model |
1 |
470.50665 |
470.507 |
5.6518 |
Error |
4 |
332.99335 |
83.248 |
Prob > F |
C. Total |
5 |
803.50000 |
|
0.0762 |
Parameter Estimates
Term |
|
Estimate |
Std Error |
t Ratio |
Prob>|t| |
Intercept |
|
-280.3811 |
138.3889 |
-2.03 |
0.1127 |
Yards |
|
1.6089548 |
0.676782 |
2.38 |
0.0762 |
Removing the Alabama and Auburn results, the model is really no better at explaining things (look at RSquare – it gained almost nothing). Not only that, the statistical significance is lower (Prob > F is higher, which is bad). Besides, do you really believe a model that predicts 25 points if Arkansas plays a team who averaged 190 yards of rush defense, but predicts 57 points if Arkansas plays a team who averages 210 rushing yards on defense? Me neither.
Now, for some real fun (well, a statistician would think so).
Whole Model Test
Model |
-LogLikelihood |
DF |
ChiSquare |
Prob>ChiSq |
Difference |
5.2925058 |
1 |
10.58501 |
0.0011 |
Full |
5.82257e-8 |
|
|
|
Reduced |
5.2925059 |
|
|
|
|
|
RSquare (U) |
1.0000 |
Observations (or Sum Wgts) |
8 |
|
|
Converged by Objective
Parameter Estimates
Term |
|
Estimate |
Std Error |
ChiSquare |
Prob>ChiSq |
Intercept |
Unstable |
603.457952 |
149969.72 |
0.00 |
0.9968 |
Yards |
Unstable |
-3.0404375 |
749.93609 |
0.00 |
0.9968 |
For log odds of L/W
Ignore the data junk.
This is a logistical test where wins and losses are compared against yardage gained. (Ignore the vertical stuff and read the left-right of the graph to simplify things.) The nearly vertical blue line effectively says that if Arkansas plays a team who gives up an average of 200 or more rushing yards, they win, otherwise they lose. Nifty, huh? If you read the "Parameter Estimates" piece, you see the word "unstable" twice. This unfortunately tells you that the model is not reliable. So, while it sounds good, [it's] really not useful.
The problem is the Kentucky game, where a really bad defense produced the same result as a really good defense. Since there are so few data points, it’s enough to throw the whole thing off. Removing Kentucky:
Whole Model Test
Model |
-LogLikelihood |
DF |
ChiSquare |
Prob>ChiSq |
Difference |
4.1878871 |
1 |
8.375774 |
0.0038 |
Full |
3.56579e-8 |
|
|
|
Reduced |
4.1878871 |
|
|
|
|
|
RSquare (U) |
1.0000 |
Observations (or Sum Wgts) |
7 |
|
|
Converged by Objective
Parameter Estimates
Term |
|
Estimate |
Std Error |
ChiSquare |
Prob>ChiSq |
Intercept |
Unstable |
77.8017059 |
26623.368 |
0.00 |
0.9977 |
Yards |
Unstable |
-0.4704935 |
144.43734 |
0.00 |
0.9974 |
For log odds of L/W
Again, just ignore the data junk and watch the pretty blue line. Without Kentucky, the breakwater lies closer to about 167. If that number is eerily frightening, it should be; your estimate of Tennessee’s run defense came out to basically exactly this result. Again, the model has stability problems due to a lack of data points (it’s "underpowered" in stats lingo). Still, it’s as useful as anything else will be for predicting this game.
Summary? Summary:
If the current information provides any prediction about this game, it’s that Tennessee is predicted to beat Arkansas based on the average rush defense predictor. If the Kentucky game is treated as an anomaly, UT is predicted to have about a 50-50 shot at beating Arkansas based on their average rush defense. That’s not exactly a revelation, but it does verify the stuff you wrote and gives some nice pretty numbers and pictures to play with.
Poll
Thoughts?
This poll is closed
-
20%
Exactly!
-
60%
You should have used a spline.
-
20%
Huh?