So, do you like basketball and math? At least one of the two? I've got a fun exercise for you if you do.
Rocky Top Talk reader kidbourbon sent the following question, which I am happy to answer:
How do you use kenpom's pythag rating to get a predicted margin of victory?
This is an excellent question! But first, let's talk briefly about what "pythag" is, then I'll show you how to use it to calculate score predictions. (I had a little discussion of pythag over at The BruceBall Blog awhile back if you're interested.)
"pythag" is short for "Pythagorean Expected Winning Percentage." You can see why we like to abbreviate it, and "pythag" sounds a lot more appropriate than "PEWP." Anyway, the concept comes from baseball, which has a long and rich history of intense, complex statistical analysis. The basic idea was to calculate how many games a team should win, based on how many runs it scored and how many runs it allowed. It measures how well a team plays overall factoring out wins and losses due to luck or timing. Simple enough.
The basic baseball formula is
E(W%) = runs scored^2 / (runs scored^2+runs allowed^2).
At their root, baseball and basketball work on the same principle-- you win by scoring more than you allow-- so it's natural to try and use the same formula to assess the strength of a team, particularly as it relates to predicting future outcomes. The major difference comes in two places: 1) many more points are scored in basketball, and 2) good teams win a far higher percentage of their games in basketball. Consequently, the exponents have to be changed to make it a realistic measure for basketball. Pomeroy, using the log5 formula and a whole slew of games, calculated that an exponent of 11.5 would be the most accurate for college basketball. Fine by me.
So for NCAA basketball, the formula becomes
E(W%) = points scored^11.5 / (points scored^11.5+points allowed^11.5).
Of course, in NCAA basketball, no two teams play the same schedule. So Pomeroy adjusts the points scored and allowed to account for schedule strength. Each team has a number for scoring rate and defensive scoring rate, adjusted to an average schedule. Those are plugged in to the above formula, and voila.
So there you go. A quick and dirty pythag introduction and a brief look at how it's calculated. Now, back to kidbourbon's question . . . so how do we use this number to predict margin of victory? I'll show you the series of required calculations, using the upcoming Kentucky@Tennessee game as an example.
Now, Pomeroy reports points scored and points allowed on a per 100 possession basis, calling them offensive and defensive efficiency. Adjusted for schedule, Tennessee's numbers are 117.7 (offense) and 89.5 (defense). Kentucky's are 108.0 and 92.8. However, homecourt advantage is not accounted for in this-- so for predictive purposes, kenpom gives the home team a 1.4% bonus to offense and defense, and gives the visiting team the same bump in the other direction. After this adjustment, we have
Vols: 119.3 offense, 88.2 defense
Wildcats: 106.5 offense, 94.1 defense
From these adjusted numbers, we could recalculate pythag for each team using the E(W%) formula above. When we do, we see that Tennessee's pythag is now 0.970 and Kentucky's is 0.806. This will allow us to look at Tennessee's probability of winning.
I mentioned the log5 formula above, and that is what all of this is based on. The log5 formula gives the chance of a team winning a game, given their pythag and their opponent's pythag. For team A vs. team B, the log5 prediction for A's chance of winning is
P(W) = (A - A * B) / (A + B - 2*A*B).
For Tennessee, this becomes
P(W) = (0.970 - 0.970 * 0.806) / (0.970 + 0.806 - 2 * 0.970 * 0.806) = 88.6%.
If you look at kenpom's prediction for the UT-UK game, he lists the Vols' chances of winning at 89%. Bingo.
But what about the margin? That's what kidbourbon is looking for, after all.
First we have to know how many possessions to expect. Adjusted for schedule, Tennessee averages 72.3 possessions per game while Kentucky averages 64.9. Across all NCAA games, the average tempo is 67.3 possessions. So, Tennessee gets 107.4% of the average, and Kentucky gets 96.4%. To predict the tempo, we simply take UT's 107.4%, multiply it by Kentucky's 96.4%, and then multiply by the average number to get expected tempo:
E(tempo) = 107.4% * 96.4% * 67.3 = 69.7,
or 70 possessions. If you look at kenpom's prediction, you'll see this expected tempo in brackets next to the score prediction. With me so far? Good.
As it turns out, the calculation for the predicted score is very similar to the calculation for the expected tempo, which is why I introduced that one first.
For each team we'll need to compare their offense and defense to the average, just as we did with tempo. The average points per 100 possessions nationwide is 101.5. That means that Tennessee's offense is 117.6% of the average. Its defense is 86.9% of the average. For Kentucky, these numbers are 104.9% and 92.7%. Remember, we already added in the homecourt advantage, so these really represent how we expect UT to do at home and how we expect UK to do on the road.
Now getting each team's expected output is easy. We simply multiply their offense by the opposing defense and then by the average output:
Tennessee expected output = 117.6% * 92.7% * 101.5 = 110.6
Kentucky expected output = 104.9% * 86.9% * 101.5 = 92.6
Seem a little high? That's because these numbers are for 100 possessions. Recall that for this game, we expect 69.7 possessions. To adjust for that, multiply each output by (69.7/100). The result?
Tennessee 77, Kentucky 65.
If this seems like a long and complicated way to get a prediction for who wins and by how much, that's because it is. Thankfully for real games kenpom has already done the calculating for us. But it's nice to know how to do this just in case you want to look at some hypotheticals. Does that answer your question, kidbourbon?