(What to take from the preseason)
Edit Note: This is excerpt from my prior post because on re-reading I thought it stood better on its own.
The above image is a diagram my informal model of the trade off between expert opinion and data analysis as more observations are taken. Basically, the fewer observations one has the more weight opinions on process people with subject matter expertise should get, but, with more observations we can weight the numbers more heavily, though subject matter experts continue to retain value. Preseason definitely lies somewhere towards the upper left of the chart with only seven or eight games and few extended outings by any one player.
The main things to look at in preseason are the same ones I think most of the coaches are looking at; first, and foremost, whatever shiny new players the team signed in the off season, whether they are free agents or rookies. Rookies, for example, tend to get much more run in the preseason than most of them will in the regular season and it's the first time seeing them against legit NBA opponents along side legit NBA teammates.
Last year I created a visualization matching preseason individual stats with the regular season performance, which is here. In general the stats we generally think of as stable, like rebounds, performed well, while scoring was only marginally related to preseason performance. (Sidenote, it is always important to understand that some of the 'stability' of stats like assists or rebounds is due the positional/role dependency; point guards will probably continue to have the ball and make passes and centers will continue to stand near the basket).
The data gained from preseason is generally more valuable for any player that either has a little existing data like rookies or anytime there is a significant discontinuity between this season and previous seasons, and that's true of scouting or coaching observations too. For example, last year building my win predictions, I found that the ASPM box score metric was less predictive season to season for any player that changed teams, which is similar to numerous other study findings on boxscore data. Also, players like Derrick Rose or Kobe Bryant coming off of major injuries and essentially lost seasons are also examples of a significant discontinuity.
We can also take advantage of the fact that we have other information about players prior to preseason games, essentially applying Bayesian principles. There's no inherent reason for preseason data to stand on its own, or for us to not incorporate that data with previous observations. Just as there is no reason for scouts to discard their previous film reviews or prior in person scouting every time the calendar turns over.
So, for example, looking at rookies preseason numbers, we can and should assume that the numbers will show some mean reversion to the long term rookie mean. Using the linear box score metric Alt. Win Score (AWS) that I used for my draft model, I ran a simple regression with both the raw AWS and one stabilized with a reversion to mean. The Stabilized AWS showed a consistent, though slight, advantage predicting the regular season AWS per 40 minutes.
For rookies with as few as 70 minutes played in the preseason and at least 100 minutes in the regular season I found a .44 correlation (R2 .198) over the last three years (a period with very poor team win percent correlations from preseason play).
I am in the process of combining that preseason rookie data from 2010, 2012 and 2013 with their data from their pre-NBA seasons. Using the sample matched so far I ran some regressions with both sets of data to predict their performances in the their rookie year:
- The R2 in the complete set was .351, with sub-sample regressions R2 ranging from approximately .26 to .44.
- The pre-season numbers explained 40% of the model's predictive power, with sub-sample regressions running from approximately 30% to 50%.
- Offensive rebounds appeared to be the strongest preseason predictor of any single stat, though that appears to be a function of the rookie success of a few outstanding big men like Andre Drummond and Athony Davis.
In addition to this general analysis, I plan to add to the cases and columns of data matched to date in order to try a more fully realized model and create a simulation run from that to apply to this year's class, as well as revisit a Bayes' theorem analysis I did on last year's rookie class.