The Celtics got off to a better start to their preseason this year than last year, the first under Brad Stevens. After two twenty point point wins the optimism was running high in Celtic land, before it reverted after three losses in a row.
So, I wanted to quantify the weight one ought to give to preseason optimism or pessimism, as well as channel the things we most ought to watch for in preseason.
In the case of team performance, there is a definite positive, but weak relationship from preseason performance to the regular season. The website 82games did a study on preseason win percentage back in 2006, they found a .401 correlation (R2 of .16) between preseason win percentage and the regular season.
I did a similar regression using the seasons picking up from where 82games left off, 2007-2014, omitting the lock out season in 2011, which featured only 2 preseason games and a shortened training camp. My results were slightly lower using both point differential and win percentage with a correlation of .367 and .319 respectively (R2 of .139 and .102). Interestingly the correlation for both in the last three non-lock out years has been even lower for both. Three years isn't enough for me to take out my protractor and draw a straight line projecting out, but it's an interesting note.
It's possible that the quickening change in coaches and shorter playing contracts has made preseason more of a vehicle for discovery and experimentation or the growing talent divergence between East and West has made the largely intra-divisional preseason schedule even less indicative of the regular season (Hello Atlantic Division!). Or it's just noise.
As a side note, preseason does retain some independent information even when paired in a model with the prior year information, either point differential or winning percentage. Historically using both prior year and preseason information has explained ~ 38% of a teams winning percentage (correlation of .61) with the preseason providing a third of the explanatory power. Of course, if you lose LeBron James or acquire 2008 vintage Kevin Garnett all bets are off, as the 2011 Cavaliers and 2008 Celtics were the two biggest outliers, which is why one is better off using the player metric methods like Nathan Walker or Kevin Ferrigan do.
Part of the reason, other than sample size and strength of schedule, preseason isn't a particularly good predictor of anything for teams is that rotations do not much resemble their regular season form.
So what to take from the preseason?
The above image is a diagram my informal model of the trade off between expert opinion and data analysis as more observations are taken. Basically, the fewer observations one has the more weight opinions on process people with subject matter expertise, should get but with more observations we can weight the numbers more heavily, though subject matter experts continue to retain value. Preseason definitely lies somewhere towards the upper left of the chart with only seven or eight games and few extended outings by any one player.
The main things to look at in preseason are the same ones I think most of the coaches are looking at; first, and foremost, whatever shiny new players the team signed in the off season, whether they are free agents or rookies. Rookies, for example, tend to get much more run in the preseason than most of them will in the regular season and it's the first time seeing them against legit NBA opponents.
Last year I created a visualization matching preseason individual stats with the regular season performance, which is here. In general the stats we generally think of a stable like rebounds performed well, while scoring was only marginally related to preseason performance. (Sidenote, that it's always important to understand that some of the 'stability' of stats like assists or rebounds is due the positional/role dependency; point guards will probably continue to have the ball and make passes and centers will continue to stand near the basket).
The data gained from preseason is generally more valuable for any player that either has a little existing data like rookies or anytime there is a significant discontinuity between this season and previous seasons, and that's true of scouting or coaching observations too. For example, last year building my win predictions, I found that the ASPM box score metric was less predictive season to season for any player that changed teams, which is similar to numerous other study findings on boxscore data. Also, players like Derrick Rose or Kobe Bryant coming off of major injuries and essentially lost seasons are another example of a significant discontinuity.
We can also take advantage of the fact that we have other information about players, applying Bayesian principles. There's no inherent reason for preseason data to stand on its own, or for us to not incorporate that data with previous observations. Just as there is no reason for scouts to discard their previous film reviews or prior in person scouting every time the calendar turns over.
So, for example, looking at rookies preseason numbers, we can and should assume that the numbers will show some mean reversion to the long term rookie mean. Using the linear box score metric Alt. Win Score (AWS) that I used for my draft model, I ran a simple regression with both the raw AWS and one stabilized with a reversion to mean. The Stabilized AWS showed a consistent, though slight, advantage predicting the regular season AWS per 40 minutes.
For rookies with as few as 70 minutes played in the preseason and at least 100 minutes in the regular season I found a .44 correlation (R2 .198) over the last three years (a period with very poor team win percent correlations from preseason play).
I am in the process of combining that preseason rookie data from 2010, 2012 and 2013 with their data from their pre-NBA seasons. Using the sample matched so far I ran some regressions with both sets of data to predict their performances in the their rookie year:
- The R2 in the complete set was .351, with sub-sample regressions R2 ranging from approximately .26 to .44.
- The pre-season numbers explained 40% of the model's predictive power, with sub-sample regressions running from approximately 30% to 50%.
- Offensive rebounds appeared to be the strongest preseason predictor of any single stat, though that appears to be a function of the rookie success of a few outstanding big men like Andre Drummond and Athony Davis.
In addition to this general analysis, I plan to add to the cases and columns of data matched to date in order to try a more fully realized model and create a simulation run from that to apply to this year's class, as well as revisit a Bayes' theorum analysis I did.