I am in the middle of doing some modelling to predict a player's three point shooting percentages. Three point shooting is notoriously one of the most variable stats in basketball.
Over at Nylon Calculus, Darryl Blackport did a great analysis that indicated it takes 750 shots before half the variation in a player's three point percentage is explained by their shooting percentage rather than noise. By contrast in a similar analysis in baseball strike out rate took only 60 at bats to stabilize and walks 120 at bats, though on base percentage was 460 at bats and batting average was 910 (but that stat is not recommended).
My modelling work is designed to help that along a little by looking at other measures related to a player's shooting ability in order to stabilize things a bit quicker. One of the data sets I have available is from my draft models, so I decided to do a little exploration with that data.
Whenever we talk about a player's shooting potential coming into the draft our natural inclination, or at least mine, is to look at the their three point percentage in either the last year, or over their career and project them from there. The problem is that very few players take anywhere near 750 three point attempts in college or European basketball before declaring for the draft, and, in fact, in the first few years of their career in the NBA there aren't many player's with that kind of shot attempt numbers.
But, it turns out there is other data in that stat sheet that is just as helpful projecting their numbers forward as the three percentage, if not more so. I ran a number of models and, so far, none of them has gotten to the .50 R^{2}, but the models have clearly been a significant improvement over just running the pre-NBA three point percentage, which explains no more than a small percentage of variation in any of models.
In general there is a significant information conveyed by the player's free throw percentage, the frequency of three point attempts and then the three point percentage.
Using the three point percentage of third and fourth years or using years two through five as the target variable weighted by three point attempts the model explained about 30% of variation in three point percentage. The formula for the third and fourth year three point percentage using pre-draft stats is below:
NBA 3 Point % = .175 + .128 * Free Throw Percentage + .00449 * 3PTA per 40 + .163 * Three Point Percentage
The three stats carry approximately equal amounts of information as measured by the beta coefficient. Though running numerous random data sub-filters most consistently indicated that free throw percentage is the best indicator. Here is how I view the variables in relation to predicting downtown shooting numbers:
- Free Throw Percentage basically conveys information about the player's shooting stroke without the 'noise' that can effect any particular three point attempt from shot clock, pass placements, or opponent contests.
- 3PTA per 40 is a variable that I think one needs to be a bit careful interpreting. In essence, it conveys something of the confidence both the player and their coach have in their three point stroke. On the other hand we can't necessarily interpret as a causal variable, Josh Smith and Charles Barkley spent their careers proving that jacking up threes doesn't necessarily cause your accuracy to rise.
- Three Point percentage, of course, mimics most closely the variable we're looking for: three point accuracy in the NBA. But the high variation and relatively low number of attempts in pre-NBA shooting make that number unreliable as a predictor.
An alternative model that uses only the player's three pointers made per 40 minutes and their free throw percentage actually explained somewhat more of the variation, about 34% with its simple formula below:
NBA 3 Point % = .22 + .01571 * 3pt Made per 40 + .1389 * Free Throw Pct
- 3PT Made per 40 conveys some of both frequency and accuracy, but has the same interpretation issues as three point attempts.
With all that said, I thought I'd apply the models to some of the notable rookies from this incoming class. Below I have both models one and two, which give largely the same result.
- Marcus Smart comes out much better than would be expected from his three point percentage only due to his decent free throw percentage and confidence taking the long shot. As such he's a decent test of the three point volume variable.
- James Young doesn't project much better than Smart from deep, which is concerning given that's his primary selling point.
- Doug McDermott and Nik Stauskas project the highest, unsurprisingly, which is like the LeBron test for a player metric, your model doesn't think McDermott is a good shooter you may want to re-check your model.
- Neither Elfrid Payton or Aaron Gordon project well going forward as shooters. Gordon due to his poor free throw shooting and relatively low volume from beyond the arc, and Payton due to a combination of all three.
- Noah Vonleh probably isn't going to hit 48.5% percent of his three pointers in the pros.
Three point shooting is definitely one of the areas that I think scouting comes most into play, especially for the player's with more unusual profiles. Players like Vonleh or Smart or Bogdan Bogdanovic where the volume and shooting percentages don't match up, given the variation at play, context is necessary to add to stats modeling's reach. Did they get pushed into poor shots as each tried to carry their team, or is their volume of three pointers something to dismiss because of no effective options on the team?
In any case, next year I will try to avoid mentally projecting a player's NBA three point prowess looking simply at his percentage this year.