I debuted my Player Tracking Plus Minus here last year as a Beta stat. Basically I have never stopped making changes and already put out one update on Nylon Calculus.
In terms of writing I try to find a balance between writing up every little finding, for whomever is actually interested, and just holding on to the end. My plan is to wait for the end of this season, sadly less than twenty games away to do a full 2.0 version. That will give me a better target variable in terms of reliability from last year and not have to worry about synchronizing any data updates from the different source I am using. But the defensive side is the most interesting and I thought I'd put some of the prep work out there in order to clarify my own thoughts and get any feedback.
Play by Play Fouling
As I alluded to in my fouling rate model the other day. I have been adding the specific foul data via NBA Miner to the model, and it appears to be a big improvement. The two specific foul types that consistently show value in cross validation are offensive fouls drawn as a positive and shooting fouls as a negative. Happily these are nice common sense terms that any basketball coach would recognize; defend without fouling and beat the offensive player to their spots.
The general term for offensive fouls dawn, which includes charges as a subset, is +1.26 per offensive foul drawn per 40 minutes. That is a high figure, higher than steals for example. So it's important to keep in mind what a orthogonal RAPM derived model is trying to do, which is to determine an estimate of player value not a description directly of the value of each play. Sometimes a model will pick up an attribute that seems to be associated with good or bad players, and may, in fact, act partially as a proxy for other actions not seen in the model. This is especially true on defense, where our data is fairly limited.
Blocks, prior to SportVU data, have been a proxy for rim protection, a decent, but flawed one. Steals in draft model especially have acted as proxies for spacial awareness and defensive engagement. Here I think that offensive fouls drawn act as a proxy for physicality on defense and defensive anticipation. Another interesting note is that there is no other term in the model with a strong direct correlation to drawing offensive fouls, so there is definitely some independent information being conveyed by a player's ability to draw offensive fouls, however that ability is interpreted.
Contesting three point shooting is noisy business, with little year over year relationship in the results, as I found here. And showing a coin flip like pattern in the results. Defending shots within the three point arc is still noisy, but there also seems to be some signal. So, for that reason I have used the percent plus minus on two point shots contested by the player. Even though including all shots improves in sample correlation to this year's defensive RAPM more, it is less likely to help predict next year's performance.
Lastly, the NBA version of defensive rating (DRTG), which is a descriptive statistic of how the team performed with the player on the floor, is used in as a second stage variable on the residual of the individual defensive statistics model. I have used that technique for a while, the good thing about this method is that it allows me to pick up the information from individual performance without being overwhelmed by correlation between DRTG and defensive RAPM, then allows me gleen the additional information from team performance rather than throwing that information away. It is also an improvement over using over all team performance that includes defensive performance when the player isn't even on the court.
One good piece of news is that the DRTG effect is just over two thirds of what it was in the older version with the new information added, so the model is able to attribute a bit more value to the discreet actions of the player.
My player updates at Nylon this year will continue to use the PT-PM as developed earlier, since it is confusing to change both the metric and the input data at the same time. But next year I will unleash the power of this fully operational player metric.