Below I have a pretty simple data visualization focused on the Celtics new draftees. The box plots are all by position to get a better sense of how the Celtics players rate with their fellow new drafted rookies.
It is probably a good thing for Celtics fans that free agency in the NBA follows so closely on the heels of the draft, if only to distract them from their palpable disappointment. The collective draft let down stems from both the inability of the front office to move up in the draft from the sixteen spot and the resulting pick of Terry "Who Now?" Rozier out of Louisville with that pick.
This year I added the high school consensus recruiting index rating to my draft model, which I discussed here. The prmary benefit of adding the rating is to add prior information to college performance especially for the highest rated prospects.
As mentioned in the introductory post, the RSCI rating is most closely related to the age that players enter the draft, as that's a self selection based on the player's expected draft position. Therefore the biggest effect on the model, other than to boost highly rated recruits, is to reduce the relative power of the age variable.
So, while I consider the version including RSCI to be the best estimate and plan to use it going forward, it is sometimes useful to compare different versions of a model in order to better understand how variables are being weighed and to take a closer look at any instances where there is a particular divergence between the versions.
First here is the top twenty or so players based on recent Draft Express rankings (may not be completely up to date with current rankings).
The top of the order of the two models is slightly different. Jahlil Okafor is rated as the top prospect when including the information about his recruiting rank, but switches places with Karl Towns when that info is dropped. D'Angelo Russell moves into a very close third without the recruiting information, but is a more distant fourth when the RSCI rating is included.
Further down Justise Winslow and Stanley Johnson also swap places in the eight and twelve spots, with Johnson rising to eighth when the recruiting info is included.
In terms of the bigger list, I wanted to pull out the biggest risers and fallers between model. As above the 'Diff' column shows how much the player was helped, or hurt, by the recruiting information. First the biggest risers:
Andrew Harrison goes from completely undraftable at 78, to almost draftable at 62, which is not a great place for a former top 5 recruit out of high school to be. Sam Dekker also goes up given that he is relatively old for a former 13th rated recruit to enter the draft. Cliff Alexander is another beneficiary.
Then there is those lowered in the model by adding the additional rating:
Devin Booker falls despite being a well regarded recruit, simply because he is such a young prospect getting less boost from his borderline out of sample age.
If you are interested in the entire list it's linked here on Google docs.
Stylin' n' Profilin'
I put together a big group of data visualizations using the top 100 draft prospects and data via Draft Express. The visualizations are meant to give some insight on the different players' style of play in college or Europe, as well as demonstrate some more general statistical relationships. For players, for example one can see how balanced Tyus Jones' scoring was at Duke, visualize how many more blocks Robert Upshaw averaged than anyone else or where Rondae Hollis-Jefferson ranks in made three pointers for small forwards (Spoiler: Low). This a companion to my overall rating viz here, and I may take some still frames out for future articles.
They are all interactive, so play around with them and let me know if there are other charts or filters you would like to see.
Below is an interactive version of the stat breakdowns I made for this year's draft. The basic format is to use the standardized scores of prospects in five different categories, rebounding, scoring, distribution, and a group combining age, competition level and scouting ratings. Bars above zero indicate that the player was above average for that group, and bars below zero means they were rated below average in that group. The floating ball with the number beside it, meanwhile, gives the player's overall rating in the draft model.
Then there are a couple of panels with scatter plots to show some of the playing styles of the prospects.
(And Introducing the I-Test)
I re-calculated my PAWS draft model with all of the updated data from the NCAA tournament, and just like everyone's mock draft, Karl-Anthony Towns moves up but narrowly moves misses the number one position. Basically the model puts the two big freshman as the top two in a tier by themselves, and higher than anyone in last year's class.
The next tier is tournament MOP, Tyus Jones and Ohio State shooting guard D'Angelo Russell. To date Jones's slight frame has held him back in most scouting rankings, and Layne Vashro has noted that Jones's lack of shots at the rim hold hi back in his model. However, his production, including his ability to get to the free throw line has been phenomenal for his age. He is also helped out by his high ranking coming out of high school, which historically has been a good indicator for players in the top ten ranking especially.
But, before the rankings, I am going to put up some visualizations of the statistics that go into the model, which I think gives a little better picture of the prospect's strengths and weaknesses.
First the centers:
Then the Power Forwards, where the model has Kevon Looney as the highest rated prospect:
Then the small forwards, with Stanley Johnson and Aleksander Vezenkov a 19 year old playing in Greece just ahead of Justise Winslow. Vezenkov, though, may need to play stretch four to have a chance to defend in the NBA
And the point guards:
In addition to my regular P-AWS draft rank I've added a Rookie Impact Test, (Rookie I-Test). This is a model just built off of rookie production, as such it is less powerful than the P-AWS model as it has less data to build from in the target variable and the early entry of the most talented players into the NBA means that the development curve has had less time to play out. The biggest difference between the two, other than the lower expected production for all players, is the lower impact age has on the model, so you will see some of the younger players with a lower I-Test ranking.
Here is my top forty:
I ran my draft model against the latest Draft Express data, which includes the first weekend of the tournament, so only a few players will have more than more one or two games of data added (though most likely that will include all of the prospects on Kentucky).
Before I get to the inevitable list I wanted to both look at the relationship between the model out put and Draft Express Top 100 prospects as well as a couple of visualizations of the stats in the models.
Using the rank given by the model and the DX rank I plotted them and ran some correlations based on position and a split between top and bottom 50.
So basically the model and Draft Express ratings agree much more at the top of the rankings than at the bottom, where there is no agreement. There's some logic to that given that the history of the draft indicates that there is more consistent differentiation at the top of the draft order than the bottom, which gives us the famous logarithmic shape of draft value.
Point Guard Island
The relationship by position is interesting. It is tempting to see the higher correlation for centers and power forwards as something indicative of the PAWS model being more in tune with scouting for bigs than guards. But last year the model and the consensus had Marcus Smart, for example, in the same neighborhood, with the disagreement the greatest with a couple wings and power forward/center in Clint Capela.
In any case, right now there is basically no agreement between my model and the DX rating of point guards, but a decent relationship with centers, power forwards, and shooting guards.
Below we can see a picture of what no correlation looks like, along with my little (literally) point guard island on the left side of the graph.
The diminutive trio of Tyus Jones, Fred VanVleet and Tyler Ulis are ranked 2nd, 5th and 19th by the PAWS model, while they're 28th, 86th and 89th respectively by Draft Express. Any analytics draft model is at a disadvantage on the defensive end, and it's clear that is a concern with these three. The defensive measures we do have, blocks and steals are below average for both of the freshmen, Jones and Ulis, but VanVleet is competitive with the other higher ranked point guards.
Below the statistics used to inform the draft model are graphed in standardized ratings for scoring, rebounds, blocks plus steals, distribution (assists and turnovers) and a rating combining age, consensus high school recruiting rank, and competition level. The weights mimic the model, so offensive rebounds are more valuable than defensive, steals are more valuable than blocks and age is the dominant factor in the Age and Competition rating.
Jones, VanVleet and Delon Wright form the models' first tier. Jones gets there in significant part due to his age and high consensus recruiting rank and his distribution numbers. Wright essentially does everything well, but is on the older side. VanVleet is younger than Wright, but has been a below average scorer. We'll see how VanVleet's tournament performance affects his draft stock, a good game against Duke could make him a fair amount of money.
Here's the same stacked visualization for the top ranked shooting guards:
D'Angelo Russell is the clear leader, with something of a second tier in Jerian Grant and R.J. Hunter. Mario Hezonja does not grade out as well as his scouting, but he is playing on a high level professional team in Spain, where he has struggled to get playing time consistently.
Here are the small forwards:
The model has Stanley Johnson and Aleksandar Vezenkov as a first tier of sorts, with Justice Winslow, Kelly Oubre, Sam Dekker and Rondae Hollis-Jefferson bunched behind. The big IF with Vezenkov is his defense, his scoring in the Greek leage has been efficient at a high volume. The model does not rate Jake Layman as a top prospect.
Here are the power forwards:
Frank Kaminsky, Bobby Portis and Kevon Looney top the power forwards. Kristaps Porzingis rates less highly than he did last year, after failing to make much progress in a chaotic season for his club in Spain, though his scouting stock hasn't yet suffered.
Finally, the centers:
There is a clear first tier with Jahlil Okafor and Karl Towns, and then a very talented second tier. Willie Cauley-Stein is somewhat lower in the model estimate than his DX rank, in part due to the limitations valuing defense, but also Cauley-Stein's age and relative lack of scoring.
I debuted my Player Tracking Plus Minus here last year as a Beta stat. Basically I have never stopped making changes and already put out one update on Nylon Calculus.
In terms of writing I try to find a balance between writing up every little finding, for whomever is actually interested, and just holding on to the end. My plan is to wait for the end of this season, sadly less than twenty games away to do a full 2.0 version. That will give me a better target variable in terms of reliability from last year and not have to worry about synchronizing any data updates from the different source I am using. But the defensive side is the most interesting and I thought I'd put some of the prep work out there in order to clarify my own thoughts and get any feedback.
Play by Play Fouling
As I alluded to in my fouling rate model the other day. I have been adding the specific foul data via NBA Miner to the model, and it appears to be a big improvement. The two specific foul types that consistently show value in cross validation are offensive fouls drawn as a positive and shooting fouls as a negative. Happily these are nice common sense terms that any basketball coach would recognize; defend without fouling and beat the offensive player to their spots.
The general term for offensive fouls dawn, which includes charges as a subset, is +1.26 per offensive foul drawn per 40 minutes. That is a high figure, higher than steals for example. So it's important to keep in mind what a orthogonal RAPM derived model is trying to do, which is to determine an estimate of player value not a description directly of the value of each play. Sometimes a model will pick up an attribute that seems to be associated with good or bad players, and may, in fact, act partially as a proxy for other actions not seen in the model. This is especially true on defense, where our data is fairly limited.
Blocks, prior to SportVU data, have been a proxy for rim protection, a decent, but flawed one. Steals in draft model especially have acted as proxies for spacial awareness and defensive engagement. Here I think that offensive fouls drawn act as a proxy for physicality on defense and defensive anticipation. Another interesting note is that there is no other term in the model with a strong direct correlation to drawing offensive fouls, so there is definitely some independent information being conveyed by a player's ability to draw offensive fouls, however that ability is interpreted.
Contesting three point shooting is noisy business, with little year over year relationship in the results, as I found here. And showing a coin flip like pattern in the results. Defending shots within the three point arc is still noisy, but there also seems to be some signal. So, for that reason I have used the percent plus minus on two point shots contested by the player. Even though including all shots improves in sample correlation to this year's defensive RAPM more, it is less likely to help predict next year's performance.
Lastly, the NBA version of defensive rating (DRTG), which is a descriptive statistic of how the team performed with the player on the floor, is used in as a second stage variable on the residual of the individual defensive statistics model. I have used that technique for a while, the good thing about this method is that it allows me to pick up the information from individual performance without being overwhelmed by correlation between DRTG and defensive RAPM, then allows me gleen the additional information from team performance rather than throwing that information away. It is also an improvement over using over all team performance that includes defensive performance when the player isn't even on the court.
One good piece of news is that the DRTG effect is just over two thirds of what it was in the older version with the new information added, so the model is able to attribute a bit more value to the discreet actions of the player.
My player updates at Nylon this year will continue to use the PT-PM as developed earlier, since it is confusing to change both the metric and the input data at the same time. But next year I will unleash the power of this fully operational player metric.
Quick post on an in process update of my draft model. Essentially I am working in the consensus high school ratings provided by the Recruiting Services Consensus Index (RSCI) for college basketball.
There are a couple of challenges to this, the first being that not every top pro prospect gets a RSCI rating, especially when looking at foreign prospects. Second, in the analysis I found that there is a quick degradation of the information conveyed by the ratings, which can reach into the hundreds. Essentially for NBA prospects I was only able to pick up a signal in the top twenty or so rankings, and even then a log scale transformation seemed to pick up information best in the training data. Keep in mind, for example, that the current favorite for the regular season MVP, Steph Curry, was not even rated by the RSCI service, because he wasn't considered good enough in high school to even rate.
Age entering the draft and RSCI rating are positively correlated. Essentially players who are highly rated coming out of high school are more likely to enter the draft earlier, because of the regard scouts and teams hold them in, and, therefore, the more money they can make by leaving. A less heralded player may both take more time to earn a starting position on the team and even with a similar freshman year performance would be more likely to stay in school as a first round selection would be far from guaranteed until a longer record of high performance is demonstrated.
Therefore, the biggest single difference, other than the new variable, between the RSCI-PAWS (beta) and the PAWS model is the age adjustment. The age coefficient falls be approximately 13% by including the RSCI ratings, which ends up aiding some non-blue chip rated prospects entering the draft a bit older.
In terms of draft order the most significant difference is Jahlil Okafor moving from third to first, and D'Angelo Russell moving from first to second. Willie Stein also moves up due, in part, to the lessened emphasis on age.
Mandatory Credit: David Butler II-USA TODAY Sports
Originally post on Nylon Calculus on 1/02/2015
As soon as the Celtics traded Rajon Rondo to the Dallas Mavericks some pundits announced the move as the start of the Marcus Smart era. However, in the first six games since the trade it has been less clear that any particular era has dawned, certainly at the point guard position.
Celtics Coach Brad Stevens has started three different players at point guard in the six games: Evan Turner, Jameer Nelson and Smart. Stevens returned to Turner on a New Year's Eve matinee after Smart's rough start in Washington, which was perhaps his worst game as a pro so far. Smart was pulled early in the first quarter after four quick turnovers. With that short hook, Smart has averaged just 24.2 minutes in since the trade.
As soon as the Celtics took Smart with the sixth pick in the draft there were superficial comparisons made between Rondo and Smart—players who played point guard in college with a reputation for defense and a somewhat shaky jump shot. However, at the point guard position there is a tremendous difference in the two player's passing and distribution level. Rondo, both stylistically and statistically, is one of the best passers in the NBA, while Smart is, at this point, essentially a combo guard. Smart's 4.2 assists per 36 minutes of play rank him at 76th out of guards with at least 250 minutes played this year, per Basketball Reference, in the same range as combo guard types like Victor Oladipo and O.J. Mayo. His assist rate of 5.0 per 36 since the trade would rank him at 59th. His Assist% numbers tell largely the same story.
There are a couple of ways to illustrate this, using simple assists per 36 minutes, assist percentage, points created by assist per 48, and percent of two point shots assisted, in nearly all measures Rondo is elite, while Smart has performed (and been used) more as a combo guard. Below are those measures for Rondo, Smart and rest of the Celtics rotation players.
Smart's assist numbers are clearly below the more traditional point guards like Rondo, Phil Pressey, and Nelson. His points created by assist per 36 minutes (scaled to match the Basketball Reference numbers), are about one third of Rondo and Nelson and less than Turner, who, for now, has won the starting point guard position. Jared Sullinger and Kelly Olynyk show up as plus passers for their positions, but still below typical guard level.
Another statistic I like to look at in terms of ball handling aptitude and responsibilities, which is in the far right column above, is the percent of a player's made two point shots that were assisted. Last year I used that stat both to group guards into 'shot creation' clusters and to project that Jordan Crawford would grab the starting point guard spot until Rondo returned from his knee injury. In terms of percent of two pointers assisted, Smart's numbers are in line with typical 'creator' point guards. But, in the second most telling measure I used, he is only attempting shots at the rim at a rate of about 2.5 attempts per 48 would have been more in line with a being a 'Spot Up' point guard, which SportVU data indicates is mostly because of his lack of drive attempts at the basket.
Drives and Line Up Balance
In half court sets the ability of a team to break down a defense off the dribble in drives to the basket is a valuable weapon, For example, Steven Shea, at his blog Basketball Analytics Book, found that team Drive efficiency explained 63% of total offensive efficiency last year and combined with corner three efficiency explains most of offensive efficiency. When Rondo was on the team he ran the drive game of the starting line up along with Jeff Green.
Thus far Smart has not shown much drive game at the NBA level, which is probably the biggest concern in his developmental progress. While there is no rule that drives need to be performed by the point guard, if Smart is injected into the Celtics starting back court with Avery Bradley there may not be much other choice. Below is a table with the Celtics' drive statistics from NBA.com to highlight the fit issues Stevens is dealing with fitting together a back court.
Rondo, Pressey, and Nelson have all averaged nearly ten drives per 48 minutes with the Celtics, with Turner driving just below seven times. Bradley averages 2.6, fewer drives per 48 minutes than either Olynyk or Sullinger, while Smart's 3.5 is just about the same as Sullinger. With Smart and Bradley in the line up it's not clear the Celtics' starting line up would have enough dribble penetration on the court outside of Jeff Green, who drives well with the most efficient production per drive on the team, but with one of the highest percentage of points generated for himself rather than teammates. Added to this is that Smart has the lowest points generated per drive on the team.
Looking at these numbers it is tough to see how a Bradley and Smart starting back court would be successful offensively at this point in Smart's development. Putting Nelson in the starting line up might create too many defensive issues, especially along with Bradley since the Celtics like to have him sometimes cover the opposing point guard. That leaves the starting combo Stevens used on New Year's Eve, or starting Smart alongside a more creative back court mate.