One of the interesting things about creating draft models is that there is a choice of what independent variable to use and that often has a significant effect on the outcome of the model. For this run I chose to stick with my Alternative Win Score (AWS) box score metric, as that matches how the base model was built giving me an apples to apples comparison to start. And it and has the virtue of being relatively simple.
I chose to emphasize the 3rd and 4th years of a draftee playing in the NBA, in part, because that is the time when teams expect to start seeing results and are nearing the player's restricted fee agency period. Using the 4th year rather than waiting for the player's expected statistical peak also allowed me to look at players up to the 2009-2010 draft.
Playing time was the last issue to deal with in choosing an independent variable. Generally, I believe that playing time conveys some information, certainly about the coaching staff's evaluation of the player, and that has been found in studies like Daniel Myers' study to build his
ASPM metric, which attributed some independent contribution to court time looking at long term RAPM. On the other hand John Hollinger found in a
2004 Basketball Prospectus that player's box score performance is largely stable per minute as time increases.
To put it more simply, coaches are efficient enough to make LeBron James a starter and give him starter's minutes, but when we talk about more middling players there are a number of issues that may significantly affect playing time that are, at best, marginally related to the players' abilities, including roster depth at the position, injuries, fit as a compliment to star players or the organization's investment in the player (IE, even if a lottery pick and a second rounder exhibit similar production the coach may have incentive to start the lottery pick to justify the draft selection, thinking specifically of Harrison Barnes and Draymond Green here).
So my independent AWS variable was stabilized using a statistical prior at a replacement level of 1.5 AWS per 40 for 200 minutes that loses weight as the player performs at a higher level for more time. Players that were out of the league by their fourth year were assumed to be replacement level, which adds a significant amount of information to the model as opposed to leaving them out.
The Results
With my independent variable chosen I was able to do some testing of my base model. The two things I was particularly interested in looking at were
my league adjustments for non-NCAA prospects and the age adjustment I have been using developed looking at the delta's as player's age in the NCAA and NBA. The league adjustment held up nicely in multiple configurations. (I re-ran the model multiple times holding specific years out of the sample and ran boosting and bagging models).
The age issue was more mixed, with the overall evidence indicating that my current age adjustment may be too conservative. In addition, assists and offensive rebounds were fairly consistently tagged as under weighted in the base model and two point attempts has some indication of being under penalized.
All that said, the model continues to pick some candidates that are not lower down on the most mock draft boards right now, though the favorites mostly show respectably. The model continues to not love Andrew Wiggins, and there's not much I can do to make it love him. On the other hand, in all configurations the model continues to love 19 year old Swiss big man Clint Capela and UCLA star Jordan Adams.
Below are the top ten players according to the test runs of the Base model and the link to a Google doc with the top thirty (0r so) prospects.
Rank |
Name |
League |
Team |
1 |
Clint Capela |
French |
Chalon |
2 |
Jordan Adams |
NCAA |
UCLA |
3 |
Jabari Parker |
NCAA |
Duke |
4 |
Jarnell Stokes |
NCAA |
Tennessee |
5 |
Jusuf Nurkic |
Adriatic |
Cedevita |
6 |
Kyle Anderson |
NCAA |
UCLA |
7 |
Bobby Portis |
NCAA |
Arkansas |
8 |
Montrezl Harrell |
NCAA |
Louisville |
9 |
Noah Vonleh |
NCAA |
Indiana |
10 |
Marcus Smart |
NCAA |
Oklahoma State |
In the spreadsheet the Base Model scores are reorder based on the NBA translation predicted by the test models, which are shown in a standardized scale:
Here is the Google Doc
Comments