Given that the free agency period is winding down I decided to check in on the performance of the free agency models I built. Using data from the market over the last three years I built two different models to predict the average annual valuation of the contracts for this year’s free agent crop. One is a regression model. The other is a Bayesian machine learning variation of a random forest model.

The primary factors in both models at predicting the AAV were Win Shares, Age, Usage, cap spike and playing time. For both models I used the Percent of the Salary Cap for the first contract year as the target variable. The benefit of the regression model is that it gives straight forward coefficients, while the ML model gives the “importance” of each variable. However, the R stats package I worked with also provides a partial dependence graph that gives an idea of the shape and direction of each variable’s influence in the context of the model. (Context of model is important since the partial dependence is shaped by the other variables included in the model as well as the sample being modeled)

Below are a couple of the more interesting variables in the ML model.

The age variable, for example, shows age with little effect on the percent of cap on the player’s contract until he hits twenty-nine, and then it declines quickly.

Win Shares also shows a nonlinear pattern in the model, taking off at around two.

And while minutes played looks like it’s relatively linear, being a starter takes one jump right around 41 games started, then stays flat.

In addition to learning about the free agency market, part of my motivation was to expand my modeling skills and experiment with ML model. And, of course, I was interested to see which one would get better results out of sample. In terms of measuring overall success I used a simple mean absolute error (MAE), which is the average error regardless of the direction of the error. The ML model has so far slightly out performed the regression model, with a MAE of 3.4 million dollars for the regression and 3.3 for the ML. But, as it turns out the error of the models averaged together is slightly better than the either at 3.2.

In an overall perspective the error on simply guessing that every player gets the average contract is 7.7 million dollars per player, so the models net a decent improvement.

But it does look like there are some systematic errors between the model and this year’s market. To start, so far the model has overestimated the contracts of centers and underestimated the contracts of point guards on average. Below is the blend of the two models plotted by position.

Whether that is a part of the league’s continuing evolution, or a reflection on this year’s free agent group is tough to say.

I then looked at the residuals compared to out of sample individual statistics I found that Usage was still undervalued and age and blocks were overvalued. Though 40 year old Vince Carter’s one year $8 million deal seems to be more or less responsible for age affect.

Lastly, there are the individual outliers. In the cases where the model is much lower than the player’s contract it’s not clear if it’s a poor projection by the model, or an overpay by the team. Last year there were cases like Timofey Mozgov that proved to be a warning of an overpay. However, the models just give a rough baseline of where the market may fall. This year one of the biggest "overpays" via the model was Stephen Curry, who is not only part of the undervalued point guard class, but receiving a Super Max contract that did not exist in the training data. The other two "overpays" via the model are short term contracts that probably are a bit high on a per year basis, but that is purposely mitigated by attaching fewer contract years. JJ Redick and Paul Milsap, The last two could potentially be a bit more concerning for the signing team given the length of contract, Blake Griffin and Jrue Holiday. Both were projected to be about $9 million lower than their AAV, and were given five year deals to stay with teams that had little to no leverage.

The best value contracts (or where the model was the most over), were Luc Mbah Moute and Ersan Illyasova at around $7 million less than projected. The link to the list is attached here.