![]() This will allow us to assess which predictors are useful for making predictions. Now, we can implement permutation feature importance by shuffling each predictor and recording the increase in RMSE. from trics import mean_squared_error rmse_full_mod = mean_squared_error(regr.predict(X_test), y_test, squared = False) Let’s calculate the RMSE of our model predictions and store it as rmse_full_mod. One commonly-used metric to assess the quality of regression predictions is root mean squared error (RMSE) evaluated on the test set. But, since this isn’t a guide on hyperparameter tuning, I am going to continue with this naive random forest model - it’ll be fine for illustrating the usefulness of permutation feature importance. from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestRegressor X = df.drop(columns = 'price') # One-hot encode color for sklearn X = (X = 'red') y = df.price # Train Test Split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42) # Instantiate a Random Forest Regressor regr = RandomForestRegressor(max_depth=100, random_state=0) # Fit a random forest regressor regr.fit(X_train, y_train)Īt this point, feel free to take some time to tune the hyperparameters of your random forest regressor. Then, we use sklearn to fit a simple random forest model. To do this, we split our data into a train and test dataset. We could use any black box model, but for the sake of this example, let’s train a random forest regressor. Say that we want to train a model to predict price from the other nine predictors. To help explain permutation feature importance more concretely, consider the following synthetic case study. Once you’ve computed feature importance scores for all of your features, you can rank them in terms of predictive usefulness.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |