Feature Selection from High Dimensions
Feature Selection from High Dimensions


It is a known challenge to select features and in its current state is more of an art than science as the approach can differ depending on the problem the data scientist is looking to solve. While there are methods such as regularization, recursive feature selection, or automated processes like Boruta, these models all perform differently depending on the type of algorithm used in the training process. With the rise of larger ensembles using a wide range of underlying models, using a singular feature selection process can lead to an underperforming model. This problem is especially prevalent with the rise of automated machine learning platforms using a variety of base models.

Automated processes like Boruta showed early promise as they were able to provide superior performance with Random Forests, but has some deficiencies including slow computation time: especially with high dimensional data. Regardless of the run time, Boruta does perform well on Random Forests, but performs poorly on other algorithms such as boosting or neural networks. Similar deficiencies occur with regularization on LASSO, elastic net, or ridge regressions in that they perform well on linear regressions, but poorly on other modern algorithms.

I am proposing and demonstrating a feature selection algorithm in a similar spirit to Boruta utilizing XGBoost as the base model. The algorithm runs in a fraction of the time it takes Boruta and has superior performance on a variety of datasets, including one of nearly twenty-two thousand features. These results hold up across a number of UCI Repository datasets. Evaluation results and timings will be shared along with the underlying code to be later converted into a library posted on CRAN.


Chase is currently a Data Scientist at Progressive Leasing in Draper, Utah working on variety of cool projects. Prior to the current position, he was an Assistant Professor of Finance and Economics at the University of South Carolina Upstate and holds a BS, MS, and PhD, all in Economics, from the University of Utah.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google