Date of Award

3-2022

Document Type

Thesis

Degree Name

Master of Science

Department

Department of Operational Sciences

First Advisor

Mark A. Gallagher, PhD

Abstract

Feature selection may be summarized as identifying salient features to a given response. Understanding which features affect the response enables, in the future, only collecting consequential data; hence, the feature selection algorithm may lead to saving effort spent collecting data, storage resources, as well as computational resources for making predictions. We propose a generalized approach to select the salient features of data sets. Our approach may also be applied to unsupervised datasets to understand which data streams provide unique information. We contend our approach identifies salient features robust to the sub-sequent predictive model applied. The proposed algorithm considers all provided variables, square variables, and two-way interactions as an extended data set. The algorithm implements a forward selection approach, based on correlation with the response, while fitting deep neural networks to the selected variables. These deep neural networks maintain an adaptive architecture which mirrors a full factorial design. These networks assess numeric and categorical values for both features and responses. Implementing this approach in ensemble with Recursive Feature Elimination we establish a new Pareto Frontier, consisting solely of this technique, for the Wisconsin Breast Cancer problem instance. This Pareto Frontier highlights our ensemble approach as the best performing method in both feature reduction and predictive accuracy.

AFIT Designator

AFIT-ENS-MS-22-M-148

DTIC Accession Number

AD1172360

Included in

Data Science Commons

Share

COinS