Jacob P. Batt

Date of Award


Document Type


Degree Name

Master of Science


Department of Operational Sciences

First Advisor

Raymond R. Hill, PhD


The Air Force must modernize, but the distribution of funds for technology remains as tight as ever. To this end, the Air Force Audit Agency is looking to utilize machine learning techniques to enhance their capabilities. This research explores Logistic Regression and Random Forest modeling to streamline data collection and cost classification. The final Logistic Regression model identified 4 significant attributes out of the 36 given and was 85 accurate in predicting whether a purchase amount was over or under $10,000. To expand beyond binary classification, a six-category classification Random Forest model was developed. It identified 6 significant attributes and was 34 accurate in in predicting whether a purchase was in 1 of 6 amount categories. Due to the class imbalance of the given data, it was necessary to use a class weighting and over-sampling technique to enhance the Random Forest model. The final class balanced model identified the same 6 significant attributes but was 78 accurate in predicting whether a purchase was in 1 of 6 amount categories. No models were able to predict whether a purchase should be classified as an information technology purchase of not.

AFIT Designator


DTIC Accession Number