Date of Award

3-2025

Document Type

Thesis

Degree Name

Master of Science in Engineering Management

Department

Department of Systems Engineering and Management

First Advisor

Daniel J. Weeks, PhD

Abstract

The Department of Defense lost over 500 million dollars between 2016 and 2024, partially due to poor early cost estimates resulting in cost overruns. practice for cost estimation relied on parametric techniques that incorporate historical data, subject matter experts in cost estimating, and predictive software applications. The main motivation for this study was to assess the viability of artificial neural networks as a means of providing a more accurate cost estimate in the early design phases of a construction project. The dataset initially contained approximately 48,000 data points from a database of various Air Force projects, including maintenance, repair, minor construction, and demolition. After data pre-processing, formatting, feature extraction, and feature imputation, the dataset reduced to 16,812 data points. While features were removed from the original dataset, initially reducing the amount of datapoints, the feature size increased largely due to one-hot encoding of categorical features, adding many more columns to the data frame. To inspect potential numerical features, a heat map analysis was conducted. The highest correlation occurred between the number of stories and facility height, with a Pearson Correlation Coefficient of 0.62. Additionally, Fiscal Year and CP_ID showed a Pearson Correlation Coefficient of 0.6. For comparison, a trivial linear regression model was constructed as a baseline for the other regression models. Additionally, both a classical linear regression model and a random forest regression model were developed using recursive feature elimination with cross validation. The trivial model achieved a coefficient of determination of -0.0007, the classical model reached 0.0887, and the random forest model obtained 0.8502. In contrast, the best neural network model produced an R² of 0.03. Poor data quality and limited data quantity posed significant challenges in implementing an ANN capable of delivering high-quality early-phase cost estimates.

AFIT Designator

AFIT-ENV-MS-25-M-098

Comments

An embargo was observed for this posting.

Approved for public release, Distribution Unlimited. PA Case Number 88ABW-2025-0283

Share

COinS