Date of Award

3-2025

Document Type

Thesis

Degree Name

Master of Science in Cost Analysis

Department

Department of Systems Engineering and Management

First Advisor

Edward D. White III, PhD

Abstract

Accurately estimating software costs is critical for effective project management within the Department of Defense (DoD), where early decisions shape resource allocation and risk management. This work evaluates regression-based Cost Estimating Relationships (CERs), probabilistic models, and machine learning techniques to address limitations of traditional estimation methods. Using records from two DoD repositories, the analysis applied Ordinary Least Squares (OLS) regression, Multinomial Logistic Regression (MLR), Random Forest, and neural networks to model and classify software costs, with key predictors including Source Lines of Code (SLOC), Equivalent Source Lines of Code (ESLOC), and programming hours. The findings highlight strengths and trade-offs of each methodology. Regression-based CERs delivered reliable, interpretable results (Adjusted R² ˜ 0.62). Probabilistic models categorized costs flexibly with strong AUC values (>0.90) for high- and low-cost classifications. Random Forest captured non-linear relationships with strong predictive performance (R² = 0.73), while neural networks, constrained by limited data, showed minimal advantage over regression. This analysis underscores the importance of aligning methodological complexity with practical constraints, especially in data-limited contexts. By integrating regression, probabilistic, and machine learning approaches, it advances cost estimation practices and supports future innovations to meet evolving defense software project demands.

AFIT Designator

AFIT-ENV-MS-25-M-073

Comments

An embargo was observed for this posting.

Approved for Public Release. PA Case Number on file.

Share

COinS