Faculty Publications

Multicollinearity Applied Stepwise Stochastic Imputation: A Large Dataset Imputation through Correlation‑based Regression

Benjamin D. Leiby
Darryl K. Ahner, Air Force Institute of TechnologyFollow

Document Type

Article

Publication Date

2-15-2023

Abstract

This paper presents a stochastic imputation approach for large datasets using a correlation selection methodology when preferred commercial packages struggle to iterate due to numerical problems. A variable range-based guard rail modification is proposed that benefits the convergence rate of data elements while simultaneously providing increased confidence in the plausibility of the imputations. A large country conflict dataset motivates the search to impute missing values well over a common threshold of 20% missingness. The Multicollinearity Applied Stepwise Stochastic imputation methodology (MASS-impute) capitalizes on correlation between variables within the dataset and uses model residuals to estimate unknown values. Examination of the methodology provides insight toward choosing linear or nonlinear modeling terms. Tailorable tolerances exploit residual information to fit each data element. The methodology evaluation includes observing computation time, model fit, and the comparison of known values to replaced values created through imputation. Overall, the methodology provides useable and defendable results in imputing missing elements of a country conflict dataset.

Comments

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

DOI

10.1186/s40537-023-00698-4

Source Publication

Journal of Big Data

Recommended Citation

Leiby, B.D., Ahner, D.K. Multicollinearity applied stepwise stochastic imputation: a large dataset imputation through correlation-based regression. J Big Data 10, 23 (2023). https://doi.org/10.1186/s40537-023-00698-4

Download

Included in

Data Science Commons, Statistics and Probability Commons

COinS

Faculty Publications

Multicollinearity Applied Stepwise Stochastic Imputation: A Large Dataset Imputation through Correlation‑based Regression

Document Type

Publication Date

Abstract

Comments

DOI

Source Publication

Recommended Citation

Included in

Search

Browse

Author Corner

Faculty Publications

Multicollinearity Applied Stepwise Stochastic Imputation: A Large Dataset Imputation through Correlation‑based Regression

Authors

Document Type

Publication Date

Abstract

Comments

DOI

Source Publication

Recommended Citation

Included in

Share

Search

Browse

Author Corner