Date of Award
3-24-2016
Document Type
Thesis
Degree Name
Master of Science
Department
Department of Operational Sciences
First Advisor
Kenneth W. Bauer, Jr., PhD.
Abstract
Despite the considerable academic interest in using machine learning methods to detect cyber attacks and malicious network traffic, there is little evidence that modern organizations employ such systems. Due to the targeted nature of attacks and cybercriminals’ constantly changing behavior, valid observations of attack traffic suitable for training a classifier are extremely rare. Rare positive cases combined with the fact that the overwhelming majority of network traffic is benign create an extreme class imbalance problem. Using publically available datasets, this research examines the class imbalance problem by using small samples of the attack observations to create multiple training sets that reflect a realistic class imbalance. A variety of techniques to alleviate the imbalance are examined including under sampling the majority class and three techniques to over sample the minority attack observations by creating new synthetic observations. We test these methods on four of the most popular machine learning classifiers. We examine two single model classifiers, artificial neural networks and support vector machines, and two ensemble methods, gradient boosting and random forests. We find that under sampling generally outperforms oversampling techniques and that the ensemble methods both outperform single models. We show that the apparent superiority of the ensemble methods may be illusory due to the “laboratory conditions” of using well-crafted public datasets. By introducing an element of noise into the training data, we show that neural networks’ robustness to noise make it the preferred approach in real world settings where the more sophisticated ensemble methods fail. We also present a technique where neural networks are used to select features from the noisy dataset that improve the performance of random forests and gradient boosting allowing for the creation of an improved ensemble classifier.
AFIT Designator
AFIT-ENS-MS-16-M-131
DTIC Accession Number
AD1054015
Recommended Citation
Walter, Russell W., "Methods to Address Extreme Class Imbalance in Machine Learning Based Network Intrusion Detection Systems" (2016). Theses and Dissertations. 380.
https://scholar.afit.edu/etd/380