Theses and Dissertations

Methods to Address Extreme Class Imbalance in Machine Learning Based Network Intrusion Detection Systems

Russell W. Walter

Date of Award

3-24-2016

Document Type

Thesis

Degree Name

Master of Science

Department

Department of Operational Sciences

First Advisor

Kenneth W. Bauer, Jr., PhD.

Abstract

Despite the considerable academic interest in using machine learning methods to detect cyber attacks and malicious network traffic, there is little evidence that modern organizations employ such systems. Due to the targeted nature of attacks and cybercriminals’ constantly changing behavior, valid observations of attack traffic suitable for training a classifier are extremely rare. Rare positive cases combined with the fact that the overwhelming majority of network traffic is benign create an extreme class imbalance problem. Using publically available datasets, this research examines the class imbalance problem by using small samples of the attack observations to create multiple training sets that reflect a realistic class imbalance. A variety of techniques to alleviate the imbalance are examined including under sampling the majority class and three techniques to over sample the minority attack observations by creating new synthetic observations. We test these methods on four of the most popular machine learning classifiers. We examine two single model classifiers, artificial neural networks and support vector machines, and two ensemble methods, gradient boosting and random forests. We find that under sampling generally outperforms oversampling techniques and that the ensemble methods both outperform single models. We show that the apparent superiority of the ensemble methods may be illusory due to the “laboratory conditions” of using well-crafted public datasets. By introducing an element of noise into the training data, we show that neural networks’ robustness to noise make it the preferred approach in real world settings where the more sophisticated ensemble methods fail. We also present a technique where neural networks are used to select features from the noisy dataset that improve the performance of random forests and gradient boosting allowing for the creation of an improved ensemble classifier.

AFIT Designator

AFIT-ENS-MS-16-M-131

DTIC Accession Number

AD1054015

Recommended Citation

Walter, Russell W., "Methods to Address Extreme Class Imbalance in Machine Learning Based Network Intrusion Detection Systems" (2016). Theses and Dissertations. 380.
https://scholar.afit.edu/etd/380

Download

Included in

Operational Research Commons

COinS

Theses and Dissertations

Methods to Address Extreme Class Imbalance in Machine Learning Based Network Intrusion Detection Systems

Date of Award

Document Type

Degree Name

Department

First Advisor

Abstract

AFIT Designator

DTIC Accession Number

Recommended Citation

Included in

Search

Browse

Author Corner

Theses and Dissertations

Methods to Address Extreme Class Imbalance in Machine Learning Based Network Intrusion Detection Systems

Author

Date of Award

Document Type

Degree Name

Department

First Advisor

Abstract

AFIT Designator

DTIC Accession Number

Recommended Citation

Included in

Share

Search

Browse

Author Corner