Synthetic data generation with machine learning for network intrusion detection systems
Document Type
Conference Proceeding
Publication Date
7-4-2019
Abstract
Machine learning is becoming an integral part of cybersecurity today, particularly in the area of network anomaly detection. However, machine learning techniques require large volumes of data to be effective. Although there are some datasets available for training Network Intrusion Detection Systems (NIDS), many of them are outdated or do not contain enough useful information for training/classification of NIDS. Therefore, generating synthetic data that is realistic is imperative for training effective intrusion detection systems. Currently, the most common methods for generating synthetic data are simulation or emulation through a software package like OPNET, and then machine learning is used to analyze the dataset for correctness. This paper argues for an approach to utilize machine learning to develop models in order to generate the datasets themselves for NIDS, which is an approach that is not commonly used. In this paper, we discuss some of the well-known available datasets, the features that make up a good dataset, the reasons for utilizing generative modeling to synthesize network data and lay out a basic approach to developing generative models for synthetic data by leveraging machine learning. © 2019, Curran Associates Inc. All rights reserved.
Source Publication
European Conference on Information Warfare and Security, ECCWS 2019
Recommended Citation
Newlin, M., Reith, M. G., & Deyoung, M. (2019). Synthetic data generation with machine learning for network intrusion detection systems. European Conference on Information Warfare and Security, ECCWS 2019, 785–789.
Comments
Co-author M. Newlin was an AFIT graduate student (M.S. in Cyber Operations) at the time of this conference. (AFIT-ENG-MS-20-M-048, March 2020)