Document Type

Conference Proceeding

Publication Date



The majority of cyber infiltration & exfiltration intrusions leave a network footprint, and due to the multi-faceted nature of detecting network intrusions, it is often difficult to detect. In this work a Zeek-processed PCAP dataset containing the metadata of 36,667 network packets was modeled with several machine learning algorithms to classify normal vs. anomalous network activity. Principal component analysis with a 10% contamination factor was used to identify anomalous behavior. Models were created using recursive feature elimination on logistic regression and XGBClassifier algorithms, and also using Bayesian and bandit optimization of neural network hyperparameters. These models were trained on a dataset with numeric features, and also with the addition of categorical variables related to connection state, transport state and application protocol. The XGBClassifier algorithm generated near-perfect models with an f1 metric on the train/test dataset of 0.994 or better for both datasets. The mean accuracy of the best model on each dataset was 99.9%, which compares favorably to prior machine learning-assisted NIDS work that possessed a mean accuracy of 96.8%. The addition of state and protocol variables gave a slight improvement to modeling, and the XGBClassifier algorithm showed the best model performance. Notably, when recursive feature elimination was applied to best model, performance was sustained with the removal of transport layer protocol information.


The authors declare this is a work of the U.S. Government and is not subject to copyright protections in the United States.

Conference location: Las Vegas, NV, July 25-28, 2022

Source Publication

World Congress in Computer Science, Computer Engineering, and Applied Computing (CSCE 2022)