Date of Award

3-5-2009

Document Type

Thesis

Degree Name

Master of Science in Computer Engineering

Department

Department of Electrical and Computer Engineering

First Advisor

Michael J. Mendenhall, PhD

Abstract

Identification of cyber attacks and network services is a robust field of study in the machine learning community. Less effort has been focused on understanding the domain space of real network data in identifying important features for cyber attack and network service classification. Motivations for such work allow for anomaly detection systems with less requirements on data “sniffed” off the network, extraction of features from the traffic, reduced learning time of algorithms, and ideally increased classification performance of anomalous behavior. This thesis evaluates the usefulness of a good feature subset for the general classification task of identifying cyber attacks and network services. The generality of the selected features elucidates the relevance or irrelevance of the feature set for the classification task of intrusion detection. Additionally, the thesis provides an extension to the Bhattacharyya method, which selects features by means of inter-class separability (Bhattacharyya coefficient). The extension for multiple class problems selects a minimal set of features with the best separability across all class pairs. Several feature selection algorithms (e.g., accuracy rate with genetic algorithm, RELIEF-F, GRLVQI, median Bhattacharyya and minimum surface Bhattacharyya methods) create feature subsets that describe the decision boundary for intrusion detection problems. The selected feature subsets maintain or improve the classification performance for at least three out of the four anomaly detectors (i.e., classifiers) under test. The feature subsets, which illustrate generality for the intrusion detection problem, range in size from 12 to 27 features. The original feature set consists of 248 features. Of the feature subsets demonstrating generality, the extension to the Bhattacharyya method generates the second smallest feature subset. This thesis quantitatively demonstrates that a relatively small feature set may be used for intrusion detection with machine learning classifiers.

AFIT Designator

AFIT-GCE-ENG-09-02

DTIC Accession Number

ADA499600

Recommended Citation

Gonzalez, Jose Andres, "Numerical Analysis for Relevant Features in Intrusion Detection (NARFid)" (2009). Theses and Dissertations. 2533.
https://scholar.afit.edu/etd/2533

Download

Included in

Digital Communications and Networking Commons, Information Security Commons

COinS

Theses and Dissertations

Numerical Analysis for Relevant Features in Intrusion Detection (NARFid)

Date of Award

Document Type

Degree Name

Department

First Advisor

Abstract

AFIT Designator

DTIC Accession Number

Recommended Citation

Included in

Search

Browse

Author Corner

Theses and Dissertations

Numerical Analysis for Relevant Features in Intrusion Detection (NARFid)

Author

Date of Award

Document Type

Degree Name

Department

First Advisor

Abstract

AFIT Designator

DTIC Accession Number

Recommended Citation

Included in

Share

Search

Browse

Author Corner