In security-related areas there is concern over novel “zero-day” attacks that penetrate system defenses and wreak havoc. The best methods for countering these threats are recognizing “nonself” as in an Artificial Immune System or recognizing “self” through clustering. For either case, the concern remains that something that appears similar to self could be missed. Given this situation, one could incorrectly assume that a preference for a tighter fit to self over generalizability is important for false positive reduction in this type of learning problem. This article confirms that in anomaly detection as in other forms of classification a tight fit, although important, does not supersede model generality. This is shown using three systems each with a different geometric bias in the decision space. The first two use spherical and ellipsoid clusters with a k-means algorithm modified to work on the one-class/blind classification problem. The third is based on wrapping the self points with a multidimensional convex hull (polytope) algorithm capable of learning disjunctive concepts via a thresholding constant. All three of these algorithms are tested using the Voting dataset from the UCI Machine Learning Repository, the MIT Lincoln Labs intrusion detection dataset, and the lossy-compressed steganalysis domain.
Knowledge and Information Systems
Peterson, G. L., & McBride, B. T. (2008). The importance of generalizability for anomaly detection. Knowledge and Information Systems, 14(3), 377–392. https://doi.org/10.1007/s10115-007-0072-8