Firewalls, especially at large organizations, process high velocity internet traffic and flag suspicious events and activities. Flagged events can be benign, such as misconfigured routers, or malignant, such as a hacker trying to gain access to a specific computer. Confounding this is that flagged events are not always obvious in their danger and the high velocity nature of the problem. Current work in firewall log analysis is manual intensive and involves manpower hours to find events to investigate. This is predominantly achieved by manually sorting firewall and intrusion detection/prevention system log data. This work aims to improve the ability of analysts to find events for cyber forensics analysis. A tabulated vector approach is proposed to create meaningful state vectors from time-oriented blocks. Multivariate and graphical analysis is then used to analyze state vectors in human–machine collaborative interface. Statistical tools, such as the Mahalanobis distance, factor analysis, and histogram matrices, are employed for outlier detection. This research also introduces the breakdown distance heuristic as a decomposition of the Mahalanobis distance, by indicating which variables contributed most to its value. This work further explores the application of the tabulated vector approach methodology on collected firewall logs. Lastly, the analytic methodologies employed are integrated into embedded analytic tools so that cyber analysts on the front-line can efficiently deploy the anomaly detection capabilities.
Journal of Algorithms and Computational Technology
Gutierrez, R. J., Bauer, K. W., Boehmke, B. C., Saie, C. M., & Bihl, T. J. (2018). Cyber anomaly detection: Using tabulated vectors and embedded analytics for efficient data mining. Journal of Algorithms & Computational Technology, 12(4), 293–310. https://doi.org/10.1177/1748301818791503