Sequence pattern mining (SPM) seeks to ﬁnd multiple items that commonly occur together in a speciﬁc order. One common assumption is that all of the relevant differences between items are captured through creating distinct items, e.g., if color matters then the same item in two different colors would have two items created, one for each color. In some domains, that is unrealistic. This paper makes two contributions. The ﬁrst extends SPM algorithms to allow item differentiation through attribute variables for domains with large numbers of items, e.g, by having one item with a variable with a color attribute rather than distinct items for each color. It demonstrates this by incorporating variables into Discontinuous Varied Order Sequence Mining (DVSM). The second contribution is the creation of Sequence Mining of Temporal Clusters (SMTC), a new SPM that addresses the interleaving issue common to SPM algorithms. Most SPM algorithms address interleaving by using a distance measure to separate co-occurring sequences. SMTC addresses interleaving by clustering all subsets of temporally close items and deferring the sequencing of mined patterns until the entire dataset if examined. Evaluation of the SPM algorithms on a digital forensics media analysis task results in a 96% reduction in terms to review, 100% detection of true positives and no false positives.
IEEE Transactions on Knowledge and Data Engineering
Okolica, J. S., Peterson, G. L., Mills, R. F., & Grimaila, M. R. (2020). Sequence Pattern Mining with Variables. IEEE Transactions on Knowledge and Data Engineering, 32(1), 177–187. https://doi.org/10.1109/TKDE.2018.2881675
Artificial Intelligence and Robotics Commons, Information Security Commons, Theory and Algorithms Commons
This record provides the accepted pre-final version of the article.
The publisher's final version of record is available at IEEEXplore, as cited below.
(c) 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.