Document Type


Publication Date



Sequence pattern mining (SPM) seeks to find multiple items that commonly occur together in a specific order. One common assumption is that all of the relevant differences between items are captured through creating distinct items, e.g., if color matters then the same item in two different colors would have two items created, one for each color. In some domains, that is unrealistic. This paper makes two contributions. The first extends SPM algorithms to allow item differentiation through attribute variables for domains with large numbers of items, e.g, by having one item with a variable with a color attribute rather than distinct items for each color. It demonstrates this by incorporating variables into Discontinuous Varied Order Sequence Mining (DVSM). The second contribution is the creation of Sequence Mining of Temporal Clusters (SMTC), a new SPM that addresses the interleaving issue common to SPM algorithms. Most SPM algorithms address interleaving by using a distance measure to separate co-occurring sequences. SMTC addresses interleaving by clustering all subsets of temporally close items and deferring the sequencing of mined patterns until the entire dataset if examined. Evaluation of the SPM algorithms on a digital forensics media analysis task results in a 96% reduction in terms to review, 100% detection of true positives and no false positives.


This record provides the accepted pre-final version of the article.

The publisher's final version of record is available at IEEEXplore, as cited below.

(c) 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.



Source Publication

IEEE Transactions on Knowledge and Data Engineering