Document Type


Publication Date



Sequence pattern mining (SPM) seeks to find multiple items that commonly occur together in a specific order. One common assumption is that all of the relevant differences between items are captured through creating distinct items, e.g., if color matters then the same item in two different colors would have two items created, one for each color. In some domains, that is unrealistic. This paper makes two contributions. The first extends SPM algorithms to allow item differentiation through attribute variables for domains with large numbers of items, e.g, by having one item with a variable with a color attribute rather than distinct items for each color. It demonstrates this by incorporating variables into Discontinuous Varied Order Sequence Mining (DVSM). The second contribution is the creation of Sequence Mining of Temporal Clusters (SMTC), a new SPM that addresses the interleaving issue common to SPM algorithms. Most SPM algorithms address interleaving by using a distance measure to separate co-occurring sequences. SMTC addresses interleaving by clustering all subsets of temporally close items and deferring the sequencing of mined patterns until the entire dataset if examined. Evaluation of the SPM algorithms on a digital forensics media analysis task results in a 96% reduction in terms to review, 100% detection of true positives and no false positives.


Publisher version at: