Seq2Pat: Sequence-to-Pattern Generation Library
Seq2Pat is a research library for sequence-to-pattern generation to discover sequential patterns that occur frequently in large sequence databases. The library supports constraint-based reasoning to specify desired properties over patterns.
From an algorithmic perspective, the library takes advantage of multi-valued decision diagrams. It is based on the state-of-the-art approach for sequential pattern mining from Hosseininasab et. al. AAAI 2019.
From an implementation perspective, the library is written in Cython
that brings together the efficiency of a low-level C++
backend and
the expressiveness of a high-level Python
public interface.
Seq2Pat is developed as a joint collaboration between Fidelity Investments and the Tepper School of Business at CMU.
Available Constraints
Average: This constraint specifies the average value of an attribute across all events in a pattern.
Gap: This constraint specifies the difference between the attribute values of every two consecutive events in a pattern.
Median: This constraint specifies the median value of an attribute across all events in a pattern.
Span: This constraint specifies the difference between the maximum and the minimum value of an attribute across all events in a pattern.
Quick Start
Constraint-based Sequential Pattern Mining
# Example to show how to find frequent sequential patterns
# from a given sequence database subject to constraints
from sequential.seq2pat import Seq2Pat, Attribute
# Seq2Pat over 3 sequences
seq2pat = Seq2Pat(sequences=[["A", "A", "B", "A", "D"],
["C", "B", "A"],
["C", "A", "C", "D"]])
# Price attribute corresponding to each item
price = Attribute(values=[[5, 5, 3, 8, 2],
[1, 3, 3],
[4, 5, 2, 1]])
# Average price constraint
seq2pat.add_constraint(3 <= price.average() <= 4)
# Patterns that occur at least twice (A-D)
patterns = seq2pat.get_patterns(min_frequency=2)
Dichotomic Pattern Mining
# Example to show how to run Dichotomic Pattern Mining
# on sequences with positive and negative outcomes
from sequential.seq2pat import Seq2Pat
from sequential.pat2feat import Pat2Feat
from sequential.dpm import dichotomic_pattern_mining, DichotomicAggregation
# Create seq2pat model for positive sequences
sequences_pos = [["A", "A", "B", "A", "D"]]
seq2pat_pos = Seq2Pat(sequences=sequences_pos)
# Create seq2pat model for negative sequences
sequences_neg = [["C", "B", "A"], ["C", "A", "C", "D"]]
seq2pat_neg = Seq2Pat(sequences=sequences_neg)
# Run DPM to get mined patterns
aggregation_to_patterns = dichotomic_pattern_mining(seq2pat_pos, seq2pat_neg,
min_frequency_pos=1,
min_frequency_neg=2)
# DPM patterns with Union aggregation
dpm_patterns = aggregation_to_patterns[DichotomicAggregation.union]
# Encodings of all sequences
sequences = sequences_pos + sequences_neg
pat2feat = Pat2Feat()
encodings = pat2feat.get_features(sequences, dpm_patterns,
drop_pattern_frequency=False)
Source Code
The source code is hosted on GitHub.