现在的位置: 首页 > 综合 > 正文

[学习笔记] Frequent pattern mining: current status and future directions (DMKD, 2007)

2012年12月26日 ⁄ 综合 ⁄ 共 1798字 ⁄ 字号 评论关闭

这篇是学习笔记, 摘录在这里只是为了方便自己查阅.
Jiawei Han, Hong Cheng, Dong Xin, Xifeng Yan: Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15(1): 55-86 (2007)

1. frequent pattern的种类
frequent itemsets: 无序
(frequent) sequential pattern: 有序
(frequent) structural pattern: 结构感知, 如子图, 子树等

2. 三种基本的frequent itemset mining方法
(1) Apriori principle
Apriori: a downward closure property, a k-itemset is frequent only if all of its sub-itemsets are frequent.
horizontal data format
(2) FP-growth
FP-tree: frequent pattern tree
horizontal data format
(3) eclat
vertical data format

3. closed frequent pattern
a pattern a is a closed frequent pattern in a data set in D if
(1) a is frequent
(2) there exists no proper super-pattern b such that b has the same support as a in D

maximal frequent pattern (max-pattern)
(1) a is frequent
(2) there exists no super-pattern b such that aclip_image002b
(3) b is frequent in D

4. Sequential pattern mining
常见的几种算法
GSP: A Sequential Pattern Mining Algorithm Based on Candidate Generate-and-Test
SPADE: An Apriori-Based Vertical Data Format Sequential Pattern Mining Algorithm
PrefixSpan: Prefix-Projected Sequential Pattern Growth

性能比较: PrefixSpan > SPADE > GSP
当frequent subsequences得数量比较大时, 三个算法的速度都变慢.

5. frequent substructures mining
Apriori-based approach
"The search for frequent graphs starts with graphs of small "size", and proceeds in a bottom-up manner".
AGM, FSG are all of this kind
Pattern-growth approach

6. Mining interesting frequent patterns
Constraint-based mining: efficient mining only the patterns that satisfy user-specified constraints.
Categories of constraints:
    succinct constraints     anti-monotonic constraints    monotonic constraints    convertible constraints: 

For constraint-based mining in the context of sequential pattern mining, refer to
    Garofalakis M, Rastogi R, Shim K (1999) SPIRIT: Sequential pattern mining with regular expression constraints. VLDB99
    Pei J, Han J, Wang W (2002) Constraint-based sequential pattern mining in large databases. CIKM02

Memo
1. frequent substructures mining里关于DAG mining的材料

抱歉!评论已关闭.