以位元序列為基礎探勘容錯重複樣式之研究

No Thumbnail Available

Date

2004

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

本論文提出一個有效率的方法對資料序列探勘出前K個非顯然且滿足最小長度限制的容錯重複樣式。我們擴展出現位元序列的表示法,設計出容錯出現位元序列,在考慮有插入或刪除錯誤的容錯情況下,用來表示候選樣式在資料序列中的出現位置。本論文提出二個演算法,分別命名為TFTRP-Mine (Top-K non-trivial FT-RPs Mining)及RE-TFTRP-Mine (REfinement of TFTRP-Mine)。兩個演算法皆根據論文中歸納出的遞迴公式,可系統性地計算出一個樣式的容錯出現位元序列,因而很有效率地得到每個侯選樣式的容錯出現次數。此外,RE-TFTRP-Mine演算法採用額外兩個技巧來砍除搜尋空間以加快探勘效率。由實驗結果得知,當K及min_len的值較小時,RE-TFTRP-Mine比TFTRP-Mine有較好的執行效率;而由實際的樂曲資料之實驗顯示,當探勘過程中有考慮容錯比對時,可以找出更多重要且隱藏的重複樣式。
An efficient way of mining top-K non-trivial fault-tolerant repeating patterns (FT-RPs in short) with length no less than min_len for data sequences is proposed in this thesis. By extending the idea of appearing bit sequences, fault-tolerant appearing bit sequences are defined to represent the locations where candidate patterns appear in a data sequence with insertion/deletion errors allowed. Two algorithms, named TFTRP-Mine (Top-K non-trivial FT-RPs Mining) and RE-TFTRP-Mine (REfinement of TFTRP-Mine), respectively, are proposed. Both of two algorithms use the recursive formulas to obtain fault-tolerant appearing bit sequences of a pattern systematically and then the fault-tolerant frequency of each candidate pattern could be counted quickly. Besides, RE-TFTRP-Mine adopts two additional strategies to prune the searching space in order to increase the mining efficiency. From experiment results, we can know that RE-TFTRP-Mine outperforms TFTRP-Mine algorithm when K and min_len are small. In addition, when adopting fault tolerant mining, more important and implicit repeating patterns could be found for music objects.

Description

Keywords

重複樣式, 容錯探勘, 位元序列, 前K個樣式探勘, 資料序列, Repeating Patterns, Fault-Tolerant Mining, Bit Sequences, Top-K patterns Mining, Data Sequence

Citation

Collections