網頁點選資料流中最近瀏覽樣式探勘方法之研究
No Thumbnail Available
Date
2006
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
摘要
網頁點選資料流中最近瀏覽樣式探勘方法之研究
謝俊緯
從歷史資料中探勘出的常見瀏覽樣式代表長期的現象,未必能反應最近的趨勢,通常網站經營者對最近使用者的瀏覽樣式會比較感興趣,因此本論文提出從網頁點選資料流中探勘最近封閉常見瀏覽樣式的方法,稱為RPTP(mining Recent Path Traversal Patterns on webclick streams)演算法,其採用滑動視窗及Lossy Counting方法的觀念,只保留最近固定數目之連接記錄中的常見及潛在常見瀏覽樣式,因此能以動態探勘方式,有效率地從網頁點選資料流中探勘出瀏覽樣式。本方法並未保留原始資料,只需記錄最近常見瀏覽樣式與最近潛在瀏覽樣式資訊。此外,本論文方法探討從儲存結構中有效率探勘出封閉瀏覽樣式的技術,以避免探勘結果中的重覆資訊,讓探勘使用者能夠更容易地分析結果。我們並結合封閉樣式的觀念,減少所需儲存樣式的數量。由實驗結果顯示,本方法可在合理的儲存空間下需求下快速進行最近常見瀏覽樣式探勘,且和相關論文相較,可較快速反應出資料流中最近常見瀏覽樣式的改變。
Abstract Mining Recent Path Traversal Patterns on Webclick Streams by Chun-Wei Hsieh Frequent traversal patterns extracted from the history data represent the mining results of long term but not necessary the recent trend. However, the web administrators are usually interesting in the traversal path of recent users. Therefore, an algorithm, called RPTP, for mining recent path traversal patterns on webclick streams is proposed in this thesis. In our approach, the lossy counting techniques are applied to maintain frequent and semi-frequent patterns in a sliding window of recent user sessions. Hence, frequent patterns on webclick streams are discovered efficiently in a dynamic way. It is not necessary for RPTP to store the original data. Instead, the appearing information of recent frequent and semi-frequent patterns is recorded. Moreover, the strategies for mining closed frequent patterns from the constructed data structures are provided to avoid generating redundant information in the mining result. Accordingly, the concept of closed patterns is applied to reduce the number of maintained patterns. The experimental results show that the RPTP achieves an efficient execution time under a reasonable memory requirement. Furthermore, by comparing with the related work, RPTP provides a shorter response time to reflect the change of frequent traversal patterns on webclick streams.
Abstract Mining Recent Path Traversal Patterns on Webclick Streams by Chun-Wei Hsieh Frequent traversal patterns extracted from the history data represent the mining results of long term but not necessary the recent trend. However, the web administrators are usually interesting in the traversal path of recent users. Therefore, an algorithm, called RPTP, for mining recent path traversal patterns on webclick streams is proposed in this thesis. In our approach, the lossy counting techniques are applied to maintain frequent and semi-frequent patterns in a sliding window of recent user sessions. Hence, frequent patterns on webclick streams are discovered efficiently in a dynamic way. It is not necessary for RPTP to store the original data. Instead, the appearing information of recent frequent and semi-frequent patterns is recorded. Moreover, the strategies for mining closed frequent patterns from the constructed data structures are provided to avoid generating redundant information in the mining result. Accordingly, the concept of closed patterns is applied to reduce the number of maintained patterns. The experimental results show that the RPTP achieves an efficient execution time under a reasonable memory requirement. Furthermore, by comparing with the related work, RPTP provides a shorter response time to reflect the change of frequent traversal patterns on webclick streams.
Description
Keywords
資料探勘, 資料流, 瀏覽樣式, Data Mining, Data Streams, Path Traversal Patterns