基於關聯式規則在影響個股漲跌之財經新聞事件探勘之應用研究

Abstract

本論文提出一個股市消息面與數值面的研究系統,結合網際網路概念、資訊擷取(Data Crawler)、資料分析(Data Analyzer)、中文斷詞系統、K-means群聚演算法(K-means Clustering Algorithm)與資料探勘(Data Mining)等不同層面的技術,為的是要找出有關個股的新聞事件與個股的股市交易相互影響的隱含關聯規則,以提供一個具有參考價值的資訊。本系統透過網際網路擷取所需的各項資訊,並且儲存至資料庫之中。利用中文斷詞系統為資料庫中的每筆新聞事件標題找出關鍵字詞(Key Item),並針對每筆資料的關鍵字詞藉由相似度鑑別過濾相近的新聞事件。將所有漲跌幅度正規化(Normalization)後,利用K-means群聚演算法將漲跌幅度分群聚,使得關聯式規則(Association Rules)在這些群聚之中找出極大項目集合(Large Itemsets),藉由支持度(Support Level)與信賴度(Confidence Level)這兩個判斷條件,可以探勘出個股新聞事件與交易的隱含關聯規則,以提供使用者在股市交易上一個具有可信度的參考資訊。
This paper presents a data mining system which combines with the news and the trading of the stock. This system was built by many different kinds of technologies. It includes Internet, Data crawler, Data mining, K-means clustering algorithm and Association rules. We want that we can find the hiding rules between news item and trading by this data mining system. The data crawler agent of this system captures the information from he internet and stores it to database. The information of the database will be processed by producing the key items for each news title and filtering the similar news items by the threshold of similitude before we use these information data. In the data mining system of the stock will be the value of the fluctuation transferred into the normalization, and then we will make the price fluctuation as the clusters by the K-means clustering algorithm. The Association rules can be discovered by finding out the large items of these clusters. Finally, the system will provide accurate information by finding out the hiding rules from each cluster.

Description

Keywords

資料探勘, 關聯式規則, 資訊擷取, K-means群聚演算法, Data Mining, Association Rules, Data Crawler, K-means Clustering Algorithm

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By