Adjustment Methods for Support Vector Machines with Imbalanced Data

No Thumbnail Available

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

在各種的資料集中,資料不平衡是機器學習領域中常見的現象,可以明顯影響模型訓練的結果。在各種提出的眾多解決方案中,最常使用的方法是合成少數資料的過採樣技術(Synthesized Minority Oversampling Technique, SMOTE),它在解決資料不平衡的同時實現了高度準確的分類。在這篇研究中,我們通過設置不同的參數來生成隨機資料,從而平衡資料比例,探討支持向量機(Support Vector Machines, SVM)在分類不平衡資料時的結果,此方法與過採樣技術都是用生成資料,達到資料比例趨於平衡,以實驗結果來說,兩者達到相似的效果。此外我們利用二分搜索算法來改善原始SVM提供的結果,提高少數類的分類效果,二元搜尋法的SVM可以在不需要生成資料的情況下,得到更好得分類結果。最後,我們將結果與過採樣技術方法進行比較。實驗結果顯示,二元搜尋法的SVM可以使少數族群得到更好的分群效果,同時平衡資料比例的隨機資料生成方法,也可以在資料比例相近時提高分類結果。
In various real datasets, data imbalance is a common phenomenon that can significantly impact the outcomes of model training in the field of machine learning. Among the various proposed solutions, one of the most commonly used methods is Synthetic Minority Over-sampling Technique (SMOTE), which addresses data imbalances while achieving highly accurate classification.In this thesis, we explore Support Vector Machines (SVM) performance in classifying imbalanced datasets by setting different parameters to generate random data, thereby balancing the data distribution. Additionally, we utilize a Binary Search algorithm to fine-tune the results provided by the original SVM, enhancing the classification performance for the minority class.Finally, we compare the results with the SMOTE method. Experimental results indicate that the random data generation method, which balances the data distribution, can improve classification outcomes when data proportions are similar. Moreover, it achieves comparable classification performance to SMOTE.

Description

Keywords

支持向量機, 不平衡資料分類, 隨機資料生成方法, 過採樣技術, 二元搜尋法, Support Vector Machine, Random Data Generation Method, SMOTE Method, Imbalanced Data Clustering, Binary Search

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By