分群演算法之超大型積體電路架構研究

No Thumbnail Available

Date

2012

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

本論文對於c-平均值(c-means)、競爭式學習(competitive learning)、模糊c-平均值(fuzzy c-means),以及帶空間約束之模糊c-平均值(fuzzy c-means with spatial constraint)等多種分群演算法分別提出硬體架構。這些架構皆已在場域可程式化閘陣列(Field Programmable Gate Array,FPGA)裝置上實作建構出適用於分群(clustering)的可程式化系統晶片(System on Programmable Chip,SOPC)系統。 由於分割(partitioning)與質心計算(centroid computation)等運算全為管線化運作,故本文所提出的c-平均值架構可同時處理多筆訓練向量(training vector)。查表式除法器(lookup table based divider)則用以減少面積成本及質心計算的延遲。 文中另提出兩種針對k贏家全取(k-winners-take-all,kWTA)操作的硬體實現。第一種架構,經由在小波域(wavelet domain)中執行部分距離搜尋(partial distance search,PDS)來找出關於每一個輸入向量的k個贏家。一種單純利用查表來做計算的硬體除法器則用以構成神經元的更新程序。部分距離搜尋模組及除法器均採取有限精度計算(finite precision calculation)來降低部分距離搜尋及硬體除法器的面積成本。另採用子空間搜尋(subspace search)及多係數累積(multiple-coefficient accumulation)等技巧來降低PDS的運算延遲。第二種則是一個高效率的管線化架構,可同時進行不同訓練向量的kWTA競賽。此管線化架構使用了一個嶄新的碼字交換機制(codeword swapping scheme),使那些在競賽過程中落敗的神經元可立即投入後續訓練向量的競賽。 文中所提出的模糊c-平均值架構是個高效率的平行計算方案。此架構利用查表式除法來降低計算權重值(membership coefficient)與質心的面積成本及計算複雜度。為了避開龐大的儲存需求,權重矩陣(membership coefficient matrix)及質心的更新,從過去慣用的迭代法,改為合併成單一步驟。這樣的架構還延伸到帶空間約束之模糊c-平均值的實現。並採用查表法來處理開根號運算,以便放寬模糊度(degree of fuzziness)的限制。 實驗結果顯示文中所提出的架構具有成本效益(cost-effective),且在面對龐大的資料集合及/或眾多的群集數時,較其他軟硬體實現能有更高的加速(speedup)。
In this dissertation, several hardware architectures are proposed for various clustering algorithms, including the c-means, competitive learning, fuzzy c-means, and fuzzy c-means with spatial constraint algorithms. All these architectures have been implemented on field programmable gate array (FPGA) devices to construct system on programmable chip (SOPC) systems for clustering. Both the partitioning and centroid computation operations in the proposed c-means architecture are fully pipelined thus multiple training vectors can be concurrently processed. A lookup table based divider is employed to reduce the area cost and latency for centroid computation. Two kinds of hardware realization of competitive learning algorithm with k-winners-take-all (kWTA) activation are presented. In the first architecture, the k winners associating with an input vector are identified by a module performing partial distance search (PDS) in the wavelet domain. The neuron updating process is based on a hardware divider with simple table lookup operations. Both the partial distance search module and hardware divider adopt finite precision calculation for area cost reduction. Subspace search and multiple-coefficient accumulation techniques are also employed to reduce the computation latency for the PDS search. The second architecture is based on an efficient pipeline allowing kWTA competition processes associated with different training vectors to be performed concurrently. The pipeline architecture employs a novel codeword swapping scheme so that neurons failing the competition for a training vector are immediately available for the competitions for subsequent training vectors. The proposed fuzzy c-means architecture is an efficient parallel solution. The architecture reduces the area cost and computational complexity for membership coefficients and centroid computation by employing lookup table based dividers. The usual iterative operations for updating the membership matrix and cluster centroid are merged into one single updating process to evade the large storage requirement. Such architecture is also extended to for the implementation of fuzzy c-means with spatial constraint. In the architecture, lookup table based root operators are adapted to relax the restriction on the degree of fuzziness. Experimental results show that the proposed architectures are cost-effective, and can attain high speedup over other hardware or software implementations for large data sets and/or large number of clusters.

Description

Keywords

分群, c-平均值, 競爭式學習, 模糊c-平均值, 可程式化閘陣列, 可程式化系統晶片, clustering, c-means, competitive learning, fuzzy c-means, field programmable gate array (FPGA), system on programmable chip (SOPC)

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By