實現於多核心圖形處理器之平行樣式比對演算法的效能導向最佳化技術研究

No Thumbnail Available

Date

2012-07-31

Authors

林政宏

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

網路入侵偵測系統(Network Intrusion Detection System, NIDS)已經被廣泛運用於保護電腦系統免於網路攻擊。其中,比對封包內容與病毒樣式的樣式比對(pattern matching)核心決定了一個網路入侵偵測系統的效能。由於網路攻擊的大量增加與網路的複雜度,傳統循序的樣式比對演算法已經不適用於目前的高速網路。在這個計畫中,我們首先針對所提出的演算法配合GPU的特性做深入的探討,包括thread divergence, memory organization, caching and locality, and memory bandwidth等方面。由於GPU的特殊架構,要得到更高的效能,必須深入探討GPU實現上的細節。考慮GPU硬體上的特性,我們企圖提出多項吞吐量導向(throughput-oriented)的最佳化技術,例如降低輸入buffer的全域記憶體(global memory)讀取次數、降低transition table讀取的延遲、避免shared memory的bank conflict、還有全域記憶體的coalescing寫入技術。計畫最後,我們也將針對GPU記憶體的限制提出兩個解決方案。其中之一是提出一個最佳化的方法來降低transition table記憶體的使用量。另一個方法則是藉由將運算工作分配給多個GPUs,提升整體的吞吐量。針對多重GPU(multi-GPUs) 架構,我們則將提出使用多重GPU的機制,包括如何將GPU的thread嵌入於CPU的pthread、如何產生task pool、與如何處理CPU/GPU架構的錯誤與例外。在本計畫中,我們將使用Snort 2.8版本的攻擊樣式,並採用包含大量的攻擊樣式的DEFCON封包以測試我們所開發的PFAC核心。實驗結果除了跟傳統AC的OpenMP與Pthread版本比較以外,並與目前最新的GPU研究成果比較。研究成果已發表於2011 IEEE GLOBECOM國際會議和第22超大型積體電路設計暨計算機輔助設計研討會、2012 GPU Technology Conference。
Network Intrusion Detection Systems (NIDS) have been widely used to protect computer systems from network attacks. The kernel of pattern matching dominates the performance of an NIDS to inspect packet content against thousands of predefined patterns. Due to the ever-increasing number of attacks and network complexity, traditional sequential pattern matching algorithms have become inadequate for the high-speed network.In this project, we carry out an in-depth study of the proposed algorithm performed on GPUs including thread divergence, memory organization, caching and locality, and memory bandwidth. Due to specific architecture of GPUs, to explore higher performance, it is imperative to study the implementation issues on GPUs. Considering the GPU architecture, we attempt to introduce several throughput-oriented techniques including reducing global memory transaction of input buffer, reducing latency of transition table lookup, avoiding bank-conflict of shared memory, and coalescing write to global memory in GPUs. Finally, we intend to deal with the memory limitation of GPUs by two ways. One is to utilize the memory of GPUs by optimizing the state transition table of our proposed algorithm. The other is to use multi-GPUs to increase throughput by distributing workload to multiple GPUs. In this part, we will detail the mechanism for binding GPU thread to CPU Pthreads, creating a task pool, and handling errors and exception conditions of joint CPU/GPU architecture.To demonstrate our algorithm, we will use attack patterns from Snort V2.8 and test the PFAC kernel using DEFCON packets which contains large amounts of attack patterns. Experimental results will be compared to the AC algorithm performed on multicore CPUs with OpenMP and Pthread as well as state-of-the-art GPU-based approaches.The research results have been presented at 2011 IEEE GLOBECOM, the 22nd VLSI Design/CAD Symposium, and 2012 GPU Technology Conference (GTC).

Description

Keywords

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By