實現H.264可變區塊移動估測單元之高效能超大型積體電路架構
No Thumbnail Available
Date
2005
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
本論文針對H.264可變區塊移動估測單元提出了一個有效率和彈性化的VLSI架構,可以針對4×4區塊大小及其整數倍數區塊大小的區塊,執行全區搜尋區塊比對演算法。本架構將會把在原始畫面中的每一個16×16大小的巨區塊(Macroblock)切割成16個沒有互相重疊4×4大小的子區塊,稱為基本子區塊(primitive subblocks),本架構中包含16個模組和一個可變區塊移動估測處理器(VBSME processor)。每一個模組中,我們利用串接一維心脈陣列(cascading 1D systolic array)來針對不同基本子區塊的執行區塊比對的動作,這樣的串接一維心脈陣列可以讓本架構有高度的計算吞吐量、高度的彈性化和百分之百的處理單元利用率,每一個基本子區塊皆會同時執行移動估測的動作,並利用這16個基本子區塊組合出41個不同大小的子區塊,在本架構中,我們利用可變區塊移動估測處理器(VBSME processor)由基本子區塊計算所得之絕對誤差總合(SAD)同時計算出所有41個子區塊的絕對誤差總合(SAD)。本論文所出的新架構和已發表的H.264可變區塊移動估測架構相比有著較低的計算延遲和高度的計算吞吐量。
This paper proposes a novel flexible VLSI architecture for the implementation of variable block size motion estimation(VBSME).The architecture is able to perform a full motion search on integral multiples of 4×4 block sizes. To use the architecture, each 16×16 marcoblock of the source frames should be partitioned into sixteen modules and one VBSME processor. Each module, realized by cascading 1D systolic arrays, is responsible for the block-matching operations of a different primitive subblock. The realization has the advantages of high throughput, high flexibility and 100﹪ processing element (PE) utilization. The motion estimation of all the primitive subblocks are performed in parallel. These primitive subblocks are used to form 41 subblocks with different sizes. We use the VBSME processor to concurrently compute the sums of absolute differences (SADs) of all the 41 subblocks from the SADs of the primitive subblocks. This new architecture has lowest latency and highest throughput over other existing VBSME architectures for the hardware implementation of H.264 encoders.
This paper proposes a novel flexible VLSI architecture for the implementation of variable block size motion estimation(VBSME).The architecture is able to perform a full motion search on integral multiples of 4×4 block sizes. To use the architecture, each 16×16 marcoblock of the source frames should be partitioned into sixteen modules and one VBSME processor. Each module, realized by cascading 1D systolic arrays, is responsible for the block-matching operations of a different primitive subblock. The realization has the advantages of high throughput, high flexibility and 100﹪ processing element (PE) utilization. The motion estimation of all the primitive subblocks are performed in parallel. These primitive subblocks are used to form 41 subblocks with different sizes. We use the VBSME processor to concurrently compute the sums of absolute differences (SADs) of all the 41 subblocks from the SADs of the primitive subblocks. This new architecture has lowest latency and highest throughput over other existing VBSME architectures for the hardware implementation of H.264 encoders.
Description
Keywords
視訊編碼, VLSI架構, 可變區塊移動估測單元, H.264視訊壓縮標準, Video Coding, VLSI Architecture, Variable Block Size Motion Estimation, H.264 Standard