結合PTZ攝影機與光學雷達之CNN虛擬圍籬系統

No Thumbnail Available

Date

2018

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

本研究開發一套結合PTZ(Pan-Tilt-Zoom)攝影機與光學雷達(Light Detection and Ranging, LiDAR)之CNN(Convolutional Neural Network,卷積神經網路)虛擬圍籬系統。 虛擬圍籬與傳統的隔離方式不同,並不需要真正築起一道實體的牆壁或護欄,而是利用各種電子裝置與軟體程式的結合,建立人眼不可察覺的虛擬防線。虛擬圍籬具有下列優點:(A)低人力介入且警戒可為全天候、大範圍 (B)具機動性與擴充性 (C)不破壞原景觀 (D)即時通報且可延伸後續處理。但實際應用上,虛擬圍籬常因誤報率太高,處理和通報速度太慢等因素,尚未被大眾所接受。 本研究分別從軟硬體兩方面來提升傳統虛擬圍籬偵測與辨識的準確度與速度。在硬體方面的改良是使用LiDAR與PTZ攝影機。LiDAR所發出的紅外線不但可以做為系統的觸發器,而且它不易受天候與光影影響,可以降低誤判,提高系統的精準度與穩定度。此外,LiDAR也將偵測到的侵入物距離資訊傳送給PTZ攝影機以控制鏡頭的變焦縮放,使得拍攝到的影像都有適當的大小,增加後續CNN分類辨識的準確度。 在軟體方面的改良則是使用CNN,利用它強大的特徵學習能力,提升辨識分類的速度與準確率。本研究以不同的訓練模式以及不同的資料集前處理來進行VGG-16與Darknet-19的實驗。就訓練模式而言,使用以ImageNet大量資料訓練所得的pretrained參數,再加上與測試資料前處理類型相近的資料集進行fine-tune,可以得到最佳的成效。就資料集前處理而言,本研究將其大致分為Original(即原本的邊界框影像)、Rescaled(以程式自動將邊界框影像等比縮放置中放入符合CNN輸入尺寸的黑色或灰色底)、Matting(去背景,將背景塗成黑色或灰色)、以及Matting&Rescaled(以程式自動將去背景後的邊界框影像等比縮放置中放入符合CNN輸入尺寸的黑色或灰色底)。實驗顯示,訓練和測試都使用Rescaled版的資料集可以得到最高的mAP,其中VGG-16實驗中,訓練和測試都使用Rescaled-Grey版資料集可得到96.3%的mAP。 對於虛擬圍籬系統而言,因為侵入物件有移動的動態資訊,會造成連續畫面的變化,因此系統可經由移動物件定位法來找出侵入物件及其邊界框,不需像傳統的物件偵測系統是以單張靜態的影像畫面為輸入,必須產生和評比各種可能的物件邊界框,並浪費資源和時間在不必要的背景物件的偵測和辨識上。本研究所採用的移動物件定位法是運用三個連續畫面的連續影像相減法,並且採運作速度極快的bitwise_and函式取相減影像的交集,以得到較精確的移動前景與邊界框。此外,可用經過動態形態學填補空洞後的二值化前景影像為遮罩,與原影像或邊界框影像結合後,達到粗略的去背景(matting)效果。Matting& Rescaled-Grey版資料集在VGG-16也有很高的mAP(95.3%)。 目前本系統設定區分的侵入者類別為三類,分別是「行人」、「動物」和「非人且非動物」。使用者可以視應用場所的需求,對三個類別的侵入者做不同的處理,使後續的應用更有彈性。從整合測試的實驗結果顯示,本研究虛擬圍籬系統整體的偵測準確率mAP達95%以上,而從LiDAR觸發取像至判斷出物件類別的平均處理時間則在0.2sec.以下,是一套準確率高且速度快的實用系統。
This study proposes a CNN(Convolutional Neural Network)-based virtual fence system equipped with a LiDAR (Light Detection and Ranging) sensor and a PTZ (Pan-Tilt-Zoom) camera. The proposed system detects and classifies invaders with high mAP (mean average precision) and short operation time. Virtual fence, as opposed to a real physical fence, plays an important role in intelligent surveillance systems. It involves less human resources to build, makes no physical impacts on the surrounding areas, and is easily extendable and portable. However, due to a high false alarm rate and high time complexity, it is still challeng- ing for a virtual fence system to provide satisfactory performance. The proposed virtual fence system in this study improves both the detection rate and speed. First, a LiDAR sensor is used to detect invaders. Once an invader is sensed, the sensor triggers the PTZ camera. LiDAR’s tolerance to variations of weather and light enhances the robustness of the system. Besides, since small objects in images easily cause detection and classification errors, the distance information of objects provided by the LiDAR sensor is passed to the PTZ camera for controlling its zoom-in and zoom-out operations to ensure proper sizes of objects present in images. Then, a three-frames temporal differencing algorithm is applied to locate the moving objects in video frames. Through a bitwise-and operation and dynamic morphological processings applied to the differencing frames, the contours and bounding boxes of the moving objects can quickly be determined. Compared with the existing object detection systems, such as RCNN and YOLO series, which provide lots of bounding boxes and evaluations at multiple locations and scales, the proposed object location method is less complicated. Besides, object detection systems above are trying to locate and classify all the objects appearing in an image, while a virtual fence system is only interested in detecting the invading moving objects. Thus, using the proposed moving object location method can avoid unnecessary processings of irrelevant background objects. Finally, a CNN system is used to classify the objects in the bounding box images into 3 classes, mainly, pedestrian, animal and others. The CNN frameworks experimented in this study are VGG-16 and Darknet-19 (the CNN framework used in YOLOv2). Different training modes and dataset preprocessings for CNN are investi- gated. For training modes, experiments of VGG-16 demonstrate that training with ImageNet-pretrained parameters and fine-tuned with bounding box datasets achieves the best performance. For dataset preprocessing, there are 4 main preprocessing types, mainly, Original, Rescaled (isotropically rescaling an image into a predefined fixed- size black or grey underlay), Matting (color background with black or grey), and Rescaled&Matting. Experimental results indicate that using Rescaled preprocessing for both training and testing datasets outperforms other combinations. VGG-16 with ImageNet-pretrained parameters and fine-tuned using a bounding box dataset with Rescaled-Grey preprocessing achieves 96.3% mAP. The integration test of the proposed virtual fence system demonstrate that the performance of the best performing configuration mentioned above achieves higher than 95% mAP and the processing time averagely taken from LiDAR detection to the end of CNN classification is less than 0.2 second. The experimental results show that the proposed system is fast, accurate, stable and of practical use.

Description

Keywords

虛擬圍籬系統, 卷積神經網路, 光學雷達, PTZ攝影機, 連續影像相減法, 形態學, 資料集前處理, virtual fence system, convolutional neural network, CNN, LiDAR, PTZ camera, temporal differencing method, morphology, dataset preprocessing

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By