虛擬觀眾攝影師系統
No Thumbnail Available
Date
2014
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
本研究的主旨在於建立一套模擬專業攝影師並以觀眾為拍攝主體的虛擬觀眾攝影師系統。現今社會中許多資訊的傳播都是透過演講方式,而為了讓觀賞者能隨時觀看演講內容,聘請專業攝影團隊紀錄整場演講是最直接的方式。然而現代生活的人力資源成本不斷提升,聘請一組專業攝影團隊的成本並不低,因此本研究發展一套虛擬觀眾攝影師系統來節省人力資源成本,同時提供專業攝影技巧以製作高規格影片。
本研究以兩台Pan Tilt Zoom Camera(PTZ攝影機)為一組作為實驗設備,一台稱為global-view攝影機,另一台為稱local-view攝影機。Global-view攝影機是用來代表攝影師的雙眼,主要功能是監控畫面與主體偵測並找出畫面中感興趣的區域(Region Of Interesting, ROI);Local-view攝影機則是用來代表攝影師手上的攝影機,在系統決定ROI與運鏡所需要的一切資訊後,local-view攝影機就會實際執行運鏡動作並進行拍攝。
本系統的主要目的是模仿專業攝影師的拍攝技巧並自動進行運鏡拍攝動作,而為了符合專業攝影師的拍攝技巧與手法,每次運鏡前系統皆需要先決定運鏡方式、景別、主體等要素。首先系統從global-view攝影機所提供的連續影像中擷取具有描述觀眾行為的motion特徵,再將這些特徵經過運算處理並找出畫面中的候選ROI,接著將這些候選ROI輸入STA(spatio-temporal attention neural model),STA能夠紀錄並提供相關資訊來協助系統找出最適合拍攝的ROI。之後系統計算欲拍攝的ROI與鏡頭中心位置的對應關係,並依據輸入的資料輸出最適合該情況的運鏡方式和景別用以啟動local-view攝影機運鏡拍攝;而local-view攝影機所拍攝的主體挑選與拍攝畫面的質感主要是以美學以及光學的特徵分析來做為判斷標準,本研究透過上述流程來模擬專業攝影學的拍攝技巧。
實驗結果顯示,本系統所運用的方法可以進行即時且流暢的運鏡動作並可準確模擬專業攝影師的拍攝手法,符合專業攝影團隊來拍攝記錄演講錄製的需求。
This thesis proposes a virtual audience cameraman system to capture the audience videos automatically. Nowadays the contents of lectures can be broadcast widely and rapidly by digital videos, thus to capture digital videos of important lectures for the viewers is an essential work. However, the cost to hire a video-recording team, including professional photographers, to capture good-quality digital videos is very high. Thus this study developed a virtual audience cameraman system which can obtain good-quality digital videos automatically and reduce the cost of hiring a professional video-recording team. In this study, two PTZ cameras are mounted together to be a set, one is the global-view camera and the other is the local-view camera. The global-view camera can be regarded as the photographer's eyes. It can be used to monitor the whole audience and help the region of interesting (ROI) detection. The local-view camera can be regarded as the photographer's camera on hand. It can be used to capture the videos from ROI after the system determines the location of ROI. Since the purpose of this system is to simulate the camera-control behaviors of professional photographers to capture the audience videos, the proposed system needs to decide the camera steering mode, shot class, and the objects before camera steering. First, the system obtains input videos from the global-view camera and then detects the audience motion features to locate the ROI candidates. The ROI candidates are then input into the spatiotemporal attention (STA) neural model. The STA neural model can record and provide information to help the system to identify the most suitable shooting ROI. Further, the system computes the relative distance between the location of the ROI on the frame and the center of the camera lens, and outputs the appropriate steering mode of the local-view camera. The local-view camera then captures the output videos from the location of ROI by considering the viewpoint of aesthetics and the analysis result of optical characteristics. Through the above process this system can simulate professional photography shooting skills. The experimental results show that the proposed method can steer the camera immediately, automatically, and smoothly. It can also simulate the style of professional photographers accurately.
This thesis proposes a virtual audience cameraman system to capture the audience videos automatically. Nowadays the contents of lectures can be broadcast widely and rapidly by digital videos, thus to capture digital videos of important lectures for the viewers is an essential work. However, the cost to hire a video-recording team, including professional photographers, to capture good-quality digital videos is very high. Thus this study developed a virtual audience cameraman system which can obtain good-quality digital videos automatically and reduce the cost of hiring a professional video-recording team. In this study, two PTZ cameras are mounted together to be a set, one is the global-view camera and the other is the local-view camera. The global-view camera can be regarded as the photographer's eyes. It can be used to monitor the whole audience and help the region of interesting (ROI) detection. The local-view camera can be regarded as the photographer's camera on hand. It can be used to capture the videos from ROI after the system determines the location of ROI. Since the purpose of this system is to simulate the camera-control behaviors of professional photographers to capture the audience videos, the proposed system needs to decide the camera steering mode, shot class, and the objects before camera steering. First, the system obtains input videos from the global-view camera and then detects the audience motion features to locate the ROI candidates. The ROI candidates are then input into the spatiotemporal attention (STA) neural model. The STA neural model can record and provide information to help the system to identify the most suitable shooting ROI. Further, the system computes the relative distance between the location of the ROI on the frame and the center of the camera lens, and outputs the appropriate steering mode of the local-view camera. The local-view camera then captures the output videos from the location of ROI by considering the viewpoint of aesthetics and the analysis result of optical characteristics. Through the above process this system can simulate professional photography shooting skills. The experimental results show that the proposed method can steer the camera immediately, automatically, and smoothly. It can also simulate the style of professional photographers accurately.
Description
Keywords
虛擬觀眾攝影師, STA(Spatio-Temporal Attention neural model), 運鏡, 專業攝影學, Virtual Audience Cameraman System, STA (Spatiotemporal Attention) neural model, Camera Steering, Professional Photography