李忠謀Lee, Chung-Mou簡郁璇Chien, Yu-Shuan2022-06-082026-08-222022-06-082021https://etds.lib.ntnu.edu.tw/thesis/detail/8e0c62a609e4dffe0f272f18a76b87bf/http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/117313近年來科技日新月異的發展,多媒體串流影片服務平台更是不可勝數,各式各樣的影片上傳到各個影音平台作為影音串流服務,使用者經常利用電子產品來觀看喜愛的電視劇或影集,若要針對影片進行語言學習,則必須自動挑定影片中某位主角所有的畫面與台詞,然而自動擷取以便進行練習,因此,本研究進行影片中人物的分群分析。傳統判別影片中人物的研究,都需要事先輸入主角人臉圖像,提取人臉特徵作爲人臉庫,進而將偵測到的人臉與人臉庫特徵比對,才能得到比對結果,然而如果沒有一開始的主角人圖像,便會無法預測影片中的人物,因此本研究探討在無監督訓練條件之下,針對影片進行人臉聚類(face cluster)將一部影片中的人臉分成為不同簇(cluster)之後,並且尋找聚類中心(centroid)作爲質量最高的圖像,透過人臉檢索(face retrieval)的方法採用上述聚類中心作爲人臉庫,即可分析影片中主角之人臉特徵與人臉庫進行比對。透過本研究所提出合併Facenet、Chinese-Whisper聚類、Annoy三種技術,以四部影集的不同場景內容環境作為實驗情境,在影片人數為五人內人臉偵測準確率達95.3%、十人內人臉偵測準確率達87.9%、十五人內人臉偵測準確率達82.7%。由於人臉經由時間會有不同變化,根據實驗結果,使用第一年的主角人臉庫進行偵測已經經過四年的影片,此人臉偵測準確率仍能維持81.8%。本研究聚類方法在LFW公開資料庫上高於 K-means、DBSCAN 聚類方法,代表聚類後的簇類與真實類別的吻合度相近。With the rapid advancement of technology, multimedia streaming video service platforms are growing rapidly. Various videos are uploaded to various audio-visual platforms as audio-visual streaming services. Users often use electronic products to watch favorite TV series or albums. This research conducts techniques for analysis of the characters in the videos or photos.Traditional research on identifying people in movies requires inputting the protagonist’s face image in advance, extracting the facial features asa face database, and then comparing the detected face with the features of the face database to get the comparison result. However, If there is no image of the protagonist at the beginning, it clustering methods difficult to predict the characters in the video. Therefore, this study explores identifying and grouping face in the video into different clusters under the condition of unsupervised training. The cluster center is searched as the highest quality image, and the above-mentioned cluster center is used as the face database through the face retrieval method, and the facial features of the protagonist in the video can be analyzed and compared with the face database.In this research, we adopted three techniques for face detection and clustering, namely: Facenet, Chinese-Whisper clustering, and Annoy. The accuracy for videos containing at most five people is 95.3%, and the accuracy for videos containing at most ten people is 87.9%, and the accuracy for videos containing at most fifteen people is 82.7%. The face will change differently over a period of time. According to the experimental results, using the protagonist's face database in the first year to detect videos that have passed four years, the accuracy of the face detection can still maintain 81.8%. The clustering method in this study is higher than the K-means and DBSCAN clustering methods in the LFW public database, which means that the clusters after clustering are similar to the real categories.人臉偵測人臉聚類人臉檢索face detectionface clusteringface retrieval基於聚類之影片人數計數分析Clustering-based People Counting Analysis學術論文