基於頻率域和時序性特徵的假人臉影片偵測

王順達; Wang, Shun-Ta

基於頻率域和時序性特徵的假人臉影片偵測

Files

60847047S-39965.pdf (2.61 MB)

Date

2021

Authors

王順達

Wang, Shun-Ta

Abstract

隨著深度學習生成技術日新月異發展，越來越多深度學習生成的假臉充斥在網路世界。多項研究證實人眼對於深度學習生成假臉的真偽越來越沒有判斷能力，將來勢必衍生更多擬真度極高的假影片讓大眾堅信不移，製造多重假資訊和社會恐慌。然而深度學習模型卻有辦法偵測某些細微特徵，不論是從語意上、屬性上、和頻譜上，甚至是幀和幀之間的不一致性都逃不過模型精準的法眼，因此利用深度學習模型偵測假臉勢在必行。近年來，深度學習偵測假臉研究日益受到關注，其中不乏利用離散餘弦轉換、傅立葉轉換等方式將特徵圖轉換至頻率域，並在頻譜中學習特徵，以及運用注意機制讓模型學習、強調局部特定區域，和利用循環神經網路學習幀和幀之間的不一致性。但過往研究往往忽略模型追求的目標是具備高度泛化能力，畢竟將來人類面臨到的造假影片不會是模型訓練時所見過的，也必然隨著深度生成技術演進產生更難辨別的影片，此時模型能否精準偵測便是考驗演算法泛化能力的時候。因此本研究結合卷積神經網路抽取空間域特徵，離散餘弦轉換後的頻譜抽取頻率域特徵，以及利用注意機制學習、強調竄改區域，和運用 GRU 架構抽取前面學習到的特徵再加以學習時序性特徵，辨別真偽。此外還設計兩種損失函數實驗，Focal Loss 和 Cross-Entropy Loss 追求最好的模型泛化能力。實驗證實，我們的模型架構能在沒有預訓練的情況下，在 Celeb-DF 資料集達到當今最佳的泛化結果，並在其他資料集也展現顯著的泛化能力。
With the rapid development of deep generative models, more and more fakefaces generated by deep learning models, so-called DeepFakes, are widely spread on the Internet. A number of studies show that the human eye is becoming less and less capable of judging the authenticity of DeepFakes, which must be harder in the future. Furthermore, DeepFakes are also creating much fake information and social panic. However, deep learning models are able to detect subtle features. Whether they are from semantics, attributes, spectrum, or even frame-to-frame inconsistencies, they have nowhere to hide by the detection from deep learning models. This is whywe investigate DeepFakes detection by deep learning.In recent years, DeepFakes detection has received increasing attention. Some ofthe researchers use discrete cosine transform, Fourier transform and other methods to convert feature maps into frequency domain so as to learn features in the frequency spectrum. Others utilize attention mechanisms to allow models to emphasize local areas. Still others use recurrent neural network to learn the inconsistency between two frames. However, researchers often overlook a fact that the goal of designing a DeepFakes detection model is to have a high level of generalizability. After all, the fake clips that human encounters in the future will not be seen during model training, and DeepFakes will definitely become more complicated as the deep generative technology evolves. At present, how effectively can the model detect DeepFakes depends on the generalizability of the algorithm.Therefore, we design a novel architecture which uses convolutional neuralnetwork to extract spatial domain features, discrete cosine transform to extractfrequency domain features, attention mechanism to emphasize the tampering area, GRU module to learn sequential features and then distinguish the authenticity. In addition, two loss functions are evaluated—Focal Loss and Cross-Entropy Loss in order to pursue the best model generalizability.Experiments have proved that our model can achieve the best generalizationresults in the Celeb-DF dataset without pre-trained, and it also exhibits significantgeneralizability in other datasets.

Keywords

深度學習, 合成影像偽造, 偽造偵測, 離散餘弦轉換, 人臉偵測, Deep learning, Face Detection, Image Synthesis, Deepfake Forensics, Discrete Cosine Transform

URI

https://etds.lib.ntnu.edu.tw/thesis/detail/132d8e7c40a4408f0212971e5b504348/
http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/117326

Collections

學位論文

Full item page

基於頻率域和時序性特徵的假人臉影片偵測

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By