基於大型語言模型探討YouTube Shorts中的疑美論與宣傳技術分析

No Thumbnail Available

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

本研究旨在探討多模態大型語言模型於YouTube短影音中偵測疑美論之效能,以及宣傳手法在大型語言模型表現之角色。在此研究中,疑美論定義為台灣應與美國保持距離之相關論述。研究中所使用的短影音資料來自台灣傳統媒體與政論節目之YouTube頻道。為評估不同輸入模態對大型語言模型偵測表現的影響,我們設計了三種輸入條件:文字模態、影像模態、以及文字與影像結合的多模態。文字模態輸入是由轉錄之影音內容與標題構成;影像模態則為將將短影音以每秒一幀進行抽取;多模態則為上述兩模態的結合。本研究包含兩組實驗,分別為實驗一與實驗二。實驗一為二元分類任務,旨在判斷短影音是否包含疑美論,共有64支短影音,由32個疑美論與32個非疑美論短影音組成。實驗二為多類別分類任務,共有62支短影音,涵蓋12個類別,包括8種前人提出的疑美論類型、3種本研究新增類型,以及1種非疑美論類型。模型除需分類出疑美論類別,亦須輸出其判斷理由。為全面評估模型表現,本研究提出了四項評估架構。第一與第二架構分別針對二元與多元分類,使用四項指標:準確率、精確率、召回率與F1分數進行評估。第三架構則透過將模型輸出的判斷理由與人工標註之疑美論片段向量化,計算餘弦相似度,藉此衡量模型對特定類型之理解程度;並以雙因子變異數分析檢驗餘弦相似度是否受輸入模態與預測正確性之影響。第四架構則以卡方檢定分析宣傳手法使用頻率與模型表現之間的關聯性。實驗結果如下:第一,文字模態在二元分類中表現最佳,準確率與F1分數均超過0.8。第二,模型於多類別分類中的表現因輸入模態不同而有所差異。第三,多數疑美論類型的餘弦相似度高於0.8,且變異數分析顯示輸入模態為顯著影響因素,其中文字模態效果最為顯著。第四,宣傳手法的使用與特定疑美論類型呈現顯著關聯,但與模型分類表現無顯著關聯,此外,最常使用的三種宣傳手法為訴諸恐懼、貼標籤與情緒語言。綜上所述,本研究為國內首篇探討多模態大型語言模型於疑美論偵測之應用,實驗結果顯示,模型在疑美論的二元分類任務中具備可行性與潛力,然而在多類別細緻分類上仍有進步空間。本研究亦指出,未來在疑美論偵測應用中,透過文字輸入即可有效驅動大型語言模型進行分類,有助於加速對疑美論宣傳內容的辨識與應對。
This study investigates the effectiveness of multimodal large language model (LLM) in detecting U.S.-skepticism in YouTube Shorts, as well as the role of propaganda techniques in influencing model performance. In this study, U.S.-skepticism refers to discourses advocating that Taiwan should maintain a certain distance from the United States. The video data analyzed were collected from YouTube channels of traditional Taiwanese media and political talk shows.To evaluate the impact of input modality on model performance, three input modalities were designed: text-only, image-only, and a multimodal combination of both. The text modality consists of transcripts and video titles, while the image modality involves frame-by-frame extraction at one frame per second. The multimodal input combines both modalities.The study includes two experiments. Experiment 1 is a binary classification task aimed at determining whether a Shorts contains U.S.-skepticism, using a dataset of 64 Shorts (32 labeled as containing U.S.-skepticism and 32 as not). Experiment 2 is a multiclass classification task involving 62 Shorts across 12 categories, including 8 types of U.S.-skepticism subtypes from previous research, 3 newly proposed subtypes, and 1 non-skeptical subtype. In addition to classification, the models were also required to output reasoning for their predictions.To comprehensively evaluate model performance, four evaluation frameworks were proposed. The first two frameworks assess performance using accuracy, precision, recall, and F1 scores for binary and multiclass classification. The third framework measures cosine similarity between the model’s reasoning and manually annotated U.S.-skepticism evoking snippets, to assess the model’s understanding of each U.S.-skepticism subtype. A two-way ANOVA was conducted to examine whether similarity score was affected by input modality or prediction accuracy. The fourth framework uses Chi-square tests to explore the relationship between propaganda technique usage and model performance.The results are as follows: (1) The text modality achieved the highest accuracy and F1 scores (both above 0.8) in binary classification. (2) In multiclass classification, model performance varied depending on the input modality. (3) Cosine similarity scores exceeded 0.8 for most subtypes, and two-way ANOVA results indicated that input modality significantly influenced similarity, with text input having the most significant effect. (4) Propaganda techniques were significantly associated with certain subtypes but showed no significant relationship with model performance. In particular, the top three frequently used propaganda techniques are Appeal to fear, Name calling, and Loaded language.In summary, this study is the first in Taiwan to explore the application of multimodal LLM in the detection of U.S.-skepticism. The experimental results indicate that the model demonstrates feasibility and potential in binary classification tasks related to U.S.-skepticism. However, there remains room for improvement in fine-grained multiclass classification detection. This study also highlights that, in future applications of U.S.-skepticism detection, textual input alone can effectively drive LLMs to perform classification, thereby facilitating the identification and response to propaganda of U.S.-skepticism more efficiently.

Description

Keywords

大型語言模型, 疑美論, 宣傳手法, YouTube短影音, 多模態, Large Language Model, U.S.-skepticism, Propaganda technique, YouTube Shorts, Multimodality

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By