基於大型語言模型的教學影片中文語音轉文字精準度提升方法之研究

dc.contributor周遵儒zh_TW
dc.contributorChou, Tzren-Ruen_US
dc.contributor.author楊之昌zh_TW
dc.contributor.authorYang, Chih-Changen_US
dc.date.accessioned2025-12-09T08:09:16Z
dc.date.available2025-07-15
dc.date.issued2025
dc.description.abstract隨著語音識別技術的迅速發展,中文語音轉文字(STT)系統對於字幕的製作,扮演著重要的角色,並經常應用於教學影片上。然而,由於中文的複雜性及同音字詞眾多,現有的STT系統在精準度方面仍存在明顯的提升空間。本研究針對提升中文STT精準度,提出了語言模型輔助編輯與微調語言模型輔助文本編輯等兩種基於大型語言模型(LLM)的優化方法,並透過製作多種領域課程的教學影片字幕,以萊文斯坦動態規劃來計算兩個字串之間的最短編輯距離進行驗證。研究結果顯示,使用語言模型輔助編輯不僅能提升精準度,微調語言模型輔助文本編輯的文字精準度更進一步得到提升,其能針對特定語言的特性產生微調策略,使其更有效地辨識出語言的細微差異,進一步提升中文語音轉文字系統的準確性。zh_TW
dc.description.abstractWith the rapid evolution of speech-recognition technology, Chinese speech-to-text (STT) systems have come to play a critical role in subtitle production and are now routinely employed in instructional videos. Yet, because of the language’s inherent complexity and the prevalence of homophones, the accuracy of current STT systems still leaves ample room for improvement.To close this gap, the present study proposes two optimisation strategies grounded in large language models (LLMs): LLM-assisted post-editing and fine-tuned-LLM-assisted post-editing. Their effectiveness is evaluated by generating subtitles for courses spanning multiple disciplines and computing the minimum edit distance between reference and candidate strings through a dynamic-programming implementation of the Levenshtein algorithm.The results demonstrate that LLM-assisted post-editing enhances transcription accuracy, and that fine-tuned-LLM-assisted post-editing delivers an additional performance gain. Fine-tuning equips the model with language-specific adaptation strategies, enabling it to capture subtle linguistic distinctions more effectively and, ultimately, to further improve the accuracy of Chinese STT systems.en_US
dc.description.sponsorship圖文傳播學系碩士在職專班zh_TW
dc.identifier012723109-47469
dc.identifier.urihttps://etds.lib.ntnu.edu.tw/thesis/detail/d352c9909d915dea4d0d03fd916b8775/
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw/handle/20.500.12235/125370
dc.language中文
dc.subject語音轉文字zh_TW
dc.subject大型語言模型zh_TW
dc.subject教學影片zh_TW
dc.subject微調語言模型zh_TW
dc.subject萊文斯坦距離zh_TW
dc.subjectSpeech-to-Text (STT)en_US
dc.subjectLarge Language Models (LLM)en_US
dc.subjectInstructional Videosen_US
dc.subjectFine-Tuned Language Modelsen_US
dc.subjectLevenshtein Distanceen_US
dc.title基於大型語言模型的教學影片中文語音轉文字精準度提升方法之研究zh_TW
dc.titleA Study on Enhancing the Accuracy of Chinese Speech-to-Text in Instructional Videos Using Large Language Modelsen_US
dc.type學術論文

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
202500047469-109783.pdf
Size:
3 MB
Format:
Adobe Portable Document Format
Description:
學術論文

Collections