基於大型語言模型的教學影片中文語音轉文字精準度提升方法之研究

楊之昌; Yang, Chih-Chang

基於大型語言模型的教學影片中文語音轉文字精準度提升方法之研究

dc.contributor	周遵儒	zh_TW
dc.contributor	Chou, Tzren-Ru	en_US
dc.contributor.author	楊之昌	zh_TW
dc.contributor.author	Yang, Chih-Chang	en_US
dc.date.accessioned	2025-12-09T08:09:16Z
dc.date.available	2025-07-15
dc.date.issued	2025
dc.description.abstract	隨著語音識別技術的迅速發展，中文語音轉文字（STT）系統對於字幕的製作，扮演著重要的角色，並經常應用於教學影片上。然而，由於中文的複雜性及同音字詞眾多，現有的STT系統在精準度方面仍存在明顯的提升空間。本研究針對提升中文STT精準度，提出了語言模型輔助編輯與微調語言模型輔助文本編輯等兩種基於大型語言模型（LLM）的優化方法，並透過製作多種領域課程的教學影片字幕，以萊文斯坦動態規劃來計算兩個字串之間的最短編輯距離進行驗證。研究結果顯示，使用語言模型輔助編輯不僅能提升精準度，微調語言模型輔助文本編輯的文字精準度更進一步得到提升，其能針對特定語言的特性產生微調策略，使其更有效地辨識出語言的細微差異，進一步提升中文語音轉文字系統的準確性。	zh_TW
dc.description.abstract	With the rapid evolution of speech-recognition technology, Chinese speech-to-text (STT) systems have come to play a critical role in subtitle production and are now routinely employed in instructional videos. Yet, because of the language’s inherent complexity and the prevalence of homophones, the accuracy of current STT systems still leaves ample room for improvement.To close this gap, the present study proposes two optimisation strategies grounded in large language models (LLMs): LLM-assisted post-editing and fine-tuned-LLM-assisted post-editing. Their effectiveness is evaluated by generating subtitles for courses spanning multiple disciplines and computing the minimum edit distance between reference and candidate strings through a dynamic-programming implementation of the Levenshtein algorithm.The results demonstrate that LLM-assisted post-editing enhances transcription accuracy, and that fine-tuned-LLM-assisted post-editing delivers an additional performance gain. Fine-tuning equips the model with language-specific adaptation strategies, enabling it to capture subtle linguistic distinctions more effectively and, ultimately, to further improve the accuracy of Chinese STT systems.	en_US
dc.description.sponsorship	圖文傳播學系碩士在職專班	zh_TW
dc.identifier	012723109-47469
dc.identifier.uri	https://etds.lib.ntnu.edu.tw/thesis/detail/d352c9909d915dea4d0d03fd916b8775/
dc.identifier.uri	http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/125370
dc.language	中文
dc.subject	語音轉文字	zh_TW
dc.subject	大型語言模型	zh_TW
dc.subject	教學影片	zh_TW
dc.subject	微調語言模型	zh_TW
dc.subject	萊文斯坦距離	zh_TW
dc.subject	Speech-to-Text (STT)	en_US
dc.subject	Large Language Models (LLM)	en_US
dc.subject	Instructional Videos	en_US
dc.subject	Fine-Tuned Language Models	en_US
dc.subject	Levenshtein Distance	en_US
dc.title	基於大型語言模型的教學影片中文語音轉文字精準度提升方法之研究	zh_TW
dc.title	A Study on Enhancing the Accuracy of Chinese Speech-to-Text in Instructional Videos Using Large Language Models	en_US
dc.type	學術論文

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 202500047469-109783.pdf
Size:: 3 MB
Format:: Adobe Portable Document Format
Description:: 學術論文

Download

Collections

學位論文