整合生成式人工智慧於多語言文本圖像客製化

dc.contributor賴以威zh_TW
dc.contributor伊藤一秀zh_TW
dc.contributorLai, I-Weien_US
dc.contributorKazuhide ITOen_US
dc.contributor.author吳巧心zh_TW
dc.contributor.authorWu, Chiao-Hsinen_US
dc.date.accessioned2025-12-09T08:03:02Z
dc.date.available2030-02-04
dc.date.issued2025
dc.description.abstract生成式人工智慧已迅速發展,能夠自動創建內容,對創意、交流及生產力產生深遠影響。然而,生成式人工智慧仍面臨諸多挑戰,尤其是在實現準確且可靠的文字圖像方面。當前的研究往往難以處理非拉丁字母的文字,並無法有效提高文字準確性,限制了生成包含文字的圖片的應用潛力。為解決這些限制,本研究提出了多語言文本圖像客製化系統(Multilingual Accurate Textual Image Customization, MATIC),該系統結合了多模態大型語言模型與擴散模型並使用思維鏈的推理方法,將複雜任務分解為易於處理的子任務以提升多語言文字元素的準確性。實驗結果表明MATIC 在多種語言的文字準確率超過 95%,表現優於現有模 型,且無論文本長度如何均能保持穩定的準確性。此外,此系統包含了增強圖像分析精確度的網格系統,能夠輔助學術研究中的圖像分析。綜合而言,這些創新使 MATIC 成為一種變革性工具,應用範疇涵蓋從學術研究到跨語言交流。zh_TW
dc.description.abstractGenerative Artificial Intelligence (GAI) has rapidly advanced, enabling autonomous content creation that impacts creativity, communication, and productivity. However, notable challenges remain, particularly in achieving accurate textual images. Current research often struggles with non-Latin scripts and fails to effectively improve text accuracy, leaving a gap in GAI’s applicability across diverse visual contexts.To address these limitations, this study introduces the Multilingual Accurate Textual Image Customization (MATIC) system, which integrates Multimodal Large Language Models with diffusion models. Utilizing a Chain-of-Thought reasoning approach, MATIC decomposes complex tasks into manageable sub-tasks, enhancing the accuracy of multilingual textual elements.Experimental results demonstrate that MATIC achieved over 95% text accuracy across multiple languages, outperforming existing models and maintaining consistent accuracy regardless of text length. Additionally, the system incorporates a grid system that enhances the precision of image analysis, offering valuable support for visual content in academic research. Together, these innovations position MATIC as a transformative tool, with broad applications ranging from advanced research to cross-linguistic communication.en_US
dc.description.sponsorship電機工程學系zh_TW
dc.identifier61175021H-46583
dc.identifier.urihttps://etds.lib.ntnu.edu.tw/thesis/detail/2ef807901482f0fd6e61f5625f5dfbf2/
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw/handle/20.500.12235/125038
dc.language英文
dc.subject多模態大型語言模型zh_TW
dc.subject擴散模型zh_TW
dc.subject思維鏈zh_TW
dc.subject文本生成圖像zh_TW
dc.subjectMultimodal Large Language Modelen_US
dc.subjectStable Diffusionen_US
dc.subjectChain-of-Thoughten_US
dc.subjectText-to-Imageen_US
dc.title整合生成式人工智慧於多語言文本圖像客製化zh_TW
dc.titleIntegration of Generative Artificial Intelligence Models: Multilingual Accurate Textual Image Customizationen_US
dc.type學術論文

Files

Collections