整合生成式人工智慧於多語言文本圖像客製化

吳巧心; Wu, Chiao-Hsin

整合生成式人工智慧於多語言文本圖像客製化

dc.contributor	賴以威	zh_TW
dc.contributor	伊藤一秀	zh_TW
dc.contributor	Lai, I-Wei	en_US
dc.contributor	Kazuhide ITO	en_US
dc.contributor.author	吳巧心	zh_TW
dc.contributor.author	Wu, Chiao-Hsin	en_US
dc.date.accessioned	2025-12-09T08:03:02Z
dc.date.available	2030-02-04
dc.date.issued	2025
dc.description.abstract	生成式人工智慧已迅速發展，能夠自動創建內容，對創意、交流及生產力產生深遠影響。然而，生成式人工智慧仍面臨諸多挑戰，尤其是在實現準確且可靠的文字圖像方面。當前的研究往往難以處理非拉丁字母的文字，並無法有效提高文字準確性，限制了生成包含文字的圖片的應用潛力。為解決這些限制，本研究提出了多語言文本圖像客製化系統（Multilingual Accurate Textual Image Customization, MATIC），該系統結合了多模態大型語言模型與擴散模型並使用思維鏈的推理方法，將複雜任務分解為易於處理的子任務以提升多語言文字元素的準確性。實驗結果表明MATIC 在多種語言的文字準確率超過 95%，表現優於現有模型，且無論文本長度如何均能保持穩定的準確性。此外，此系統包含了增強圖像分析精確度的網格系統，能夠輔助學術研究中的圖像分析。綜合而言，這些創新使 MATIC 成為一種變革性工具，應用範疇涵蓋從學術研究到跨語言交流。	zh_TW
dc.description.abstract	Generative Artificial Intelligence (GAI) has rapidly advanced, enabling autonomous content creation that impacts creativity, communication, and productivity. However, notable challenges remain, particularly in achieving accurate textual images. Current research often struggles with non-Latin scripts and fails to effectively improve text accuracy, leaving a gap in GAI’s applicability across diverse visual contexts.To address these limitations, this study introduces the Multilingual Accurate Textual Image Customization (MATIC) system, which integrates Multimodal Large Language Models with diffusion models. Utilizing a Chain-of-Thought reasoning approach, MATIC decomposes complex tasks into manageable sub-tasks, enhancing the accuracy of multilingual textual elements.Experimental results demonstrate that MATIC achieved over 95% text accuracy across multiple languages, outperforming existing models and maintaining consistent accuracy regardless of text length. Additionally, the system incorporates a grid system that enhances the precision of image analysis, offering valuable support for visual content in academic research. Together, these innovations position MATIC as a transformative tool, with broad applications ranging from advanced research to cross-linguistic communication.	en_US
dc.description.sponsorship	電機工程學系	zh_TW
dc.identifier	61175021H-46583
dc.identifier.uri	https://etds.lib.ntnu.edu.tw/thesis/detail/2ef807901482f0fd6e61f5625f5dfbf2/
dc.identifier.uri	http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/125038
dc.language	英文
dc.subject	多模態大型語言模型	zh_TW
dc.subject	擴散模型	zh_TW
dc.subject	思維鏈	zh_TW
dc.subject	文本生成圖像	zh_TW
dc.subject	Multimodal Large Language Model	en_US
dc.subject	Stable Diffusion	en_US
dc.subject	Chain-of-Thought	en_US
dc.subject	Text-to-Image	en_US
dc.title	整合生成式人工智慧於多語言文本圖像客製化	zh_TW
dc.title	Integration of Generative Artificial Intelligence Models: Multilingual Accurate Textual Image Customization	en_US
dc.type	學術論文

Collections

學位論文

整合生成式人工智慧於多語言文本圖像客製化

Files

Collections