整合生成式人工智慧於多語言文本圖像客製化
No Thumbnail Available
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
生成式人工智慧已迅速發展,能夠自動創建內容,對創意、交流及生產力產生深遠影響。然而,生成式人工智慧仍面臨諸多挑戰,尤其是在實現準確且可靠的文字圖像方面。當前的研究往往難以處理非拉丁字母的文字,並無法有效提高文字準確性,限制了生成包含文字的圖片的應用潛力。為解決這些限制,本研究提出了多語言文本圖像客製化系統(Multilingual Accurate Textual Image Customization, MATIC),該系統結合了多模態大型語言模型與擴散模型並使用思維鏈的推理方法,將複雜任務分解為易於處理的子任務以提升多語言文字元素的準確性。實驗結果表明MATIC 在多種語言的文字準確率超過 95%,表現優於現有模 型,且無論文本長度如何均能保持穩定的準確性。此外,此系統包含了增強圖像分析精確度的網格系統,能夠輔助學術研究中的圖像分析。綜合而言,這些創新使 MATIC 成為一種變革性工具,應用範疇涵蓋從學術研究到跨語言交流。
Generative Artificial Intelligence (GAI) has rapidly advanced, enabling autonomous content creation that impacts creativity, communication, and productivity. However, notable challenges remain, particularly in achieving accurate textual images. Current research often struggles with non-Latin scripts and fails to effectively improve text accuracy, leaving a gap in GAI’s applicability across diverse visual contexts.To address these limitations, this study introduces the Multilingual Accurate Textual Image Customization (MATIC) system, which integrates Multimodal Large Language Models with diffusion models. Utilizing a Chain-of-Thought reasoning approach, MATIC decomposes complex tasks into manageable sub-tasks, enhancing the accuracy of multilingual textual elements.Experimental results demonstrate that MATIC achieved over 95% text accuracy across multiple languages, outperforming existing models and maintaining consistent accuracy regardless of text length. Additionally, the system incorporates a grid system that enhances the precision of image analysis, offering valuable support for visual content in academic research. Together, these innovations position MATIC as a transformative tool, with broad applications ranging from advanced research to cross-linguistic communication.
Generative Artificial Intelligence (GAI) has rapidly advanced, enabling autonomous content creation that impacts creativity, communication, and productivity. However, notable challenges remain, particularly in achieving accurate textual images. Current research often struggles with non-Latin scripts and fails to effectively improve text accuracy, leaving a gap in GAI’s applicability across diverse visual contexts.To address these limitations, this study introduces the Multilingual Accurate Textual Image Customization (MATIC) system, which integrates Multimodal Large Language Models with diffusion models. Utilizing a Chain-of-Thought reasoning approach, MATIC decomposes complex tasks into manageable sub-tasks, enhancing the accuracy of multilingual textual elements.Experimental results demonstrate that MATIC achieved over 95% text accuracy across multiple languages, outperforming existing models and maintaining consistent accuracy regardless of text length. Additionally, the system incorporates a grid system that enhances the precision of image analysis, offering valuable support for visual content in academic research. Together, these innovations position MATIC as a transformative tool, with broad applications ranging from advanced research to cross-linguistic communication.
Description
Keywords
多模態大型語言模型, 擴散模型, 思維鏈, 文本生成圖像, Multimodal Large Language Model, Stable Diffusion, Chain-of-Thought, Text-to-Image