跨語言資訊檢索
No Thumbnail Available
Date
2002-04-??
Authors
陳信希
Hsin-His Chen
Journal Title
Journal ISSN
Volume Title
Publisher
國立台灣師範大學圖書資訊研究所
Graduate Institute of Library and Information Studies
Graduate Institute of Library and Information Studies
Abstract
多語性是網路社會的重要特徵之一,如何將網路資源,介紹給不同語言的使用者,同時吸收其他語言所呈的資訊,都是資訊國際化不能忽略的重要課題。語言資訊檢索,提供使用者以某種語言檢索另外一種語言表達的文件,為近年來很活躍的研究題目之一。本文嘗試將這個研究主題相關的理論和技術,介紹給有興趣的讀者。首先探討詢問翻譯、文件翻譯、和不翻譯等三類基本方法。接著考慮翻譯歧義性和目標多義性,以及專有名詞音譯等進階方法。評估是促進技術進步的必要工作,本文最後也介紹跨語言資訊檢索三大評比:TREC、CLEF與NTCIR。
Multilinguality is one of the major characteristics in network era. The trend toward information globalization has brought new challenges for information management. On the one hand, it is often necessary to share to valuable resources on the web with users of different languages. On the other hand, it is also necessary for a user to utilize knowledge presented in a foreign language. This paper introduces related theories and technologies of cross language information retrieval, which is kernel in multingual information management. The basic concepts are presented in sequence on the basis of the classification of query translation, document translation, and no translation. Besides, some advanced concepts like translation ambiguity and target polysemy, as well as proper name transliteration are discussed. Performance evaluation is indispensable for improvement. This paper also shows three world-wide IR evaluation, including TRCE, CLEF and NTCIR.
Multilinguality is one of the major characteristics in network era. The trend toward information globalization has brought new challenges for information management. On the one hand, it is often necessary to share to valuable resources on the web with users of different languages. On the other hand, it is also necessary for a user to utilize knowledge presented in a foreign language. This paper introduces related theories and technologies of cross language information retrieval, which is kernel in multingual information management. The basic concepts are presented in sequence on the basis of the classification of query translation, document translation, and no translation. Besides, some advanced concepts like translation ambiguity and target polysemy, as well as proper name transliteration are discussed. Performance evaluation is indispensable for improvement. This paper also shows three world-wide IR evaluation, including TRCE, CLEF and NTCIR.