漢語音譯用字傾向的語料庫研究:以臺灣與中國大陸新聞為例

No Thumbnail Available

Date

2013

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

漢語音譯詞與其他語言最主要的音譯不同之處,在於決定以何種譯音對應來源語發音之後,仍必須從同音漢字中選擇一個作為產出的音譯字。過去音譯研究雖不乏關於選字原則的討論,但多屬以音譯辭典等工具書為依據的質性分析,鮮有收集第一手語料的量化分析。本研究試圖利用語料庫與統計方法,找出現代漢語中有生產力的音譯字,描述現代漢語的音譯用字規範。研究者利用程式,從四個臺灣及中國大陸的新聞網站收集篇章,建置新聞語料庫,然後從中擷取帶有括號夾註原文的音譯詞,根據詞彙指涉對象的性質,加上「人名」(在可判別的情況下並標註性別)、「地名」、「其他實體名」等標記,製成四個音譯詞子語料庫,觀察子語料庫中的音譯字對應於新聞語料庫所有漢字的分布,並利用對數概似比檢定(log-likelihood-ratio test),比較各種不同條件下的音譯用字差異。研究結果揭示了音譯字當中有約80%共通出現於各種音譯詞,約20%明顯傾向使用於特定條件,顯示出漢語音譯用字規範內部的不同質。
The primary difference between transliteration in Chinese and that in other languages is the necessity of choosing one among many homophonous characters of the pronunciation that is chosen to represent the source language sound. Most previous transliteration studies that discuss the principles of the character choosing process were qualitative, using reference books such as transliteration dictionaries as sources, while few were primary-data-driven quantitative analyses. This study attempts to find the productive characters in contemporary Chinese transliteration and describe the norms of contemporary Chinese transliteration from a corpus-based, statistical approach. The researcher compiles four news corpora from Taiwan and Mainland China news websites. Four transliteration sub-corpora are then compiled by extracting from these news corpora transliterations with their corresponding source language words in parentheses and annotating them as “person” (with gender tags when possible), “place” or “other entity” according to the nature of their referents. The researcher observes the distribution of the characters in the transliteration sub-corpora vis-a-vis the news corpora as well as the difference in character usage under various conditions using log-likelihood ratio tests. The result shows that roughly 80% of the characters used in transliteration are common to all categories of transliterations, while the rest 20% tend significantly to be used under certain conditions, a sign of the non-homogeneity within the norm of character usage in Chinese transliteration.

Description

Keywords

音譯, 用字, 選字, 語料庫, 對數概似比檢定, transliteration, character usage, character choosing, corpus, log-likelihood-ratio test

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By