中文動詞上下位關係自動標記法

No Thumbnail Available

Date

2009

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

近年來,詞彙網路(Wordnet)已成為計算語言學相關領域中最為普遍利用的資源之一,對於資訊檢索(Information Retrieval)或是自然語言處理 (Natural Language Processing)的發展有相當大的幫助。詞彙網路是由同義詞集(Synset)以及詞彙語意關係(Lexical Semantic Relation)所建 構而成,例如以英語為主的普林斯頓詞網(Princeton WordNet)、以及結合多個歐洲語言的歐語詞網(EuroWordNet)等,建構皆已相當完善。然而,一個詞網的建構並非一時一人之力所能完成,其所需要的人力以及耗費的時間相當可觀。因此,如何有效率並有系統的建構一個詞網是近年來研究致力的目標。而詞彙間的語意關係是構成一個詞網的主要元素,因此,如何自動化的抽取詞彙語義關係是建構詞網的重要步驟之一。中研院語言所已建立一個以中頻詞為主的中文詞彙網路(Chinese WordNet, CWN),旨在提供完整的中文辭彙之詞義區分。然而,在目前中文詞彙網路系統中,同義詞集間相互的語意關係乃是採用人為判定標記,且這些標記之數量尚未達成可行應用之一定規模。因此,本研究提出一套半自動化的方法來自動標記詞彙間的語意關係,本篇論文針對動詞之間的上下位詞彙語意關係(Hypernymy-troponymy elation),提出一種自動標記的方法,並抽取具有中文上下位關係之中文動詞組對。 本篇論文提出兩種並行之方法,第一,藉由句法上特定的句型(lexical syntactic pattern),自動抽取出中文詞彙網路中具有上下位關係之動詞組。第二,我們利用bootstrapping的方法,透過中研院建構的中英雙語詞網(Sinica Bow)大量將普林斯頓英語詞網中的語意關係對映至中文。實驗結果顯示,此系統能快速並大量地自動抽取出具有上下位語意關係之中文動詞組,本論文盼能將此方法應用於正在發展中的中文詞網自動語意關係標記,以及知識本體之自動建構,進而能有效率的建構完善的中文詞彙知識資源。
WordNet-like databases have become crucial sources for lexical semantic studies and computational linguistic applications such as Information Retrieval (IR) and Natural Language Processing (NLP). The fundamental elements of WordNet are synsets (the synonymous grouping of words) and semantic relations among synsets. However, creating such a lexical network is a time-consuming and labor-intensive project. In particular, for those languages with few resources such as Chinese, is even difficult. Chinese WordNet (CWN), which composed of middle frequency words, has been launched by Academia Sinica based on the similar paradigm as Princeton WordNet. The synset that each word sense locates in CWN is manually labeled. However, the lexical semantic relations among synsets in CWN are only partially constructed and lack of systematic labeling. Therefore, in this thesis, two independent approaches were proposed to automatically harvesting lexical semantic relations, especially focused on the hypernymy-troponymy relation of verbs. This thesis describes two approaches for discovering hypernymy-troponymy relation among verbs. Syntactic pattern-based approach is used for that sentence structures can always denote relations and reveal information among lexical entries. Bootstrapping approach, on the other hand, aims at exploiting an already existing database and combining them within a common, standard framework. From a large scale of input data, our proposed approaches can greatly and rapidly extract verb pairs that are in hypernymy-troponymy relation in Chinese, aiding the construction of lexical database in a more effective way. In addition, it is hoped that these approaches will shed light on the task of automatic acquisition of other Chinese lexical semantic relations and ontology learning as well.

Description

Keywords

語義關係自動標記, 動詞詞彙語意, 動詞上下位關係, 中文詞網, Automatic extraction, Lexical semantic relation, Troponymy, Chinese WordNet

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By