於有限標示資料下可擴展關係擷取之學習策略

No Thumbnail Available

Date

2020

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

本論文以國中生物課本文本內容作為語料庫來源,研究如何從有限標示的中文文本資料中進行關係擷取。本論文將此問題分成三元詞組偵測及語意關係分群及分類兩個處理任務,我們對三元詞組偵測任務提出結合關係詞類標籤模型及句型分類模型,並搭配遷移式學習、句型分類微調、及條件隨機域預測的學習策略,輸出句子中可能包含的三元詞組;對語意關係分群及分類任務則提出兩階段分群演算法,找出三元詞組中語意關係相似的聚落,並搭配半監督式學習的策略指定聚落的關係類別,達到可擴展關係擷取的目的。本論文實驗顯示:採用 BERT模型加上各元件及學習策略時,可讓原模型達到更好的標籤預測效果,另外所提出之兩階段分群演算法也較傳統分群演算法得出三元詞組的聚落有更高的關係類別純度。最後結合兩個任務所提方法,在具一般關係詞類標籤的來源領域資料輔助下,本論文所提方法只需極少數目標領域中已指定關係類別的三元詞組標示資料,即可達到約 66% 的正確率,且較需大量標示資料的監督式學習關係擷取方法有更高的正確率。
In this paper, we study how to train a model for relation extraction from limited labeled data. We solve the problem by two sub-tasks: 1) triples detection and 2) triples clustering and classification. In the task of triples detection, a tagging model and a sentence classification model are proposed. The strategies of transfer learning, ensemble classifier for different types of sentences, and CRF are combined to extract the triples in a sentence. For the extracted triples, a two-phase clustering algorithm is proposed to discover the groups of triples which have semantics-similar relationship terms. The discovered groups are then assigned to the corresponding relation types by a modified KNN algorithm by a small set of labeled data. Accordingly, the proposed semi-supervised learning strategy can achieve extendable relation extraction. The results of experiments show that, when the BERT model is combined with CRF and the various training strategies, the primitive model can get better tagging prediction. In addition, the proposed two-phase clustering algorithm can obtain a higher purity of relation type in the discovered group of triples compared with the traditional clustering algorithms. Finally, the method proposed in this paper only needs a very small number of labeled triples with specified relation types in the target domain to achieve accuracy 66%, whose performance is better than the supervised learning approach requiring a much larger dataset of labeled triples.

Description

Keywords

關係擷取, 自然語言處理, 遷移式學習, none

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By