Please use this identifier to cite or link to this item:
|Title:||A Chinese word segmentation and POS tagging system for readability research|
Chang, T. H.
Sung, Y. T.
Lee, Y. T.
|Abstract:||In recent years, readability research has relied on applications of natural language processing techniques to analyze documents. However, Chinese sentences consist of characters and with no blanks between words. Therefore, a mistake on word segmentation and/or part-of-speech tagging for Chinese sentences will result in many errors in the follow-up analysis. CRF model,is recently the most popular and successful method for Chinese word segmentation. However, due to such problems as reiterative locution, unknown words and incomplete sentences, many readings for children cannot be processed accurately by CRF model. This study aims to develop a Chinese word segmentation and POS tagging system called WeCan. This system is composed of bigram model, SPLR algorithm, unknown words extraction and rule bases. WeCan has been applied to the preprocessing procedure of CRIE. In preliminary experiments, it also worked well on the elementary school textbook in Taiwan.|
|Appears in Collections:||教師著作|
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.