科普文章自動化分級模型的建置

楊旻容; Yang, Min-Jung

科普文章自動化分級模型的建置

Date

2015

Authors

楊旻容

Yang, Min-Jung

Abstract

　　閱讀是學習的初始，是人類學習知識的重要管道。科普雜誌是非常有助益的課外讀物，它幫助民眾瞭解或學習科學的基礎概念與應用，培養科學精神去思考、探究日常周遭的事物。對於學習者來說，適當的讀本才能讓其閱讀效果最佳化，國內科普雜誌出版商，為其販售的科普讀物規劃了適合的閱讀對象，不過有些雜誌因商業考量界定的閱讀對象範圍較廣，或者因其目標對象包含識字階段的學童，將家長與老師陪讀的因素也考量進去，沒有較精確的適讀年齡層可供參考。本研究挑選了三個版本共150篇由中文撰寫的科普雜誌文章，利用可讀性評估來衡量其適讀對象年級，考量國小學童與國中以上年級學生對於科普雜誌的學習需求不同，以一般語言特徵和概念特徵建構了二階段的分級模型，將國內1~12年級國社自三科教科書的可讀性特徵作為基礎，來預測範圍由小學三年級至大學一年級的科普文章，同時也邀請了九位國小至高中的自然專科教師對文章閱讀層級作評斷，作為與模型分析比較之基準。本研究建置之科普文章自動化分級模型的分類準確率為59.73%，根據老師們評斷結果施以彈性放寬處置，模型準確率為73.15%，可作為自然課外讀物分級的初步參考。
　　Reading is beginning of learning, and also a primary tool for knowledge acquiring. Popular Science magazine is a kind of helpful reading materials for readers. It assists people to realize or to learn the basic conception and application of science and also cultivates public a scientific spirit to think, to probe into every phenomenon of daily life. The level of reading materials which matches readers’ ability and purpose bring readers the best benefits in reading. The domestic publishers of Popular Science magazine provide their products a reference of appropriate readers, however they usually draw a large range of suitable readers’ ages, for the reason that they can profit more from their customers. Another reason is when students have some problems on reading magazines, parents and teachers may give them some aids, so publishers extend the target objects and it caused there are not any exactly leveled books for readers. This study selected 150 Popular Scientific articles writing or translation in the Chinese language from three different versions of the magazine. With the use of readability text classification, which combines linguistic features and concept words that were displayed by a list and have degree of difficulty, to construct a two-stage leveling models for different reading needs of different grades students. Readability Assessment could quantify the difficulty of the text, then students could choose the appropriate reading articles. The corpus of this two stage automated leveling models is based on 12 grades textbooks, which contains Chinese, social studies, and natural science three subjects, it could predict difficulty levels of any articles whether from a book or internet in scientific disciplines, with the readers’ grade range from 1st grade at primary school to 13th grade at university. To compare with the result of leveling models, the researcher also invited nine natural science teachers from primary, junior high and senior high school to estimate the suitable readers’ grade of these Popular Scientific articles. The rate of models’ accuracy is 59.73% for the strict standard and 73.15% for the less stringent but acceptable standard. The models could supply public a more precisely and verified result.

Keywords

可讀性, 科普文章, 文本分類, 潛在語意分析, 支援向量機, 特徵選取, Readability, Popular Scientific Article, Text Classification, Latent Semantic Analysis, Support Vector Machine, Feature Selection

URI

http://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22G060108004E%22.&%22.id.&
http://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/92713

Collections

學位論文

Full item page

科普文章自動化分級模型的建置

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By