國立臺灣師範大學應用華語文學系張莉萍蔡雅薰2014-10-302014-10-302011-07-31http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/31322此計畫為整合型計畫「華語學習者中介語料庫之建構計畫」之子計畫三,與其他 三個子計畫「子計畫一:華語學習者中介語料庫架構及檢索系統」、「子計畫二:單句 語料庫之建構」、「子計畫四:手寫寫作語料庫之建構」關係密切。此計畫所建置之語 料來源為教育部委託國家華語測驗推動委員會所執行的華語文能力測驗,其中的電腦 寫作測驗,因此語料性質不同於其他兩個子計畫。目前這個考試的等級對應於歐洲共 同語文參考架構(CEFR),分為六個等級,考生可以選擇符合自己程度等級參加考試, 成績分為通過與不通過兩種,分數為級分制(0-5級),得到3分以上(包含3分)即 代表通過。因此在語料上,可以查詢到考生母語、參加等級程度及各分項得分記錄等 等。 這個計畫預計三年蒐集270萬字考生作文,270萬字經過第一階段少部分錯誤修正 後,採中央研究院自動斷詞系統與詞類標記處理後,為帶有詞類標記及經過斷詞的語 料庫。其中100萬詞將透過研究人員判定,輸入錯誤(偏誤)標記,成為一個帶有錯 誤標記,可以做為華語中介語分析理論、華語習得理論等基礎工具。計畫第二年與第 三年將分別進行考生作文偏誤頻次與評分間之相關分析,以及探討考生在不同能力等 級所展現的漢語關鍵特徵(criterial feature)為何。As one of the subprojects of ‘The Construction of an Interlanguage Corpus of No-Native Learners of Chinese Mandarin’, this project aims at building a 2.7 million Chinese characters learner corpus which is based on the Test of Proficiency writing test (thereafter, TOP-writing test). The TOP-writing test has two different tasks for each level, according to CEFR six levels. For example, for the A2 level candidates, the TOP-Beginner (A2 level) writing test asks them to write a note or card for immediate needs and describe a story after looking at four pictures. For the B1 level candidates, the TOP-Learner writing test asks them to write a personal letter to family or close friends and a narrative about daily life events. For research in Chinese interlanguage, learning Chinese as a second language, theoretical studies of Chinese teaching and other applied areas such as development of Chinese teaching materials, tests and others, 1 million characters will be tagged with syntactic categories and error codes. Both the raw and tagged data will be available with a user-friendly interface for any teachers and researchers to use. The establishment of the first open access learner corpus and the strong research team with linguists, computer experts, and Chinese teachers both from Taiwan and overseas in our group will no doubt lay the theoretical foundation of Chinese learning and mark Taiwan as a leading place for the research and teaching Chinese as a second language. The job descriptions of each year are given as follows: In the first year, about 700,000 characters will be collected. Meanwhile, error tagging for 100,000 will be studied and tested. In the second year, 1,000,000 more characters will be collected and about 1,000,000 characters will be tagged with syntactic categories and error codes. In the third year, in addition to the collection of 1,000,000 characters. Initial studies of Chinese interlanguage and criterial syntactic features for each proficiency level will be conducted.學習者語料庫中介語考試作文偏誤標記Learner CorpusinterlanguageTOCFL-writing testerror tagging華語學習者中介語料庫之建構計畫---子計畫三:電腦寫作考試語料庫之建構