華語學習者中介語料庫之建構計畫---子計畫一

No Thumbnail Available

Date

2010/08-2011/07

Authors

陳浩然
高照明
張俊盛

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

許多學者教師均認為語料庫對語言學習和教學有相當潛力。晚近興起的學習者語料 庫更受到眾多語言教師,研究人員的重視。學習者語料庫是通過收集語言學習者各種書 面語或口語的自然語料而建立起來的一種語言學習資料庫。就華語為第二(外)語而言, 運用學習者語料庫中的語料,可以進行對外華語教學的多方面研究。例如華語中介語研 究、第二語言習得研究、對外華語教學理論研究、對外華語教材研究、華語考試研究與 對外華語教學相關的華語研究等等。這些研究對提高華語教學、華語測試、華語習得研 究等方面的水準,都具有重要意義。 此外,也可對華語紙面教材,華語數位教材,華 語數位學習工具等研究提供珍貴參考資源。 目前在台灣大規模可用的華語學習者中介語料庫尚未建立,為提升華語教學研究, 亟需建立語料庫。台灣師大國語教學中心是收集華語學習者中介語資料的最佳地點,因 為國語中心每期(三個月)學生超過1,700 人,教師人數超過180 人,全年每季皆招收 學生,且中心學生來自世界各地,有日本、韓國、越南、泰國、印尼、歐美等國家,再 加上中心電腦網路設備齊全,皆足以進行大量華語語料之收集。若能藉由台灣師大國語 教學中心建立華語學習者中介語語料庫,則從事華語研究和教學之相關人員皆可使用此 資料庫來進行研究及運用研究成果於華語教學實務上,對於華語發展應有莫大幫助。本 計畫是華語學習者中介語料庫之建構計畫的第一個子計畫。預計三年期間將完成一個約 共約八百七十百萬詞(大部分含人工標註其中錯誤)華語學習者中介語料庫。各年預定 目標如下: 第一年: 藉由電腦輔助人工註記錯誤及電腦自動詞類分析,我們可以將各種華語學習 者的中介語語料(單句語料庫、電腦寫作考試語料庫、手寫寫作語料庫)充分加值,讓 它呈現出更多的資訊,也讓華語教師及研究人員進一步地檢索及分析。 第二年:將第一年已加註記的學習者語料逐步上線,持續將各子計畫所註記之語料納入 語料庫。建構各式搜尋功能、嘗試找出最佳的使用者介面。我們將把三個子計畫的語料 同時上網,提供使用者以下列幾種方式搜尋:1.未經任何註記的原始語料 2.經過詞性 註記的學習者語料 3.已標記上錯誤類型的學習者語料 第三年:持續將各子計畫所註記之語料納入語料庫,並持續改善搜尋之介面,速度及相 關功能。也將利用已納入語料庫之資料進行學習者語料分析,並利用已註記之語料進行 搭華語配詞教材之開發。華語搭配詞目前並沒有良好的學習素材,所以本計畫將利用學 習者語料找出學生搭配詞上的困難,並搭配現有之大型中文語料庫所抽取之搭配詞,編 輯出適合華語為第二語言學習者的教材。
Many researchers and language teachers believe that language corpora have great potentials for language learning and language teaching. The learner corpora in particular received much attention recently. Learner corpora are large collections of texts based on second language learners’ written or oral production. For Chinese as a second/foreign language research, the learner corpus can be used in many research topics. Based on the learner corpus, researchers can conduct research on Chinese second language learners’ interlanguage, second language acquisition research, Chinese language assessment and Chinese language pedagogy. In addition, the findings from the learner corpus can also be used in developing Chinese teaching materials and other digital learning tools an resources. Although there are several Chinese as second language learner corpus, there is no large Chinese as second language corpus developed in Taiwan. It is not easy to collect the interlanguage data in Taiwan. However, the needs for developing a Chinese learner corpus are very strong. MTC (Mandarin Teaching Center) at national Taiwan Normal University is one of the largest Chinese learning centers in Taiwan. There are more than 1700 students enrolled in each quarter, and there are more than 180 teachers in this center. Students from more than 70 countries are studying in this center. It seems that MTC is an ideal location to collect the important data. The interlanguage data should be valuable for Chinese as second language acquisition researchers and Chinese language teachers. In this project, we will develop a 8.7-million Chinese as a second language interlanguage corpus. In addition, the learner corpus will be manually tagged with error tags. The tagged interlanguage corpus will be very useful for research and teaching. There are three sub-corpora in the project. The corpus will be made available to other teachers and researchers. The goals for each year will be briefly stated below. The first year: we will use both manual tagging and computer automatic tagging to make the learner data more accessible to users of the web-based concordancing system. The tags (error-tags and part-of-speech tags) should help users to retrieve and uncover the hidden interlanguaeg patterns more easily and efficiently. The second year: The tagged corpora by the other three teams of this project will be made available online in the second year. The research team in this project will develop a user-friendly interface for the 7-millon learner corpus. The users can have various options in searching various sub-corpora. They can search the raw data, the POS-tagged corpus, and the error-tagged corpus. Each type of tagging will allow researchers, teachers, and students to obtain various useful information from the corpus. The third year: More data will be imported into the web-based concordancing system. The interface and retrieval speed will be further improved. After we load all the learner data into the system, the research team will provide further analysis to retrieve the important interlanguage pattern information from the corpus. Moreover, the collocation learning and teaching has become a very crucial research topic in recent years. We will use the powerful Sketch Engine developed in United Kingdom to uncover the collocation patterns in both Chinese native speaker corpora and the interlanguage corpus. The comparison of these two sets of collocates will help the research team to develop some high-quality interlanguage teaching and learning materials for Chinese as a second language learners

Description

Keywords

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By