華語為二語學習者之搭配詞能力發展:台灣華語文測驗學習者寫作語料分析

No Thumbnail Available

Date

2022

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

本研究旨在分析華語為二語學習者在華語文能力測驗(TOCFL)所寫的文章中二字詞(bigram)的詞彙關聯和分佈的發展。作者檢視四個精熟度級別的二語學習者所產出的文本,總計2836篇,並評估該文本中所有連續的兩字詞,將它們與來自中央研究院現代漢語平衡語料庫的華語母語者的雙字搭配詞組進行比較。作者採用中研院現代漢語平衡語料庫之二字詞表作為詞典,取得四種詞彙指標: mutual information(MI)、Delta P、inverse document frequency(IDF)和unseen rate對二語文本中的二字詞(bigram)進行分析,以評估二語學習者之搭配詞組能力。本論文進行了兩項統計分析:二因子變異數分析(two way ANOVA)和事後趨勢分析(post hoc trend analysis)。二因子變異數分析一方面檢驗了學習者精熟度級別和L2文本文體之間的關係,另一方面檢驗了學習者詞彙指標之得分。研究結果發現,在MI、backward Delta P和unseen rate,文體對詞彙指標分數有明顯的交互作用。中級學習者表現出最低的平均MI分數,這可能是導因於中級學習者的詞彙量增加與其實驗心態。此外,Backward Delta P分數沒有隨著級別上升而有明顯的增加趨勢。唯一的上升發生在B1信件,這可能是起因於B1學習者所使用的局部語法結構(local constructions)。另外,在C1中觀察到的unseen bigram大多被認為是分歧的表述(divergent representations),這表明進階學習者努力想出一些組合來表達他們的想法,即使這些組合可能不為大多數母語者使用。另外,在forward Delta P和IDF中,文體與詞彙指標分數不存在交互作用。Forward Delta P分數隨著級別而增加,這反映人類語言處理的從左到右的方向。高IDF之二字詞在A2和C1學習者中更為普遍被使用,原因是A2與C1學習者使用了相當多與現實生活相關或者特定領域二字詞。本論文對多元層面的二語學習者二字詞能力進行了全面的分析,並強調華語為二語教學中,單字以外的多字詞組能力之重要性。
The current study evaluates the development of the bigrams’ lexical associations and distributions in texts written by Chinese as a second language (L2) learners during the TOCFL writing test. Four proficiency levels were included for analysis, amounting to 2836 compositions in total. All contiguous two-word combinations in L2 texts were evaluated by comparing them to Chinese native speakers’ collocation patterns taken from a reference corpus, the Academia Sinica Balanced Corpus of Modern Chinese. To examine the relationship between learners’ proficiency levels and their multifaceted collocation competence, four distributional metrics were adopted—mutual information (MI), Delta P, inverse document frequency (IDF), and unseen rate. With the help of Sinica Corpus’ bigram list as a dictionary, co-occurring two-word combinations in L2 texts were given collocability scores to assess their collocation competence. The current thesis performed two quantitative analyses: two-way ANOVA and post hoc trend analysis. The two-way ANOVA test examined the relationship between learners’ proficiency levels and L2 text genres on the one hand and their collocability scores on the other.It has been found that in MI, backward Delta P and unseen rate, GENRE has a significant interaction on the collocability metric. In MI, although different genres show varying developments across the proficiency levels, the intermediate levels show the lowest mean MI scores, which could be attributed to L2 learners’ increase in vocabulary size and the experimental minds. In backward Delta P, no ascending trend is found. The only increase is shown in B1 letters, which could be attributed to the emergence of local grammatical constructions at the B1 level. The unseen bigrams observed in C1 are mostly regarded as divergent legitimate representations, showing that the advanced learners have tried hard to come up with words to convey their ideas even if these word combinations may not be commonly used by most native speakers. On the contrary, in forward Delta P and IDF, there is no LEVEL and GENRE interaction. Forward Delta P has a positive linear trend across all proficiency levels, which reflects human language processing's preferred left-to-right orientation. The use of high-IDF bigrams is more common among A2 and C1 learners for two reasons. The employment of bigrams that are relevant to real-life circumstances may have resulted in high-IDF bigrams at the A2 level. Domain-specific issues may explain the high-IDF bigrams at the C1 level. The current thesis provides a comprehensive analysis of the multifaceted L2 collocation competence, and highlights the importance of formulaicity beyond single words in CSL.

Description

Keywords

基於使用的理論, 語料庫, 搭配詞, 相互資訊, Delta P, 反文檔頻率, 華語為第二語言, usage-based theory, corpus, collocation, mutual information, Delta P, inverse document frequency, Chinese as a second language (CSL)

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By