以語料庫為本的學術英文字串使用分析 A Corpus-based Analysis of the Use of Lexical Bundles in English Academic Writing

Yu-Hsiu Lin
字串的研究在近年來引起了廣泛的興趣。字串意指在一語體中利用頻率為依據所得的重複出現的詞組。大部分的詞彙群研究只針對母語人士的語料進行分析,僅有少數涵蓋非母語人士的語料。學習者語料庫研究者建議,這樣語言特色的使用落差應加以研究,透過比較母語與非母語人士,第二語言學習者的不足能夠被揭露並提供外語教學努力的方向。 本研究旨在探討母語和非母語人士在應用語言學領域的學術寫作中,四字字串的使用情形。主要研究目的包含: (1) 找出母語與非母語人士的學術寫作中頻繁且廣泛使用的字串;(2) 分析這些字串所呈現的結構與其在言談中所扮演的功能;(3) 探討非母語人士在這些字串的使用上,和母語人士相比,是否呈現多用與少用。 本研究建置了兩個學術寫作語料庫:其一是兩百篇由母語人士撰寫並發表於應用語言學期刊中的研究論文,另一個是四百篇由台灣應用語言學領域的學者與研究生於相關研討會中所發表的會議論文。兩語料庫字數分達一百四十萬字 與一百六十萬字。研究者首先找出每百萬字中重複使用超過20次、並出現在百分之十以上的文章總數的字串,接著將它們依照結構與功能分類,並利用統計分析判斷非母語人士多用或少用了哪些字串。 研究結果顯示,非母語人士在學術寫作中較少使用字串:母語人士總共使用了151種字串,而非母語人士只用了66種。統計分析也指出非母語人士大量少用母語人士常用的字串。在151個母語人士常用字串中,非母語人士就少用了112個,其中許多是用來架構論述以及表達作者態度判斷與吸引讀者注意。此結果顯示非母語人士對於其學術領域如何建構知識與呈現論述的方式並非完全熟悉,他們似乎也為了作者的客觀語氣而忽略了學術寫作中的互動性。進一步的語料索引分析更顯示非母語人士與母語人士相比,擁有的語言資源較少,因此會有依賴某些表達以及少用同義的字串的情況產生。 非母語人士同時過度使用了40個字串,其中一部分是用來指稱研究場域時間,以及導引文章發展與讀者的用語,這部分的過度使用可能反映出研討會論文的本質。此外,非母語人士也多用表達因果關係的字串,可能導因於他們過分強調結果的呈現。另外有四個用來表達態度的字串,在母語人士語料中鮮少出現,表示非母語人士對於應用語言學領域中學術寫作措辭不盡熟悉。 本研究結果指出台灣英語學習者和母語人士在學術寫作字串的使用上,不僅頻率上有明顯差異,也少用很多母語人士常用的措辭,同時過度使用在母語語料中頗為罕見的字串。研究結果除了可以作為應用語言學領域中英語學術寫作的教學資源,也提供字串研究一些可繼續探究的議題。
Lexical bundle research has attracted much interest in recent years. Lexical bundles are recurrent multiword sequences derived with a frequency-driven approach in a given register. While previous research has been largely conducted with native language data, only a few studies have discussed how nonnative speakers employ bundles in their language production. The gap between native and nonnative speakers’ use of the feature, as advocates of learner corpus research suggest, should be explored and can inform EAP pedagogy for more L2 learners’ linguistic deficiencies can thus be revealed through such comparison. This study intends to help fill the gap and aims to investigate the use of 4-word lexical bundles in academic writing by native and nonnative speakers of English in the field of applied linguistics. The purposes of the study are: (1) to identify lexical bundles in the corpora, (2) to analyze their the structural patterns and the functional purposes, and (3) to investigate the extent to which Taiwanese writers, in comparison with the native writers, have exhibited overuse and underuse of the lexical bundles. Two academic written corpora were compiled: the Native Speakers Corpus (NSC), a collection of two hundred research articles written by native speakers in published journals in applied linguistics, and the Nonnative Speaker Corpus (NNSC), a compilation of four hundred conference papers written by Taiwanese writers and presented in conferences in the field. The corpora respectively contained approximately 1.4 and 1.6 million words. Lexical bundles which occurred at least 20 times per million words and in at least 10% of all texts in the corpora were identified and categorized according to their structural patterns and functional purposes. Statistical analysis was then conducted to determine whether the bundles have been overused or underused by the nonnative speakers. The investigation and comparison have yielded a number of interesting findings. First, the native speakers used 151 types of lexical bundles. The Taiwanese writers used only 66 types. The results showed that the nonnative speakers overall used fewer lexical bundles in their academic writing. Second, the statistical analysis indicated that the nonnative speakers largely exhibited underuse of lexical bundles that were frequently used by the native speakers. Out of the 151 types in the NSC, 112 were underused. Many of them functioned as devices to frame arguments and express writers’ attitudinal judgment and attention-drawing purposes. This may suggest that the Taiwanesewriters were not fully aware of the discursive ways in which their discipline constructs knowledge and presents arguments. They may also neglect the interactive aspect in academic writing, which may be a result of an avoidance of referring to the authors so as to sound objective. In-depth corpus analysis further revealed that the nonnative speakers had a more limited linguistic repertoire, which, as a result, may have led to their overreliance on certain expressions and underuse of bundles that are synonymous. The nonnative speakers also overused 40 bundles. The overuse of bundles that specify research topic and location, along with structuring bundles, may reflect the nature of the nonnative corpus. Other overused bundles, including resultative signals, are likely due to the writers’ overemphasis on presenting results to persuade. Four overused stance signaling bundles were all very rarely used by the native speakers. This again shows that the Taiwanese writers may not be entirely familiar with the phraseology in academic writing in their discipline. On the basis of the findings, several pedagogical implications were drawn for English academic writing instruction in applied linguistics and possible directions for future lexical bundle research were suggested.
字串, 語料庫分析, 學術英文寫作, lexical bundles, corpus analysis, English academic writing