字詞辨識中個別差異之量度:個人詞彙行為之角色探究
No Thumbnail Available
Date
2012
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
本論文旨在以語料庫與計算語言學的研究方法,量測字詞辨識中受試者表現之個別差異。字詞辨識為心理語言學領域關注的議題,過去的研究 (Katz et al., in press; Lewellen, Goldinger, Pisoni,& Greene, 1993; Sears, Siakaluk, Chow, & Buchanan, 2008; Unsworth & Pexman, 2003; Yap, Balota, Sibley, & Ratcliff, 2012) 主要皆藉由測驗或問卷的方式,如詞彙測驗、詞彙熟悉度問卷,探討其中個別差異的來源;然而,這樣的研究方法,往往侷限於測驗可及的範疇,且受限於單一測驗包含的詞彙、分數、量尺等等。
為了將研究範圍拓展至語言的實際使用面向上,本文從個人日常生活的詞彙行為 (lexical behaviors) 著手,提出「個人用詞之頻率指數」以及「個人詞頻」兩種變項的計量法;進而探討它們是否能解釋字詞辨識實驗中因受試者個人表現所造成的變異。研究經由四個步驟完成。第一,實施中文詞彙判斷作業 (lexical decision task),用以收集字詞辨識之實驗數據。第二,自動抽取各受試者的臉書貼文,並加以斷詞。第三,利用斷詞結果,來計算前述兩種詞彙行為變項之數值。「個人用詞之頻率指數」是依據個人所用之詞彙在中研院平衡語料庫中相對應的詞頻而計算。「個人詞頻」意指詞彙判斷的實驗刺激 (stimuli) 於個人臉書貼文中出現的頻率高低。第四,統計分析的部分,採用擅於估計個人差異的混合效果模式 (mixed-effects models)。
實驗結果顯示,「個人詞頻」效果顯著,受試者對於自己使用頻率較高的詞彙,反應較快;「個人用詞之頻率指數」較低的受試者,與預期相反地,正確率較低。此外,作為量度個人詞彙行為的先驅研究,本文亦提供計算方法論上的建議,如下所列。與預期相反的頻率指數結果,可能源於計量時所參照的平衡語料庫是由書面資料所組成,建議未來類似的實驗,應參照口語語料庫中的詞頻。另外,經由我們的實驗測試,即使自動斷詞的結果包含許多錯誤,利用該結果所得的個人總詞數來正規化其詞頻數,仍具有可行性。最後,當使用與臉書貼文一樣的自然語料 (naturalistic data) 進行計量時,建議研究個人的詞彙偏好或習性,而非個人使用的每一字詞。
This thesis aims to adopt a corpus-based computational linguistic approach to measure individual differences (IDs) in visual word recognition. Word recognition has been a cardinal issue in the field of psycholinguistics. Previous studies (Katz et al., in press; Lewellen, Goldinger, Pisoni,& Greene, 1993; Sears, Siakaluk, Chow, & Buchanan, 2008; Unsworth & Pexman, 2003; Yap, Balota, Sibley, & Ratcliff, 2012) examined the IDs by resorting to test-based or questionnaire-based measures (e.g. vocabulary tests and word familiarity questionnaires). Those measures, however, confined the research within the scope where they can evaluate, and also differentiated individuals within the boundary of limited scores, scales, or vocabularies. To extend the research to approximate to IDs in real life, the present study undertakes the issue from the observations of participants’ daily-life lexical behaviors. We proposed the methods to calculate "the frequency index of personal word usage" and "personal word frequency", and further investigated that whether each of them accounted for participants’ variances in word recognition. The investigation was carried out in four steps. First, a lexical decision task containing 912 Chinese stimuli was conducted so as to collect the data of visual word recognition. Second, each participant’s Facebook posts were automatically extracted and segmented into words. Third, based on those words, the two variables of individual lexical behaviors were computed. The frequency index per person was derived via his/her words’ corresponding frequencies in the Academia Sinica Balanced Corpus. The personal word frequency referred to the relative degrees to which a given word-recognition stimulus occurred in one’s Facebook posts. Fourth, experimental data were analyzed in mixed-effects models, which can precisely estimate by-subject differences. Results showed that the effects of personal word frequency reached significance; participants responded themselves more rapidly when encountering more frequently used words. People with lower frequency indices of personal word usage had a lower accuracy rates than others, which was contrary to our prediction. Besides, as a pioneer study of measuring lexical behaviors, this thesis also provides suggestions regarding the methodology, which are presented subsequently. The counter-prediction finding in the frequency index experiment was possibly attributed to that the Sinica Corpus mainly consists of written data; therefore, it is suggested that similar experiments in future research resort to the frequency counts in a spoken corpus. Additionally, according to our examination, a person’s total token number is feasible for normalizing his/her frequency counts even though word segmentation errors were contained within the tokens. Finally, when naturalistic data like the Facebook posts are utilized for the measurement, it is recommend basing the computation on personal preference or pattern of lexical usage, instead of on every single word in one’s language usage data.
This thesis aims to adopt a corpus-based computational linguistic approach to measure individual differences (IDs) in visual word recognition. Word recognition has been a cardinal issue in the field of psycholinguistics. Previous studies (Katz et al., in press; Lewellen, Goldinger, Pisoni,& Greene, 1993; Sears, Siakaluk, Chow, & Buchanan, 2008; Unsworth & Pexman, 2003; Yap, Balota, Sibley, & Ratcliff, 2012) examined the IDs by resorting to test-based or questionnaire-based measures (e.g. vocabulary tests and word familiarity questionnaires). Those measures, however, confined the research within the scope where they can evaluate, and also differentiated individuals within the boundary of limited scores, scales, or vocabularies. To extend the research to approximate to IDs in real life, the present study undertakes the issue from the observations of participants’ daily-life lexical behaviors. We proposed the methods to calculate "the frequency index of personal word usage" and "personal word frequency", and further investigated that whether each of them accounted for participants’ variances in word recognition. The investigation was carried out in four steps. First, a lexical decision task containing 912 Chinese stimuli was conducted so as to collect the data of visual word recognition. Second, each participant’s Facebook posts were automatically extracted and segmented into words. Third, based on those words, the two variables of individual lexical behaviors were computed. The frequency index per person was derived via his/her words’ corresponding frequencies in the Academia Sinica Balanced Corpus. The personal word frequency referred to the relative degrees to which a given word-recognition stimulus occurred in one’s Facebook posts. Fourth, experimental data were analyzed in mixed-effects models, which can precisely estimate by-subject differences. Results showed that the effects of personal word frequency reached significance; participants responded themselves more rapidly when encountering more frequently used words. People with lower frequency indices of personal word usage had a lower accuracy rates than others, which was contrary to our prediction. Besides, as a pioneer study of measuring lexical behaviors, this thesis also provides suggestions regarding the methodology, which are presented subsequently. The counter-prediction finding in the frequency index experiment was possibly attributed to that the Sinica Corpus mainly consists of written data; therefore, it is suggested that similar experiments in future research resort to the frequency counts in a spoken corpus. Additionally, according to our examination, a person’s total token number is feasible for normalizing his/her frequency counts even though word segmentation errors were contained within the tokens. Finally, when naturalistic data like the Facebook posts are utilized for the measurement, it is recommend basing the computation on personal preference or pattern of lexical usage, instead of on every single word in one’s language usage data.
Description
Keywords
個別差異, 字詞辨識, 詞彙行為, 自然語料, 混合效果模式, individual differences, word recognition, lexical behaviors, naturalistic data, mixed-effects models