科學多重文本閱讀理解評量之建構與信效度分析-以氣候變遷與三峽大壩之間的關係題本為例
No Thumbnail Available
Date
2017-12-??
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
國立臺灣師範大學教育心理學系
Department of Educational Psychology, NTNU
Department of Educational Psychology, NTNU
Abstract
本研究主要目的係在發展「科學多重文本閱讀理解評量」,並建立一組評鑑閱讀理解能力之「多重文本閱讀理解評量規準」。本評量之科學題本為「氣候變遷與中國長江三峽大壩的關係」,包含「提取訊息」、「概化訊息」、「解釋訊息」以及「整合訊息」四個分評量,共計10 題選擇題及9題建構題。分析結果顯示,評分者內之Cronbach’s α 值均大於 .9,表示評分者內一致性尚稱良好。其次,評分者間之Kendall ω 和諧係數值大於 .8,P 值小於< .001,達顯著相關,顯示評分者間有相同相對等級的評分趨勢。另評分者嚴苛度之多面向Rasch 測量模式與評定量尺及部分給分模式比較之卡方考驗則達顯著水準,表示評分者間的嚴苛度及閾值嚴苛度存在差異存在,前者infit與outfit MFRM 均介於1±0.3 之間,表示無論是嚴格或寬鬆的評分者,均能有效區分出高、低能力的學生;後者意謂著對於牽涉到評分者之詮釋、評估、評分的心理歷程,本來就很難像機器評分一樣的一致性,亦符合一般對於人評分的預期,並可被理解與接受。其次,題本之內部一致性,除「提取訊息」、「概化訊息」外,其餘亦均大於 .70,全評量α 則在 .90 以上,顯示SMTRCA之Cronbach’s α 尚在可接受範圍內。最後,驗證性因素分析也支持「科學多重文本閱讀理解評量」四因素之假設模式,兩者適配尚稱符合。本研究初步發現「科學多重文本閱讀理解評量」可分為「提取訊息」、「概化訊息」、「解釋訊息」以及「整合訊息」四個分評量,而該四個分評量分數所表徵之一階潛在因素,可被「科學多重文本閱讀理解評量」解釋的變異量分別為 .60、.66、.80、.80。
This study aimed to advance the Scientific Multi-Text Reading Comprehension Assessment (SMTRCA), with a focus on the Rubric of Multi-Text Reading Comprehension Assessment (RMTRCA) designed to evaluate the extent of reading comprehension. To this end, we used scientific texts describing the dispute of the relationships between climate changes and the Three Gorges Dam and developed assessment items according to our rubric. Test items included 10 close-ended and 9 open-ended questions and were categorized into 4 subscales: information retrieval, information generalization, information interpretation, and information integration. The results of analysis showed that the cronbach’s α values were more than .9, indicating that the intra-rater consistency was well. Secondly, the Kendall’s coefficient of concordance was more than .8 and its P value was smaller than .001, denoting a consistent scoring pattern between raters. Additionally, the analysis of many-facet Rasch measurement (MFRM) and the comparison of the rating scale model (RSM) and the partial credit model (PCM) showed that the chi-square test of rater severity and threshold difficulty were significant. The infit and outfit MNSQ of the former are between 1±0.3, meaning that both severe and lenient raters can distinguish high-ability students from low-ability students more effectively. The latter means that the rating procedures involve human interpretation, evaluation and scoring processes so that it is difficult to reach a machine-like consistency level. However, this is in line with expectations of typical human judgment processes. Thirdly, most values of Cronbach’s α of test items were larger than .7 except those from information retrieval and information generalization but overall they were all within acceptable range. Finaly, confirmatory factor analysis showed that there was an acceptable goodness-of-fit among the SMTRCA. The SMTRCA accounts for .60, .66, .80, and .80 of the variance associated with the first
This study aimed to advance the Scientific Multi-Text Reading Comprehension Assessment (SMTRCA), with a focus on the Rubric of Multi-Text Reading Comprehension Assessment (RMTRCA) designed to evaluate the extent of reading comprehension. To this end, we used scientific texts describing the dispute of the relationships between climate changes and the Three Gorges Dam and developed assessment items according to our rubric. Test items included 10 close-ended and 9 open-ended questions and were categorized into 4 subscales: information retrieval, information generalization, information interpretation, and information integration. The results of analysis showed that the cronbach’s α values were more than .9, indicating that the intra-rater consistency was well. Secondly, the Kendall’s coefficient of concordance was more than .8 and its P value was smaller than .001, denoting a consistent scoring pattern between raters. Additionally, the analysis of many-facet Rasch measurement (MFRM) and the comparison of the rating scale model (RSM) and the partial credit model (PCM) showed that the chi-square test of rater severity and threshold difficulty were significant. The infit and outfit MNSQ of the former are between 1±0.3, meaning that both severe and lenient raters can distinguish high-ability students from low-ability students more effectively. The latter means that the rating procedures involve human interpretation, evaluation and scoring processes so that it is difficult to reach a machine-like consistency level. However, this is in line with expectations of typical human judgment processes. Thirdly, most values of Cronbach’s α of test items were larger than .7 except those from information retrieval and information generalization but overall they were all within acceptable range. Finaly, confirmatory factor analysis showed that there was an acceptable goodness-of-fit among the SMTRCA. The SMTRCA accounts for .60, .66, .80, and .80 of the variance associated with the first