應用潛在特質模型以驗證試題競試之英語成就測驗

No Thumbnail Available

Date

2010

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

台灣的研究文獻中,與學校測驗相關的研究並不多見,雖然台灣教育部一直相當重視語言測驗並補助各地方政府舉辦國中教師英語試題競試。在英語試題競賽中,鑑定這些成就測驗品質的依據經常就是專家的判斷,然而文獻指出專家內容效度的可靠性仍未被完整地認知。因此,本研究應用潛在特質模型(the Latent Trait Model)來分析兩份皆來自金門英文科試題競試中的測驗(優等測驗與未得獎的對照測驗);藉由分析並比較考生對兩份試卷的作答反應來驗證試題品質。劣質題目被偵測出來後則以內容分析法檢視,以找出可能導致試題品質下降的原因。此研究共有兩百四十一名就讀於一所桃園地區的國中九年級學生參與,男女比例約一比一。結果指出此兩份試卷中的多點記分主觀測驗題型皆顯現出良好的模型適配度;至於二元記分客觀題型,優等測驗卻沒有較低比率的差適配度題目,並且顯現比對照測驗更多的其他顯著因素;此外,性別歧視的項目功能差異分析(Differential Functional Analysis) 以及局部依賴分析(Local Dependence Analysis)結果顯示,從比例上探討,優等測驗並沒有比對照測驗有更少劣質題目。整體來說,令人訝異地,優等試卷並沒有優於對照試卷。因此,此研究支持更多類型的效度證據須加以蒐集才能更全面性的評估試卷。然而,試題內容分析顯示對照測驗含有數個明顯語言錯誤 (linguistic errors),且參照測驗題型較有吸引力且創新。因此,此研究認為若要全面性檢驗一份試卷的好壞,專家判斷與統計分析缺一不可。最後,依據內容分析,造成品質不佳的可能來源包含測驗設計者未能認清各種題性的特質,善用試題雙向細目表,謹慎選擇內容主題,和有意識地使用題組題型。根據本研究發現,相關的建議亦提供給英語科試題競試舉辦單位及國中英語教師。
In Taiwan, little research has been done on school-based testing though the Minister of Education values the importance of language testing and has subsidized county campaigns at junior high school level. In the contests, the evidence of the quality of these achievement tests is often based on experts’ judgment. However, the robustness of such content validity is not well-known yet. This study, then, aims at validating the tests of the campaign by analyzing and comparing one winning test and one competing test in Kinmen Contest in terms of aspects regarding the empirical evidence collected with the Latent Trait Model. Then a qualitative content analysis was conducted to locate possible sources accounting for the poor items detected. Two hundred forty-one ninth graders, nearly half males and half females, at one junior high school in Taoyuan participated in the study. The results revealed all the subjectively-scored polytomous items in the tests fit the model well. As for the objectively-scored dichotomous items, nevertheless, the winning test did not have a lower percentage of misfitting items but manifested more other significant dimensions than the compared test; both differential item functioning (DIF) analysis of gender bias and local dependence analysis showed that the winning test did not have a lower percentage of poor items than the compared one. Overall, surprisingly, the winning test did not outperform the compared test. Thus, this study supports more types of validity evidence are needed to evaluate tests comprehensively. However, the results of content analysis indicated that the compared test contained several significant linguistic errors, and that the items in the winning test were more intriguing and innovative. Consequently, the study contends that to better evaluate test quality, it takes both expert knowledge and statistical analysis. At last, the possible sources of the poor items included test designers’ failures to recognize the characteristics of item types, to make use of test specifications, to prudently select topics, and to have an awareness of the use of testlets. Suggestions based on the findings are provided for contest holders and junior high school English teachers.

Description

Keywords

語言測驗, 潛在特質模型, 成就測驗, language testing, the Latent Trait Model, achievement test

Citation

Collections