科學探究能力評量之標準設定與其效度檢核

No Thumbnail Available

Date

2019-03-??

Journal Title

Journal ISSN

Volume Title

Publisher

國立臺灣師範大學教育心理學系
Department of Educational Psychology, NTNU

Abstract

本研究係以臺灣大台北地區605 位11 年級學生接受科學探究能力評量施測所蒐集的實徵資料,以達到兩項研究目的。其一為依據待加強、基礎、精熟三個等級之標準表現描述,設定科學探究能力評量之標準,其二則從內部、過程及外部等多元效度證據來源,檢核Bookmark 法進行科學探究能力標準設定的適切性及有效性。研究結果顯示,本研究科學探究能力的標準設定可獲得過程效度證據的支持。其次,內部效度評估結果顯示,14 位標準設定成員在第一輪到第二輪之各表現等級的標準誤均在可接受範圍(SE < 0.12),表示成員內標準設定結果檢具可靠性。另以二輪決斷分數中位數之樣本平均數的標準誤評估標準設定方法內的一致性,結果顯示各表現等級的標準誤均在可接受範圍(SE < 0.12),表示標準設定方法內的結果相當一致。再者,以獨立樣本t檢定進行標準設定成員間一致性的考驗,分析結果顯示不同群組成員所設定的決斷分數均未達顯著差異。此外,標準設定極端值的監控結果發現,僅有少數極端值出現,故而對於整體決斷分數的影響甚微。因此,本研究科學探究能力標準設定可獲得內部效度證據的支持。最後,本研究以群聚分析標準設定,透過探討Bookmark 法所得決斷分數之輻合效度,結果顯示二種標準設定法將學生分為三個表現等級之相關係數達顯著水準,表示在判斷表現等級有相當程度的一致性。另利用區別分析檢核標準設定的一致性,分析結果顯示,Bookmark 法在「觀察與定題」、「計畫與執行」、「分析與發現」及「推理與論證」整體分類一致性依序為79.50%、86.00%、100.00%、89.90%,可見Bookmark 標準設定法所得出的決斷分數在各表現等級分類之區別力相當高,可獲得外部效度證據的支持。綜合以上證據,研究結果顯示經由Bookmark 法所設定之科學探究能力標準適切而且有效。
This study developed a standard setting for Grade 11 of the Multimedia-based Assessment of Scientific Inquiry Abilities(MASIA) based on three levels of standard performance descriptions: below basic, basic, and proficient. The study also useda bookmark to identify the cut-off scores. Furthermore, the study discussed the correct degree of the MASIA standard, whichdepends on multiple levels of evidence, namely procedural, internal, and external evidence for validity. First, the result of theprocedural evaluation for validity showed that the standard setting of scientific inquiry abilities adopted in this study issupported by the procedural evidence for validity. Second, the result of the internal evaluation for validity showed that thestandard error of each performance level that participants reached in rounds one and two were within an acceptable range(standard error [SE] < 0.12), thus indicating good intra-rater consistency. The consistency within the standard-setting methodwas evaluated using the standard error of the sample mean based on the median of the cut-off scores from round two. Theresult showed that the standard error of every performance level was within an acceptable range (SE < 0.12), thus denotinghigh consistency within the results of the standard-setting method. Third, the inter-rater consistency of the standard settingwas examined using an independent sample t test, and the results showed that none the cut-off scores set by the participantsof different groups reached statistical significance. Therefore, the standard setting of scientific inquiry abilities can besupported by internal evidence for procedural validity. Finally, this study treated the quasi-setting results derived from thecluster analysis as convergent validity-based evidence to assess external validity. The results showed that the correlationcoefficient of the three performance levels of the students differentiated by two standard-setting methods reached statisticalsignificance, thus in

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By