A Comparison of Three Polytomous DIF Detection Methods

The performance of the three procedures -- the Logistic Regression procedure (Logi; French& Miller, 1996), the Likelihood Ratio test (LR; Thissen, Steinberg, & Gerard, 1986), and the Differential Functioning of Items and Tests procedure (DFIT; Flowers, Oshima, & Raju, 1999) in detecting differential item functioning (DIF) under the graded response model (GRM) were compared in a simulation study. Factors manipulated included sample size, differences in the ability distributions between the focal and the reference groups, and four different percentages of DIF items contained in a test.For each of the sixteen combinations, 100 replications of DIF detection were simulated. All three DIF procedures adhered to nominal Type I error rates under most conditions. LR was the most powerful among the three under all situations. DFIT was less powerful than LR, but also useful for DIF detection especially with groups of different ability distributions and relatively large percentage of DIF items. Logi, with mean Powers lower than 0.4 in all conditions, appeared to be sensitive only to items with large DIF size. In addition, the three procedures were used to assess DIF of the Cognitive Ability Screening Instrument (CASI) and the results of the DIF analysis were compared to previous studies.
Cognitive Abilities Screening Instrument (CASI), Dementia, DFIT, Differential Item Functioning, Graded Response Model, Likelihood Ratio Test, Logistic Regression Procedure