大型語言模型 ChatGPT 於學測英文考科中選擇題及混合題表現之探究

No Thumbnail Available

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

人工智慧現今已然融入人們生活的各個層面之中。劃時代的尖端科技也逐漸改變人們對世界的認知及與之的互動方式。就英語學習而言,自OpenAI於2022年推出大型語言模型ChatGPT以來,ChatGPT已為因工作需求或正在學習語言的人們提供了方便的資源。鑑於英語測驗於評估英語學習成果中的關鍵角色,本研究旨在探討:(1) ChatGPT在英語語言測驗中是否為值得信賴的英語學習工具,及 (2) 其於作答英語語言測驗中的的能力及隱憂。本研究評估了ChatGPT(GPT-4和GPT-4o)於2017至2024年的台灣學科能力測驗(學測)英文考科中的表現。測驗題型包括詞彙題、綜合測驗、文意選填、篇章結構、閱讀測驗以及混合題。研究顯示,GPT-4和GPT-4o於每年度的學測英文考科之答題準確率分別落在86%到100%及92.86%到100%區間。GPT-4o在詞彙和語法知識、閱讀技能和圖像處理能力方面優於GPT-4。若以題型來看,GPT-4o在整體表現上優於GPT-4,於詞彙題和篇章結構測驗中取得滿分,並在閱讀測驗、文意選填和綜合測驗中表現出色,作答準確率分別為98.28%、97.50%和96.19%。然而,對於因台灣新課綱的實施而納入的混合題題型,GPT-4和GPT-4o的作答準確率皆僅為66.67%,顯示出大型語言模型處理此類題型的可信度相對較低。而ChatGPT於混合題型中之多選題答題中所顯現出的邏輯謬誤及錯誤解讀,抑或是相對較少的混合題題型之題目數量皆可能為其於混合題題型上表現相對不出色的潛在原因。總體而言,大型語言模型ChatGPT的學測英文答題準確率顯示其在英語學習中具有幫助學生解題的潛力。
As software eats the world, Artificial Intelligence (AI) is now eating the software. In this era, where AI in integrated into every aspect of life, the cutting-edge technology is gradually changing how people perceive and interact with this world.In the field of English language learning, the OpenAI’s large language model, ChatGPT, has been providing a much-needed resources for people who are striving to use or learn language ever since its launch in 2022. Given the crucial role of tests in evaluating learning outcomes, the present study aims to examine: (1) whether ChatGPT can be a reliable language partner for English language learners in the post-exam review, and (2) its potential strengths and weaknesses in handling English language tests.The study assesses the performance of ChatGPT (GPT-4 and GPT-4o) on the General Scholastic Academic Test (GSAT) English language tests from 2017 to 2024. The given tasks include Vocabulary, Rational Cloze, Banked Cloze, Discourse, Reading Comprehension tasks, and Integrated Questions. The results show that GPT-4 and GPT-4o achieve accuracy rates ranging from 86% to 100% and 92.86% to 100% respectively in each year of the test. GPT-4o, with generally better performance than GPT-4, achieves fullmarks in Vocabulary and Discourse tasks, and excels at Reading Comprehension, Banked Cloze, and Rational Cloze tasks, with accuracy rates of 98.28%, 97.50%, and 96.19% respectively. However, the 66.67% accuracy of both GPT-4 and GPT-4o in Integrated Questions, incorporated due to the implementation of new curriculum in Taiwan, suggests that the large language model is unreliable for this task type. The relatively low accuracy rate for Integrated Questions may results from the fewer items of this task type or from the logical fallacy and incorrect grasp of texts observed in its response in the research.Overall, the study indicates that GPT-4o may possess superior lexical and grammatical knowledge, reading skills, and image-processing capabilities than GPT-4. The remarkable accuracy rates of ChatGPT showcases its potential to assist English language learners when the learners need help.

Description

Keywords

ChatGPT, 語言模型, 學測, 英語測驗, ChatGPT, Language Model, GSAT, English Language Test

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By