基於 ALBERT 模型國小高年級寫作自動評閱方法設計之研究
No Thumbnail Available
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
本研究開發並驗證一種基於ALBERT預訓練模型的迴歸方法,用以解決國小高年級學生中文寫作的自動評分任務。實驗利用臺北市國小高年級學生寫作語料庫,語料庫包含專家對整體與分項(主題表達、結構組織、語言運用、卷面觀察)的評分。評估不同規模ALBERT迴歸模型(base至xlarge)系統生成的輸出,並與專家評分進行比較。結果顯示,隨著模型規模增大,自動評閱效能顯著提升。xlarge模型在整體評閱表現最佳,取得最低平均絕對誤差(MAE: 0.6294)和最高二次加權Kappa值(QWK: 0.6635),表明其預測誤差最小且與專家評分高度一致。分項評閱中,xlarge模型在「結構組織」(QWK: 0.6822)「主題表達」(QWK: 0.6685)表現突出,證明模型能有效捕捉文章結構與邏輯。然而,模型在「卷面觀察」(QWK: 0.5147)表現最差。本研究驗證所設計的ALBERT預訓練迴歸方法可作為國小高年級寫作評閱的可行性與有效性。未來研究將著重於擴充多樣化寫作語料庫、增加評閱回饋機制,以期為寫作自動評閱提供更全面的輔助工具,並為基於神經網絡的語言模型在教育領域應用提供新的研究方向。
This study develops and validates a regression-based method utilizing the pre-trained ALBERT model for automated essay scoring of Chinese writing by senior-grade elementary school students. The experiments use a corpus of writing samples from senior-grade students in Taipei City elementary schools, annotated with expert ratings for overall performance and four sub-categories: theme expression, structural organization, language use, and presentation. The performance of ALBERT regression models of various scales (from base to xlarge) is evaluated by comparing system-generated scores with expert assessments.Results show that larger model sizes significantly improve automated scoring accuracy. The xlarge model achieves the best overall performance, with the lowest Mean Absolute Error (MAE: 0.6294) and the highest Quadratic Weighted Kappa (QWK: 0.6635), indicating minimal prediction error and strong consistency with expert scores. For sub-category scoring, the xlarge model performs particularly well in"structural organization" (QWK: 0.6822) and "theme expression" (QWK: 0.6685), demonstrating its capability to capture essay structure and coherence. However, its performance in "presentation" is comparatively lower (QWK: 0.5147). This study confirms the feasibility and effectiveness of an ALBERT-based regression approach for automated scoring of senior-grade elementary school essays. Future work will focus on diversifying the writing corpus and incorporating feedback mechanisms to develop a morecomprehensive scoring tool, offering new directions for the application of neural language models in education.
This study develops and validates a regression-based method utilizing the pre-trained ALBERT model for automated essay scoring of Chinese writing by senior-grade elementary school students. The experiments use a corpus of writing samples from senior-grade students in Taipei City elementary schools, annotated with expert ratings for overall performance and four sub-categories: theme expression, structural organization, language use, and presentation. The performance of ALBERT regression models of various scales (from base to xlarge) is evaluated by comparing system-generated scores with expert assessments.Results show that larger model sizes significantly improve automated scoring accuracy. The xlarge model achieves the best overall performance, with the lowest Mean Absolute Error (MAE: 0.6294) and the highest Quadratic Weighted Kappa (QWK: 0.6635), indicating minimal prediction error and strong consistency with expert scores. For sub-category scoring, the xlarge model performs particularly well in"structural organization" (QWK: 0.6822) and "theme expression" (QWK: 0.6685), demonstrating its capability to capture essay structure and coherence. However, its performance in "presentation" is comparatively lower (QWK: 0.5147). This study confirms the feasibility and effectiveness of an ALBERT-based regression approach for automated scoring of senior-grade elementary school essays. Future work will focus on diversifying the writing corpus and incorporating feedback mechanisms to develop a morecomprehensive scoring tool, offering new directions for the application of neural language models in education.
Description
Keywords
ALBERT模型, 迴歸, 寫作自動評閱, 國小高年級寫作, ALBERT model, regression, automated essay scoring, elementary senior-grade students' writing