Generative AI and item writing – Item Writing for Language testing

A collection of resources on generative AI (such as ChatGPT) and its applications to producing language test items

Rossi, O. (2023). Using technology to write language test items. Talk delivered at IATEFL TEASIG online conference Developing Assessment Tasks for the Classroom. September 2023. Download slides

Rossi, O. (2023). Using AI for test item generation: Opportunities and challenges. Webinar delivered as part of the EALTA Webinar series. May 2023. Download workshop slides Watch webinar

Aryadoust, V., Zakaria, A., & Jia, Y. (2024). Investigating the affordances of OpenAI’s large language model in developing listening assessments. Computers and Education: Artificial Intelligence, 6. https://doi.org/10.1016/j.caeai.2024.100204

Attali, Y., LaFlair, G., & Runge, A. (2023, March 31). A new paradigm for test development [Duolingo webinar series]. Watch webinar

Attali, Y., Runge, A., LaFlair, G.T., Yancey, K., Goodwin, S., Park, Y., & von Davier, A. (2022). The interactive reading task: Transformer-based automatic item generation. Frontiers in Artificial Intelligence, 5, 903077. https://doi.org/10.3389/frai.2022.903077

Belzak, W.C.M., Naismith, B., Burstein, J. (2023). Ensuring fairness of human- and AI-generated test Items. In: N. Wang, G. Rebolledo-Mendez, V. Dimitrova, N. Matsuda, O.C. Santos (Eds.), Artificial Intelligence in education. Communications in Computer and Information Science, 1831. Springer, Cham. https://doi.org/10.1007/978-3-031-36336-8_108

Bezirhan, U., & von Davier, M. (2023). Automated reading passage generation with Open AI’s large language model. Preprint. https://doi.org/10.48550/arXiv.2304.04616

Bolender, B., Foster, C. & Vispoel, S. (2023). The criticality of implementing principled design when using AI technologies in test development. Language Assessment Quarterly, 20(4-5), 512-519. https://doi.org/10.1080/15434303.2023.2288266

Bulut, O., & Yildirim-Erbasli, S.N. (2022). Automatic story and item generation for reading comprehension assessments with transformers. International Journal of Assessment Tools in Education, 9, pp.72-87. https://doi.org/10.21449/ijate.1124382

Choi, I. & Zu, J. (2022), The impact of using synthetically generated listening stimuli on test-taker performance: A case study with multiple-choice, single-selection items. ETS Research Report Series, 2022(1), 1–14. https://doi.org/10.1002/ets2.12347

Chung, H-L., Chan, Y-H., & Fan, Y-C. (2020). A BERT-based distractor generation scheme with multi-tasking and negative answer training strategies. Findings of the Association for Computational Linguistics: EMNLP 2020, pp.4390-4400. https://aclanthology.org/2020.findings-emnlp.393/

Dijkstra, R., Gen¸c, Z., Kayal, S., & Kamps, J. (2022). Reading comprehension quiz generation using generative pre-trained transformers. Pre-print. https://intextbooks.science.uu.nl/workshop2022/files/itb22_p1_full5439.pdf

Fei, Z., Zhang, Q., Gui, T., Liang, D., Wang, S., Wu, W., & Huang, X. (2022). CQG: A simple and effective controlled generation framework for multi-hop question generation. In
Proceedings of the 60th Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2022. https://aclanthology.org/2022.acl-long.475

Felice, M., Taslimipoor, S., & Buttery, P. (2022). Constructing open cloze tests using generation and discrimination capabilities of transformers. In Findings of the Association for Computational Linguistics: ACL 2022, pp. 1263–1273, Dublin, Ireland. Association for Computational Linguistics. https://arxiv.org/pdf/2204.07237.pdf

Ghanem, B., Coleman, L.L., Dexter, J. R., von der Ohe, S. M., & Fyshe, A. (2022). Question generation for reading comprehension assessment by modelling how and what to ask. https://doi.org/10.48550/arXiv.2204.02908

Kalpakchi, D., & Boye, J. (2021). BERT-based distractor generation for Swedish reading comprehension questions using a small-scale dataset. Paper presented at the 14th International Conference on Natural Language Generation INLG2021. https://arxiv.org/pdf/2108.03973.pdf

Kalpakchi D., & Boye, J. (2023a). Quasi: A synthetic question-answering dataset in Swedish using GPT-3 and zero-shot learning. In T. Alumäe and M. Fishel (Eds.), Proceedings
of the 24th Nordic Conference on Computational Linguistics (pp.477–491). https://aclanthology.org/2023.nodalida-1.48/

Kalpakchi D., & Boye, J. (2023b). Generation and evaluation of multiple-choice reading
comprehension questions for Swedish. https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-329400

Khademi, A. (2023). Can ChatGPT and Bard generate aligned assessment items? A reliability analysis against human performance. Journal of Applied Learning & Teaching, 6(1), pp.75-80. https://doi.org/10.37074/jalt.2023.6.1.28

Liusie, A., Raina, V., & Gales, M. (2023). “World knowledge” in multiple choice reading
comprehension. In Proceedings of the Sixth Fact Extraction and VERification Workshop
(FEVER). Association for Computational Linguistics. https://aclanthology.org/2023.fever-1.5

O’Grady, S. (2023). An AI generated test of pragmatic competence and connected speech. Language Teaching Research Quarterly, 37, 188-203. https://doi.org/10.32038/ltrq.2023.37.10

Raina, V., & Gales, M. (2022). Multiple-choice question generation: Towards an automated
assessment framework. https://doi.org/10.48550/arXiv.2209.11830

Raina, V., Liusie, A., & Gales, M. (2023a). Analyzing multiple-choice reading and listening
comprehension tests. https://doi.org/10.48550/arXiv.2307.01076

Raina, V., Liusie, A., & Gales, M. (2023b). Assessing distractors in multiple-choice tests. https://doi.org/10.48550/arXiv.2311.04554

Rathod, A., Tu, T., & Stasaski, K. (2022). Educational multi-question generation for reading
comprehension. In Proceedings of the 17th Workshop on Innovative Use of NLP for
Building Educational Applications (pp.216-223). https://aclanthology.org/2022.bea-1.26

Rodriguez-Torrealba, R., Gracia-Lopez, E., & Garcia-Cabot, A. (2022). End-to-end generation of multiple-choice questions using text-to-text transfer transformer models. Expert Systems With Applications, 208, 118258. https://doi.org/10.1016/j.eswa.2022.118258

Sayin, A., & Gierl, M. (2024). Using OpenAI GPT to generate reading comprehension items. Educational Measurement 43(1), 5-18. https://doi.org/10.1111/emip.12590

Shin, I., & Gierl, M. (2022). Generating reading comprehension items using automated processes. International Journal of Testing, 22(3-4), 289-311.
https://doi.org/10.1080/15305058.2022.2070755

von Davier, A. (2023, February 27). Generative AI for test development [a talk given for the Department of Education, University of Oxford]. Watch presentation

Uto, M., Tomikawa, Y., & Suzuki, A. (2023). Difficulty-controllable neural question generation for reading comprehension using item response theory. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (pp.119-129). https://aclanthology.org/2023.bea-1.10

Wang, X., Liu, B., & Wu, L. (2023). SkillQG: Learning to generate question for
reading comprehension assessment. https://doi.org/10.48550/arXiv.2305.04737

Yunjiu, L., Wei, W., & Zheng, Y. (2022). Artificial intelligence-generated and human expert-designed vocabulary tests: A comparative study. SAGE Open, 12(1). https://doi.org/10.1177/21582440221082130