Generative AI and item writing

A collection of resources on generative AI (such as ChatGPT) and its applications to producing language test items

Rossi, O. (2023). Using technology to write language test items. Talk delivered at IATEFL TEASIG online conference Developing Assessment Tasks for the Classroom. September 2023. Download slides

Rossi, O. (2023). Using AI for test item generation: Opportunities and challenges. Webinar delivered as part of the EALTA Webinar series. May 2023. Download workshop slides Watch webinar 

Aryadoust, V., Zakaria, A., & Jia, Y. (2024). Investigating the affordances of OpenAI’s large language model in developing listening assessments. Computers and Education: Artificial Intelligence, 6. 

Attali, Y., LaFlair, G., & Runge, A. (2023, March 31). A new paradigm for test development [Duolingo webinar series]. Watch webinar

Attali, Y., Runge, A., LaFlair, G.T., Yancey, K., Goodwin, S., Park, Y., & von Davier, A. (2022). The interactive reading task: Transformer-based automatic item generation. Frontiers in Artificial Intelligence, 5, 903077.

Belzak, W.C.M., Naismith, B., Burstein, J. (2023). Ensuring fairness of human- and AI-generated test Items. In: N. Wang, G. Rebolledo-Mendez, V. Dimitrova, N. Matsuda, O.C. Santos (Eds.), Artificial Intelligence in education. Communications in Computer and Information Science, 1831. Springer, Cham. 

Bezirhan, U., & von Davier, M. (2023). Automated reading passage generation with Open AI’s large language model. Preprint.

Bolender, B., Foster, C. & Vispoel, S. (2023). The criticality of implementing principled design when using AI technologies in test development. Language Assessment Quarterly, 20(4-5), 512-519. 

Bulut, O., & Yildirim-Erbasli, S.N. (2022). Automatic story and item generation for reading comprehension assessments with transformers. International Journal of Assessment Tools in Education, 9, pp.72-87.

Choi, I. & Zu, J. (2022), The impact of using synthetically generated listening stimuli on test-taker performance: A case study with multiple-choice, single-selection items. ETS Research Report Series, 2022(1), 1–14.

Chung, H-L., Chan, Y-H., & Fan, Y-C. (2020). A BERT-based distractor generation scheme with multi-tasking and negative answer training strategies. Findings of the Association for Computational Linguistics: EMNLP 2020, pp.4390-4400.

Dijkstra, R., Gen¸c, Z., Kayal, S., & Kamps, J. (2022). Reading comprehension quiz generation using generative pre-trained transformers. Pre-print.

Fei, Z., Zhang, Q., Gui, T., Liang, D., Wang, S., Wu, W., & Huang, X. (2022). CQG: A simple and effective controlled generation framework for multi-hop question generation. In
Proceedings of the 60th Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers).
Association for Computational Linguistics, 2022.

Felice, M., Taslimipoor, S., & Buttery, P. (2022). Constructing open cloze tests using generation and discrimination capabilities of transformers. In Findings of the Association for Computational Linguistics: ACL 2022, pp. 1263–1273, Dublin, Ireland. Association for Computational Linguistics.

Ghanem, B., Coleman, L.L., Dexter, J. R., von der Ohe, S. M., & Fyshe, A. (2022). Question generation for reading comprehension assessment by modelling how and what to ask.

Kalpakchi, D., & Boye, J. (2021). BERT-based distractor generation for Swedish reading comprehension questions using a small-scale dataset. Paper presented at the 14th International Conference on Natural Language Generation INLG2021.

Kalpakchi D., & Boye, J. (2023a). Quasi: A synthetic question-answering dataset in Swedish using GPT-3 and zero-shot learning. In T. Alumäe and M. Fishel (Eds.), Proceedings
of the 24th Nordic Conference on Computational Linguistics

Kalpakchi D., & Boye, J. (2023b). Generation and evaluation of multiple-choice reading
comprehension questions for Swedish.

Khademi, A. (2023). Can ChatGPT and Bard generate aligned assessment items? A reliability analysis against human performance. Journal of Applied Learning & Teaching, 6(1), pp.75-80.

Liusie, A., Raina, V., & Gales, M. (2023). “World knowledge” in multiple choice reading
In Proceedings of the Sixth Fact Extraction and VERification Workshop
(FEVER). Association for Computational Linguistics.

O’Grady, S. (2023). An AI generated test of pragmatic competence and connected speech. Language Teaching Research Quarterly, 37, 188-203.

Raina, V., & Gales, M. (2022). Multiple-choice question generation: Towards an automated
assessment framework.

Raina, V., Liusie, A., & Gales, M. (2023a). Analyzing multiple-choice reading and listening
comprehension tests.

Raina, V., Liusie, A., & Gales, M. (2023b). Assessing distractors in multiple-choice tests.

Rathod, A., Tu, T., & Stasaski, K. (2022). Educational multi-question generation for reading
In Proceedings of the 17th Workshop on Innovative Use of NLP for
Building Educational Applications

Rodriguez-Torrealba, R., Gracia-Lopez, E., & Garcia-Cabot, A. (2022). End-to-end generation of multiple-choice questions using text-to-text transfer transformer models. Expert Systems With Applications, 208, 118258.

Sayin, A., & Gierl, M. (2024). Using OpenAI GPT to generate reading comprehension items. Educational Measurement 43(1), 5-18.

Shin, I., & Gierl, M. (2022). Generating reading comprehension items using automated processes. International Journal of Testing, 22(3-4), 289-311.

von Davier, A. (2023, February 27). Generative AI for test development [a talk given for the Department of Education, University of Oxford].  Watch presentation

Uto, M., Tomikawa, Y., & Suzuki, A. (2023). Difficulty-controllable neural question generation for reading comprehension using item response theory. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (pp.119-129).

Wang, X., Liu, B., & Wu, L. (2023). SkillQG: Learning to generate question for
reading comprehension assessment.

Yunjiu, L., Wei, W., & Zheng, Y. (2022). Artificial intelligence-generated and human expert-designed vocabulary tests: A comparative study. SAGE Open, 12(1).