Generative AI and item writing

A collection of resources on generative AI (such as ChatGPT) and its applications to producing language test items


Resources by Olena Rossi

Rossi, O. (2024). Using ChatGPT to generate tasks for EAP reading and listening assessments. Workshop delivered as part of BALEAP Assessment Roadshow. July 2024, online. Download slides Watch workshop Part 1 Watch workshop Part 2

Rossi, O. (2024). Item writing with generative AI: Current issues and future directions. Presentation delivered at the inaugural meeting of the EALTA SIG Artificial Intelligence for Language Assessment. June 2024, Belfast. Download slides

Rossi, O. (2024). Assessment of language through AI: Opportunities, challenges, and future directions. Plenary talk delivered at 2024 International Conference Language Education 4.0: A Paradigm Shift towards Action-Oriented Approach, Artificial Intelligence Integration and Beyond. June 2024, Ankara Download slides

Rossi, O. (2023). Using technology to write language test items. Talk delivered at IATEFL TEASIG online conference Developing Assessment Tasks for the Classroom. September 2023. Download slides

Rossi, O. (2023). Using AI for test item generation: Opportunities and challenges. Webinar delivered as part of the EALTA Webinar series. May 2023. Download workshop slides Watch webinar 


Resources by other authors

Aryadoust, V., Zakaria, A., & Jia, Y. (2024). Investigating the affordances of OpenAI’s large language model in developing listening assessments. Computers and Education: Artificial Intelligence, 6. https://doi.org/10.1016/j.caeai.2024.100204 

Attali, Y., LaFlair, G., & Runge, A. (2023, March 31). A new paradigm for test development [Duolingo webinar series]. Watch webinar

Attali, Y., Runge, A., LaFlair, G.T., Yancey, K., Goodwin, S., Park, Y., & von Davier, A. (2022). The interactive reading task: Transformer-based automatic item generation. Frontiers in Artificial Intelligence, 5, 903077. https://doi.org/10.3389/frai.2022.903077

Belzak, W.C.M., Naismith, B., Burstein, J. (2023). Ensuring fairness of human- and AI-generated test Items. In: N. Wang, G. Rebolledo-Mendez, V. Dimitrova, N. Matsuda, O.C. Santos (Eds.), Artificial Intelligence in education. Communications in Computer and Information Science, 1831. Springer, Cham. https://doi.org/10.1007/978-3-031-36336-8_108 

Bezirhan, U., & von Davier, M. (2023). Automated reading passage generation with Open AI’s large language model. Preprint. https://doi.org/10.48550/arXiv.2304.04616

Bolender, B., Foster, C. & Vispoel, S. (2023). The criticality of implementing principled design when using AI technologies in test development. Language Assessment Quarterly, 20(4-5), 512-519. https://doi.org/10.1080/15434303.2023.2288266 

Bulut, O., & Yildirim-Erbasli, S.N. (2022). Automatic story and item generation for reading comprehension assessments with transformers. International Journal of Assessment Tools in Education, 9, pp.72-87. https://doi.org/10.21449/ijate.1124382

Choi, I. & Zu, J. (2022), The impact of using synthetically generated listening stimuli on test-taker performance: A case study with multiple-choice, single-selection items. ETS Research Report Series, 2022(1), 1–14. https://doi.org/10.1002/ets2.12347

Chung, H-L., Chan, Y-H., & Fan, Y-C. (2020). A BERT-based distractor generation scheme with multi-tasking and negative answer training strategies. Findings of the Association for Computational Linguistics: EMNLP 2020, pp.4390-4400. https://aclanthology.org/2020.findings-emnlp.393/

Chun, J. Y. & Barley, N. (2024). A comparative analysis of multiple-choice questions: ChatGPT-generated items vs. human-developed items. In C. A. Chapelle, G. H. Beckett, and J. Ranalli (Eds.), Exploring AI in applied linguistics (pp.118-136). Iowa State University Digital Press. https://bit.ly/TSLL23openbook

Dijkstra, R., Gen¸c, Z., Kayal, S., & Kamps, J. (2022). Reading comprehension quiz generation using generative pre-trained transformers. Pre-print. https://intextbooks.science.uu.nl/workshop2022/files/itb22_p1_full5439.pdf

Fei, Z., Zhang, Q., Gui, T., Liang, D., Wang, S., Wu, W., & Huang, X. (2022). CQG: A simple and effective controlled generation framework for multi-hop question generation. In
Proceedings of the 60th Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers).
Association for Computational Linguistics, 2022. https://aclanthology.org/2022.acl-long.475

Felice, M., Taslimipoor, S., & Buttery, P. (2022). Constructing open cloze tests using generation and discrimination capabilities of transformers. In Findings of the Association for Computational Linguistics: ACL 2022, pp. 1263–1273, Dublin, Ireland. Association for Computational Linguistics. https://arxiv.org/pdf/2204.07237.pdf

Ghanem, B., Coleman, L.L., Dexter, J. R., von der Ohe, S. M., & Fyshe, A. (2022). Question generation for reading comprehension assessment by modelling how and what to ask. https://doi.org/10.48550/arXiv.2204.02908

Kalpakchi, D., & Boye, J. (2021). BERT-based distractor generation for Swedish reading comprehension questions using a small-scale dataset. Paper presented at the 14th International Conference on Natural Language Generation INLG2021. https://arxiv.org/pdf/2108.03973.pdf

Kalpakchi D., & Boye, J. (2023a). Quasi: A synthetic question-answering dataset in Swedish using GPT-3 and zero-shot learning. In T. Alumäe and M. Fishel (Eds.), Proceedings
of the 24th Nordic Conference on Computational Linguistics
(pp.477–491). https://aclanthology.org/2023.nodalida-1.48/

Kalpakchi D., & Boye, J. (2023b). Generation and evaluation of multiple-choice reading
comprehension questions for Swedish.
https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-329400

Khademi, A. (2023). Can ChatGPT and Bard generate aligned assessment items? A reliability analysis against human performance. Journal of Applied Learning & Teaching, 6(1), pp.75-80. https://doi.org/10.37074/jalt.2023.6.1.28

Liusie, A., Raina, V., & Gales, M. (2023). “World knowledge” in multiple choice reading
comprehension.
In Proceedings of the Sixth Fact Extraction and VERification Workshop
(FEVER). Association for Computational Linguistics.
https://aclanthology.org/2023.fever-1.5

O’Grady, S. (2023). An AI generated test of pragmatic competence and connected speech. Language Teaching Research Quarterly, 37, 188-203. https://doi.org/10.32038/ltrq.2023.37.10

Raina, V., & Gales, M. (2022). Multiple-choice question generation: Towards an automated
assessment framework.
https://doi.org/10.48550/arXiv.2209.11830

Raina, V., Liusie, A., & Gales, M. (2023a). Analyzing multiple-choice reading and listening
comprehension tests. https://doi.org/10.48550/arXiv.2307.01076

Raina, V., Liusie, A., & Gales, M. (2023b). Assessing distractors in multiple-choice tests. https://doi.org/10.48550/arXiv.2311.04554

Rathod, A., Tu, T., & Stasaski, K. (2022). Educational multi-question generation for reading
comprehension.
In Proceedings of the 17th Workshop on Innovative Use of NLP for
Building Educational Applications
(pp.216-223). https://aclanthology.org/2022.bea-1.26

Rodriguez-Torrealba, R., Gracia-Lopez, E., & Garcia-Cabot, A. (2022). End-to-end generation of multiple-choice questions using text-to-text transfer transformer models. Expert Systems With Applications, 208, 118258. https://doi.org/10.1016/j.eswa.2022.118258

Runge, A., Attali, Y., LaFlair, G. T., Park, Y., & Church, J. (2024). A generative AI-driven interactive listening assessment task. Frontiers in Artificial Intelligence, 7, 1474019.
https://doi.org/10.3389/frai.2024.1474019

Sayin, A., & Gierl, M. (2024). Using OpenAI GPT to generate reading comprehension items. Educational Measurement 43(1), 5-18. https://doi.org/10.1111/emip.12590

Shin, I., & Gierl, M. (2022). Generating reading comprehension items using automated processes. International Journal of Testing, 22(3-4), 289-311.
https://doi.org/10.1080/15305058.2022.2070755

Shin, D., & Lee, J. H. (2024). AI-powered automated item generation for language testing. ELT Journal, ccae016. https://doi.org/10.1093/elt/ccae016

von Davier, A. (2023, February 27). Generative AI for test development [a talk given for the Department of Education, University of Oxford].  Watch presentation

Uto, M., Tomikawa, Y., & Suzuki, A. (2023). Difficulty-controllable neural question generation for reading comprehension using item response theory. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (pp.119-129). https://aclanthology.org/2023.bea-1.10

Wang, X., Liu, B., & Wu, L. (2023). SkillQG: Learning to generate question for
reading comprehension assessment.
https://doi.org/10.48550/arXiv.2305.04737

Yunjiu, L., Wei, W., & Zheng, Y. (2022). Artificial intelligence-generated and human expert-designed vocabulary tests: A comparative study. SAGE Open, 12(1). https://doi.org/10.1177/21582440221082130