Generative AI for Course Design: Writing Effective Prompts for Multiple Choice Question Development
How this will help
In higher education, developing strong multiple-choice questions can be a time-intensive part of the course design process. Developing such items requires subject-matter expertise and assessment literacy, and for faculty and designers who are creating and producing online courses, it can be difficult to find the capacity to craft quality multiple-choice questions.
At the University of Michigan Center for Academic Innovation, learning experience designers are using generative artificial intelligence to streamline the multiple-choice question development process and help ameliorate this issue. In this article, I summarize one of our projects that explored effective prompting strategies to develop multiple-choice questions with ChatGPT for our open course portfolio. We examined how structured prompting can improve the quality of AI-generated assessments, producing relevant comprehension and recall items and options that include plausible distractors.
Achieving this goal enables us to develop several ungraded practice opportunities, preparing learners for their graded assessments while also freeing up more time for course instructors and designers.
Prompt Design Matters
While it is possible to simply prompt a GenAI model to “write quiz questions,” our inquiry showed that the quality of GenAI-generated multiple-choice questions is tied to prompt structure. Our center team used five prompt versions, each with lecture transcripts from the open online course “Applied Machine Learning in Python.” After comparing the resulting items, we found that prompts that combined learning objectives and explicit item-writing guidelines produced the most relevant questions and the most plausible distractors.
The Anatomy of an Effective Prompt
Our most successful prompt included four key layers of instruction:
- Module learning objectives: High-quality multiple-choice questions facilitate formative and summative assessment when they are aligned with learning objectives. This component helps anchor each question in the intended outcomes.
- Multiple choice question writing guidelines: Specifying construction rules for stems and options, such as avoiding trick questions and including feedback for each option, yields items that are relevant, well-written, and non-ambiguous (Haladyna & Rodriguez, 2013).
- Context: We provided textual transcripts of course videos and asked the model to strictly use the provided content. We also asked for feedback on the options so that learners can learn from their mistakes.
- Formatting instructions: We can upload formatted multiple-choice question files that the learning management system, in this case Coursera, uses to automatically add the question to course quizzes. We included the LMS formatting in our prompt.
Anatomy of a Multiple Choice Question
A multiple-choice question has three main parts:
The stem – The question or incomplete statement that sets up the problem.
The correct option – In single-response questions, this is the correct answer among the options.
The distractors – Incorrect options that are often designed to be plausible but wrong.
Prompt Template
We have distilled our findings into a reusable template that instructors and instructional designers can adapt for their own use in generative AI models:
Writing multiple choice questions from lecture text
- Develop {number of items} recall and comprehension multiple choice questions from the following lecture script: {insert text}.
- Use these learning objectives: {insert objectives}.
- Follow these multiple-choice question development guidelines:
- Single correct answer items: Each question should have one clear stem, one correct answer, and three plausible distractors. Avoid trick or opinion-based questions, and exclude phrases like “in the video” or “according to the instructor.”
- Multiple correct answer items: Each question should have one clear stem, two correct answers, and two plausible distractors. Avoid trick or opinion-based questions, and exclude phrases like “in the video” or “according to the instructor.”
- Provide item-specific feedback for each option using lesson material.
- Output in this format: {structured example}.
Example using the Prompt Template
To illustrate this process, let’s apply the prompt template to the module, “What Are Transformers?” from the previously identified course. The beginning of the prompt starts with stating the task to develop multiple-choice questions.
Task
Develop 12 multiple-choice questions for each of the lectures in the “What are Transformers” lesson of the open online course “Applied Information Extraction in Python.” Below are module learning objectives, item writing criteria, desired formatting of the items developed, type items developed, transcripts of videos in this module separated by their title.
Learning objectives
- Explain what language models and large language models are.
- Describe transformer-based models and their applications.
- Articulate advances in deep neural network models for information extraction.
- Configure a deep neural network model to detect entities of interest.
Item types
Write two types of multiple-choice questions:
- Single correct answer, where only one option is correct.
- Multiple correct answers, where more than one option is correct.
Of the 12 questions, nine should be the single correct answer type.
Item writing guidelines
- Separate items for each lecture with the lecture title.
- Each question addresses one type of content.
- Questions are independent of each other.
- Avoid trick questions.
- Avoid opinion-based questions.
- Avoid “all of the above” and “none of the above.”
- Avoid True/False items.
- Use only options that are plausible and discriminating. Three options are usually sufficient.
- For single-correct-answer multiple-choice questions, only one option is the correct answer.
- Options should be independent of each other.
- Options should be worded positively.
- Do not include clues to the right option.
- Avoid “always” and “never.”
- Avoid obviously incorrect options. Distractors should be plausible.
- Write feedback for each option. Feedback should not reveal the correct answer.
Item formatting
Use the following format to write the single-correct-answer multiple-choice questions:
Question number – multiple choice shuffle
Question stem goes here
A: Incorrect option 1 goes here
Feedback: Add feedback about why this option is incorrect
*B: Correct answer goes here (add “*” to correct option)
Feedback: Add feedback about why this option is correct
C: Incorrect option 2 goes here
Feedback: Add feedback about why this option is incorrect
D: Incorrect option 3 goes here
Feedback: Add feedback about why this option is incorrect
Use the following format for multiple-correct-answer multiple-choice questions:
Question number – checkbox, shuffle, partial credit
Question stem goes here
A: Incorrect option 1 goes here
Feedback: Add feedback about why this option is incorrect
*B: Correct answer goes here (add “*” to correct option)
Feedback: Add feedback about why this option is correct
C: Incorrect option 2 goes here
Feedback: Add feedback about why this option is incorrect
*D: Correct option 2 goes here (add “*” to correct option)
Feedback: Add feedback about why this option is correct
Upload transcripts
Lecture Video 1 Transcript Title (upload transcript here)
Lecture Video 2 Transcript Title (upload transcript here)
Lecture Video 3 Transcript Title (upload transcript here)
Lecture Video 4 Transcript Title (upload transcript here)
Results
With this prompt, the model generates questions similar to the following, which are organized for easy uploading to the learning platform:
Q1 – single choice
What is the main function of the encoder in a transformer model?
A. Generate the next token in a sequence.
Feedback: Incorrect – this describes the decoder stage.
*B. Create a representation of the input text.
Feedback: Correct – the encoder builds contextual embeddings used for downstream tasks like classification or NER.
C. Store labeled datasets for training.
Feedback: Incorrect – transformers train on unlabeled corpora.
D. Identify which tokens should receive more attention.
Feedback: Incorrect – the attention layer handles weighting across tokens.
Lessons Learned
Testing the use of GenAI to craft multiple-choice questions provided encouraging results and showed that with thoughtful planning and strident review, instructors and course designers can prepare quality assignments and assessments for their learners while also freeing up valuable course planning and design time.
Anyone looking to use AI tools for building multiple-choice questions should be sure to:
- Include learning objectives. They significantly improve question relevance.
- Embed item-writing guidelines. This increases distractor plausibility and grammatical consistency.
- Avoid overly general instructions. Prompts that lack context produce meta-questions about “the course” rather than “the concept.”
- Iterate and use human review. Even with strong prompts, expert validation remains essential.
Responsible Scaling
As learning experience designers, we collaborate with faculty members and instructors to design and develop courses. We always ask our faculty partners to review the resulting questions before uploading them to the platform. Our faculty partners remove weak questions, adjust inaccurate options, and sometimes refine the feedback.
In our research project, our data scientist built a technical infrastructure that extracted video transcripts automatically and then linked them to the prompts for additional context. This workflow combines AI efficiency with human oversight infrastructure, allowing us to process hundreds of video transcripts efficiently. For more information on U-M resources for transcribing your videos, review this article or reach out to the Information and Technology Services team.
When utilizing GenAI to help create course activities or assessments, it is important to select the right source materials for the tool. Your course materials will provide key context to the tools as they work toward an output.
Looking Ahead
Our team at the center continues to expand its work on developing different assessment types with GenAI tools. Some of our current inquiries include difficulty calibration and question-bank automation across different learning management systems.
So far, we have learned that by embedding rigorous prompt design into course development workflows, we can focus less on manual item drafting and more on higher-level learning design.
Using GenAI tools as a time-saving resource while reinforcing accuracy through expert review and iteration by faculty and course designers has revealed exciting potential for future online course development.
The Center for Academic Innovation team included author Hedieh Najafi (learning experience designer senior), Sean Vucinich (solution developer lead), Weiyi Zhang (learning experience designer senior), Lyndsay Wing (learning experience designer senior), and former learning experience designer Melissa McCurry.
Practical Tips
- Include learning objectives. They significantly improve question relevance. If you need help writing learning objectives, review this article.
- Embed item-writing guidelines. This increases distractor plausibility and grammatical consistency.
- Avoid overly general instructions. Prompts that lack context produce meta-questions about “the course” rather than “the concept.”
- Iterate and use human review. Even with strong prompts, expert validation remains essential.
- If you have videos, include the transcript of those videos. For more information on U-M resources for transcribing your videos, review this article or reach out to the Information and Technology Services team.
References
Vucinich, S., Najafi, H., McCurry, M., Zhang, W., Dizon, L., & Wing, L. (2025) Effective Prompting to Generate Multiple Choice Questions with GPT-4o. University of Michigan Center for Academic Innovation.
Haladyna, T.M., & Rodriguez, M.C. (2013). Developing and Validating Test Items. Routledge.
Arif, T., Asthana, S., & Collins-Thompson, K. (2024). Generation and Assessment of Multiple-Choice Questions from Video Transcripts using Large Language Models. ACM Learning @ Scale.