Skip to main content

Effects of a long term faculty development program on improvement in quality of MCQs: an impact evaluation study

Abstract

Background

The faculty development programs associated with positive outcomes are the ones that are designed in accordance with organizational needs, are based on a theoretical framework, target inter-professional learning and use multiple teaching strategies to teach a single skill. Rather than being short, one-time events, these programs are long term with contextual engagement of participants. To meet this challenge, a six-month long certificate course in health professions education, based on constructivist theory, was designed. The objective of this study was to evaluate the impact of this focused training, at Kirkpatrick level II, on learning of item writing skills by the participants.

Methods

This quasi-experimental study was conducted from January 2019 to June 2020 at National University of Medical Sciences, Pakistan. A total of 133 faculty members were enrolled in the program. Of these, data from 124 students (75 male and 49 female participants), who had passed our CHPE program, was included in the study. The longitudinal engagement comprised of three steps. In step 1(pre-intervention) participants submitted 5MCQs each, resulting in a total of 620 MCQs. For the step 2 (intervention), a six-hour workshop on writing single best MCQs with peer and faculty feedback during the five-day long face to face session was conducted for all the participants. Subsequently four courses including learning theories, curriculum planning, teaching and learning and assessment were delivered over the course of next six months with special focus on building participants’ capacity on blueprinting so as to highlight the linkages between various courses and their significance through a single snapshot document. An assignment on writing single best MCQs was given as part of course on assessment and faculty gave individual feedback to participants. Finally, for the step 3- post intervention, the skills were tested in the end of program exam. A validated checklist was used to score the items on quality parameters. A total of 1800 MCQ items (600 at each step) were analyzed at these three points of intervention; and the scores obtained were compared to assess if the item writing skills had improved.

Results

The average scores across three steps showed an increasing trend. The Friedman ANOVA test results indicated that there was a statistically significant difference across the three time points with a value of test statistics χ2 (2, n = 600 = 955.86, P < 0.05). The Wilcoxon signed rank test showed significant differences between each pair of steps, supporting the finding of an improvement trend in the scores across all steps. Step 3 had the highest value of mean ranks (20.24 ± 0.05) and a higher median score (Md = 21). Bonferroni correction showed that at the threshold level of 0.0167, all three comparisons are still statistically significant.

Conclusion

Contextual engagement of participants through a longitudinal faculty development program, incorporating varied teaching techniques like individual and group work, practice and prompt feedback and peer review, improves participants’ MCQ writing skills, such that they construct MCQs that assess higher cognitive skills with fewer item writing flaws.

Peer Review reports

Introduction

Background

Multiple Choice Questions (MCQ) are the most commonly used tool for assessment of cognitive abilities. The ease of administration, computer checking, its ability to widely cover the content, response process and psychometric evidence of its robustness are few of the advantages which have led to its acceptability and adoption, universally [1, 2]. However, the advantages of MCQs are dependent on item quality and adequate coverage of the content to be examined [1, 3,4,5]. A well-constructed MCQ, consists of a stem (a clinical case scenario), a lead-in (question), followed by three or four choice options out of which one is correct/best answer and remaining are distractors [6]. There is major evidence that items based on clinical vignette or problem-based questions are better equipped to assess higher level cognitive skills [3, 6, 7]. Unfortunately, majority of examiners find it challenging to create higher level MCQs [8]. One reason could be lack of training leading to many institutions' continued usage of questions that test recall of knowledge [5]. The downside of this practice is assessment of lower order thinking skills and promotion of memorization only [8]. One of the advantages of using MCQs for testing cognition is the wide and proper coverage of content, and this highlights the benefits of using quality assurance procedures like blueprinting. Literature is replete with evidence that blueprinting plays an important role in ensuring that assessments are measuring students’ learning outcomes 3, 4, 59 It aligns the course content to appropriate assessment modalities on one hand and helps teachers in selecting the appropriate teaching strategies on the other [10, 11]. Despite its value, test blueprinting in medical education is still beset with many issues like the lack of awareness of developers regarding their importance and effectiveness. Also methods used for blueprinting are not uniform and selection of appropriate assessment format which is aligned to the learning outcome is often missing [9]. Harden has identified ten roles for teachers and being a good assessor is a major role teachers have to perform. Developing good quality assessment is thus considered a key teaching competency, and providing adequate training to faculty to construct quality MCQs is a major responsibility of institutions.The call to accountability means that learning has to be supported by a quality assured assessment process and considering the multiple roles faculty has to perform and the fact that the success of institutions depends on them, the institutions invest heavily on capacity building initiatives [12,13,14,15]. The global drive towards Competency Based Medical Education (CBME) has led to the realization that similar competency based framework should be followed in designing and evaluating faculty development programs [16]. Global literature unfortunately, shows that the quality of MCQs produced by various medical institutions is still poor [1, 2, 8]. Item quality suffers from technical item flaws like presence of grammatical or spelling mistakes, faulty lead-in statement and use of extraneous details and jargon in the stem and studies have shown that post exam indices of quality like item analysis are improved if the items are technically correct [17, 18]. In the options, use of absolute or vague terms, implausible distractors, presence of long statement in option, use of all the above or none of the above, presence of cues leading to correct answers, word repeats and non-homogenous options, are some of the major technical item writing flaws which can give advantage to test wise students and create unnecessary confusion for good students [15, 16]. These findings show that majority of examiners find it challenging to create higher level MCQs despite faculty development initiatives. One reason could be that, the majority of MCQs studies report results from single, isolated workshops. It is now well established that professional development initiatives which are longer in duration, are structured and employ group activities are more successful in comparison to fragmented one-time events [14, 15] The findings about poor quality items may also be attributed to attrition of knowledge due to absence of repetition and reinforcement which is not possible in one-time MCQ workshops. Moreover, problems are encountered with long-time retention of knowledge and like all cognitive skills, item writing skills can decay without regular practice and be forgotten over time [1]. Strengthening of peer review process, allocation of time for item writing and pairing new writers with experienced ones enhancing writer engagement are some of the steps which can play beneficial role in faculty’s acquisition and retention of skills learnt in MCQ writing [15]. Additional features which are beneficial and may be added in faculty development programs are provision of adequate, prompt feedback after review of items and review by peer/panel or committee of experts to remove errors [19]. Following the global trends, Pakistan also employs MCQs for assessment of cognitive skills and many institutions have launched faculty training programs [20]. Unfortunately, majority of these initiatives are random, one time workshops whose long-term impact is neither measured nor is taken into consideration during the planning stages. Most of the published literature is based on Kirkpatrick’s first level where immediate satisfaction is reported [19,20,21]. Only one study published in 2023 reports short term (Kirkpatrick’s level II) results for improved item writing skills [21].

The present study was planned when, during review of MCQs in university, principal author who is assistant dean of assessment, found that majority MCQs were poorly constructed, showing item writing flaws, though an earlier survey conducted with the 2019 cohort of Certificate in Health Professions Education (CHPE) showed that all participants had attended at least one workshop on MCQ writing. A previous study published by the authors has evaluated the impact of short-term training, and feedback on the item writing at Level 1 and II of Kirkpatrick [22]. The Kirkpatrick model comprising of four different levels namely reaction, learning, behavior and results criteria is extensively used for evaluating the training programs. [14], In the study the reaction of participants was measured by evaluating participants’ satisfaction with a post-workshop feedback questionnaire. For level II, learning of participants was measured by a self-made structured questionnaire given as a pre-test and post-test assessment, which showed significant improvements in boosting confidence in item writing skills (p = 0.001), recognizing parts of MCQs (p = 0.001), identifying item writing flaws (p = 0.001), and levels of Millers pyramid and Blooms taxonomy (p = 0.001)22. Based on the results of this study a long-term initiative on MCQ writing, specifically incorporating peer review, prompt feedback by experts, continued hands-on practice and specific focus on developing skills of participating faculty on test construction through hands-on activity on blueprinting was planned within a six-month long Certificate program in Health Professions Education (CHPE) [23, 24]. In a review on faculty development programs published in 2012, it was reported that most educational development interventions do not target women, junior faculty, or senior faculty; they only prioritize clinical faculty. The review mentioned that only few studies (29%) described a conceptual or theoretical framework of their program [25]. Utilizing these observations, we chose the theoretical paradigm of constructivism for this program, where knowledge and meaning are constructed from experiences. Secondly the concept of test construction was used as a foundational concept. Many a times written and practical examinations are conducted in a haphazard way without any coordination between those writing outcome, those teaching or the faculty who finally writes items or tasks for performance exams. Test construction refers to the process of creating high-quality assessments that accurately measure student learning outcomes and entails selecting appropriate item types (e.g., multiple-choice, essay) writing and reviewing test items, ensuring content validity and relevance and establishing reliability and consistency 16. Designing a detailed blueprint is crucial to meet this challenge and is a crucial aspect of test construction and is a major source of content validity evidence 3, 4, 16. The main objective of our study was to investigate the effect of focused training on item writing skills provided during this structured, six-month long certificate course. We decided to evaluate our program at Kirkpatrick level II, where learning of participants is measured. Sustained feedback on quality of MCQs was incorporated so that the faculty can construct MCQs that assess higher cognitive skills with fewer item writing flaws.

Methodology

Settings

This study was conducted from January 2019 to June 2020 at NUMS and was approved by the Institutional Review Board (IRB) of Office of Research, Innovation and Commercialization of National University of Medical Sciences, Rawalpindi Pakistan (Approval No. 06/R&D/NUMS, dated 19th February 2018). All information obtained involving human participants were in accordance with the ethical standards of the institutional and with the 1964 Helsinki Declaration and its later amendments. A total of 133 faculty members enrolled in the Certificate Program in Health Profession Education (CHPE), showcasing diversity in gender, age, academic disciplines from both basic and clinical sciences, and academic ranks. Participants were grouped into six categories, with face-to-face sessions held in five cities. Data was analyzed for 124 students (75 male and 49 female) who successfully completed the program. A universal sampling approach was used to include all eligible faculty members, and the final sample reflects those who completed the program. Attrition was minimal, with nine participants unable to finish due to personal scheduling conflicts or unforeseen obligations.

Recognizing the need for structured, long-term faculty development initiatives for effective learning, National University of Medical Sciences (NUMS) initiated a comprehensive program. This program provides capacity building of faculty in three tiers, providing training on major curricular aspects: assessment, teaching and learning, curricular planning, and program evaluation. In the first tier, there are interactive workshops on above mentioned areas, conducted in a regular 4-month cycle every year, for NUMS faculty in constituent and affiliated colleges. In second tier, a six-month certificate program CHPE, has been launched since 2019; with over two thousand participants to-date. In the third tier, a Master program MHPE was envisioned to be launched in coming years. The present article focusses on measuring the impact of interventions done in the second tier only.

Kirkpatrick’s model of educational outcomes was used for our study which comprises of four levels of outcome. First is the learners ‘reaction to the educational experience (level 1), learning which refers to changes in attitudes, knowledge, and skills (level II), behavior which refers to changes in practice and the application of learning to practice (level III) and results which refers to change at the level of the organization (level IV) [26,27,28]. Our study primarily evaluates the impact of a long-term faculty development program on MCQ writing skills using structured assessments at different stages. Kirkpatrick’s Level II which assesses learning, typically through pre-tests and post-tests is used in this study which measures learning by comparing MCQs written at different time points within the training program.

In this hybrid program, a five-day physical face-to-face session is held comprising interactive workshops on the major curriculum areas mentioned above. The four sequential e-courses (Adult Learning Theories and their Application ALTA-5 weeks, Curriculum Planning and Evaluation CPE-5 weeks, Evidence Based Teaching and Learning EBTL-6 weeks and Assessment for Learning AFL-6 weeks) were designed on Moodle (an online learning management system). For the online component, the participants were divided into six distance learning groups, with one MHPE qualified educationist assigned as the group tutor incharge of 14 to 25 students. To ensure standardization of scoring among all the Distance Learning Study Groups (DLSGs), a course manual was developed for each of the four courses, separately. These manuals were shared with all tutors via email. A preliminary meeting for orientation of participants and faculty to the course was also conducted. The 5-day face to face session, comprised of one day for introducing generic skills and designated sessions for each course per day. A major component of these sessions was the development of a blueprint, as a hands-on training activity. A well-constructed blueprint overcomes two major threats to validity namely Construct Underrepresentation (CU) which is the under-sampling of course content and Construct Irrelevant Variance (CIV) which results when a construct other than the intended one is measured. Improper item formats or inaccurate test modalities and too difficult or too easy questions are some of the causes of CIV [29]. One more important role played by blueprinting is that by ascertaining the cognitive levels, items of varying complexity requiring a wider range of cognitive abilities and skills are selected [29, 30]. This ensures that the examiners write and select items that assess higher order thinking skills and in-depth learning [29]. Writing learning outcomes was taught during CPE, designing various instructional strategies were taught during EBTL, and assessment strategies during the AFL course. An inter-professional cohort regardless of age and gender, representing disciplines from both basic and clinical sciences and academic ranks, was a prominent feature of our first cohort of CHPE. These participants were divided into six major groups and their face-to-face session was planned in five different cities. The conceptual framework of this study is shown in Fig. 1.

Fig. 1
figure 1

Conceptual Framework of the Study

Step 1-pre intervention

On the first day of face-to-face session, the participants were requested to submit 5 MCQs including cover sheet, candidate name, academic rank, specialization, and answer key as per their routine practice. A total of 640 items were received at this step of our study, designated as Step1. These MCQs were assessed and scored with the help of a validated checklist (Appendix A); This checklist was developed after conducting a thorough literature search regarding the available guidelines on high quality test item development [29, 31,32,33]. A total of twenty one items were finalized after expert review of a panel of five educationists including authors 1, 2 and 5. These experts had extensive experience in item development and review. Some minor changes were made in the language of items as a result of feedback. 600 items were selected after discarding incomplete items.

Intervention

During the face-to-face session, participants were trained to make Table of Specifications (TOS) and blueprint for assessment. The importance of alignment of their outcomes to assessment tools and assigning weightings to these tools was highlighted. A workshop on writing the single best MCQ was conducted for each group and templates, designed on the framework provided by National Board of Medical Examiners (NBME) manual on how to construct single best MCQ, were provided to participants [15]. The participants of each group worked in teams to construct one MCQ, for each of the two learning outcomes, they had written for their course. We planned a moderation activity for all the groups. Each MCQ was displayed on multimedia and assessed for item writing flaws according to a checklist provided to the participants. Prompt peer feedback and expert feedback was provided, and corrections were made by the participants.

Step 2-post intervention

During ‘Assessment for Learning’ course, participants were given an assignment of constructing 10 MCQs. Feedback was provided on these MCQs within one week of assignment submission by the tutors. A total of 1240 MCQs were submitted. Of these, only 600 items were selected, randomly from the corrected MCQs to match the sample size collected at step 1.

Step 3-the final intervention

This step was based on the adage “assessment drives learning.” In the End of Program (EOP) exam, held at the end of 6 months, participants were tested by giving a task of constructing 5 MCQs. The same checklist was used to score all the items. A total of 620 MCQs were submitted; 5 each submitted by 124 participants. Of these, only 600 items were selected randomly, to match the sample size collected at step 1. In total, 1800 MCQs were evaluated by the authors.

Results

A descriptive as well as empirical analysis was carried on the data. Before proceeding to the exploratory data analysis, we wanted to check that our scale (checklist) is reliable with the sample, therefore, the commonly used indicators of internal consistency, Cronbach’s Alpha was calculated and found to be 0.861, suggesting very good internal consistency/ reliability for the scale. Inferring that the educational intervention has a good consistency. All values above 0.3 in the item-total statistics gives the indication that each item is correlated with the total score. An exploratory data analysis was done.

To measure the impact of the longitudinal faculty development program, the same checklist which had been used for training purposes, was used for evaluating the quality of the MCQs. The data obtained was analyzed using STATA software (version 15.0). Firstly, normality of data was checked and the data was observed non normal. Then non-parametric test Friedman’s one-way repeated measure of ANOVA and post hoc analysis was carried out to explore the significant difference across the steps. The Friedman Test is the non-parametric alternative to the one-way repeated measures analysis of variance. The statistical significance was maintained as p-value < 0.05 during the entire statistical analysis. Cohen’s d test was adopted for calculating effect size to compare scores obtained in pre-, post-, and final intervention in MCQs ratings at three steps of training. To explore the change from Step 1 to Step 2, and similarly from Step 2 to Step 3, the average scores are shown in Fig. 2.

Fig. 2
figure 2

Average scores in overall and across the components of MCQ

A total of 1800 MCQs items were included in the analysis. Of these, 600 MCQs were assembled before the intervention at Step 1, 1210 MCQs (of these, only 600 were selected randomly) at Step 2, and 620 MCQs (of these, only 600 were selected randomly) at Step 3. The mean scores of items across per participant given in Fig. 2 also show that overall quality of MCQs is improving with improvements observed in stem, lead-in, and options, too.

The average scores across the three Steps also have an increasing trend (See Table 1). The quality of MCQs at Step 1 was lower (13.24 ± 0.13) but higher in the middle of the training at Step 2 (17.07 ± 0.09) with the effect size 0.60 and much higher at Step 3(20.24 ± 0.05) with the signified effect size of 0.68.

Table 1 Friedman ANOVA test across the steps

On the empirical side, the results of the Friedman ANOVA test indicated that there was a statistically significant difference across the three time points (Step 1, Step 2, Step 3), with a value of test statistics χ [1] (2, n = 600) = 955.86, P < 0.05). An inspection of the mean ranks demonstrated that Step 3 had the highest scores as compared to Step 1 and Step 2. Similarly, Step 3 recorded a higher median score (Md = 21) than the other two steps: Step 2, Md = 18 & Step 1, Md = 13.

Having established a statistically significant difference among the three steps, a post hoc analysis was conducted to determine which specific comparisons were driving this effect. The Wilcoxon Signed-Rank Test, with a Bonferroni-adjusted alpha value, was used to control for Type I error. Given that two pairwise comparisons were made—Step 1 (pre-intervention) vs. Step 2 (post-intervention) and Step 2 vs. Step 3 (follow-up)—the adjusted significance level was set at 0.05 / 2 = 0.025.

Effect size statistics were calculated for each Wilcoxon Signed-Rank Test to assess the magnitude of differences (see Table 1). The results indicated that all pairwise comparisons were statistically significant at the adjusted alpha level of 0.0167, suggesting meaningful improvements across the steps. These findings confirm that the intervention had a significant impact, with improvements continuing from post-intervention (Step 2) to the follow-up stage (Step 3).

Discussion

Recognizing the role played by Faculty DeveloPment programs (FDP) in benefitting the organization by enhancing the capacity of faculty, many institutions have launched FDP. The results of these programs are, however, varied. Our study shows that sustained hands-on training of faculty, provision of peer and expert feedback by educational experts, multidisciplinary participants, and the contextual engagement of learners through blueprinting results in improved item writing skills of participants.

The need for and importance of providing training on MCQ writing can be ascertained from the fact that in Kohan’s systematic review, of 119 studies, only 45 were on FDPs related to assessment training and the majority of these were at level 2 16. BEME guide 19 states that only 12 studies used quasi experimental design in faculty development programs, of these only two used delayed post-tests. In all these intervention studies, change in knowledge and skill i.e. learning (Kirkpatrick level II) was assessed via questionnaires and interviews. None of the studies had used theoretical framework or used test scores to evaluate learning of participants. Moreover, the number of participants per intervention were found to range from 5 to 107 (mean is 23) per cohort. The smallest number of participants in evaluation studies was 15 and the largest was 114 25.

Our study is unique in many aspects. It has used constructivism as a theoretical foundation, recruited participants, regardless of age and gender, across various disciplines such as medicine, basic health sciences, pharmacy, dentistry, physiotherapy, nursing, targeting inter-professional learning. Of 133 participants of our faculty development program, 124 participated in our study, 49 of these were women and 75 were men. Moreover, it has used test scores of participants taken at three different steps, and difference in participants’ scores in these steps was used as a measure of learning; change in knowledge and skill.

Our program design agrees with Steinert’s viewpoint regarding successful faculty development programs with positive outcomes. The findings show that successful programs use multiple instructional methods within single intervention, incorporate experiential learning and reflective practice, include individual as well as group learning and group projects, embed peer support and the development of communities of practice, mentorship, and institutional support [25]. In addition, successful FDPs consider the role of context, explore the value of extended programs and follow-up sessions and promote the use of alternative practices including narrative approaches, peer coaching, and team development [25, 27].

To the best of our knowledge, there is no study which has demonstrated the long-term impact of training on the quality of MCQs using such a structured program with a theoretical foundation. This six-month faculty development program focuses on enhancing the quality of Multiple-Choice Questions (MCQs) through contextual engagement [31]. This was especially significant for our organization as the university was moving towards an integrated outcome based curriculum with implementation of quality assurance mechanisms in assessment. Our participants have explored four interconnected elements: Theoretical Foundations: Understanding assessment and evaluation theories, Curriculum Planning: Designing effective MCQs aligned with curriculum outcomes with incorporation of test construction principles, Teaching Strategies: Developing item-writing skills and best practices, and Assessment and Evaluation: Analyzing and improving MCQ quality.

A key feature of successful faculty development activities is that they must be aligned with individual as well as organizational needs [27]. Despite attending MCQ workshops many times, the construction of poor quality MCQs by faculty may be attributed to flaws in the conception and planning of training programs. The typically used format of short term, isolated workshops fail to bring improvement in knowledge or skill of teachers or achievement of students’ or institutional goals. This has caused a paradigm shift to design structured, rigorous and thoughtfully designed faculty development programs of longer duration. 14, 25, 271 Studies from Pakistan continue to reveal a dearth of such structured programs [33]. Of special concern is the recent boom in two years’ master’s program and certificate courses of six-month duration in health professions education (CHPE) in Health Professions Education (HPE). Many programs have failed to provide any tangible results in terms of faculty capacity building. Without checks and balances, often resorting to didactic, non-evidence-based programs with no physical interaction between students and faculty, there are serious questions on their efficiency [34, 35].

One of the foundational concept utilized in our certificate course was blueprinting. While designing this CHPE program, we ensured connectivity between various elements of the curriculum by highlighting the significance of TOS and assessment blueprint. Hands-on opportunities were provided to the participants, who worked in groups, to develop their own ToS and assessment blueprint for their chosen modules during the five-day face-to-face session and were provided feedback. It is important to note that the selected modules were relevant to their organizational needs. A similar activity was also incorporated during the second course CPE, where the participants learnt to write learning outcomes and were provided detailed, individual feedback. They then selected appropriate teaching and learning strategies in EBTL and then constructed MCQs for their course as assignment for AFL. We believe that this contextual engagement with the content of the module and understanding of significance of alignment of outcomes with teaching/learning strategies and assessment tools helped our participants to develop high quality MCQs. Our intervention significantly improved the quality of MCQs with further progression of course and its activities, at Step I (13, p = 0.000), Step II (18, p = 0.000) and Step III (21, p = 0.000). It not only improved the overall quality of MCQs, but an improvement in quality of stem, lead-in and options was observed, too. Our participants were able to develop MCQs assessing higher order thinking skills of problem solving. Agreeing with other studies, our results showed sustained and improved results due to periodic intervention and thus developed capacity for testing vast content with ease of scoring and convenience [21, 22].

For writing good quality MCQs, reviewing each item brings further quality improvement with timely feedback(repetition). Experts who can review conscientiously and help remove potential technical item flaws ensure improved quality of MCQ based written assessment [22]. Thus timely, relevant, and constructive feedback is to be considered while planning training. Multiple opportunities for feedback were built-in within our CHPE program. During face-to-face session, the activity of moderation as mentioned in other studies [33, 34, 36, 37] was conducted, where peer as well as expert feedback of educationists on technical item flaws was provided and participants were helped in correcting their mistakes. Similarly, another feedback was built-in during the fourth course of AFL, in which participants were given assignment of constructing ten new MCQs and were provided written, expert feedback within a week of assignment submission. Utilization of these varied but very significant concepts helped to develop a learning experience that enabled participants to build their knowledge and skills in MCQ writing in a sequential, gradual and supported manner so that they recognize the crucial link between course outcomes, teaching learning strategies and assessment, through the carefully designed course content of a six-month course. The Kirkpatrick level of evaluation used for our study was level II as it demonstrates learning, where participants exhibit increased knowledge, improved skills, and/or a shift in attitude upon completion of training [26,27,28].

Conclusion

Our study details the processes underlying a successful long term faculty development program in alignment with the best practice evidences such as using a theoretical framework for developing a faculty development program with multiple teaching strategies. It also aimed at utilizing the foundational concept of table of specification, cementing of newly acquired skills and their incorporation in routine practice through provision of repetitive opportunities of practice and feedback. It also highlights the benefits of contextual engagement of participants, use of varied teaching techniques such as individual and group work, practice and feedback and peer review in improving participants’ MCQ writing skills such that they construct MCQs that assess higher cognitive skills with fewer item writing flaws. The continued, contextual engagement of learners is especially helpful in preventing attrition of knowledge and skills.

Limitations

A major quality assurance mechanism of assessment is the post hoc analysis of the items and our study does not report such results. This important statistical analysis was not possible as our participants were from varied subjects and institutions though it is planned in future, on department wise faculty from our constituent college, trained through our program. This study did not track whether faculty members continued to apply these improved MCQ writing skills in their professional settings beyond the program itself. This has resulted in evaluation of our study at level II of Kirkpatrick’s model. We acknowledge that impact studies that support a long-term behavioral shift in faculty practices beyond the program are required.

Recommendations

The findings of our successful model can help other institutions develop their own faculty development programs for improving MCQ writing skills as well as for imparting many other related skills. This can help in creation of a repository of quality assured MCQs. Different institutions can then utilize this repository for developing quality assured high stake exams, resulting in saving of many resources. The results of this study are based on an intervention of six months’ duration and keeping the element of decay of knowledge in mind it would be interesting to check their skills after one or two years of training.

Data availability

The data that support the findings of this study are available from corresponding author upon request but restrictions apply to the availability of these data, which were used under license for the current study and so are not publicly available.

Abbreviations

MCQs:

Multiple-Choice Questions

CHPE:

Certificate in Health Professions Education

NUMS:

National University of Medical Sciences

ALTA:

Adult learning theories and their application

CPE:

Curriculum planning and Evaluation

EBTL:

Evidence Based Teaching and Learning

AFL:

Assessment for Learning

DLSGs:

Distance Learning Study Groups

IRB:

Institutional Review Board

TOS:

Table of Specifications

EOP:

End of Program

FDP:

Faculty development programs

HPE:

Health Professions Education

PM&DC:

Pakistan Medical and Dental Council

QA:

Quality assurance

References

  1. Abdulghani HM, Irshad M, Haque S, Ahmad T, Sattar K, Khalil MS. Effectiveness of longitudinal faculty development programs on MCQs items writing skills: A follow-up study. PLoS ONE. 2017;12(10):e0185895.

    Article  Google Scholar 

  2. Brown GT, Abdulnabi HH. Evaluating the quality of higher education instructor-constructed multiple-choice tests: Impact on student grades. InFrontiers in Education 2017 Jun 2 (Vol. 2, p. 24). Frontiers Media SA.

  3. Raymond MR, Grande JP. A practical guide to test Blueprinting. Med Teach. 2019;41(8):854–61.

    Article  Google Scholar 

  4. Patil, SY, Gosavi, M, Bannur, HB, & Ratnakar, AV. Blueprinting in assessment: A tool to increase the validity of undergraduate written examinations in pathology. International Journal of Applied and Basic Medical Research. 2015;5:S76-S79. Gottlieb M, Bailitz J, Fix M, Shappell E, 5

  5. Wagner MJ. Educator’s blueprint: A how-to guide for developing high‐quality multiple‐choice questions. AEM Educ Train. 2023;7(1):e10836.

    Article  Google Scholar 

  6. SHUMWAY JM, Harden RM. The assessment of learning outcomes for the competent and reflective physician. AMEE Guide No. 25.

  7. Rahim AF, Simok AA, Wahab SF. A guide for writing single best answer questions to assess Higher-Order thinking skills based on learning outcomes. Educ Med J. 2022;14(2).

  8. Xiromeriti M, Newton PM. Solving not answering. Validation of guidance for writing higher-order multiple-choice questions in medical science education. Med Sci Educ 2024 Aug 20:1–9.

  9. Abdellatif H. Test results with and without blueprinting: psychometric analysis using the Rasch model. Educación Médica. 2023;24(3):100802.

    Article  Google Scholar 

  10. Adkoli B, Deepak KK, Anshu ST. Blue printing in assessment. Principles of assessment in medical education. New Delhi: Jaypee; 2012. pp. 205–13.

    Chapter  Google Scholar 

  11. Singh T. Principles of assessment in medical education. Jaypee Brothers Medical; 2021. Oct 30.

  12. Harden RM, Crosby JR. AMEE guide 20: the good teacher is more than a lecturer - the twelve roles of the teacher. Med Teach. 2000;22:334–47.

    Article  Google Scholar 

  13. Chacko TV. Moving toward competency-based education: challenges and the way forward. Archives Med Health Sci. 2014;2(2):247–53.

    Article  Google Scholar 

  14. Steinert Y, Mann K, Centeno A, Dolmans D, Spencer J, Gelula M, Prideaux D. A systematic review of faculty development initiatives designed to improve teaching effectiveness in medical education: BEME guide 8. Med Teach. 2006;28(6):497–526.

    Article  Google Scholar 

  15. Karthikeyan S, O’Connor E, Hu W. Barriers and facilitators to writing quality items for medical school assessments–a scoping review. BMC Med Educ. 2019;19:1–1.

    Article  Google Scholar 

  16. Kohan M, Changiz T, Yamani N. A systematic review of faculty development programs based on the Harden Teacher’s role framework model. BMC Med Educ. 2023;23(1):910.

    Article  Google Scholar 

  17. Haladyna TM, Rodriguez MC, Mehta G, Mokhasi V. Item analysis of multiple choice questions: an assessment of the assessment tool. Int J Health Sci Res. 2014;4(7):197–202.

  18. Karthikeyan S, O’Connor E, Hu W. Motivations of assessment item writers in medical programs: a qualitative study. BMC Med Educ. 2020;20:1–0.

    Article  Google Scholar 

  19. Mukhtar F, Chaudhry AM. Faculty development in medical institutions: where do we stand in Pakistan? J Ayub Med Coll Abbottabad. 2010;22(3):210–3.

    Google Scholar 

  20. Khan HF, Danish KF, Awan AS, Anwar M. Identification of technical item flaws leads to improvement of the quality of single best multiple choice questions. Pakistan J Med Sci. 2013;29(3):715.

    Google Scholar 

  21. Rahim MF, Bham SQ, Khan S, Ansari T, Ahmed M. Improving the quality of MCQs by enhancing cognitive level and using psychometric analysis: improving the quality of MCQs by enhancing cognitive level. Pakistan J Health Sci. 2023 Apr;30:115–21.

  22. Kiran F, Ayub R, Rauf A, Qamar K. Evaluating the impact of faculty development programme initiative: are we really improving skills in MCQ writing? J Pak Med Assoc. 2021;71(10):2434–8.

    Google Scholar 

  23. Abozaid H, Park YS, Tekian A. Peer review improves psychometric characteristics of multiple choice questions. Med Teach. 2017;39(sup1):S50–4.

    Article  Google Scholar 

  24. Haladyna TM, Downing SM, Rodriguez MC. A review of multiple-choice item-writing guidelines for classroom assessment. Appl Measur Educ. 2002;15(3):309–33.

    Article  Google Scholar 

  25. Steinert Y, Naismith L, Mann K. Faculty development initiatives designed to promote leadership in medical education. A BEME systematic review: BEME guide 19. Med Teach. 2012;34(6):483–503.

    Article  Google Scholar 

  26. Kirkpatrick DL, Kirkpatrick JD. Evalutating Training Programs, 3rd ed.; Berrett-Koehler Publishers, Inc.: San Francisco, CA, USA, 2006; ISBN 9781576753484.

  27. Steinert Y, Mann K, Anderson B, Barnett BM, Centeno A, Naismith L, Prideaux D, Spencer J, Tullo E, Viggiano T, Ward H. A systematic review of faculty development initiatives designed to enhance teaching effectiveness: A 10-year update: BEME guide 40. Med Teach. 2016;38(8):769–86.

    Article  Google Scholar 

  28. Alsalamah A, Callinan C. Adaptation of Kirkpatrick’s Four-Level model of training criteria to evaluate training programmes for head teachers. Educ Sci. 2021;11(3):116. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/educsci11030116

    Article  Google Scholar 

  29. Yudkowsky R, Park YS, Downing SM, editors. Assessment in health professions education. New York: Routledge; 2019. Jul 26.

    Google Scholar 

  30. Lane S, Raymond MR, Haladyna TM, editors. Handbook of test development. New York, NY: Routledge; 2016.

    Google Scholar 

  31. Haladyna TM, Downing SM, Rodriguez MC. A review of multiple-choice item writing guidelines for classroom assessment. Appl Meas Educ. 2002;15:309–34.

    Article  Google Scholar 

  32. Downing S, Haladyna T, Patel RM. Use of Item analysis to improve quality of Multiple Choice Questions in II MBBS. Journal of Education Technology in Health Sciences. 2017;4(1):22– 9.

  33. Naeem N, van der Vleuten C, Alfaris EA. Faculty development on item writing substantially improves item quality. Adv Health Sci Education: Theory Pract. 2012;17(3):369–76. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s10459-011-9315-2

    Article  Google Scholar 

  34. Nadeem N, Yasmin R. Faculty development practices in medical colleges of Lahore, Pakistan. Pakistan J Med Health Sci. 2018;12(1):66–72.

    Google Scholar 

  35. Aly SM, Shamim MS. MHPE programs in Pakistan: concerns for quality. J Pak Med Assoc. 2016;66(4):366–7.

    Google Scholar 

  36. Menon B, Miller J, DeShetler LM. Questioning the questions: methods used by medical schools to review internal assessment items. MedEdPublish. 2021;10.

  37. Smeby SS, Lillebo B, Gynnild V, Samstad E, Standal R, Knobel H, Vik A, Slørdahl TS. Improving assessment quality in professional higher education: could external peer review of items be the answer? Cogent Med. 2019;6(1):1659746.

    Article  Google Scholar 

Download references

Acknowledgements

We appreciate the support from all the faculty members of the department of Health Professions Education.

Funding

No funding was available for this research.

Author information

Authors and Affiliations

Authors

Contributions

Rukhsana Ayub (RA) and Faiza Kiran (FK) developed the main concept and the original manuscript was written by RA and FK, Nadia Shabnam (NS) did the analysis and wrote the results, prepared tables. Ayesha Rauf (AR) reviewed the manuscript and Fozia Fatima (FF) prepared the figures.

Corresponding author

Correspondence to Nadia Shabnam.

Ethics declarations

Ethics approval and consent to participate

The study was approved by the Institutional Review Board (IRB) of Office of Research, Innovation and Commercialization of National university of Medical Sciences, Rawalpindi Pakistan (Approval No. 06/R&D/NUMS, dated 19th February 2018). All information obtained involving human participants were in accordance with the ethical standards of the institutional and with the 1964 Helsinki Declaration and its later amendments.

Informed consent was obtained from all individual participants included in the study. Participation was voluntary. Participants were provided with detailed information regarding the study’s purpose and benefits.

Consent for publication

Not Applicable.

Competing interests

The authors declare no competing interests.

Conflict of interest

The authors declare that they have no competing interests

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ayub, R., Kiran, F., Shabnam, N. et al. Effects of a long term faculty development program on improvement in quality of MCQs: an impact evaluation study. BMC Med Educ 25, 541 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12909-025-07081-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12909-025-07081-2

Keywords