Skip to main content

AI-powered standardised patients: evaluating ChatGPT-4o’s impact on clinical case management in intern physicians

Abstract

Background

Artificial Intelligence is currently being applied in healthcare for diagnosis, decision-making and education. ChatGPT-4o, with its advanced language and problem-solving capabilities, offers an innovative alternative as a virtual standardised patient in clinical training. Intern physicians are expected to develop clinical case management skills such as problem-solving, clinical reasoning and crisis management. In this study, ChatGPT-4o’s served as virtual standardised patient and medical interns as physicians on clinical case management. This study aimed to evaluate intern physicians’ competencies in clinical case management; problem-solving, clinical reasoning, crisis management and explore the impact and potential of ChatGPT-4o as a viable tool for assessing these competencies.

Methods

This study used a simultaneous triangulation design, integrating quantitative and qualitative data. Conducted at Aydın Adnan Menderes University, with 21 sixth-year medical students, ChatGPT-4o simulated realistic patient interactions requiring competencies in clinical case management; problem-solving, clinical reasoning, crisis management. Data were gathered through self-assessment survey, semi-structured interviews, observations of the students and ChatGPT-4o during the process. Analyses included Pearson correlation, Chi-square, and Kruskal-Wallis tests, with content analysis conducted on qualitative data using MAXQDA software for coding.

Results

According to the findings, observation and self-assessment survey scores of intern physicians’ clinical case management skills were positively correlated. There was a significant gap between participants’ self-assessment and actual performance, indicating discrepancies in self-perceived versus real clinical competence. Participants reported feeling inadequate in their problem-solving and clinical reasoning competencies and experienced time pressure. They were satisfied with the Artificial Intelligence-powered standardised patient process and were willing to continue similar practices. Participants engaged with a uniform patient experience. Although participants were satisfied, the application process was sometimes negatively affected due to disconnection problems and language processing challenges.

Conclusions

ChatGPT-4o successfully simulated patient interactions, providing a controlled environment without risking harm to real patients for practicing clinical case management. Although some of the technological challenges limited effectiveness, it was useful, cost-effective and accessible. It is thought that intern physicians will be better supported in acquiring clinical management skills through varied clinical scenarios using this method.

Clinical trial number

Not applicable.

Peer Review reports

Background

The history of Artificial Intelligence (AI) began in the 1950s with British mathematician and logician Alan Turing. He was the first man to ask the question “Can machines think?” in his paper titled “Computing Machinery and Intelligence”. He introduced the concept of machines simulating human intelligence and designed the “Turing Test” checking whether a machine can behave like a human, if so a person can’t tell if it’s a machine or a person [1]. Since then, the field of AI has evolved significantly, leading to the development of increasingly advanced systems. One notable example is ChatGPT, a large language model developed by OpenAI and launched in 2022, now powered by ChatGPT-4.0. ChatGPT-4.0 offers enhanced language understanding, a broader knowledge base, coding capabilities and the ability to handle long texts and verbal communication. This version generates more human-like responses, provides in-depth answers to a wide range of topics and can solve multi-stage problems. It has shown significant potential in healthcare, particularly in areas such as diagnosis, decision-making, data analysis [2, 3].

The use of standardized patient (SP) has been a cornerstone of clinical education, widely utilized in medical training to simulate real-life clinical scenarios. This allows students to develop and improve their clinical skills, clinical reasoning (CR), decision-making, communication and diagnostic abilities in a controlled environment without interacting with real patients. Clinical case management (CCM), which incorporates clinical reasoning, problem-solving (PS), crisis management (CM) competencies are essential for physicians in healthcare settings to make informed decisions, solve complex problems and respond effectively to crises while ensuring optimal patient care empowering students to make informed decisions, solve complex problems and effectively respond to crises, all while ensuring optimal patient care [4,5,6,7]. CR is essential, as it forms the basis for PS by involving the evaluation and interpretation of patient data to make accurate diagnoses and treatment decisions [4, 5]. Effective PS, both diagnostic and therapeutic, builds on this by identifying and addressing key medical issues [8]. In high-pressure situations, CM comes into play, requiring quick decision-making, leadership, and coordination. Clear and compassionate communication ensures patients understand their diagnosis, treatment options and prognosis, fostering shared decision-making [9].

Often access to SP is important for students to develop these competencies, but this requires a long time, labour, workload and cost. In medical faculties the increasing number of students limits students’ ability to engage in one-on-one practice with SPs and receive individualized feedback essential for clinical training [10]. To overcome these limitations, virtual standardized patients offer an innovative, highly flexible, widely accessible, cost-effective and user-friendly solution for clinical training. With their ability to simulate realistic patient interactions, they provide personalized feedback and ensure equitable opportunities for students to develop clinical competencies [1, 5]. Early studies suggest that generative AI models can simulate standardized patients in various scenarios, supporting students’ clinical reasoning, decision-making and problem-solving skills while providing performance analysis and feedback. These advancements position AI as a valuable tool to complement traditional methods in nursing and medical education [1, 2, 4, 5, 8, 11, 12].

In this study, ChatGPT-4o’s served as virtual standardised patient and medical interns as physicians on CCM. This study aimed to evaluate intern physicians’ competencies in clinical case management; PS, CR, CM and explore the impact and potential of ChatGPT-4o as a viable tool for assessing these competencies. In this context, answers to the following research questions (RQ) were sought.

RQ1: Is there an association of self-perceived (PS-S, CR-S, CM-S) and observed performance (PS-O, CR-O and CM-O) of intern physicians with gender, motivation and learning preferences?

RQ2: Is there any relationship between intern physicians’ self-perceptions of various variables (PS-S, CR-S and CM-S) and their practice performance (PS-O, CR-O, CM-O)?

RQ3: What are the opinions of intern physicians about the implementation process?

RQ4: What are the researchers’ observations about the implementation process?

Methods

Study design and participants

This study employed a simultaneous triangulation design that allows comprehensive reporting by combining diverse data types. A mixed-method approach where quantitative and qualitative data were collected and analysed simultaneously, then compared during interpretation [13]. In this study, ChatGPT-4o was used as a virtual standardised patient to assess the participants’ CCM competencies. ChatGPT-4o was used as quantitative data provided general inferences, while qualitative data enabled deeper insights, resulting in a more thorough interpretation of the findings.

In Türkiye, undergraduate medical education is designed as a six-year program: the first three years constitute the basic sciences, 4th and 5th years the clinical training period and 6th year is the medical internship period. By becoming interns, students are expected to be competent in clinical case management involving a variety of skills essential for effective patient care like PS, CR, CM, patient communication and time management. The National Core Curriculum 2020 (NCC-2020) is the guideline that aims to enhance the quality and standardization of medical education in Türkiye; focusing on the competencies a physician must have [14].

The study was conducted at Aydın Adnan Menderes University Faculty of Medicine in July-September 2024. Twenty-one intern physicians, who had completed clinical training (fifth-year), were included as participants. They were selected through convenience sampling, targeting volunteers, from suitable clinical rotations for a quick implementation process [15]. According to qualitative research standards, at least 10 participants are recommended [16]. All participants were informed about the study and selected based on their availability. Participants reported no issues with vision or hearing.

Data collection tools

Three different data collection were used in the study (Fig. 1).

Self-assessment Survey: The survey was developed by the researchers to evaluate the competencies of participants’. To ensure the content validity of the self-assessment survey, feedback was obtained from two clinicians (with expertise in the field) and a measurement and evaluation expert. Based on their input, the survey was revised; a pilot study was conducted with an intern physician. The intern physician provided feedback on the clarity of the questions and their alignment with the intended dimensions, leading to further revisions in the self-assessment survey. The first section of the survey collected demographic and personal information about the participants, while the second section, using a five-point Likert scale, focused on self-assessment, comprising 12 items for problem-solving (PS-S), 12 for clinical reasoning (CR-S) and 11 for crisis management skills (CM-S).

Interview Form: A semi-structured interview form was designed by the researchers to collect participants’ views on the practice. This method allows for in-depth insights in qualitative and mixed-method studies. Five questions were developed to explore problem solving, clinical reasoning and crisis management competencies, academic contributions and potential improvements.

Observation Form: A semi-structured observation form was used to assess relevant competencies on a five-point scale across case applications, focusing on problem-solving (PS-O), clinical reasoning (CR-O), and crisis management (CM-O). Two researchers documented external observations, achieving 83.33% inter-rater agreement, which meets the recommended 80% threshold.

Procedure

The research process consisted of four steps; selection of clinical cases, preparation of data collection tools, implementation of the application and analysis of data (Fig. 1).

Fig. 1
figure 1

The research process

The clinical cases development process

The study followed NCC-2020 for clinical case selection; two cases (hypertension and brucellosis) were selected from the common diseases in Türkiye; based on the requirement that a physician should competently diagnose, treat and manage. ChatGPT-4o was used to develop clinical cases by providing the disease names, study objectives and difficulty levels. The cases were initially drafted as epicrises by a physician and revised by the researchers, incorporating additional physical and laboratory findings generated with the assistance of ChatGPT-4o. The cases were reviewed by clinical experts, for medical accuracy and by instructional technology experts, for educational appropriateness. After conducting a pilot study with three physicians, the cases and prompts were adjusted based on feedback from these physicians’ and researchers’ observations. To make communication more challenging and enhance realism, prompts included only information that a patient would likely know. Prompts were uploaded to ChatGPT-4o before the interview. Physical examination results of the patients were provided with instructions indicating they were recorded by another physician, but laboratory results and diagnosis were excluded to simulate real-life scenarios (Fig. 1).

Implementation

The observers gave the participants a briefing about the activity and aim of the study. The participants gave informed consent for the study. Before the activity, the participants received printed patient examination forms; designed by the researchers, modelling the forms using in university hospital clinics, to reflect real clinical training.

The interaction was conducted in two clinical skills laboratory rooms, equipped to enable interactive discussion and observation. Intern physicians participated in the study, assuming the role of a physician, while ChatGPT-4o acted as a standardized patient in two consecutive clinical cases. Each intern physician was seated at a table with an 11-inch tablet providing reliable internet access and preloaded with the ChatGPT-4o scenarios. The interaction began with the intern verbally initiating communication with the AI-patient to gather a detailed history. The patient’s complaints were presented by the AI in response to the physician’s questions. Physical examination findings, which were simulated as already performed, were provided verbally during the interaction. If laboratory tests were requested by the physician, printed results were promptly presented by the observer. The intern physicians were tasked with verbally sharing their diagnostic conclusions, formulating treatment plans and providing patient-specific recommendations within a five-minute time frame, participants were verbally warned by the observers as the five-minute mark approached but they could continue the task. Throughout the process, the AI acted responsively, simulating realistic patient reactions. The participants completing the interaction filled in the self-assessment survey and then participated in an interview where they were asked about their experience interacting with ChatGPT-4.0 and their performance as physicians.

The interactions were observed by three observers; two observers (who had previously calibrated their evaluation criteria for PS, CR and CM) concentrated on the participants’ performance according to the predefined criteria; third observer focused on ensuring the technical functionality of the intervention.

Analysis

Data analysis was divided into three main sections: statistical relationships between variables, correlations between observation and survey scores and participants’ opinions on the implementation process (Fig. 1). Because of the normal distributions of the tests (observation, survey), The Pearson correlation analysis was examined; non-parametric tests due to the non-normal distribution of variables, such as the Chi-square and Kruskal-Wallis tests. P-value < 0.05 was considered statistically significant.

The analysed variables were selected to explore potential factors that may influence medical students’ self-perceptions and practice performance in clinical case management. These factors, such as gender, resource preferences, voluntary choice of medicine and satisfaction with this choice, were hypothesized to be linked to students’ motivation and performance outcomes. By examining these relationships, we aimed to gain a deeper understanding of the contextual and individual factors that could shape the impact of AI tools like ChatGPT in supporting medical education. Since not all variables followed a normal distribution [17], non-parametric tests were applied. A chi-square analysis explored relationships between gender, voluntary choice of medicine, satisfaction with that choice and preferred resources.

Directed content analysis was used to analyse the technical observation data. In this method, codes were created before and during data analysis [18]. The interview data were analysed using conventional content analysis. Observation notes from the ChatGPT-4o application, including voice chat interactions, were also analysed with schematized codes reflecting frequency and consistency. Due to the nature of qualitative research, interviews were conducted until data saturation was reached. The interview data was transferred to the MAXQDA; this program was coded using an inductive approach by the researcher and an expert. The resulting codes were classified and thematized according to their relationships with each other. An analytical strategy was used while analysing the data.

Results

Participant characteristics

The study involved 21 participants (8 female, 13 male) with an average age of 24 ± 1.03 years (range 23–26). While 15 participants willingly chose medical faculty, only 9 were still satisfied with their choice. When asked about their preferred resources for analysing cases in the clinic, participants reported using internet (n = 17), lecture notes (n = 15), textbooks (n = 14), instructors (n = 14), peers (n = 11), residents (n = 10) and artificial intelligence (n = 8).

The analysis revealed a significant relationship between gender and preferred resources, as all female participants selected textbooks (χ²=6.462, p = 0.011). Furthermore, participants who preferred assistants as a resource (χ²=4.295, p = 0.038) and those who preferred instructors (χ²=4.200, p = 0.040) were more likely to rely on lecture notes. However, gender did not show a significant association with any other variables analysed in this study, suggesting that its influence may be limited to specific resource preferences rather than broader aspects of clinical performance or motivation.

RQ1

Is there an association of self-perceived (PS-S, CR-S, CM-S) and observed performance (PS-O, CR-O and CM-O) of intern physicians with gender, motivation and learning preferences?

Table 1. shows the relationships between participant variables and their clinical case management performance measured through observation (PS-O, CR-O, CM-O) and self-assessment survey scores (PS-S, CR-S, CM-S). Independent-Samples Kruskal-Wallis test was used to analyse relationships between demographic data, preferences, observation scores and survey scores. Post-hoc analysis revealed that participants who willingly chose medicine scored higher on PS-O and CR-O. Significant associations include willingness to choose medicine, satisfaction with this choice and preferred resources such as internet use and peer support, highlighting their impact on problem-solving, clinical reasoning, and crisis management competencies (Table 1). Participants satisfied with their choice showed positive correlations with problem-solving, clinical reasoning and crisis management scores on the survey (Table 1).

Table 1 Analysis of the association of self-perceived (PS-S, CR-S, CM-S) and observed performance (PS-O, CR-O, CM-O) scores of intern physicians with gender, motivation and learning preferences

RQ2

Is there any relationship between intern physicians’ self-perceptions of various variables (PS-S, CR-S, CM-S) and their practice performance (PS-O, CR-O, CM-O)?

Pearson correlation analysed the relationship between total scores fromobservation forms (PS-O, CR-O, CM-O) and surveys (PS-S, CR-S, CM-S) that reflect participants’ competencies and self-perceptions, respectively. Skewness and kurtosis values for all scores were within the ± 1 range, indicating normal distribution (Table 2).

Table 2 Descriptive statistics of observation and survey scores

The assumption of linearity was also satisfied. With these conditions met, the analysis was conducted. Table 3 shows that there is no relationship between observation scores and survey scores. However, it was determined that each of them had a relationship with the other tests in their group. The correlation analysis demonstrated several significant relationships among the observed (O) and self-assessed (S) competencies. A very strong positive correlation was found between clinical reasoning (CR-O) and problem-solving (PS-O) competencies. Additionally, crisis management (CM-O) showed a strong positive correlation with both problem-solving (PS-O; r = 0.824, p < 0.001) and clinical reasoning (CR-O; r = 0.677, p = 0.001). In terms of self-assessed competencies, problem-solving (PS-S) was strongly correlated with clinical reasoning (CR-S; r = 0.858, p < 0.001) and crisis management (CM-S; r = 0.755, p < 0.001) (Table 3).

Table 3 The correlations between the tests

Similarly, clinical reasoning (CR-S) exhibited a strong correlation with crisis management (CM-S; r = 0.857, p < 0.001). Positive self-assessment correlates with better performance and improved clinical reasoning. Strong clinical management competencies align with reasoning competencies, while survey and observation scores do not directly correlate, though competency relationships may influence self-perception and overall outcomes.

RQ3

What are the opinions of participants about the implementation process?

The codes associated with PS, CR and CM categories are presented in Fig. 2. The most frequently mentioned code in the Problem-Solving category is the “feeling of inadequacy in PS competencies” (n = 16). Some participant quotes include:

P07: “My PS competencies were quite inadequate in case analysis.”

P21: “Due to the short time, I had to synthesise all my knowledge and experience. I wasn’t good enough, but it was useful for me as it showed me this”.

Fig. 2
figure 2

Code frequencies of interview findings

For Clinical Reasoning competencies, the most frequently mentioned codes are “feeling of inadequacy in CR competencies” (n = 8), “systematic approach” (n = 5) and “approach according to complaints” (n = 4) and “feeling of adequacy in CR competencies” (n = 4). Sample quotes:

P02: “I tried to make it systematic by first detailing the complaint and then the history and the findings added to the complaint.”

P07: “From the lectures, we could solve some problems, but since treatment and patient guidance weren’t emphasised, I felt stuck when it came to managing treatment or diagnosis.”

For clinical management competencies, the most common codes are “feeling of time pressure” (n = 10), “feeling of adequacy” (n = 9), and “feeling of inadequacy” (n = 9). Sample quotes:

P18: “The 5-minute limit stressed me. Seeing the time running while diagnosing made it harder.”

P21: “For the first time, meeting a real patient one-on-one created tension, but I managed to stay composed and used targeted questions effectively.”

P15: “I could not use the time effectively. I struggled to question the patient enough due to time constraints. While listening and taking notes simultaneously, I found it difficult to manage both tasks.”

RQ4

What are the researchers’ observations about the implementation process?

Researchers observed that participants’ PS, CR and CM competencies were generally rated as good (n = 13) or fair (n = 8). About half (n = 10) struggled with time management, exceeding the 5-minute limit. Two participants appeared highly anxious, panicked, and indecisive. Some had difficulty taking patient histories due to time pressure, while others failed to interpret lab results effectively. Overall, participants performed better in the second case, using the 5-minute limit more efficiently in patient management.

Technical observation notes were recorded during the applications by the researchers. The voice chat with ChatGPT-4o was analysed in detail (Fig. 3). The most frequent issue was ‘responding in different languages.’ Other common problems included ‘terminating the conversation prematurely,’ ‘server errors due to overload,’ ‘start up issues,’ and ‘slow response times.’ Notably, it was observed that participants were generally able to manage these communication challenges effectively, demonstrating adaptability in handling ChatGPT-4o’s limitations.

Fig. 3
figure 3

Technical challenges in ChatGPT-4o voice interactions and frequencies

In the analysis of text-based chat records of ChatGPT-4o (Fig. 4), it was observed that in some cases the participants perceived their statements not in Turkish, mostly English, rarely in other languages. These language differences often caused ChatGPT-4o to misunderstand the message, although in some cases it responded correctly. Another issue encountered was the misspelling of Turkish proper nouns or the misinterpretation of non-proper nouns as proper nouns.

Fig. 4
figure 4

Text-based analysis of ChatGPT-4o chat records

Discussion

This study aimed to evaluate intern physicians’ competencies in clinical case management (PS, CR, CM) and explore the impact and potential of ChatGPT-4.0 as a viable tool for assessing these competencies. We believe that integrating ChatGPT-4.0 into clinical education could provide an accessible, cost-effective approach to enhancing these skills, complementing traditional methods, and improving learning outcomes.

The following discussion is based on the results of the research questions, explores their implications for medical education and examines the role and limitations of AI-based virtual standardised patients in medical training.

RQ1

In response to this research question, we analysed the association between self-perceived (PS-S, CR-S, CM-S) and observed performance (PS-O, CR-O and CM-O) of intern physicians with gender, motivation and learning preferences.

Among the participant characteristics, gender was not associated with competency scores. While female participants preferred multiple learning sources, particularly textbooks, this did not significantly influence their competencies in CCM nor did it impact their use of AI-supported standardized patient applications. Female participants’ preference for structure and repetition aligns with prior research, highlighting strategies like organization and rehearsal that support foundational knowledge and problem-solving skills [19]. AI tools like ChatGPT-4o can complement these strategies by offering personalized case simulations and feedback, thus enhancing the educational experience and accommodating diverse learning preferences. There was also a link between lecturer notes and both assistants and instructors. This suggests that students favour primary sources and instructor notes because they align with natural learning conditions [20]. The literature suggests that medical students can be supported in understanding complex instructor notes and navigating natural learning conditions through the diverse functions offered by ChatGPT. This aligns with findings indicating that students increasingly prefer ChatGPT as a primary resource for learning [21, 22]. By leveraging ChatGPT’s capabilities, students may better integrate AI into their learning processes, enhancing their ability to adapt to natural learning conditions and improving their overall educational experience.

The analysis revealed a positive association between the total scores of PS-O, CR-O, PS-S and CR-S among those who willingly chose medicine as their field of study. A review of the literature highlights that medical students’ internal motivation for choosing their department positively influences their satisfaction levels and performance. A review of the literature highlights that medical students’ internal motivation for choosing their department positively influences their satisfaction levels and performance [23,24,25]. Similarly, Papastavrou et al. identified a positive relationship between students’ satisfaction and their reasoning and problem-solving abilities [26]. Additionally, Park reported that satisfaction with their chosen field is associated with improved crisis management skills among medical students [27].

Developed PS skills are linked to greater career confidence and professional expectations [28]. CR skills, crucial for interns, were also positively correlated with these scores [7]. Additionally, participants who used the internet as a learning resource showed a positive correlation with CR-O scores. When the literature is examined, it is seen that especially access to internet-based training is effective in learning in CR development [29]. Peer help in learning also positively influenced PS-S, CR-S, and CM-S scores, supporting findings that peer learning enhances PS, CR and CM development in medical students [30,31,32,33,34].

RQ2

It was observed that there was a very strong positive relationship between CR-O and PS-O, a strong relationship between CR-S and PS-S, which aligns with existing literature showing a connection between CR and PS [35, 36]. This suggests that improvements in one area may enhance the other. Additionally, a strong positive relationship was observed between CM and PS levels. Specifically, CM-O and PS-O showed a strong, CM-S, PS-S positive correlation. This supports the findings by Nguyen et al., who emphasised that higher PS levels positively impact self-efficacy in crisis management [37]. CM-O and CR-O had a good relationship, while CM-S and CR-S were strongly correlated. Literature suggests that linking communication skills with CR improves performance in crises [29]. These findings suggest that strengthening PS and CR skills, integral to medical practice, can enhance overall clinical competence. This supports educational frameworks such as Bloom’s Taxonomy and Miller’s Pyramid, which advocate for moving from knowledge to application in clinical settings. By integrating case-based learning, educators can foster both problem-solving and clinical reasoning skills, leading to more effective medical training [38, 39]. The relation between PS and CR involves knowing the theory itself and applying that knowledge to clinical practice. It agrees with Miller’s claim; for effective clinical education, knowledge alone is not enough, it must be applied in practice [40]. CM competencies were influenced by external factors like time pressure, which often hindered performance during simulated crises. As CM is essential in medical practice, training programs must simulate real-world stressors to prepare students better. In Türkiye, time pressure is a significant challenge for physicians during health care service delivery. In this study, we introduced limited time to reflect this real-world pressure and enhance the simulation’s realism.

The study found a significant gap between participants’ self-assessment and actual performance, indicating discrepancies in self-perceived versus real clinical competence. The correlation data in Table 1 reveal that while PS, CR and CM were aligned with observed performance and survey results; participants’ self-assessments did not consistently reflect their true competence, echoing concerns raised in the literature about the reliability of self-assessment in medical education. This may be a result of intern physicians’ being at the beginning of their internship without sufficient experience. This discrepancy can be explained by the Dunning-Kruger Effect, where novices overestimate their abilities due to lack of experience [41, 42]. Enhancing self-assessment accuracy through reflective practice and structured feedback, along with developing metacognitive skills, could improve interns’ ability to evaluate their competencies.

RQ3

Most participants reported that the practice contributed to their academic success or provided valuable professional experience. Literature supports the use of virtual patients and case studies for enhancing clinical skills and reasoning, which is positively received by medical students ([43, 44].

ChatGPT-4o successfully simulated patient interactions, providing a controlled environment without risking harm to real patients to practice PS, CR and CM. Scenarios could be tailored to match specific learning needs. It was cost-effective and accessible. Participants engaged with a uniform patient experience. During the study, some of the technological challenges limited effectiveness; as technical glitches, language processing challenges and system overload as literature supports [45]. Notably, AI sometimes misperceived the language spoken by participants, confusing Turkish with English and generating incorrect translations. Literature confirms that ChatGPT can struggle with contextual understanding, leading to errors [41, 46]. Additionally, ChatGPT showed a bias in assuming the gender of the user, typically identifying them as male, likely due to patterns in its training data [46, 47]. Although the researchers had memberships, they were affected by limit overages and server overloads, which caused access problems during usage and made the system less accessible at times.

RQ4

Most participants reported that time pressure during problem-solving is challenging and time plays a crucial role. They emphasised that experience enhances problem-solving skills and is essential for personal development. They particularly focused on clinical reasoning and a systematic approach. Attention to symptoms and laboratory findings was highlighted as crucial for diagnosis. Some participants felt competent about clinical reasoning, some highlighted the advantages of simple thinking whereas others stated the need for more detailed inquiry. Most participants indicated a lack of experience in crisis management and a desire to develop this skill. Time pressure and effective time management were the most common challenges during crisis management. Additionally, staying calm and conducting effective investigations were considered necessary skills. Some participants mentioned not feeling confident enough in conducting investigations. They noted that they felt challenged by working alone with a real patient for the first time. Despite they were talking with AI.

Participants’ recommendations included resolving technological issues, standardizing artificial intelligence, adding a feedback system and incorporating imaging methods into the application. Incorporating more realistic and dynamic patient interactions into the simulation could introduce greater complexity and variability for training.

In medical education, integrating AI tools into traditional teaching methods, particularly in clinical training, can enrich the curriculum. AI serves as a bridge between theoretical knowledge and practical skills, providing dynamic simulations that reflect real-world scenarios. Tools like ChatGPT enable interactive patient simulations accessible to students without logistical limitations, supporting problem and case-based learning with personalized feedback, assessments and adaptive learning experiences. For successful AI integration, educators should provide training on the effective use of AI tools, ensure compliance with privacy regulations, and promote equal access to learning opportunities. AI-powered standardized patient applications, featuring a range of clinical cases, can help medical students develop essential competencies during clinical training and internships.

Limitations

This study provides a unique and innovative exploration of ChatGPT as a virtual standardized patient, highlighting its potential as a valuable tool for clinical training. However, the implementation of ChatGPT faced challenges, including delayed responses and communication glitches, which may have influenced intern physicians’ performance, particularly in high-pressure scenarios like CM. While the sample size is adequate and appropriate for the qualitative nature of the research, it is relatively small, potentially limiting the generalizability of the findings. Furthermore, focusing on two of the most commonly seen cases may have inadvertently heightened stress levels, particularly during CM scenarios. Future research should include larger and more diverse case samples and evaluate the long-term effects of repeated exposure to AI-based simulations, with an emphasis on integrating these tools into the medical curriculum.

Conclusions

The results highlighted a significant difference between self-assessment and actual performance and the need for more accurate self-assessment in medical education. Participants’ self-assessments did not reflect their competence, indicating the need for improved feedback and metacognitive strategies. The intern physicians felt inadequate in problem solving and crisis management skills under time pressure, but stated that this experience contributed to their academic success. AI-supported simulations such as ChatGPT-4o were found useful for standardized patient interactions and practicing clinical competencies; but technological issues, such as disconnection problems and language processing challenges, revealed the need for improvement. AI tools can make a valuable contribution to clinical education by offering an economical support to the development of core competencies. Future studies should evaluate the effectiveness of AI simulations in long-term and large sample groups.

Data availability

Data is provided within the manuscript or supplementary information files.

Abbreviations

AI:

Artificial Intelligence

CCM:

Clinical Case Management

CM:

Crisis Management

CR:

Clinical Reasoning

NCC:

The National Core Curriculum

O:

Observation

PS:

Problem Solving

SP:

Standardized Patient

RQ:

Research Question

S:

Survey

References

  1. Jackson P, Ponath Sukumaran G, Babu C, Tony MC, Jack DS, Reshma V, et al. Artificial intelligence in medical education-perception among medical students. BMC Med Educ. 2024;24(1):804.

    Article  Google Scholar 

  2. Xu X, Chen Y, Miao J. Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review. J Educational Evaluation Health Professions. 2024;21:6.

    Article  Google Scholar 

  3. OpenAI. Introducing GPT-4o and more tools to ChatGPT free users. OpenAI; 2024 Available from: https://openai.com/index/gpt-4o-and-more-tools-to-chatgpt-free/

  4. Plackett R, Kassianos AP, Mylan S, Kambouri M, Raine R, Sheringham J. The effectiveness of using virtual patient educational tools to improve medical students’ clinical reasoning skills: a systematic review. BMC Med Educ. 2022;22(1):365.

    Article  Google Scholar 

  5. Ruczynski LI, van de Pol MH, Schouwenberg BJ, Laan RF, Fluit CR. Learning clinical reasoning in the workplace: a student perspective. BMC Med Educ. 2022;22(1):19.

    Article  Google Scholar 

  6. Flanagan B, Nestel D, Joseph M. Making patient safety the focus: crisis resource management in the undergraduate curriculum. Med Educ. 2004;38(1):56–66.

    Article  Google Scholar 

  7. Windish DM, Price EG, Clever SL, Magaziner JL, Thomas PA. Teaching medical students the important connection between communication and clinical reasoning. J Gen Intern Med. 2005;20(12):1108–13.

    Article  Google Scholar 

  8. Sauder M, Tritsch T, Rajput V, Schwartz G, Shoja MM. Exploring generative artificial intelligence-assisted medical education: assessing case-based learning for medical students. Cureus. 2024;16(1).

  9. Benfatah M, Marfak A, Saad E, Hilali A, Nejjari C, Youlyouz-Marfak I. Assessing the efficacy of ChatGPT as a virtual patient in nursing simulation training: A study on nursing students’ experience. Teaching and Learning in Nursing; 2024.

  10. Wang SY, Chen CH, Tsai TC. Learning clinical reasoning with virtual patients. Med Educ. 2020;54(5).

  11. Ba H, Zhang L, Yi Z. Enhancing clinical skills in pediatric trainees: a comparative study of ChatGPT-assisted and traditional teaching methods. BMC Med Educ. 2024;24(1):558.

    Article  Google Scholar 

  12. Wang C, Li S, Lin N, Zhang X, Han Y, Wang X, et al. Application of large Language models in Medical Training evaluation—using ChatGPT as a standardized patient: Multimetric Assessment. J Med Internet Res. 2025;27:e59435.

    Article  Google Scholar 

  13. Creswell JW, Poth CN. Qualitative inquiry and research design: choosing among five approaches. Sage; 2016.

  14. Grubu ÇUC. Medical Faculty-National Core Curriculum 2020. Tıp Eğitimi Dünyası. 2020;19(57– 1):1-146.

  15. Breakwell GM, Barnett J, Wright DB. Research methods in psychology. 2020.

  16. Sandelowski M. Sample size in qualitative research. Res Nurs Health. 1995;18(2):179–83.

    Article  Google Scholar 

  17. Tabachnick BG, Fidell LS, Ullman JB. Using multivariate statistics. Boston, MA: Pearson; 2013.

  18. Hsieh H-F, Shannon SE. Three approaches to qualitative content analysis. Qual Health Res. 2005;15(9):1277–88.

    Article  Google Scholar 

  19. Ruffing S, Wach FS, Spinath FM, Brünken R, Karbach J. Learning strategies and general cognitive ability as predictors of gender- specific academic achievement. Front Psychol. 2015;6:1238.

    Article  Google Scholar 

  20. Slater DR, Davies R. Student preferences for Learning resources on a land-based Postgraduate Online Degree Program. Online Learn. 2020;24(1):140–61.

    Article  Google Scholar 

  21. Guo AA, Li J. Harnessing the power of ChatGPT in medical education. Med Teach. 2023;45(9):1063.

    Article  Google Scholar 

  22. Tao W, Yang J, Qu X. Utilization of, perceptions on, and intention to use AI chatbots among medical students in China: National Cross-sectional Study. JMIR Med Educ. 2024;10(1):e57132.

    Article  Google Scholar 

  23. Chawla S, Mithra P, Rekha T, Kumar N, Holla R, Dhawan N, et al. IJCM_411A: motivational factors associated with the choosing medicine as a profession among first year students. Indian J Community Med. 2024;49(Suppl 1):S118.

    Article  Google Scholar 

  24. Piumatti G, Abbiati M, Baroffio A, Gerbase MW. Associations between motivational factors for studying medicine, learning approaches and empathy among medical school candidates. Adv Health Sci Educ. 2019;24:287–300.

    Article  Google Scholar 

  25. Tartas M, Walkiewicz M, Majkowicz M, Budzinski W. Psychological factors determining success in a medical career: a 10-year longitudinal study. Med Teach. 2011;33(3):e163–72.

    Article  Google Scholar 

  26. Papastavrou E, Dimitriadou M, Tsangari H, Andreou C. Nursing students’ satisfaction of the clinical learning environment: a research study. BMC Nurs. 2016;15:1–10.

    Article  Google Scholar 

  27. Park M. The relationships among learning behaviors, major satisfaction, and study skills of first-year medical students. Korean J Med Educ. 2011;23(2):83–93.

    Article  Google Scholar 

  28. Odacı H, Çıkrıkçı N, İrem Değerli F. The role of problem-solving skills in career decision-making self-efficacy and vocational outcome expectations. Int J Educational Reform. 2023;32(4):448–63.

    Article  Google Scholar 

  29. Jamali R, Moslemi N, Khabaz Mafinejad M, Alizadeh M, Shariat Moharari R. Medical students’ satisfaction with a web-based Training Module of clinical reasoning. Strides Dev Med Educ. 2020;17(1):1–5.

    Google Scholar 

  30. Alzaabi S, Nasaif M, Khamis AH, Otaki F, Zary N, Mascarenhas S. Medical students’ perception and perceived value of peer learning in undergraduate clinical skill development and assessment: mixed methods study. JMIR Med Educ. 2021;7(3):e25875.

    Article  Google Scholar 

  31. Chamberland M, Mamede S, St-Onge C, Setrakian J, Schmidt HG. Does medical students’ diagnostic performance improve by observing examples of self-explanation provided by peers or experts? Adv Health Sci Educ. 2015;20:981–93.

    Article  Google Scholar 

  32. Hamad SMS, Iqbal S, Alothri AM, Alghamadi MAA, Elhelow MKKA. To teach is to learn twice added value of peer learning among medical students during COVID-19 pandemic. MedEdPublish. 2020;9:127.

  33. Rastegar Kazerooni A, Amini M, Tabari P, Moosavi M. Peer mentoring for medical students during the COVID-19 pandemic via a social media platform. Med Educ. 2020;54(8).

  34. Tanveer MA, Mildestvedt T, Skjærseth IG, Arntzen HH, Kenne E, Bonnevier A et al. Peer teaching in undergraduate medical education: what are the learning outputs for the student-teachers? A systematic review. Adv Med Educ Pract. 2023;14:723–39.

  35. Elstein AS, Schwarz A. Clinical problem solving and diagnostic decision making: selective review of the cognitive literature. BMJ. 2002;324(7339):729–32.

    Article  Google Scholar 

  36. Koufidis C, Manninen K, Nieminen J, Wohlin M, Silén C. Grounding judgement in context: a conceptual learning model of clinical reasoning. Med Educ. 2020;54(11):1019–28.

    Article  Google Scholar 

  37. Nguyen NN, Le TT, Thi Nguyen B-P, Nguyen A. Examining effects of students’ innovative behaviour and problem-solving skills on crisis management self-efficacy: policy implications for higher education. Policy Futures Educ. 2024;22(1):1–20.

    Article  Google Scholar 

  38. Levin M, Cennimo D, Chen S, Lamba S. Teaching clinical reasoning to medical students: a case-based illness script worksheet approach. MedEdPORTAL. 2016;12:10445.

    Article  Google Scholar 

  39. Wu B, Wang M, Johnson JM, Grotzer TA. Improving the learning of clinical reasoning through computer-based cognitive representation. Med Educ Online. 2014;19:25940.

    Article  Google Scholar 

  40. Miller GE. The assessment of clinical skills/competence/performance. Acad Med. 1990;65(9):S63–7.

    Article  Google Scholar 

  41. Bradley CS, Dreifuerst KT, Johnson BK, Loomis A. More than a meme: the Dunning-Kruger effect as an opportunity for positive change in nursing education. Clin Simul Nurs. 2022;66:58–65.

    Article  Google Scholar 

  42. Dunning D. The Dunning–Kruger effect: On being ignorant of one’s own ignorance. Adv Exp Soc Psychol. 2011;44:247– 96.

  43. Gesundheit N, Brutlag P, Youngblood P, Gunning WT, Zary N, Fors U. The use of virtual patients to assess the clinical skills and reasoning of medical students: initial insights on student acceptance. Med Teach. 2009;31(8):739–42.

    Article  Google Scholar 

  44. Johnson G, Flagler S. Web-based unfolding cases: a strategy to enhance and evaluate clinical reasoning skills. J Nurs Educ. 2013;52(10):589–92.

    Article  Google Scholar 

  45. Chan KS, Zary N. Applications and Challenges of Implementing Artificial Intelligence in Medical Education: integrative review. JMIR Med Educ. 2019;5(1):e13930.

    Article  Google Scholar 

  46. Chowdhury MN-U-R, Haque A, editors. ChatGPT: its applications and limitations. 3rd International Conference on Intelligent Technologies (CONIT); 2023. IEEE.

  47. Biswas S. The function of Chat GPT in social media: according to ChatGPT. Available at SSRN 4405389. 2023.

Download references

Acknowledgements

The author would like to thank the participants for their participation in this study.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

SÖ: Conception, design of the project, observation, manuscript writing, FT: Technical observation, quantitative data analysis, manuscript writing, HHÜ: Observation, Qualitative data analysis, SÖ, FT and HHÜ: Data collection, manuscript editing, critical revision of the article.

Corresponding author

Correspondence to Selcen Öncü.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the Non-Interventional Ethics Committee of the Faculty of Medicine, Aydın Adnan Menderes University (Prot.No:2024/121). The participants were informed about the survey’s voluntary nature and use for research purposes before their participation and informed consent was obtained. The participants were assured that their findings would remain confidential. We confirm that all methods were carried out in accordance with relevant guidelines and regulations.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

12909_2025_6877_MOESM1_ESM.doc

Supplementary Material 1: Self-assessment Survey on Intern Physicians’ Competencies in Clinical Case Management Using AI Assisted Case Analysis.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Öncü, S., Torun, F. & Ülkü, H.H. AI-powered standardised patients: evaluating ChatGPT-4o’s impact on clinical case management in intern physicians. BMC Med Educ 25, 278 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12909-025-06877-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12909-025-06877-6

Keywords