Skip to main content

Table 5 Mean, standard deviation values, and Inter-Rater reliability level of evaluators

From: Is AI the future of evaluation in medical education?? AI vs. human evaluation in objective structured clinical examination

Basic Life Support Criteria

Inter-rater

Reliability

Evaluator

H1

H2

RT

AI1

AI2

α

κ

M(Sd)

M(Sd)

M(Sd)

M(Sd)

M(Sd)

Checked the safety of the environment, themselves, and the patient (verbally stating this is sufficient).

0.267

0.100

0.94

(0.9)

1.51

(0.5)

0.96

(0.9)

2

(0.0)

1.89

(0.4)

Gently touched the patient/injured person’s shoulders and asked, “How are you? Are you okay?” (verbally stating this is sufficient).

0.042

0.038

1.87

(0.5)

1.96

(0.2)

1.94

(0.3)

1.79

(0.4)

1.98

(0.2)

If the patient is unconscious, called for help from the environment and gave the command “Call 112” to someone (verbally stating this is sufficient).

0.017

0.161

1.87

(0.5)

1.96

(0.2)

1.77

(0.5)

1.81

(0.4)

1.89

(0.4)

Checked the mouth, opened the airway using the “head-tilt, chin-lift” maneuver.

0.015

0.023

1.70

(0.6)

1.94

(0.3)

1.74

(0.5)

1.72

(0.5)

1.81

(0.5)

Assessed breathing for no more than 10 s using the “look, listen, feel” method and checked for pulse at the carotid artery.

0.051

0.024

1.60

(0.7)

1.96

(0.2)

1.57

(0.7)

1.87

(0.3)

1.68

(0.6)

Performed effective and correct chest compressions (correct hand position, correct compression point, correct depth, correct speed, and allowing chest recoil).

0.293

-0.014

1.55

(0.6)

2

(0.0)

1.34

(0.7)

1.66

(0.5)

1

(0.6)

After 30 chest compressions, effectively gave 2 rescue breaths with proper head-tilt and chin-lift position (closed the patient’s nostrils while giving breaths).

0.151

0.071

1.87

(0.3)

1.98

(0.2)

1.49

(0.5)

1.55

(0.5)

1.68

(0.5)

Minimized interruptions in chest compressions.

0.046

0.042

1.53

(0.9)

2

(0.0)

1.64

(0.6)

1.83

(0.4)

1.70

(0.6)

Continued performing chest compressions and rescue breaths in a 30/2 ratio for two minutes (or stated that they should do so).

0.143

-0.033

1.68

(0.7)

2

(0.0)

1.77

(0.5)

1.96

(0.2)

1.36

(0.8)

Checked the patient’s breathing and pulse every two minutes (verbally stating this is sufficient).

0.163

-0.003

0.57

(0.8)

1.45

(0.5)

1.28

(0.8)

1.62

(0.7)

1.30

(0.9)

  1. H1: Human Evaluating from Video 1, H2: Human Evaluating from Video 2, RT: Real Time Human Evaluator, AI1: ChatGPT, AI2: Gemini Flash, M: Mean, Sd: Standard Deviation, α: Krippendorff’s, κ: Fleiss