Skip to main content

Table 4 Comparison of evaluators

From: Is AI the future of evaluation in medical education?? AI vs. human evaluation in objective structured clinical examination

Analysis

Evaluator

N

M(Sd)

Test

p

Significant Difference

First Analysis

H1

58

10.55(5.8)

27.268*

< 0.001

H1 < AI1

H1 < AI2

H2 < RT

H2 < AI1

H2 < AI2

RT < AI1

H2

58

8.33(7)

RT

58

12.45(5.9)

AI1

58

17.10(0.9)

AI2

58

15.03(3.4)

Second Analysis

(1) Human Evaluating from Video

116

9.44(6.5)

47.968*

< 0.001

1 < 2

1 < 3

2 < 3

(2) Real Time Human Evaluator

58

12.45(5.9)

(3) Artificial Intelligence Evaluator

116

16.07(2.7)

Third Analysis

(1) Human Evaluating

174

10.44(6.4)

-8.913**

< 0.001

1 < 2

(2) Artificial Intelligence Evaluator

116

16.07(2.7)

  1. H1: Human Evaluating from Video 1, H2: Human Evaluating from Video 2, RT: Real Time Human Evaluator, AI1: ChatGPT, AI2: Gemini Flash, M: Mean, Sd: Standard Deviation, *ANOVA Test, **Independent Sample T-Test