Skip to main content

Table 2 Comparison of evaluators

From: Is AI the future of evaluation in medical education?? AI vs. human evaluation in objective structured clinical examination

Analysis

Evaluator

N

M(Sd)

Test

p

Significant Difference

First Analysis

H1

43

24.33(3.2)

15.023*

< 0.001

H1 < AI1

H1 < AI2

H2 < AI1

RT < AI1

H2

43

25.56(3.4)

RT

43

25.86(2.7)

AI1

43

29.02(1.3)

AI2

43

27.44(4.1)

Second Analysis

(1) Human Evaluating from Video

86

24.94(3.3)

24.665*

< 0.001

1 < 3

2 < 3

(2) Real Time Human Evaluator

43

25.86(2.7)

(3) Artificial Intelligence Evaluator

86

28.23(3.1)

Third Analysis

(1) Human Evaluating

129

25.25(3.2)

-6.822**

< 0.001

1 < 2

(2) Artificial Intelligence Evaluator

86

28.23(3.1)

  1. H1: Human Evaluating from Video 1, H2: Human Evaluating from Video 2, RT: Real Time Human Evaluator, AI1: ChatGPT, AI2: Gemini Flash, M: Mean, Sd: Standard Deviation, *ANOVA Test, **Independent Sample T-Test