Skip to main content

Table 6 Comparison of evaluators

From: Is AI the future of evaluation in medical education?? AI vs. human evaluation in objective structured clinical examination

Analysis

Evaluator

N

M(Sd)

Test

p

Significant Difference

First Analysis

H1

47

15.19(2.6)

20.052

0.001

H1 < H2

H1 < AI1

H2 > RT

H2 > AI2

RT < AI1

AI1 > AI2

H2

47

18.74(1.0)

RT

47

15.49(2.9)

AI1

47

17.81(0.9)

AI2

47

16.30(3.2)

Second Analysis

(1) Human Evaluating from Video

94

16.97(2.6)

6.312

0.002

1 > 2

2 < 3

(2) Real Time Human Evaluator

47

15.49(2.9)

(3) Artificial Intelligence Evaluator

94

17.05(2.5)

Third Analysis

(1) Human Evaluating

141

16.48(2.8)

-1.620

0.107

None

(2) Artificial Intelligence Evaluator

94

17.05(2.5)

  1. H1: Human Evaluating from Video 1, H2: Human Evaluating from Video 2, RT: Real Time Human Evaluator, AI1: ChatGPT, AI2: Gemini Flash, M: Mean, Sd: Standard Deviation, *ANOVA Test, **Independent Sample T-Test