Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study

Table 1 Performance of GPT-4o, GPT-4, GPT-3.5 and Google Bard in USMLE, PLAB, HKMLE and NMLE.

		GPT-4o (n/N, %)	GPT-4 (n/N, %)	GPT-3.5 (n/N, %)	Google Bard (n/N, %)
Overall		538/592(90.9%)	515/591(87.1%)	364/542(67.2%)	314/516(60.9%)
USMLE	Step 1 (119)	108/118(91.5%)	109/117(93.2%)	61/93(65.6%)	73/114(64.3%)
	Step 2CK (120)	113/120(94.2%)	114/120(95.0%)	78/109(71.6%)	50/90(55.6%)
	Step 3 (137)	127/137(92.7%)	126/137(92.0%)	85/124(68.5%)	61/105(58.1%)
PLAB (30)		28/30(93.3%)	26/30(86.7%)	24/30(80.0%)	13/24(54.2%)
HKMLE (48)		44/48(91.7%)	43/48(89.6%)	32/47(68.1%)	33/46(71.7%)
NMLE (139)		118/139(84.9%)	97/139(69.8%)	84/139(60.4%)	84/137(61.3%)

USMLE = United State Medical Licensing Examination; PLAB = Professional and Linguistic Assessments Board; HKMLE = Hong Kong Medical Licensing Examination; NMLE = National Medical Licensing Examination

ISSN: 1472-6920