A cross sectional investigation of ChatGPT-like large language models application among medical students in China

BMC Medical Education

Table 3 Perceptions of medical students towards the shortcomings of large language models

	Gender¹		Understanding of large language models²			Degree of trust in the information provided by large language models²
	Male	Female	Never heard	Heard but never used	Heard and used	Low trust	Moderate trust	High trust
Offering ineffective assistance
Agree	320 (39.0)	261 (29.1)	94	256	231	132	414	35
Disagree	500 (61.0)	637 (70.9)	207	568	362	126	912	99
χ2	18.90		10.68			42.22
p-value	< 0.001		< 0.001			< 0.001
Lacking the ability to reason through complex issues
Agree	462 (56.3)	472 (52.6)	125	454	355	169	720	45
Disagree	358 (43.7)	426 (47.4)	176	370	238	89	606	89
χ2	2.47		27.40			36.23
p-value	0.12		< 0.001			< 0.001
Results lack interpretability
Agree	426 (52.0)	420 (46.8)	117	426	303	167	633	46
Disagree	394 (48.0)	478 (53.2)	184	398	290	91	693	88
χ2	4.60		15.76			37.88
p-value	0.03		< 0.001			< 0.001
Fabricating content haphazardly
Agree	285 (34.8)	224 (24.9)	78	227	204	122	353	34
Disagree	535 (65.2)	674 (75.1)	223	597	389	136	973	100
χ2	19.79		10.18			45.50
p-value	< 0.001		0.006			< 0.001
Limited intelligence and understanding of things
Agree	484 (59.0)	456 (50.8)	112	447	381	166	719	55
Disagree	336 (41.0)	442 (49.2)	189	377	212	92	607	79
χ2	11.76		59.06			19.88
P-value	< 0.001		< 0.001			< 0.001
Superficial content with a strong sense of patchiness
Agree	477 (58.2)	459 (51.1)	113	450	373	172	720	44
Disagree	343 (41.8)	439 (48.9)	188	374	220	86	606	90
χ2	8.61		51.79			40.78
P-value	0.003		< 0.001			< 0.001
Unable to grasp core issues within context
Agree	525 (64.0)	507 (56.5)	131	517	383	178	797	56
Disagree	296 (36.1)	391 (43.5)	170	307	210	80	529	78
χ2	9.90		41.84			27.21
P-value	0.002		< 0.001			< 0.001

ISSN: 1472-6920