Skip to main content

Table 1 Performance metrics for all evaluated strategies on the EUNACOM Exam. Mean scores, standard deviations (SD), API calls, and mean completion time (in sec- onds) are shown

From: Performance of single-agent and multi-agent language models in Spanish language medical competency exams

Category

Strategy

Accuracy (Mean % ± SD)

API Calls

Time (s)

Single-agent

COT + Few-Shot Few-Shot

87.67% ± 0.12% 86.88% ± 0.40%

1.00 1.00

1.74 1.61

 

CoT MEDPROMPT

86.86% ± 0.37%

1.00

2.26

86.96% ± 0.44%

1.00

2.95

 

SELF-REFLECTION

85.38% ± 0.22%

2.65

4.15

 

ZERO-SHOT

85.90% ± 0.32%

1.00

1.53

 

MDAGENTS

89.97% ± 0.56%

21.14

192.44

 

MEDAGENTS

87.99% ± 0.49%

17.00

63.95

Multi-agent

VOTING

87.22% ± 0.31%

6.00

12.51

 

BORDA COUNT

86.70% ± 0.18%

6.00

13.03

 

Weighted Voting

86.68% ± 0.18%

6.00

12.43