Artificial intelligence performance in answering multiple-choice oral pathology questions: a comparative analysis

BMC Oral Health

Table 2 Pairwise comparisons of large language models’ for all questions

	Gemini 1.5	Gemini 2	Copilot	Deepseek	Claude	ChatGPT 4o	ChatGPT 4	ChatGPT o1
Gemini 1.5	-	0.856	0.002	0.856	0.577	0.724	0.050	0.001
Gemini2	0.856	-	0.000	0.573	0.425	0.592	0.033	0.002
Copilot	0.002	0.000	-	0.001	0.000	0.004	0.150	0.000
Deepseek	0.856	0.573	0.001	-	0.707	0.592	0.024	0.001
Claude	0.577	0.425	0.000	0.707	-	0.363	0.012	0.005
ChatGPT 4o	0.724	0.592	0.004	0.592	0.363	-	0.107	0.000
ChatGPT 4	0.050	0.033	0.150	0.024	0.012	0.107	-	0.000
ChatGPT o1	0.001	0.002	0.000	0.001	0.005	0.000	0.000	-

ISSN: 1472-6831