Artificial intelligence performance in answering multiple-choice oral pathology questions: a comparative analysis

BMC Oral Health

Table 4 Pairwise comparisons of large language models’ for case-based question

	Gemini 1.5	Gemini 2	Copilot	Deepseek	Claude	ChatGPT 4o	ChatGPT 4	ChatGPT o1
Gemini 1.5	-	0.525	0.315	0.195	0.687	0.315	0.019	0.389
Gemini2	0.525	-	0.517	0.345	0.446	0.517	0.043	0.227
Copilot	0.315	0.517	-	0.765	0.164	1.000	0.162	0.070
Deepseek	0.195	0.345	0.765	-	0.094	0.764	0.269	0.037
Claude	0.687	0.446	0.164	0.094	-	0.164	0.002	0.640
ChatGPT 4o	0.315	0.517	1.000	0.764	0.164	-	0.162	0.070
ChatGPT 4	0.019	0.043	0.162	0.269	0.002	0.162	-	0.002
ChatGPT o1	0.389	0.227	0.070	0.037	0.640	0.070	0.002	-

ISSN: 1472-6831