Artificial intelligence performance in answering multiple-choice oral pathology questions: a comparative analysis

BMC Oral Health

Table 5 Pairwise comparisons of large language models’ for knowledge-based question

	Gemini 1.5	Gemini 2	Copilot	Deepseek	Claude	ChatGPT 4o	ChatGPT 4	ChatGPT o1
Gemini 1.5	-	0.673	0.002	0.271	0.673	0.835	0.432	0.001
Gemini2	0.673	-	0.001	0.494	1.000	0.831	0.228	0.002
Copilot	0.002	0.001	-	0.000	0.000	0.001	0.023	0.000
Deepseek	0.271	0.494	0.000	-	0.494	0.370	0.061	0.016
Claude	0.673	1.000	0.000	0.494	-	0.831	0.228	0.002
ChatGPT 4o	0.835	0.831	0.001	0.370	0.831	-	0.320	0.001
ChatGPT 4	0.432	0.228	0.023	0.061	0.228	0.320	-	0.000
ChatGPT o1	0.001	0.002	0.000	0.016	0.002	0.001	0.000	-

ISSN: 1472-6831