Artificial intelligence performance in answering multiple-choice oral pathology questions: a comparative analysis

BMC Oral Health

Table 7 Pairwise comparisons of large language models’ for mucosal and tongue diseases

	Gemini 1.5	Gemini 2	Copilot	Deepseek	Claude	ChatGPT 4o	ChatGPT 4	ChatGPT o1
Gemini 1.5	-	0.688	0.015	0.640	0.640	0.688	0.053	0.076
Gemini2	0.688	-	0.037	0.389	0.389	1.000	0.117	0.038
Copilot	0.015	0.037	-	0.002	0.002	0.037	0.584	0.000
Deepseek	0.640	0.389	0.002	-	1.000	0.389	0.020	0.150
Claude	0.640	0.389	0.002	1.000	-	0.389	0.020	0.150
ChatGPT 4o	0.688	1.000	0.037	0.389	0.389	-	0.117	0.038
ChatGPT 4	0.053	0.117	0.584	0.020	0.020	0.117	-	0.001
ChatGPT o1	0.076	0.038	0.000	0.150	0.150	0.038	0.001	-

ISSN: 1472-6831