Skip to main content

Table 7 Pairwise comparisons of large language models’ for mucosal and tongue diseases

From: Artificial intelligence performance in answering multiple-choice oral pathology questions: a comparative analysis

 

Gemini 1.5

Gemini 2

Copilot

Deepseek

Claude

ChatGPT 4o

ChatGPT 4

ChatGPT o1

Gemini 1.5

-

0.688

0.015

0.640

0.640

0.688

0.053

0.076

Gemini2

0.688

-

0.037

0.389

0.389

1.000

0.117

0.038

Copilot

0.015

0.037

-

0.002

0.002

0.037

0.584

0.000

Deepseek

0.640

0.389

0.002

-

1.000

0.389

0.020

0.150

Claude

0.640

0.389

0.002

1.000

-

0.389

0.020

0.150

ChatGPT 4o

0.688

1.000

0.037

0.389

0.389

-

0.117

0.038

ChatGPT 4

0.053

0.117

0.584

0.020

0.020

0.117

-

0.001

ChatGPT o1

0.076

0.038

0.000

0.150

0.150

0.038

0.001

-

  1. *(p < 0.0031, bonferroni correction)