BMC Oral Health

Table 1 Performance of large language models’ on DUS oral pathology questions

From: Artificial intelligence performance in answering multiple-choice oral pathology questions: a comparative analysis

	Correct	Incorrect	P*
Gemini 1.5	81	19	0.000
Gemini 2	82	18
Copilot	61	39
Deepseek	82	18
Claude	84	16
ChatGPT 4o	79	21
ChatGPT 4	69	31
ChatGPT o1	96	4

*Pearson Chi Square. The statistical significance level was set at P ≤ 0.05

Back to article page

ISSN: 1472-6831

Contact us

General enquiries: journalsubmissions@springernature.com