Evaluation of the performance of large language models in clinical decision-making in endodontics

BMC Oral Health

Table 3 Mean, standard deviation of the likert scales cores

Language Model	N	Mean	StDev	Misinformation Rate
Google Bard^A	40	1,3000	0,5164	25%
ChatGPT-3.5^{A, B}	40	1,4500	0,5524	15%
ChatGPT-4^B	40	1,7000	0,5164	10%

ISSN: 1472-6831