From: Evaluation of the performance of large language models in clinical decision-making in endodontics
Language Model | N | Mean | StDev | Misinformation Rate | |
---|---|---|---|---|---|
Google BardA | 40 | 1,3000 | 0,5164 | 25% | |
ChatGPT-3.5A, B | 40 | 1,4500 | 0,5524 | 15% | |
ChatGPT-4B | 40 | 1,7000 | 0,5164 | 10% |