Okan Turgut, H Melike Bayram* and Emre Bayram
Associate Professor, Tokat Gaziosmanpasa University, Faculty of Dentistry, Department of Endodontics, Tokat, Turkiye
*Corresponding Author: Huda Melike Bayram, Associate Professor, Tokat Gaziosmanpasa University, Faculty of Dentistry, Department of Endodontics, Tokat, Turkiye.
Received: August 25, 2025; Published: September 10, 2025
Aim: Objective: This study aimed to evaluate the performance of three large language models (LLMs)-Grok, ChatGPT, and DeepSeekin managing traumatic dental injuries (TDIs) based on their alignment with the International Association of Dental Traumatology (IADT) 2020 clinical guidelines.
Materials and Methods: Twenty open-ended prompts were constructed to reflect real-life TDI scenarios, aligned with the 2020 IADT guidelines. Each model was queried once per prompt with no re-prompting or interaction refinement. Responses were evaluated by a trained rater using a five-criteria rubric: scientific accuracy, reliability of information, comprehensibility, level of detail, and clinical applicability. Scoring was performed using a 3-point ordinal scale. One-way ANOVA and post-hoc comparisons were applied for statistical analysis.
Results: Grok outperformed both ChatGPT and DeepSeek in scientific accuracy, detail level, and information reliability (p < 0.001). ChatGPT and DeepSeek showed relatively higher scores in comprehensibility (p = 0.007). For clinical applicability, only the Grok– DeepSeek comparison was statistically significant (p = 0.016). Total score comparisons were substantial across all model pairs (p < 0.001).
Conclusion: Large language models exhibit distinct strengths across clinical performance metrics. Grok appears more suitable for guideline-based clinical decision support in TDI management, whereas ChatGPT and DeepSeek may be better suited for educational and communicative purposes. Purpose-driven model selection and continuous performance monitoring are recommended for safe and effective clinical integration.
Keywords: Artificial Intelligence; Large Language Models; Traumatic Dental Injuries; Clinical Decision Support; Guideline Adherence; IADT Guidelines
Citation: H Melike Bayram., , et al. “Comparative Evaluation of Artificial Intelligence Models for Traumatic Dental Injuries Based on Clinical Guideline Adherence".Acta Scientific Dental Sciences 9.10 (2025): 09-14.
Copyright: © 2025 H Melike Bayram., , et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
ff
© 2024 Acta Scientific, All rights reserved.