Accuracy and Consistency of Gemini Responses Regarding the Management of Traumatized Permanent Teeth.
Google Gemini
artificial intelligence
dental trauma
endodontic management
large language models
tooth injuries
Journal
Dental traumatology : official publication of International Association for Dental Traumatology
ISSN: 1600-9657
Titre abrégé: Dent Traumatol
Pays: Denmark
ID NLM: 101091305
Informations de publication
Date de publication:
26 Oct 2024
26 Oct 2024
Historique:
revised:
27
09
2024
received:
24
08
2024
accepted:
29
09
2024
medline:
26
10
2024
pubmed:
26
10
2024
entrez:
26
10
2024
Statut:
aheadofprint
Résumé
The aim of this cross-sectional observational analytical study was to assess the accuracy and consistency of responses provided by Google Gemini (GG), a free-access high-performance multimodal large language model, to questions related to the European Society of Endodontology position statement on the management of traumatized permanent teeth (MTPT). Three academic endodontists developed a set of 99 yes/no questions covering all areas of the MTPT. Nine general dentists and 22 endodontic specialists evaluated these questions for clarity and comprehension through an iterative process. Two academic dental trauma experts categorized the knowledge required to answer each question into three levels. The three academic endodontists submitted the 99 questions to the GG, resulting in 297 responses, which were then assessed for accuracy and consistency. Accuracy was evaluated using the Wald binomial method, while the consistency of GG responses was assessed using the kappa-Fleiss coefficient with a confidence interval of 95%. A 5% significance level chi-squared test was used to evaluate the influence of question level of knowledge on accuracy and consistency. The responses generated by Gemini showed an overall moderate accuracy of 80.81%, with no significant differences found between the responses of the academic endodontists. Overall, high consistency (95.96%) was demonstrated, with no significant differences between GG responses across the three accounts. The analysis also revealed no correlation between question level of knowledge and accuracy or consistency, with no significant differences. The results of this study could significantly impact the potential use of Gemini as a free-access source of information for clinicians in the MTPT.
Sections du résumé
BACKGROUND
BACKGROUND
The aim of this cross-sectional observational analytical study was to assess the accuracy and consistency of responses provided by Google Gemini (GG), a free-access high-performance multimodal large language model, to questions related to the European Society of Endodontology position statement on the management of traumatized permanent teeth (MTPT).
MATERIALS AND METHODS
METHODS
Three academic endodontists developed a set of 99 yes/no questions covering all areas of the MTPT. Nine general dentists and 22 endodontic specialists evaluated these questions for clarity and comprehension through an iterative process. Two academic dental trauma experts categorized the knowledge required to answer each question into three levels. The three academic endodontists submitted the 99 questions to the GG, resulting in 297 responses, which were then assessed for accuracy and consistency. Accuracy was evaluated using the Wald binomial method, while the consistency of GG responses was assessed using the kappa-Fleiss coefficient with a confidence interval of 95%. A 5% significance level chi-squared test was used to evaluate the influence of question level of knowledge on accuracy and consistency.
RESULTS
RESULTS
The responses generated by Gemini showed an overall moderate accuracy of 80.81%, with no significant differences found between the responses of the academic endodontists. Overall, high consistency (95.96%) was demonstrated, with no significant differences between GG responses across the three accounts. The analysis also revealed no correlation between question level of knowledge and accuracy or consistency, with no significant differences.
CONCLUSIONS
CONCLUSIONS
The results of this study could significantly impact the potential use of Gemini as a free-access source of information for clinicians in the MTPT.
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© 2024 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Références
S. Petti, U. Glendor, and L. Andersson, “World Traumatic Dental Injury Prevalence and Incidence, a Meta‐Analysis‐One Billion Living People Have Had Traumatic Dental Injuries,” Dental Traumatology 34 (2018): 71–86.
M. Majewski, P. Kostrzewska, S. Ziółkowska, N. Kijek, and K. Malinowski, “Traumatic Dental Injuries—Practical Management Guide,” Polski Merkuriusz Lekarski 50 (2022): 216–218.
R. Lam, “Epidemiology and Outcomes of Traumatic Dental Injuries: A Review of the Literature,” Australian Dental Journal 61 (2016): 4–20.
Google Gemini. “Gemini Team, Google. Gemini,” https://gemini.google.com/app.
J. A. Rodrigues, J. Krois, and F. Schwendicke, “Demystifying Artificial Intelligence and Deep Learning in Dentistry,” Brazilian Oral Research 35 (2021): e094.
S. B. Khanagar, A. Alfadley, K. Alfouzan, M. Awawdeh, A. Alaqla, and A. Jamleh, “Developments and Performance of Artificial Intelligence Models Designed for Application in Endodontics: A Systematic Review,” Diagnostics 13, no. 3 (2023): 414.
K. O'Hern, E. Yang, and N. Y. Vidal, “ChatGPT Underperforms in Triaging Appropriate Use of Mohs Surgery for Cutaneous Neoplasms,” JAAD International 12 (2023): 168–170.
R. K. Sinha, A. Deb Roy, N. Kumar, and H. Mondal, “Applicability of ChatGPT in Assisting to Solve Higher Order Problems in Pathology,” Cureus 15 (2023): e140097.
Google Gemini. “Gemini Team, Google. Gemini,” https://gemini.google.com/updates.
A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention Is all You Need,” arXiv 7 (2023): 1–15.
A. A. Rahsepar, N. Tavakoli, G. H. J. Kim, C. Hassani, F. Abtin, and A. Bedayat, “How AI Responds to Common Lung Cancer Questions: ChatGPT Versus Google Bard,” Radiology 307, no. 5 (2023): e230922.
Gemini Team, R. Anil, S. Borgeaud, et al., “Gemini: A Family of Highly Capable Multimodal Models,” arXiv 4 (2024): 1–90.
L. Ouyang, J. Wu, X. Jiang, et al., “Training Language Models to Follow Instructions With Human Feedback,” arXiv 1 (2022): 1–68.
R. Thoppilan, D. De Freitas, J. Hall, et al., “LaMDA: Language Models for Dialog Applications,” arXiv 3 (2022): 1–47.
Z. W. Lim, K. Pushpanathan, S. M. E. Yew, et al., “Benchmarking Large Language Models' Performances for Myopia Care: A Comparative Analysis of ChatGPT‐3.5, ChatGPT‐4.0, and Google Bard,” eBioMedicine 95 (2023): 104770.
H. Mohammad‐Rahimi, S. A. Ourang, M. A. Pourhoseingholi, O. Dianat, P. M. H. Dummer, and A. Nosrat, “Validity and Reliability of Artificial Intelligence Chatbots as Public Sources of Information on Endodontics,” International Endodontic Journal 57, no. 3 (2024): 305–314.
N. Norori, Q. Hu, F. M. Aellen, F. D. Faraci, and A. Tzovara, “Addressing Bias in Big Data and AI for Health Care: A Call for Open Science,” Patterns 2 (2021): 100347.
A. Suárez, V. Díaz‐Flores García, J. Algar, M. Gómez Sánchez, M. Llorente de Pedro, and Y. Freire, “Unveiling the ChatGPT Phenomenon: Evaluating the Consistency and Accuracy of Endodontic Question Answers,” International Endodontic Journal 57, no. 1 (2024): 108–113.
G. Krastl, R. Weiger, A. Filippi, et al., “European Society of Endodontology Position Statement: Endodontic Management of Traumatized Permanent Teeth,” International Endodontic Journal 54 (2021): 1473–1481.
K. Moulaei, A. Yadegari, M. Baharestani, S. Farzanbakhsh, B. Sabet, and M. Reza Afrash, “Generative Artificial Intelligence in Healthcare: A Scoping Review on Benefits, Challenges and Applications,” International Journal of Medical Informatics 188 (2024): 105474.
P. Rajpurkar, E. Chen, O. Banerjee, and E. J. Topol, “AI in Health and Medicine,” Nature Medicine 28 (2022): 31–38.
T. Tu, S. Azizi, D. Driess, et al., “Towards Generalist Biomedical AI,” arXiv 1 (2023): 1–37.
M. Moor, O. Banerjee, Z. S. H. Abad, et al., “Foundation Models for Generalist Medical Artificial Intelligence,” Nature 616 (2023): 259–265.
F. Umer and S. Habib, “Critical Analysis of Artificial Intelligence in Endodontics: A Scoping Review,” Journal of Endodontia 48 (2022): 152–160.
R. Ali, O. Y. Tang, I. D. Connolly, et al., “Performance of ChatGPT, GPT‐4, and Google Bard on a Neurosurgery Oral Boards Preparation Question Bank,” Neurosurgery 93, no. 5 (2023): 1090–1098.
K. Saab, T. Tu, W. H. Weng, et al., “Capabilities of Gemini Models in Medicine,” arXiv 2 (2024): 1–58.
S. E. Uribe, I. Maldupa, A. Kavadella, et al., “Artificial Intelligence Chatbots and Large Language Models in Dental Education: Worldwide Survey of Educators,” European Journal of Dental Education 28 (2024): 1–12.
Gemini Team, M. Reid, N. Savinov, et al., “Gemini 1.5: Unlocking Multimodal Understanding Across Millions of Tokens of Context,” arXiv 4 (2024): 1–154.