Assessing data gathering of chatbot based symptom checkers - a clinical vignettes study.
Artificial intelligence
Chatbots
Computer-assisted diagnosis
Data-gathering
Diagnosis
Medical interview
Symptom checker
Telemedicine
Triage
Journal
International journal of medical informatics
ISSN: 1872-8243
Titre abrégé: Int J Med Inform
Pays: Ireland
ID NLM: 9711057
Informations de publication
Date de publication:
12 2022
12 2022
Historique:
received:
02
05
2022
revised:
09
10
2022
accepted:
10
10
2022
pubmed:
29
10
2022
medline:
11
11
2022
entrez:
28
10
2022
Statut:
ppublish
Résumé
The burden on healthcare systems is mounting continuously owing to population growth and aging, overuse of medical services, and the recent COVID-19 pandemic. This overload is also causing reduced healthcare quality and outcomes. One solution gaining momentum is the integration of intelligent self-assessment tools, known as symptom-checkers, into healthcare-providers' systems. To the best of our knowledge, no study so far has investigated the data-gathering capabilities of these tools, which represent a crucial resource for simulating doctors' skills in medical-interviews. The goal of this study was to evaluate the data-gathering function of currently available chatbot symptom-checkers. We evaluated 8 symptom-checkers using 28 clinical vignettes from the repository of MSD-Manual case studies. The mean number of predefined pertinent findings for each case was 31.8 ± 6.8. The vignettes were entered into the platforms by 3 medical students who simulated the role of the patient. For each conversation, we obtained the number of pertinent findings retrieved and the number of questions asked. We then calculated the recall-rates (pertinent-findings retrieved out of all predefined pertinent-findings), and efficiency-rates (pertinent-findings retrieved out of the number of questions asked) of data-gathering, and compared them between the platforms. The overall recall rate for all symptom-checkers was 0.32(2,280/7,112;95 %CI 0.31-0.33) for all pertinent findings, 0.37(1,110/2,992;95 %CI 0.35-0.39) for present findings, and 0.28(1140/4120;95 %CI 0.26-0.29) for absent findings. Among the symptom-checkers, Kahun platform had the highest recall rate with 0.51(450/889;95 %CI 0.47-0.54). Out of 4,877 questions asked overall, 2,280 findings were gathered, yielding an efficiency rate of 0.46(95 %CI 0.45-0.48) across all platforms. Kahun was the most efficient tool 0.74 (95 %CI 0.70-0.77) without a statistically significant difference from Your.MD 0.69(95 %CI 0.65-0.73). The data-gathering performance of currently available symptom checkers is questionable. From among the tools available, Kahun demonstrated the best overall performance.
Sections du résumé
BACKGROUND
The burden on healthcare systems is mounting continuously owing to population growth and aging, overuse of medical services, and the recent COVID-19 pandemic. This overload is also causing reduced healthcare quality and outcomes. One solution gaining momentum is the integration of intelligent self-assessment tools, known as symptom-checkers, into healthcare-providers' systems. To the best of our knowledge, no study so far has investigated the data-gathering capabilities of these tools, which represent a crucial resource for simulating doctors' skills in medical-interviews.
OBJECTIVES
The goal of this study was to evaluate the data-gathering function of currently available chatbot symptom-checkers.
METHODS
We evaluated 8 symptom-checkers using 28 clinical vignettes from the repository of MSD-Manual case studies. The mean number of predefined pertinent findings for each case was 31.8 ± 6.8. The vignettes were entered into the platforms by 3 medical students who simulated the role of the patient. For each conversation, we obtained the number of pertinent findings retrieved and the number of questions asked. We then calculated the recall-rates (pertinent-findings retrieved out of all predefined pertinent-findings), and efficiency-rates (pertinent-findings retrieved out of the number of questions asked) of data-gathering, and compared them between the platforms.
RESULTS
The overall recall rate for all symptom-checkers was 0.32(2,280/7,112;95 %CI 0.31-0.33) for all pertinent findings, 0.37(1,110/2,992;95 %CI 0.35-0.39) for present findings, and 0.28(1140/4120;95 %CI 0.26-0.29) for absent findings. Among the symptom-checkers, Kahun platform had the highest recall rate with 0.51(450/889;95 %CI 0.47-0.54). Out of 4,877 questions asked overall, 2,280 findings were gathered, yielding an efficiency rate of 0.46(95 %CI 0.45-0.48) across all platforms. Kahun was the most efficient tool 0.74 (95 %CI 0.70-0.77) without a statistically significant difference from Your.MD 0.69(95 %CI 0.65-0.73).
CONCLUSION
The data-gathering performance of currently available symptom checkers is questionable. From among the tools available, Kahun demonstrated the best overall performance.
Identifiants
pubmed: 36306653
pii: S1386-5056(22)00211-8
doi: 10.1016/j.ijmedinf.2022.104897
pmc: PMC9595333
pii:
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
104897Commentaires et corrections
Type : CommentIn
Informations de copyright
Copyright © 2022 The Author(s). Published by Elsevier B.V. All rights reserved.
Déclaration de conflit d'intérêts
Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Références
J Med Internet Res. 2020 Jun 19;22(6):e15154
pubmed: 32558657
Australas Psychiatry. 2011 Dec;19(6):484-8
pubmed: 22077299
JAMA. 2017 Nov 7;318(17):1668-1678
pubmed: 29114831
J Med Internet Res. 2020 Dec 1;22(12):e24087
pubmed: 33147166
J Prim Care Community Health. 2020 Jan-Dec;11:2150132720980612
pubmed: 33300414
BMJ. 2015 Jul 08;351:h3480
pubmed: 26157077
Lancet. 2017 Apr 8;389(10077):1431-1441
pubmed: 28402825
Yale J Biol Med. 1973 Oct;46(4):264-83
pubmed: 4775683
Med J Aust. 2020 Jun;212(11):514-519
pubmed: 32391611
PLoS One. 2021 Jul 15;16(7):e0254088
pubmed: 34265845
Front Artif Intell. 2020 Nov 30;3:543405
pubmed: 33733203
Ann Intern Med. 1978 Aug;89(2):245-55
pubmed: 677593
Adv Health Sci Educ Theory Pract. 2012 Mar;17(1):65-79
pubmed: 21505841
Med Educ. 2011 Sep;45(9):927-38
pubmed: 21848721
Intern Emerg Med. 2015 Mar;10(2):171-5
pubmed: 25446540
Acad Med. 1998 Oct;73(10 Suppl):S1-5
pubmed: 9795635
BMJ Open. 2017 Jun 21;7(6):e015141
pubmed: 28637730
Lancet. 2018 Nov 10;392(10159):1789-1858
pubmed: 30496104
Front Public Health. 2022 May 12;10:870314
pubmed: 35646786
JMIR Med Inform. 2019 May 01;7(2):e13445
pubmed: 31042151
Ann Emerg Med. 2008 Aug;52(2):126-36
pubmed: 18433933
J Med Internet Res. 2019 Nov 7;21(11):e15360
pubmed: 31697237
J Med Internet Res. 2020 Nov 30;22(11):e20549
pubmed: 33170799
BMJ Open. 2020 Dec 16;10(12):e040269
pubmed: 33328258
Lancet. 2018 Nov 24;392(10161):2263-2264
pubmed: 30413281
J Telemed Telecare. 2014 Apr;20(3):123-7
pubmed: 24643948
BMC Med Educ. 2020 Oct 19;20(1):368
pubmed: 33076879
Acad Emerg Med. 2009 Jan;16(1):1-10
pubmed: 19007346
Med Decis Making. 1990 Jan-Mar;10(1):31-46
pubmed: 2182962
BMJ Open. 2019 Aug 1;9(8):e027743
pubmed: 31375610
Arch Acad Emerg Med. 2019 Jun 3;7(1):34
pubmed: 31555764
Comput Methods Programs Biomed. 2021 Sep;209:106357
pubmed: 34438223
Lancet. 2017 Jul 8;390(10090):156-168
pubmed: 28077234
Acute Med Surg. 2022 Mar 01;9(1):e740
pubmed: 35251669
JAMA Intern Med. 2016 Dec 1;176(12):1860-1861
pubmed: 27723877
Emerg Med Australas. 2021 Aug;33(4):592-600
pubmed: 33724707