Criteria2Query 3.0: Leveraging generative large language models for clinical trial eligibility query generation.

Artificial intelligence ChatGPT Eligibility prescreening Human–computer collaboration Large language models

Journal

Journal of biomedical informatics
ISSN: 1532-0480
Titre abrégé: J Biomed Inform
Pays: United States
ID NLM: 100970413

Informations de publication

Date de publication:
30 Apr 2024
Historique:
received: 12 11 2023
revised: 03 04 2024
accepted: 29 04 2024
medline: 3 5 2024
pubmed: 3 5 2024
entrez: 2 5 2024
Statut: aheadofprint

Résumé

Automated identification of eligible patients is a bottleneck of clinical research. We propose Criteria2Query (C2Q) 3.0, a system that leverages GPT-4 for the semi-automatic transformation of clinical trial eligibility criteria text into executable clinical database queries. C2Q 3.0 integrated three GPT-4 prompts for concept extraction, SQL query generation, and reasoning. Each prompt was designed and evaluated separately. The concept extraction prompt was benchmarked against manual annotations from 20 clinical trials by two evaluators, who later also measured SQL generation accuracy and identified errors in GPT-generated SQL queries from 5 clinical trials. The reasoning prompt was assessed by three evaluators on four metrics: readability, correctness, coherence, and usefulness, using corrected SQL queries and an open-ended feedback questionnaire. Out of 518 concepts from 20 clinical trials, GPT-4 achieved an F1-score of 0.891 in concept extraction. For SQL generation, 29 errors spanning seven categories were detected, with logic errors being the most common (n = 10; 34.48 %). Reasoning evaluations yielded a high coherence rating, with the mean score being 4.70 but relatively lower readability, with a mean of 3.95. Mean scores of correctness and usefulness were identified as 3.97 and 4.37, respectively. GPT-4 significantly improves the accuracy of extracting clinical trial eligibility criteria concepts in C2Q 3.0. Continued research is warranted to ensure the reliability of large language models.

Identifiants

pubmed: 38697494
pii: S1532-0464(24)00067-4
doi: 10.1016/j.jbi.2024.104649
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

104649

Informations de copyright

Copyright © 2024. Published by Elsevier Inc.

Déclaration de conflit d'intérêts

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Jimyung Park (J)

Department of Biomedical Informatics, Columbia University, New York, United States.

Yilu Fang (Y)

Department of Biomedical Informatics, Columbia University, New York, United States.

Casey Ta (C)

Department of Biomedical Informatics, Columbia University, New York, United States.

Gongbo Zhang (G)

Department of Biomedical Informatics, Columbia University, New York, United States.

Betina Idnay (B)

Department of Biomedical Informatics, Columbia University, New York, United States.

Fangyi Chen (F)

Department of Biomedical Informatics, Columbia University, New York, United States.

David Feng (D)

Department of Biomedical Informatics, Columbia University, New York, United States.

Rebecca Shyu (R)

Department of Biomedical Informatics, Columbia University, New York, United States.

Emily R Gordon (ER)

Columbia University Vagelos College of Physicians and Surgeons, New York, United States.

Matthew Spotnitz (M)

Department of Biomedical Informatics, Columbia University, New York, United States.

Chunhua Weng (C)

Department of Biomedical Informatics, Columbia University, New York, United States. Electronic address: cw2384@cumc.columbia.edu.

Classifications MeSH