Developing and validating natural language processing algorithms for radiology reports compared to ICD-10 codes for identifying venous thromboembolism in hospitalized medical patients.


Journal

Thrombosis research
ISSN: 1879-2472
Titre abrégé: Thromb Res
Pays: United States
ID NLM: 0326377

Informations de publication

Date de publication:
Jan 2022
Historique:
received: 29 09 2021
revised: 17 11 2021
accepted: 18 11 2021
pubmed: 7 12 2021
medline: 27 1 2022
entrez: 6 12 2021
Statut: ppublish

Résumé

Identifying venous thromboembolism (VTE) from large clinical and administrative databases is important for research and quality improvement. To develop and validate natural language processing (NLP) algorithms to identify VTE from radiology reports among general internal medicine (GIM) inpatients. This cross-sectional study included GIM hospitalizations between April 1, 2010 and March 31, 2017 at 5 hospitals in Toronto, Ontario, Canada. We developed NLP algorithms to identify pulmonary embolism (PE) and deep venous thrombosis (DVT) from radiologist reports of thoracic computed tomography (CT), extremity compression ultrasound (US), and nuclear ventilation-perfusion (VQ) scans in a training dataset of 1551 hospitalizations. We compared the accuracy of our NLP algorithms, the previously-published "simpleNLP" tool, and administrative discharge diagnosis codes (ICD-10-CA) for PE and DVT to the "gold standard" manual review in a separate random sample of 4000 GIM hospitalizations. Our NLP algorithms were highly accurate for identifying DVT from US, with sensitivity 0.94, positive predictive value (PPV) 0.90, and Area Under the Receiver-Operating-Characteristic Curve (AUC) 0.96; and in identifying PE from CT, with sensitivity 0.91, PPV 0.89, and AUC 0.96. Administrative diagnosis codes and the simple NLP tool were less accurate for DVT (ICD-10-CA sensitivity 0.63, PPV 0.43, AUC 0.81; simpleNLP sensitivity 0.41, PPV 0.36, AUC 0.66) and PE (ICD-10-CA sensitivity 0.83, PPV 0.70, AUC 0.91; simpleNLP sensitivity 0.89, PPV 0.62, AUC 0.92). Administrative diagnosis codes are unreliable in identifying VTE in hospitalized patients. We developed highly accurate NLP algorithms to identify VTE from radiology reports in a multicentre sample and have made the algorithms freely available to the academic community with a user-friendly tool (https://lks-chart.github.io/CHARTextract-docs/08-downloads/rulesets.html#venous-thromboembolism-vte-rulesets).

Sections du résumé

BACKGROUND BACKGROUND
Identifying venous thromboembolism (VTE) from large clinical and administrative databases is important for research and quality improvement.
OBJECTIVE OBJECTIVE
To develop and validate natural language processing (NLP) algorithms to identify VTE from radiology reports among general internal medicine (GIM) inpatients.
METHODS METHODS
This cross-sectional study included GIM hospitalizations between April 1, 2010 and March 31, 2017 at 5 hospitals in Toronto, Ontario, Canada. We developed NLP algorithms to identify pulmonary embolism (PE) and deep venous thrombosis (DVT) from radiologist reports of thoracic computed tomography (CT), extremity compression ultrasound (US), and nuclear ventilation-perfusion (VQ) scans in a training dataset of 1551 hospitalizations. We compared the accuracy of our NLP algorithms, the previously-published "simpleNLP" tool, and administrative discharge diagnosis codes (ICD-10-CA) for PE and DVT to the "gold standard" manual review in a separate random sample of 4000 GIM hospitalizations.
RESULTS RESULTS
Our NLP algorithms were highly accurate for identifying DVT from US, with sensitivity 0.94, positive predictive value (PPV) 0.90, and Area Under the Receiver-Operating-Characteristic Curve (AUC) 0.96; and in identifying PE from CT, with sensitivity 0.91, PPV 0.89, and AUC 0.96. Administrative diagnosis codes and the simple NLP tool were less accurate for DVT (ICD-10-CA sensitivity 0.63, PPV 0.43, AUC 0.81; simpleNLP sensitivity 0.41, PPV 0.36, AUC 0.66) and PE (ICD-10-CA sensitivity 0.83, PPV 0.70, AUC 0.91; simpleNLP sensitivity 0.89, PPV 0.62, AUC 0.92).
CONCLUSIONS CONCLUSIONS
Administrative diagnosis codes are unreliable in identifying VTE in hospitalized patients. We developed highly accurate NLP algorithms to identify VTE from radiology reports in a multicentre sample and have made the algorithms freely available to the academic community with a user-friendly tool (https://lks-chart.github.io/CHARTextract-docs/08-downloads/rulesets.html#venous-thromboembolism-vte-rulesets).

Identifiants

pubmed: 34871982
pii: S0049-3848(21)00529-6
doi: 10.1016/j.thromres.2021.11.020
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

51-58

Informations de copyright

Copyright © 2021 Elsevier Ltd. All rights reserved.

Auteurs

Amol A Verma (AA)

St. Michael's Hospital, Unity Health Toronto, Toronto, ON, Canada; Department of Medicine, University of Toronto, Toronto, ON, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada. Electronic address: amol.verma@mail.utoronto.ca.

Hassan Masoom (H)

Department of Medicine, University of Toronto, Toronto, ON, Canada.

Chloe Pou-Prom (C)

St. Michael's Hospital, Unity Health Toronto, Toronto, ON, Canada.

Saeha Shin (S)

St. Michael's Hospital, Unity Health Toronto, Toronto, ON, Canada.

Michael Guerzhoy (M)

St. Michael's Hospital, Unity Health Toronto, Toronto, ON, Canada.

Michael Fralick (M)

Department of Medicine, University of Toronto, Toronto, ON, Canada; Department of Medicine, Sinai Health System, Toronto, ON, Canada.

Muhammad Mamdani (M)

St. Michael's Hospital, Unity Health Toronto, Toronto, ON, Canada; Department of Medicine, University of Toronto, Toronto, ON, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada; Leslie Dan Faculty of Pharmacy, University of Toronto, Canada.

Fahad Razak (F)

St. Michael's Hospital, Unity Health Toronto, Toronto, ON, Canada; Department of Medicine, University of Toronto, Toronto, ON, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH