Developing and validating natural language processing algorithms for radiology reports compared to ICD-10 codes for identifying venous thromboembolism in hospitalized medical patients.
Deep vein thrombosis
ICD codes
Natural language processing
Pulmonary embolism
Validity
Journal
Thrombosis research
ISSN: 1879-2472
Titre abrégé: Thromb Res
Pays: United States
ID NLM: 0326377
Informations de publication
Date de publication:
Jan 2022
Jan 2022
Historique:
received:
29
09
2021
revised:
17
11
2021
accepted:
18
11
2021
pubmed:
7
12
2021
medline:
27
1
2022
entrez:
6
12
2021
Statut:
ppublish
Résumé
Identifying venous thromboembolism (VTE) from large clinical and administrative databases is important for research and quality improvement. To develop and validate natural language processing (NLP) algorithms to identify VTE from radiology reports among general internal medicine (GIM) inpatients. This cross-sectional study included GIM hospitalizations between April 1, 2010 and March 31, 2017 at 5 hospitals in Toronto, Ontario, Canada. We developed NLP algorithms to identify pulmonary embolism (PE) and deep venous thrombosis (DVT) from radiologist reports of thoracic computed tomography (CT), extremity compression ultrasound (US), and nuclear ventilation-perfusion (VQ) scans in a training dataset of 1551 hospitalizations. We compared the accuracy of our NLP algorithms, the previously-published "simpleNLP" tool, and administrative discharge diagnosis codes (ICD-10-CA) for PE and DVT to the "gold standard" manual review in a separate random sample of 4000 GIM hospitalizations. Our NLP algorithms were highly accurate for identifying DVT from US, with sensitivity 0.94, positive predictive value (PPV) 0.90, and Area Under the Receiver-Operating-Characteristic Curve (AUC) 0.96; and in identifying PE from CT, with sensitivity 0.91, PPV 0.89, and AUC 0.96. Administrative diagnosis codes and the simple NLP tool were less accurate for DVT (ICD-10-CA sensitivity 0.63, PPV 0.43, AUC 0.81; simpleNLP sensitivity 0.41, PPV 0.36, AUC 0.66) and PE (ICD-10-CA sensitivity 0.83, PPV 0.70, AUC 0.91; simpleNLP sensitivity 0.89, PPV 0.62, AUC 0.92). Administrative diagnosis codes are unreliable in identifying VTE in hospitalized patients. We developed highly accurate NLP algorithms to identify VTE from radiology reports in a multicentre sample and have made the algorithms freely available to the academic community with a user-friendly tool (https://lks-chart.github.io/CHARTextract-docs/08-downloads/rulesets.html#venous-thromboembolism-vte-rulesets).
Sections du résumé
BACKGROUND
BACKGROUND
Identifying venous thromboembolism (VTE) from large clinical and administrative databases is important for research and quality improvement.
OBJECTIVE
OBJECTIVE
To develop and validate natural language processing (NLP) algorithms to identify VTE from radiology reports among general internal medicine (GIM) inpatients.
METHODS
METHODS
This cross-sectional study included GIM hospitalizations between April 1, 2010 and March 31, 2017 at 5 hospitals in Toronto, Ontario, Canada. We developed NLP algorithms to identify pulmonary embolism (PE) and deep venous thrombosis (DVT) from radiologist reports of thoracic computed tomography (CT), extremity compression ultrasound (US), and nuclear ventilation-perfusion (VQ) scans in a training dataset of 1551 hospitalizations. We compared the accuracy of our NLP algorithms, the previously-published "simpleNLP" tool, and administrative discharge diagnosis codes (ICD-10-CA) for PE and DVT to the "gold standard" manual review in a separate random sample of 4000 GIM hospitalizations.
RESULTS
RESULTS
Our NLP algorithms were highly accurate for identifying DVT from US, with sensitivity 0.94, positive predictive value (PPV) 0.90, and Area Under the Receiver-Operating-Characteristic Curve (AUC) 0.96; and in identifying PE from CT, with sensitivity 0.91, PPV 0.89, and AUC 0.96. Administrative diagnosis codes and the simple NLP tool were less accurate for DVT (ICD-10-CA sensitivity 0.63, PPV 0.43, AUC 0.81; simpleNLP sensitivity 0.41, PPV 0.36, AUC 0.66) and PE (ICD-10-CA sensitivity 0.83, PPV 0.70, AUC 0.91; simpleNLP sensitivity 0.89, PPV 0.62, AUC 0.92).
CONCLUSIONS
CONCLUSIONS
Administrative diagnosis codes are unreliable in identifying VTE in hospitalized patients. We developed highly accurate NLP algorithms to identify VTE from radiology reports in a multicentre sample and have made the algorithms freely available to the academic community with a user-friendly tool (https://lks-chart.github.io/CHARTextract-docs/08-downloads/rulesets.html#venous-thromboembolism-vte-rulesets).
Identifiants
pubmed: 34871982
pii: S0049-3848(21)00529-6
doi: 10.1016/j.thromres.2021.11.020
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
51-58Informations de copyright
Copyright © 2021 Elsevier Ltd. All rights reserved.