Automating Ischemic Stroke Subtype Classification Using Machine Learning and Natural Language Processing.


Journal

Journal of stroke and cerebrovascular diseases : the official journal of National Stroke Association
ISSN: 1532-8511
Titre abrégé: J Stroke Cerebrovasc Dis
Pays: United States
ID NLM: 9111633

Informations de publication

Date de publication:
Jul 2019
Historique:
received: 20 08 2018
revised: 30 01 2019
accepted: 09 02 2019
pubmed: 20 5 2019
medline: 23 7 2019
entrez: 20 5 2019
Statut: ppublish

Résumé

The manual adjudication of disease classification is time-consuming, error-prone, and limits scaling to large datasets. In ischemic stroke (IS), subtype classification is critical for management and outcome prediction. This study sought to use natural language processing of electronic health records (EHR) combined with machine learning methods to automate IS subtyping. Among IS patients from an observational registry with TOAST subtyping adjudicated by board-certified vascular neurologists, we analyzed unstructured text-based EHR data including neurology progress notes and neuroradiology reports using natural language processing. We performed several feature selection methods to reduce the high dimensionality of the features and 5-fold cross validation to test generalizability of our methods and minimize overfitting. We used several machine learning methods and calculated the kappa values for agreement between each machine learning approach to manual adjudication. We then performed a blinded testing of the best algorithm against a held-out subset of 50 cases. Compared to manual classification, the best machine-based classification achieved a kappa of .25 using radiology reports alone, .57 using progress notes alone, and .57 using combined data. Kappa values varied by subtype being highest for cardioembolic (.64) and lowest for cryptogenic cases (.47). In the held-out test subset, machine-based classification agreed with rater classification in 40 of 50 cases (kappa .72). Automated machine learning approaches using textual data from the EHR shows agreement with manual TOAST classification. The automated pipeline, if externally validated, could enable large-scale stroke epidemiology research.

Identifiants

pubmed: 31103549
pii: S1052-3057(19)30048-5
doi: 10.1016/j.jstrokecerebrovasdis.2019.02.004
pii:
doi:

Types de publication

Journal Article Observational Study Validation Study

Langues

eng

Sous-ensembles de citation

IM

Pagination

2045-2051

Informations de copyright

Copyright © 2019. Published by Elsevier Inc.

Auteurs

Ravi Garg (R)

Department of Neurology, Northwestern University, Feinberg School of Medicine, Chicago, Illinois.

Elissa Oh (E)

Department of Neurology, Northwestern University, Feinberg School of Medicine, Chicago, Illinois.

Andrew Naidech (A)

Department of Neurology, Northwestern University, Feinberg School of Medicine, Chicago, Illinois.

Konrad Kording (K)

University of Pennsylvania, Philadelphia, Pennsylvania.

Shyam Prabhakaran (S)

Department of Neurology, Pritzker School of Medicine, University of Chicago, Chicago, Illinois. Electronic address: shyam1@uchicago.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH