Automated data preparation for

PET cancer data preprocessing hybrid imaging machine learning

Journal

Frontiers in oncology
ISSN: 2234-943X
Titre abrégé: Front Oncol
Pays: Switzerland
ID NLM: 101568867

Informations de publication

Date de publication:
2022
Historique:
received: 12 08 2022
accepted: 23 09 2022
entrez: 28 10 2022
pubmed: 29 10 2022
medline: 29 10 2022
Statut: epublish

Résumé

This study proposes machine learning-driven data preparation (MLDP) for optimal data preparation (DP) prior to building prediction models for cancer cohorts. A collection of well-established DP methods were incorporated for building the DP pipelines for various clinical cohorts prior to machine learning. Evolutionary algorithm principles combined with hyperparameter optimization were employed to iteratively select the best fitting subset of data preparation algorithms for the given dataset. The proposed method was validated for glioma and prostate single center cohorts by 100-fold Monte Carlo (MC) cross-validation scheme with 80-20% training-validation split ratio. In addition, a dual-center diffuse large B-cell lymphoma (DLBCL) cohort was utilized with Center 1 as training and Center 2 as independent validation datasets to predict cohort-specific clinical endpoints. Five machine learning (ML) classifiers were employed for building prediction models across all analyzed cohorts. Predictive performance was estimated by confusion matrix analytics over the validation sets of each cohort. The performance of each model with and without MLDP, as well as with manually-defined DP were compared in each of the four cohorts. Sixteen of twenty established predictive models demonstrated area under the receiver operator characteristics curve (AUC) performance increase utilizing the MLDP. The MLDP resulted in the highest performance increase for random forest (RF) (+0.16 AUC) and support vector machine (SVM) (+0.13 AUC) model schemes for predicting 36-months survival in the glioma cohort. Single center cohorts resulted in complex (6-7 DP steps) DP pipelines, with a high occurrence of outlier detection, feature selection and synthetic majority oversampling technique (SMOTE). In contrast, the optimal DP pipeline for the dual-center DLBCL cohort only included outlier detection and SMOTE DP steps. This study demonstrates that data preparation prior to ML prediction model building in cancer cohorts shall be ML-driven itself, yielding optimal prediction models in both single and multi-centric settings.

Sections du résumé

Background UNASSIGNED
This study proposes machine learning-driven data preparation (MLDP) for optimal data preparation (DP) prior to building prediction models for cancer cohorts.
Methods UNASSIGNED
A collection of well-established DP methods were incorporated for building the DP pipelines for various clinical cohorts prior to machine learning. Evolutionary algorithm principles combined with hyperparameter optimization were employed to iteratively select the best fitting subset of data preparation algorithms for the given dataset. The proposed method was validated for glioma and prostate single center cohorts by 100-fold Monte Carlo (MC) cross-validation scheme with 80-20% training-validation split ratio. In addition, a dual-center diffuse large B-cell lymphoma (DLBCL) cohort was utilized with Center 1 as training and Center 2 as independent validation datasets to predict cohort-specific clinical endpoints. Five machine learning (ML) classifiers were employed for building prediction models across all analyzed cohorts. Predictive performance was estimated by confusion matrix analytics over the validation sets of each cohort. The performance of each model with and without MLDP, as well as with manually-defined DP were compared in each of the four cohorts.
Results UNASSIGNED
Sixteen of twenty established predictive models demonstrated area under the receiver operator characteristics curve (AUC) performance increase utilizing the MLDP. The MLDP resulted in the highest performance increase for random forest (RF) (+0.16 AUC) and support vector machine (SVM) (+0.13 AUC) model schemes for predicting 36-months survival in the glioma cohort. Single center cohorts resulted in complex (6-7 DP steps) DP pipelines, with a high occurrence of outlier detection, feature selection and synthetic majority oversampling technique (SMOTE). In contrast, the optimal DP pipeline for the dual-center DLBCL cohort only included outlier detection and SMOTE DP steps.
Conclusions UNASSIGNED
This study demonstrates that data preparation prior to ML prediction model building in cancer cohorts shall be ML-driven itself, yielding optimal prediction models in both single and multi-centric settings.

Identifiants

pubmed: 36303841
doi: 10.3389/fonc.2022.1017911
pmc: PMC9595446
doi:

Types de publication

Journal Article

Langues

eng

Pagination

1017911

Informations de copyright

Copyright © 2022 Krajnc, Spielvogel, Grahovac, Ecsedi, Rasul, Poetsch, Traub-Weidinger, Haug, Ritter, Alizadeh, Hacker, Beyer and Papp.

Déclaration de conflit d'intérêts

MH, LP, and TB are co-founders of Dedicaid GmbH, Austria. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Références

Sci Rep. 2015 Aug 17;5:13087
pubmed: 26278466
Front Oncol. 2022 Jun 08;12:820136
pubmed: 35756658
Artif Intell Med. 2020 Apr;104:101815
pubmed: 32498997
Radiother Oncol. 2020 Dec;153:97-105
pubmed: 33137396
Eur J Radiol. 2020 Jul;128:109043
pubmed: 32438261
Philos Trans A Math Phys Eng Sci. 2016 Apr 13;374(2065):20150202
pubmed: 26953178
J Nucl Med. 2020 Apr;61(4):488-495
pubmed: 32060219
Eur Radiol. 2020 Jul;30(7):3834-3842
pubmed: 32162004
Eur J Nucl Med Mol Imaging. 2021 Jun;48(6):1795-1805
pubmed: 33341915
Eur J Nucl Med Mol Imaging. 2020 Nov;47(12):2826-2835
pubmed: 32253486
Eur J Gastroenterol Hepatol. 2007 Dec;19(12):1046-54
pubmed: 17998827
Eur J Nucl Med Mol Imaging. 2021 Feb;48(2):340-349
pubmed: 32737518
Eur J Nucl Med Mol Imaging. 2021 May;48(5):1538-1549
pubmed: 33057772
Mol Oncol. 2012 Apr;6(2):182-95
pubmed: 22469618
Cancers (Basel). 2021 Mar 12;13(6):
pubmed: 33809057
J Nucl Med. 2018 Jun;59(6):892-899
pubmed: 29175980
Cancers (Basel). 2020 Oct 13;12(10):
pubmed: 33066161
Nature. 2015 May 28;521(7553):436-44
pubmed: 26017442
Sci Adv. 2019 Apr 26;5(4):eaau6792
pubmed: 31032399
Radiology. 2020 May;295(2):328-338
pubmed: 32154773
Front Oncol. 2021 Mar 02;11:603882
pubmed: 33738250
Eur J Nucl Med Mol Imaging. 2022 Jan;49(2):596-608
pubmed: 34374796
Front Oncol. 2015 Dec 03;5:272
pubmed: 26697407
Phys Med. 2021 Mar;83:9-24
pubmed: 33662856
J Pers Med. 2021 Oct 22;11(11):
pubmed: 34834414
Sci Rep. 2020 Oct 12;10(1):17024
pubmed: 33046736
Front Aging Neurosci. 2017 Oct 06;9:329
pubmed: 29056906
Mol Imaging Biol. 2020 Jun;22(3):730-738
pubmed: 31338709
Radiology. 2019 Feb;290(2):479-487
pubmed: 30526358
SN Comput Sci. 2021;2(6):420
pubmed: 34426802
Cancers (Basel). 2021 Jun 11;13(12):
pubmed: 34208197
Comput Assist Surg (Abingdon). 2019 Oct;24(sup2):62-72
pubmed: 31403330
Front Oncol. 2022 Jan 28;12:788968
pubmed: 35155231
World J Surg. 2016 Aug;40(8):2036-42
pubmed: 27220508

Auteurs

Denis Krajnc (D)

QIMP Team, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria.

Clemens P Spielvogel (CP)

Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria.
Christian Doppler Laboratory for Applied Metabolomics, Medical University of Vienna, Vienna, Austria.

Marko Grahovac (M)

Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria.

Boglarka Ecsedi (B)

QIMP Team, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria.

Sazan Rasul (S)

Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria.

Nina Poetsch (N)

Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria.

Tatjana Traub-Weidinger (T)

Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria.

Alexander R Haug (AR)

Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria.
Christian Doppler Laboratory for Applied Metabolomics, Medical University of Vienna, Vienna, Austria.

Zsombor Ritter (Z)

Department of Medical Imaging, University of Pécs, Medical School, Pécs, Hungary.

Hussain Alizadeh (H)

1st Department of Internal Medicine, University of Pécs, Medical School, Pécs, Hungary.

Marcus Hacker (M)

Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria.

Thomas Beyer (T)

QIMP Team, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria.

Laszlo Papp (L)

QIMP Team, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria.
Applied Quantum Computing group, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria.

Classifications MeSH