Multiple-instance learning of somatic mutations for the classification of tumour type and the prediction of microsatellite status.


Journal

Nature biomedical engineering
ISSN: 2157-846X
Titre abrégé: Nat Biomed Eng
Pays: England
ID NLM: 101696896

Informations de publication

Date de publication:
02 Nov 2023
Historique:
received: 14 04 2023
accepted: 30 09 2023
medline: 3 11 2023
pubmed: 3 11 2023
entrez: 3 11 2023
Statut: aheadofprint

Résumé

Large-scale genomic data are well suited to analysis by deep learning algorithms. However, for many genomic datasets, labels are at the level of the sample rather than for individual genomic measures. Machine learning models leveraging these datasets generate predictions by using statically encoded measures that are then aggregated at the sample level. Here we show that a single weakly supervised end-to-end multiple-instance-learning model with multi-headed attention can be trained to encode and aggregate the local sequence context or genomic position of somatic mutations, hence allowing for the modelling of the importance of individual measures for sample-level classification and thus providing enhanced explainability. The model solves synthetic tasks that conventional models fail at, and achieves best-in-class performance for the classification of tumour type and for predicting microsatellite status. By improving the performance of tasks that require aggregate information from genomic datasets, multiple-instance deep learning may generate biological insight.

Identifiants

pubmed: 37919367
doi: 10.1038/s41551-023-01120-3
pii: 10.1038/s41551-023-01120-3
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© 2023. The Author(s).

Références

Ching, T. et al. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15, 20170387 (2018).
doi: 10.1098/rsif.2017.0387 pubmed: 29618526 pmcid: 5938574
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
doi: 10.1038/nmeth.3547 pubmed: 26301843 pmcid: 4768299
Routhier, E. & Mozziconacci, J. Genomics enters the deep learning era. PeerJ 10, e13613 (2022).
doi: 10.7717/peerj.13613 pubmed: 35769139 pmcid: 9235815
Altman, N. S. & Krzywinski, M. The curse(s) of dimensionality. Nat. Methods 15, 399–400 (2018).
doi: 10.1038/s41592-018-0019-x pubmed: 29855577
Elmarakeby, H. A. et al. Biologically informed deep neural network for prostate cancer discovery. Nature 598, 348–352 (2021).
doi: 10.1038/s41586-021-03922-4 pubmed: 34552244 pmcid: 8514339
Dietterich, T. G., Lathrop, R. H. & Lozano-Pérez, T. Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89, 31–71 (1997).
doi: 10.1016/S0004-3702(96)00034-3
Amores, J. Multiple instance classification: review, taxonomy and comparative study. Artif. Intell. 201, 81–105 (2013).
doi: 10.1016/j.artint.2013.06.003
Carbonneau, M.-A., Cheplygina, V., Granger, E. & Gagnon, G. Multiple instance learning: a survey of problem characteristics and applications. Pattern Recognit. 77, 329–353 (2018).
doi: 10.1016/j.patcog.2017.10.009
Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).
doi: 10.1038/s41551-020-00682-w pubmed: 33649564 pmcid: 8711640
Lu, M. Y. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 594, 106–110 (2021).
doi: 10.1038/s41586-021-03512-4 pubmed: 33953404
Chen, R. J. et al. Whole slide images are 2D point clouds: context-aware survival prediction using patch-based graph convolutional networks. In Medical Image Computing and Computer Assisted Intervention (MICCAI 2021) 339–349 (Springer International, 2021).
Kim, S., Lee, H., Kim, K. & Kang, J. Mut2Vec: distributed representation of cancerous mutations. BMC Med. Genet. 11, 33 (2018).
Palazzo, M., Beauseroy, P. & Yankilevich, P. A pan-cancer somatic mutation embedding using autoencoders. BMC Bioinform. 20, 655 (2019).
doi: 10.1186/s12859-019-3298-z
Peng, J., Zou, D., Gong, W., Kang, S. & Han, L. Deep neural network classification based on somatic mutations potentially predicts clinical benefit of immune checkpoint blockade in lung adenocarcinoma. Oncoimmunology 9, 1734156 (2020).
doi: 10.1080/2162402X.2020.1734156 pubmed: 32158626 pmcid: 7051190
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
doi: 10.1038/nature12477 pubmed: 23945592 pmcid: 3776390
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
doi: 10.1038/s41586-020-1943-3 pubmed: 32025018 pmcid: 7054213
Jiao, W. et al. A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns. Nat. Commun. 11, 728 (2020).
doi: 10.1038/s41467-019-13825-8 pubmed: 32024849 pmcid: 7002586
Ilse, M., Tomczak, J. M. & Welling, M. Attention-based deep multiple instance learning. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 2127–2136 (PMLR, 2018).
Pavlidis, N. & Pentheroudakis, G. Cancer of unknown primary site. Lancet 379, 1428–1435 (2012).
doi: 10.1016/S0140-6736(11)61178-1 pubmed: 22414598
Salvadores, M., Mas-Ponte, D. & Supek, F. Passenger mutations accurately classify human tumors. PLoS Comput. Biol. 15, e1006953 (2019).
doi: 10.1371/journal.pcbi.1006953 pubmed: 30986244 pmcid: 6483366
Danyi, A., Jager, M. & de Ridder, J. Cancer type classification in liquid biopsies based on sparse mutational profiles enabled through data augmentation and integration. Life 12, 1 (2021).
doi: 10.3390/life12010001 pubmed: 35054395 pmcid: 8780455
Sanjaya, P. et al. Mutation-Attention (MuAt): deep representation learning of somatic mutations for tumour typing and subtyping. Genome Med. 15, 47 (2023).
doi: 10.1186/s13073-023-01204-4 pubmed: 37420249 pmcid: 10326961
Kautto, E. A. et al. Performance evaluation for rapid detection of pan-cancer microsatellite instability with MANTIS. Oncotarget 8, 7452–7463 (2017).
doi: 10.18632/oncotarget.13918 pubmed: 27980218
Wang, C. & Liang, C. MSIpred: a python package for tumor microsatellite instability classification from tumor mutation annotation data using a support vector machine. Sci. Rep. 8, 17546 (2018).
doi: 10.1038/s41598-018-35682-z pubmed: 30510242 pmcid: 6277498
Goodman, B. & Flaxman, S. European Union regulations on algorithmic decision-making and a ‘right to explanation’. AI Mag. 38, 50–57 (2017).
Gadermayr, M. & Tschuchnig, M. Multiple instance learning for digital pathology: a review on the state-of-the-art, limitations & future potential. Preprint at arXiv https://doi.org/10.48550/arXiv.2206.04425 (2022).
Li, J. et al. A multi-resolution model for histopathology image classification and localization with multiple instance learning. Comput. Biol. Med. 131, 104253 (2021).
doi: 10.1016/j.compbiomed.2021.104253 pubmed: 33601084 pmcid: 7984430
Sharma, Y. et al. Cluster-to-conquer: a framework for end-to-end multi-instance learning for whole slide image classification. In International Conference on Medical Imaging with Deep Learning 682–698 (PMLR, 2021).
Yan, Y. et al. Deep multi-instance learning with dynamic pooling. In Proc. 10th Asian Conference on Machine Learning (eds Zhu, J. & Takeuchi, I.) 662–677 (PMLR, 2018).
Carlile, B., Delamarter, G., Kinney, P., Marti, A. & Whitney, B. Improving deep learning by inverse square root linear units (ISRLUs). Preprint at arXiv https://doi.org/10.48550/arXiv.1710.09967 (2017).
Tareen, A. & Kinney, J. B. Logomaker: beautiful sequence logos in Python. Bioinformatics 36, 2272–2274 (2020).
doi: 10.1093/bioinformatics/btz921 pubmed: 31821414
Ellrott, K. et al. Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell Syst. 6, 271–281 (2018).
doi: 10.1016/j.cels.2018.03.002 pubmed: 29596782 pmcid: 6075717
Cancer Genome Atlas Network Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).
doi: 10.1038/nature11252
Levine, D. A. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013).
doi: 10.1038/nature12113 pubmed: 23636398 pmcid: 3704730
Berger, A. C. et al. A comprehensive pan-cancer molecular study of gynecologic and breast cancers. Cancer Cell 33, 690–705 (2018).
doi: 10.1016/j.ccell.2018.03.014 pubmed: 29622464 pmcid: 5959730
Liu, Y. et al. Comparative molecular analysis of gastrointestinal adenocarcinomas. Cancer Cell 33, 721–735 (2018).
doi: 10.1016/j.ccell.2018.03.010 pubmed: 29622466 pmcid: 5966039
Bonneville, R. et al. Landscape of microsatellite instability across 39 cancer types. JCO Precis. Oncol. https://doi.org/10.1200/PO.17.00073 (2017).
doi: 10.1200/PO.17.00073 pubmed: 29850653 pmcid: 5972025
Stovner, E. B. & Sætrom, P. PyRanges: efficient comparison of genomic intervals in Python. Bioinformatics 36, 918–919 (2020).
doi: 10.1093/bioinformatics/btz615 pubmed: 31373614

Auteurs

Jordan Anaya (J)

Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, USA.

John-William Sidhom (JW)

The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
Bloomberg~Kimmel Institute for Cancer Immunotherapy, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA.

Faisal Mahmood (F)

Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA.
Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA.

Alexander S Baras (AS)

Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, USA. baras@jhmi.edu.
The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA. baras@jhmi.edu.
Bloomberg~Kimmel Institute for Cancer Immunotherapy, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA. baras@jhmi.edu.

Classifications MeSH