Multiple-instance learning of somatic mutations for the classification of tumour type and the prediction of microsatellite status.

Journal

Nature biomedical engineering

ISSN: 2157-846X

Titre abrégé: Nat Biomed Eng

Pays: England

ID NLM: 101696896

Informations de publication

Date de publication:
02 Nov 2023

Historique:

received: 14 04 2023

accepted: 30 09 2023

medline: 3 11 2023

pubmed: 3 11 2023

entrez: 3 11 2023

Statut: aheadofprint

Résumé

Large-scale genomic data are well suited to analysis by deep learning algorithms. However, for many genomic datasets, labels are at the level of the sample rather than for individual genomic measures. Machine learning models leveraging these datasets generate predictions by using statically encoded measures that are then aggregated at the sample level. Here we show that a single weakly supervised end-to-end multiple-instance-learning model with multi-headed attention can be trained to encode and aggregate the local sequence context or genomic position of somatic mutations, hence allowing for the modelling of the importance of individual measures for sample-level classification and thus providing enhanced explainability. The model solves synthetic tasks that conventional models fail at, and achieves best-in-class performance for the classification of tumour type and for predicting microsatellite status. By improving the performance of tasks that require aggregate information from genomic datasets, multiple-instance deep learning may generate biological insight.

Identifiants

DOI: 10.1038/s41551-023-01120-3 PMID: 37919367

pubmed: 37919367

doi: 10.1038/s41551-023-01120-3

pii: 10.1038/s41551-023-01120-3

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Informations de copyright

Références

Ching, T. et al. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15, 20170387 (2018).

doi: 10.1098/rsif.2017.0387 pubmed: 29618526 pmcid: 5938574

Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).

doi: 10.1038/nmeth.3547 pubmed: 26301843 pmcid: 4768299

Routhier, E. & Mozziconacci, J. Genomics enters the deep learning era. PeerJ 10, e13613 (2022).

doi: 10.7717/peerj.13613 pubmed: 35769139 pmcid: 9235815

Altman, N. S. & Krzywinski, M. The curse(s) of dimensionality. Nat. Methods 15, 399–400 (2018).

doi: 10.1038/s41592-018-0019-x pubmed: 29855577

Elmarakeby, H. A. et al. Biologically informed deep neural network for prostate cancer discovery. Nature 598, 348–352 (2021).

doi: 10.1038/s41586-021-03922-4 pubmed: 34552244 pmcid: 8514339

Dietterich, T. G., Lathrop, R. H. & Lozano-Pérez, T. Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89, 31–71 (1997).

doi: 10.1016/S0004-3702(96)00034-3

Amores, J. Multiple instance classification: review, taxonomy and comparative study. Artif. Intell. 201, 81–105 (2013).

doi: 10.1016/j.artint.2013.06.003

Carbonneau, M.-A., Cheplygina, V., Granger, E. & Gagnon, G. Multiple instance learning: a survey of problem characteristics and applications. Pattern Recognit. 77, 329–353 (2018).

doi: 10.1016/j.patcog.2017.10.009

Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).

doi: 10.1038/s41551-020-00682-w pubmed: 33649564 pmcid: 8711640

Lu, M. Y. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 594, 106–110 (2021).

doi: 10.1038/s41586-021-03512-4 pubmed: 33953404

Chen, R. J. et al. Whole slide images are 2D point clouds: context-aware survival prediction using patch-based graph convolutional networks. In Medical Image Computing and Computer Assisted Intervention (MICCAI 2021) 339–349 (Springer International, 2021).

Kim, S., Lee, H., Kim, K. & Kang, J. Mut2Vec: distributed representation of cancerous mutations. BMC Med. Genet. 11, 33 (2018).

Palazzo, M., Beauseroy, P. & Yankilevich, P. A pan-cancer somatic mutation embedding using autoencoders. BMC Bioinform. 20, 655 (2019).

doi: 10.1186/s12859-019-3298-z

Peng, J., Zou, D., Gong, W., Kang, S. & Han, L. Deep neural network classification based on somatic mutations potentially predicts clinical benefit of immune checkpoint blockade in lung adenocarcinoma. Oncoimmunology 9, 1734156 (2020).

doi: 10.1080/2162402X.2020.1734156 pubmed: 32158626 pmcid: 7051190

Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

doi: 10.1038/nature12477 pubmed: 23945592 pmcid: 3776390

Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).

doi: 10.1038/s41586-020-1943-3 pubmed: 32025018 pmcid: 7054213

Jiao, W. et al. A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns. Nat. Commun. 11, 728 (2020).

doi: 10.1038/s41467-019-13825-8 pubmed: 32024849 pmcid: 7002586

Ilse, M., Tomczak, J. M. & Welling, M. Attention-based deep multiple instance learning. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 2127–2136 (PMLR, 2018).

Pavlidis, N. & Pentheroudakis, G. Cancer of unknown primary site. Lancet 379, 1428–1435 (2012).

doi: 10.1016/S0140-6736(11)61178-1 pubmed: 22414598

Salvadores, M., Mas-Ponte, D. & Supek, F. Passenger mutations accurately classify human tumors. PLoS Comput. Biol. 15, e1006953 (2019).

doi: 10.1371/journal.pcbi.1006953 pubmed: 30986244 pmcid: 6483366

Danyi, A., Jager, M. & de Ridder, J. Cancer type classification in liquid biopsies based on sparse mutational profiles enabled through data augmentation and integration. Life 12, 1 (2021).

doi: 10.3390/life12010001 pubmed: 35054395 pmcid: 8780455

Sanjaya, P. et al. Mutation-Attention (MuAt): deep representation learning of somatic mutations for tumour typing and subtyping. Genome Med. 15, 47 (2023).

doi: 10.1186/s13073-023-01204-4 pubmed: 37420249 pmcid: 10326961

Kautto, E. A. et al. Performance evaluation for rapid detection of pan-cancer microsatellite instability with MANTIS. Oncotarget 8, 7452–7463 (2017).

doi: 10.18632/oncotarget.13918 pubmed: 27980218

Wang, C. & Liang, C. MSIpred: a python package for tumor microsatellite instability classification from tumor mutation annotation data using a support vector machine. Sci. Rep. 8, 17546 (2018).

doi: 10.1038/s41598-018-35682-z pubmed: 30510242 pmcid: 6277498

Goodman, B. & Flaxman, S. European Union regulations on algorithmic decision-making and a ‘right to explanation’. AI Mag. 38, 50–57 (2017).

Gadermayr, M. & Tschuchnig, M. Multiple instance learning for digital pathology: a review on the state-of-the-art, limitations & future potential. Preprint at arXiv https://doi.org/10.48550/arXiv.2206.04425 (2022).

Li, J. et al. A multi-resolution model for histopathology image classification and localization with multiple instance learning. Comput. Biol. Med. 131, 104253 (2021).

doi: 10.1016/j.compbiomed.2021.104253 pubmed: 33601084 pmcid: 7984430

Sharma, Y. et al. Cluster-to-conquer: a framework for end-to-end multi-instance learning for whole slide image classification. In International Conference on Medical Imaging with Deep Learning 682–698 (PMLR, 2021).

Yan, Y. et al. Deep multi-instance learning with dynamic pooling. In Proc. 10th Asian Conference on Machine Learning (eds Zhu, J. & Takeuchi, I.) 662–677 (PMLR, 2018).

Carlile, B., Delamarter, G., Kinney, P., Marti, A. & Whitney, B. Improving deep learning by inverse square root linear units (ISRLUs). Preprint at arXiv https://doi.org/10.48550/arXiv.1710.09967 (2017).

Tareen, A. & Kinney, J. B. Logomaker: beautiful sequence logos in Python. Bioinformatics 36, 2272–2274 (2020).

doi: 10.1093/bioinformatics/btz921 pubmed: 31821414

Ellrott, K. et al. Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell Syst. 6, 271–281 (2018).

doi: 10.1016/j.cels.2018.03.002 pubmed: 29596782 pmcid: 6075717

Cancer Genome Atlas Network Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).

doi: 10.1038/nature11252

Levine, D. A. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013).

doi: 10.1038/nature12113 pubmed: 23636398 pmcid: 3704730

Berger, A. C. et al. A comprehensive pan-cancer molecular study of gynecologic and breast cancers. Cancer Cell 33, 690–705 (2018).

doi: 10.1016/j.ccell.2018.03.014 pubmed: 29622464 pmcid: 5959730

Liu, Y. et al. Comparative molecular analysis of gastrointestinal adenocarcinomas. Cancer Cell 33, 721–735 (2018).

doi: 10.1016/j.ccell.2018.03.010 pubmed: 29622466 pmcid: 5966039

Bonneville, R. et al. Landscape of microsatellite instability across 39 cancer types. JCO Precis. Oncol. https://doi.org/10.1200/PO.17.00073 (2017).

doi: 10.1200/PO.17.00073 pubmed: 29850653 pmcid: 5972025

Stovner, E. B. & Sætrom, P. PyRanges: efficient comparison of genomic intervals in Python. Bioinformatics 36, 918–919 (2020).

doi: 10.1093/bioinformatics/btz615 pubmed: 31373614

Multiple-instance learning of somatic mutations for the classification of tumour type and the prediction of microsatellite status.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Informations de copyright

Références

Auteurs

Jordan Anaya (J)

John-William Sidhom (JW)

Faisal Mahmood (F)

Alexander S Baras (AS)

Classifications MeSH