Multiple-instance learning of somatic mutations for the classification of tumour type and the prediction of microsatellite status.
Journal
Nature biomedical engineering
ISSN: 2157-846X
Titre abrégé: Nat Biomed Eng
Pays: England
ID NLM: 101696896
Informations de publication
Date de publication:
02 Nov 2023
02 Nov 2023
Historique:
received:
14
04
2023
accepted:
30
09
2023
medline:
3
11
2023
pubmed:
3
11
2023
entrez:
3
11
2023
Statut:
aheadofprint
Résumé
Large-scale genomic data are well suited to analysis by deep learning algorithms. However, for many genomic datasets, labels are at the level of the sample rather than for individual genomic measures. Machine learning models leveraging these datasets generate predictions by using statically encoded measures that are then aggregated at the sample level. Here we show that a single weakly supervised end-to-end multiple-instance-learning model with multi-headed attention can be trained to encode and aggregate the local sequence context or genomic position of somatic mutations, hence allowing for the modelling of the importance of individual measures for sample-level classification and thus providing enhanced explainability. The model solves synthetic tasks that conventional models fail at, and achieves best-in-class performance for the classification of tumour type and for predicting microsatellite status. By improving the performance of tasks that require aggregate information from genomic datasets, multiple-instance deep learning may generate biological insight.
Identifiants
pubmed: 37919367
doi: 10.1038/s41551-023-01120-3
pii: 10.1038/s41551-023-01120-3
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© 2023. The Author(s).
Références
Ching, T. et al. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15, 20170387 (2018).
doi: 10.1098/rsif.2017.0387
pubmed: 29618526
pmcid: 5938574
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
doi: 10.1038/nmeth.3547
pubmed: 26301843
pmcid: 4768299
Routhier, E. & Mozziconacci, J. Genomics enters the deep learning era. PeerJ 10, e13613 (2022).
doi: 10.7717/peerj.13613
pubmed: 35769139
pmcid: 9235815
Altman, N. S. & Krzywinski, M. The curse(s) of dimensionality. Nat. Methods 15, 399–400 (2018).
doi: 10.1038/s41592-018-0019-x
pubmed: 29855577
Elmarakeby, H. A. et al. Biologically informed deep neural network for prostate cancer discovery. Nature 598, 348–352 (2021).
doi: 10.1038/s41586-021-03922-4
pubmed: 34552244
pmcid: 8514339
Dietterich, T. G., Lathrop, R. H. & Lozano-Pérez, T. Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89, 31–71 (1997).
doi: 10.1016/S0004-3702(96)00034-3
Amores, J. Multiple instance classification: review, taxonomy and comparative study. Artif. Intell. 201, 81–105 (2013).
doi: 10.1016/j.artint.2013.06.003
Carbonneau, M.-A., Cheplygina, V., Granger, E. & Gagnon, G. Multiple instance learning: a survey of problem characteristics and applications. Pattern Recognit. 77, 329–353 (2018).
doi: 10.1016/j.patcog.2017.10.009
Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).
doi: 10.1038/s41551-020-00682-w
pubmed: 33649564
pmcid: 8711640
Lu, M. Y. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 594, 106–110 (2021).
doi: 10.1038/s41586-021-03512-4
pubmed: 33953404
Chen, R. J. et al. Whole slide images are 2D point clouds: context-aware survival prediction using patch-based graph convolutional networks. In Medical Image Computing and Computer Assisted Intervention (MICCAI 2021) 339–349 (Springer International, 2021).
Kim, S., Lee, H., Kim, K. & Kang, J. Mut2Vec: distributed representation of cancerous mutations. BMC Med. Genet. 11, 33 (2018).
Palazzo, M., Beauseroy, P. & Yankilevich, P. A pan-cancer somatic mutation embedding using autoencoders. BMC Bioinform. 20, 655 (2019).
doi: 10.1186/s12859-019-3298-z
Peng, J., Zou, D., Gong, W., Kang, S. & Han, L. Deep neural network classification based on somatic mutations potentially predicts clinical benefit of immune checkpoint blockade in lung adenocarcinoma. Oncoimmunology 9, 1734156 (2020).
doi: 10.1080/2162402X.2020.1734156
pubmed: 32158626
pmcid: 7051190
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
doi: 10.1038/nature12477
pubmed: 23945592
pmcid: 3776390
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
doi: 10.1038/s41586-020-1943-3
pubmed: 32025018
pmcid: 7054213
Jiao, W. et al. A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns. Nat. Commun. 11, 728 (2020).
doi: 10.1038/s41467-019-13825-8
pubmed: 32024849
pmcid: 7002586
Ilse, M., Tomczak, J. M. & Welling, M. Attention-based deep multiple instance learning. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 2127–2136 (PMLR, 2018).
Pavlidis, N. & Pentheroudakis, G. Cancer of unknown primary site. Lancet 379, 1428–1435 (2012).
doi: 10.1016/S0140-6736(11)61178-1
pubmed: 22414598
Salvadores, M., Mas-Ponte, D. & Supek, F. Passenger mutations accurately classify human tumors. PLoS Comput. Biol. 15, e1006953 (2019).
doi: 10.1371/journal.pcbi.1006953
pubmed: 30986244
pmcid: 6483366
Danyi, A., Jager, M. & de Ridder, J. Cancer type classification in liquid biopsies based on sparse mutational profiles enabled through data augmentation and integration. Life 12, 1 (2021).
doi: 10.3390/life12010001
pubmed: 35054395
pmcid: 8780455
Sanjaya, P. et al. Mutation-Attention (MuAt): deep representation learning of somatic mutations for tumour typing and subtyping. Genome Med. 15, 47 (2023).
doi: 10.1186/s13073-023-01204-4
pubmed: 37420249
pmcid: 10326961
Kautto, E. A. et al. Performance evaluation for rapid detection of pan-cancer microsatellite instability with MANTIS. Oncotarget 8, 7452–7463 (2017).
doi: 10.18632/oncotarget.13918
pubmed: 27980218
Wang, C. & Liang, C. MSIpred: a python package for tumor microsatellite instability classification from tumor mutation annotation data using a support vector machine. Sci. Rep. 8, 17546 (2018).
doi: 10.1038/s41598-018-35682-z
pubmed: 30510242
pmcid: 6277498
Goodman, B. & Flaxman, S. European Union regulations on algorithmic decision-making and a ‘right to explanation’. AI Mag. 38, 50–57 (2017).
Gadermayr, M. & Tschuchnig, M. Multiple instance learning for digital pathology: a review on the state-of-the-art, limitations & future potential. Preprint at arXiv https://doi.org/10.48550/arXiv.2206.04425 (2022).
Li, J. et al. A multi-resolution model for histopathology image classification and localization with multiple instance learning. Comput. Biol. Med. 131, 104253 (2021).
doi: 10.1016/j.compbiomed.2021.104253
pubmed: 33601084
pmcid: 7984430
Sharma, Y. et al. Cluster-to-conquer: a framework for end-to-end multi-instance learning for whole slide image classification. In International Conference on Medical Imaging with Deep Learning 682–698 (PMLR, 2021).
Yan, Y. et al. Deep multi-instance learning with dynamic pooling. In Proc. 10th Asian Conference on Machine Learning (eds Zhu, J. & Takeuchi, I.) 662–677 (PMLR, 2018).
Carlile, B., Delamarter, G., Kinney, P., Marti, A. & Whitney, B. Improving deep learning by inverse square root linear units (ISRLUs). Preprint at arXiv https://doi.org/10.48550/arXiv.1710.09967 (2017).
Tareen, A. & Kinney, J. B. Logomaker: beautiful sequence logos in Python. Bioinformatics 36, 2272–2274 (2020).
doi: 10.1093/bioinformatics/btz921
pubmed: 31821414
Ellrott, K. et al. Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell Syst. 6, 271–281 (2018).
doi: 10.1016/j.cels.2018.03.002
pubmed: 29596782
pmcid: 6075717
Cancer Genome Atlas Network Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).
doi: 10.1038/nature11252
Levine, D. A. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013).
doi: 10.1038/nature12113
pubmed: 23636398
pmcid: 3704730
Berger, A. C. et al. A comprehensive pan-cancer molecular study of gynecologic and breast cancers. Cancer Cell 33, 690–705 (2018).
doi: 10.1016/j.ccell.2018.03.014
pubmed: 29622464
pmcid: 5959730
Liu, Y. et al. Comparative molecular analysis of gastrointestinal adenocarcinomas. Cancer Cell 33, 721–735 (2018).
doi: 10.1016/j.ccell.2018.03.010
pubmed: 29622466
pmcid: 5966039
Bonneville, R. et al. Landscape of microsatellite instability across 39 cancer types. JCO Precis. Oncol. https://doi.org/10.1200/PO.17.00073 (2017).
doi: 10.1200/PO.17.00073
pubmed: 29850653
pmcid: 5972025
Stovner, E. B. & Sætrom, P. PyRanges: efficient comparison of genomic intervals in Python. Bioinformatics 36, 918–919 (2020).
doi: 10.1093/bioinformatics/btz615
pubmed: 31373614