Machine learning approach for the prediction of the number of sulphur atoms in peptides using the theoretical aggregated isotope distribution.


Journal

Rapid communications in mass spectrometry : RCM
ISSN: 1097-0231
Titre abrégé: Rapid Commun Mass Spectrom
Pays: England
ID NLM: 8802365

Informations de publication

Date de publication:
15 Mar 2023
Historique:
revised: 18 11 2022
received: 07 07 2022
accepted: 18 12 2022
medline: 5 4 2023
pubmed: 18 2 2023
entrez: 17 2 2023
Statut: ppublish

Résumé

The observed isotope distribution is an important attribute for the identification of peptides and proteins in mass spectrometry-based proteomics. Sulphur atoms have a very distinctive elemental isotope definition, and therefore, the presence of sulphur atoms has a substantial effect on the isotope distribution of biomolecules. Hence, knowledge of the number of sulphur atoms can improve the identification of peptides and proteins. In this paper, we conducted a theoretical investigation on the isotope properties of sulphur-containing peptides. We proposed a gradient boosting approach to predict the number of sulphur atoms based on the aggregated isotope distribution. We compared prediction accuracy and assessed the predictive power of the features using the mass and isotope abundance information from the first three, five and eight aggregated isotope peaks. Mass features alone are not sufficient to accurately predict the number of sulphur atoms. However, we reach near-perfect prediction when we include isotope abundance features. The abundance ratios of the eighth and the seventh, the fifth and the fourth, and the third and the second aggregated isotope peaks are the most important abundance features. The mass difference between the eighth, the fifth or the third aggregated isotope peaks and the monoisotopic peak are the most predictive mass features. Based on the validation analysis it can be concluded that the prediction of the number of sulphur atoms based on the isotope profile fails, because the isotope ratios are not measured accurately. These results indicate that it is valuable for future instrument developments to focus more on improving spectral accuracy to measure peak intensities of higher-order isotope peaks more accurately.

Identifiants

pubmed: 36798055
doi: 10.1002/rcm.9480
doi:

Substances chimiques

Peptides 0
Proteins 0
Isotopes 0
Sulfur 70FD1KFU70

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

e9480

Subventions

Organisme : Flanders AI research
Organisme : Fonds Wetenschappelijk Onderzoek
ID : VS02819N

Informations de copyright

© 2023 John Wiley & Sons Ltd.

Références

Palmblad M, Buijs J, Håkansson P. Automatic analysis of hydrogen/deuterium exchange mass spectra of peptides and proteins using calculations of isotopic distributions. J Am Soc Mass Spectrom. 2001;12(11):1153-1162. doi:10.1016/S1044-0305(01)00301-4
Ghavidel FZ, Mertens I, Baggerman G, Laukens K, Burzykowski T, Valkenborg D. The use of the isotopic distribution as a complementary quality metric to assess tandem mass spectra results. J Proteomics. 2014;98:150-158. doi:10.1016/j.jprot.2013.12.013
Yergey J, Heller D, Hansen G, Cotter RJ, Fenselau C. Isotopic distributions in mass spectra of large molecules. Anal Chem. 1983;55(2):353-356. doi:10.1021/ac00253a037
Polacco BJ, Purvine SO, Zink EM, et al. Discovering mercury protein modifications in whole proteomes using natural isotope distributions observed in liquid chromatography-tandem mass spectrometry. Mol Cell Proteomics. 2011;10(8):M110.004853.
Coursey JS, Schwab DJ, Tsai JJ, Dragoset RA. Atomic Weights and Isotopic Compositions (version 4.1). 2015.
Prohaska T, Irrgeher J, Benefield J, et al. Standard atomic weights of the elements 2021 (IUPAC technical report). Pure Appl Chem. 2022;94(5):573-600. doi:10.1515/pac-2019-0603
Stoll S. Isotopomers and Isotopologues: The history behind the confusion. Chem Educator. 2007;12:240-242.
Sleno L. The use of mass defect in modern mass spectrometry. J Mass Spectrom. 2012;47(2):226-236. doi:10.1002/jms.2953
van Oosten LN, Pinkse MWH, Pieterse M, Escoubas P, Verhaert P. High-accuracy mass spectrometry based screening method for the discovery of cysteine containing peptides in animal venoms and toxins. Methods Mol Biol. 2018;1719:335-348. doi:10.1007/978-1-4939-7537-2_22
Valkenborg D, Mertens I, Lemière F, Witters E, Burzykowski T. The isotopic distribution conundrum. Mass Spectrom Rev. 2012;31(1):96-109. doi:10.1002/mas.20339
Solouki T, Emmett MR, Guan S, Marshall AG. Detection, number, and sequence location of sulfur-containing amino acids and disulfide bridges in peptides by ultrahigh-resolution MALDI FTICR mass spectrometry. Anal Chem. 1997;69(6):1163-1168. doi:10.1021/ac960885q
Shi SD, Hendrickson CL, Marshall AG. Counting individual sulfur atoms in a protein by ultrahigh-resolution Fourier transform ion cyclotron resonance mass spectrometry: Experimental resolution of isotopic fine structure in proteins. Proc Natl Acad Sci U S A. 1998;95(20):11532-11537. doi:10.1073/pnas.95.20.11532
Miladinović SM, Kozhinov AN, Gorshkov MV, Tsybin YO. On the utility of isotopic fine structure mass spectrometry in protein identification. Anal Chem. 2012;84(9):4042-4051. doi:10.1021/ac2034584
Moseley HN. Correcting for the effects of natural abundance in stable isotope resolved metabolomics experiments involving ultra-high resolution mass spectrometry. BMC Bioinf. 2010;11(1):139. doi:10.1186/1471-2105-11-139
Nagao T, Yukihira D, Fujimura Y, et al. Power of isotopic fine structure for unambiguous determination of metabolite elemental compositions: In silico evaluation and metabolomic application. Anal Chim Acta. 2014;813:70-76. doi:10.1016/j.aca.2014.01.032
Chambers MC, Maclean B, Burke R, et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat Biotechnol. 2012;30(10):918-920. doi:10.1038/nbt.2377
Information, N. C. f. B. PubChem Compound Summary for CID 5460653, Proton. https://pubchem.ncbi.nlm.nih.gov/compound/Proton (accessed November 15).
Consortium TU. UniProt: The universal protein knowledgebase in 2021. Nucleic Acids Res. 2020;49(D1):D480-D489.
Vandermarliere E, Mueller M, Martens L. Getting intimate with trypsin, the leading protease in proteomics. Mass Spectrom Rev. 2013;32(6):453-465. doi:10.1002/mas.21376
Dittwald P, Claesen J, Burzykowski T, Valkenborg D, Gambin A. BRAIN: A universal tool for high-throughput calculations of the isotopic distribution for mass spectrometry. Anal Chem. 2013;85(4):1991-1994. doi:10.1021/ac303439m
Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery: San Francisco, California, USA; 2016: pp 785-794.
Bühlmann P, Hothorn T. Boosting algorithms: Regularization, prediction and model fitting. Stat Sci. 2007;22(4):477-505.
Freund Y, Schapire RE. Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on International Conference on Machine Learning, Morgan Kaufmann Publishers Inc.: Bari, Italy; 1996: pp 148-156.
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159-174. doi:10.2307/2529310
Senko MW, Beu SC, McLaffertycor FW. Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions. J Am Soc Mass Spectrom. 1995;6(4):229-233. doi:10.1016/1044-0305(95)00017-8
Pearson KX. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. LondEdinbDublPhilMag J Sci. 1900;50(302):157-175.
Claesen J, Valkenborg D, Burzykowski T. Predicting the number of sulfur atoms in peptides and small proteins based on the observed aggregated isotope distribution. Rapid Commun Mass Spectrom. 2021;35(19):e9162. doi:10.1002/rcm.9162
Dittwald P, Vu TN, Harris GA, et al. Towards automated discrimination of lipids versus peptides from full scan mass spectra. EuPA Open Proteom. 2014;4:87-100. doi:10.1016/j.euprot.2014.05.002
Kaufmann A, Walker S. Comparison of linear intrascan and interscan dynamic ranges of Orbitrap and ion-mobility time-of-flight mass spectrometers. Rapid Commun Mass Spectrom. 2017;31(22):1915-1926. doi:10.1002/rcm.7981
Makarov A, Denisov E, Lange O, Horning S. Dynamic range of mass accuracy in LTQ Orbitrap hybrid mass spectrometer. J Am Soc Mass Spectrom. 2006;17(7):977-982. doi:10.1016/j.jasms.2006.03.006

Auteurs

Annelies Agten (A)

Uhasselt, Data Science Institute (DSI), Agoralaan, Diepenbeek, Belgium.

Jurgen Claesen (J)

Epidemiology and Data Science, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands.

Tomasz Burzykowski (T)

Uhasselt, Data Science Institute (DSI), Agoralaan, Diepenbeek, Belgium.
Department of Statistics and Medical Informatics, Medical University of Bialystok, Bialystok, Poland.

Dirk Valkenborg (D)

Uhasselt, Data Science Institute (DSI), Agoralaan, Diepenbeek, Belgium.

Articles similaires

Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
Animals Huntington Disease Mitochondria Neurons Mice
Humans Breast Neoplasms Female Mass Spectrometry Adipose Tissue
Humans Computational Biology ROC Curve Algorithms Proteins

Classifications MeSH