Three-Dimensional Convolutional Neural Networks and a Cross-Docked Data Set for Structure-Based Drug Design.


Journal

Journal of chemical information and modeling
ISSN: 1549-960X
Titre abrégé: J Chem Inf Model
Pays: United States
ID NLM: 101230060

Informations de publication

Date de publication:
28 09 2020
Historique:
pubmed: 1 9 2020
medline: 22 6 2021
entrez: 1 9 2020
Statut: ppublish

Résumé

One of the main challenges in drug discovery is predicting protein-ligand binding affinity. Recently, machine learning approaches have made substantial progress on this task. However, current methods of model evaluation are overly optimistic in measuring generalization to new targets, and there does not exist a standard data set of sufficient size to compare performance between models. We present a new data set for structure-based machine learning, the CrossDocked2020 set, with 22.5 million poses of ligands docked into multiple similar binding pockets across the Protein Data Bank, and perform a comprehensive evaluation of grid-based convolutional neural network (CNN) models on this data set. We also demonstrate how the partitioning of the training data and test data can impact the results of models trained with the PDBbind data set, how performance improves by adding more lower-quality training data, and how training with docked poses imparts pose sensitivity to the predicted affinity of a complex. Our best performing model, an ensemble of five densely connected CNNs, achieves a root mean squared error of 1.42 and Pearson

Identifiants

pubmed: 32865404
doi: 10.1021/acs.jcim.0c00411
pmc: PMC8902699
mid: NIHMS1784597
doi:

Substances chimiques

Ligands 0

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

4200-4215

Subventions

Organisme : NIGMS NIH HHS
ID : R01 GM108340
Pays : United States

Références

Bioinformatics. 2018 Nov 1;34(21):3666-3674
pubmed: 29757353
Wiley Interdiscip Rev Comput Mol Sci. 2015 Nov-Dec;5(6):405-424
pubmed: 27110292
J Chem Inf Model. 2018 Feb 26;58(2):287-296
pubmed: 29309725
AAPS J. 2012 Mar;14(1):133-41
pubmed: 22281989
J Chem Inf Model. 2011 Aug 22;51(8):1739-41
pubmed: 21591735
Bioinformatics. 2020 Feb 1;36(3):758-764
pubmed: 31598630
Nat Rev Drug Discov. 2004 Nov;3(11):935-49
pubmed: 15520816
J Mol Recognit. 2013 May;26(5):215-39
pubmed: 23526775
Bioinformatics. 2018 Sep 1;34(17):i821-i829
pubmed: 30423097
J Chem Inf Model. 2018 Jan 22;58(1):119-133
pubmed: 29190087
J Chem Inf Model. 2013 Aug 26;53(8):1853-70
pubmed: 23548044
Proteins. 2007 Feb 1;66(2):399-421
pubmed: 17096427
PLoS Comput Biol. 2018 Jan 8;14(1):e1005929
pubmed: 29309403
Acc Chem Res. 2017 Feb 21;50(2):302-309
pubmed: 28182403
J Chem Theory Comput. 2016 Oct 11;12(10):5215-5225
pubmed: 27580382
BMC Bioinformatics. 2016 Sep 22;17(Suppl 11):308
pubmed: 28185549
J Med Chem. 2012 Jul 26;55(14):6582-94
pubmed: 22716043
J Med Chem. 2004 Mar 25;47(7):1739-49
pubmed: 15027865
J Chem Inf Model. 2011 Nov 28;51(11):2897-903
pubmed: 22017367
J Comput Aided Mol Des. 2016 Sep;30(9):695-706
pubmed: 27573981
Bioinformatics. 2014 Sep 15;30(18):2681-3
pubmed: 24849577
J Phys Chem B. 1998 Apr 30;102(18):3586-616
pubmed: 24889800
PLoS One. 2019 Aug 20;14(8):e0220113
pubmed: 31430292
J Med Chem. 2010 Dec 23;53(24):8461-7
pubmed: 20929257
J Chem Inf Model. 2013 Aug 26;53(8):1893-904
pubmed: 23379370
J Comput Chem. 2017 Jan 30;38(3):169-177
pubmed: 27859414
J Chem Inf Model. 2018 Nov 26;58(11):2319-2330
pubmed: 30273487
J Comput Chem. 2010 Jan 30;31(2):455-61
pubmed: 19499576
J Chem Inf Model. 2020 Sep 28;60(9):4263-4273
pubmed: 32282202
J Chem Inf Model. 2018 May 29;58(5):916-932
pubmed: 29698607
J Chem Inf Model. 2019 Apr 22;59(4):1645-1657
pubmed: 30730731
J Biol Chem. 2018 Feb 16;293(7):2606-2616
pubmed: 29282288
J Mol Biol. 2000 Jan 14;295(2):337-56
pubmed: 10623530
Nucleic Acids Res. 2012 Jan;40(Database issue):D535-40
pubmed: 22080553
J Chem Inf Model. 2010 Nov 22;50(11):1961-9
pubmed: 20936880
J Chem Inf Model. 2020 Sep 28;60(9):4170-4179
pubmed: 32077698
J Chem Inf Model. 2019 Mar 25;59(3):947-961
pubmed: 30835112
Bioinformatics. 2010 May 1;26(9):1169-75
pubmed: 20236947
Acc Chem Res. 2002 Jun;35(6):430-7
pubmed: 12069628
J Chem Inf Model. 2017 Apr 24;57(4):942-957
pubmed: 28368587
J Chem Inf Model. 2019 Sep 23;59(9):3981-3988
pubmed: 31443612
Curr Protoc Chem Biol. 2017 Sep 14;9(3):196-212
pubmed: 28910858
Methods. 2015 Jan;71:146-57
pubmed: 25481478
J Chem Inf Model. 2020 Mar 23;60(3):1079-1084
pubmed: 32049525
J Chem Inf Model. 2014 Jun 23;54(6):1700-16
pubmed: 24716849
J Comput Aided Mol Des. 2008 Mar-Apr;22(3-4):147-59
pubmed: 18074107
Molecules. 2015 Jun 12;20(6):10947-62
pubmed: 26076113
Bioinformatics. 2010 May 1;26(9):1160-8
pubmed: 20305268
J Comput Aided Mol Des. 2002 Jan;16(1):11-26
pubmed: 12197663
Protein Sci. 2020 Jan;29(1):298-305
pubmed: 31721338
Curr Top Med Chem. 2017;17(20):2235-2259
pubmed: 28240180
Bioinformatics. 2019 Apr 15;35(8):1334-1341
pubmed: 30202917

Auteurs

Paul G Francoeur (PG)

Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States.

Tomohide Masuda (T)

Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States.

Jocelyn Sunseri (J)

Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States.

Andrew Jia (A)

Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States.

Richard B Iovanisci (RB)

Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States.

Ian Snyder (I)

Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States.

David R Koes (DR)

Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States.

Articles similaires

Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
Alzheimer Disease Humans Regression Analysis Quantitative Structure-Activity Relationship Drug Design

Unsupervised learning for real-time and continuous gait phase detection.

Dollaporn Anopas, Yodchanan Wongsawat, Jetsada Arnin
1.00
Humans Gait Neural Networks, Computer Unsupervised Machine Learning Walking
Humans Shoulder Fractures Tomography, X-Ray Computed Neural Networks, Computer Female

Classifications MeSH