Coherent Blending of Biophysics-Based Knowledge with Bayesian Neural Networks for Robust Protein Property Prediction.
Bayesian methodology
biophysical models
deep learning
machine learning
protein engineering
uncertainty quantification
Journal
ACS synthetic biology
ISSN: 2161-5063
Titre abrégé: ACS Synth Biol
Pays: United States
ID NLM: 101575075
Informations de publication
Date de publication:
17 Nov 2023
17 Nov 2023
Historique:
pubmed:
27
10
2023
medline:
27
10
2023
entrez:
27
10
2023
Statut:
ppublish
Résumé
Predicting properties of proteins is of interest for basic biological understanding and protein engineering alike. Increasingly, machine learning (ML) approaches are being used for this task. However, the accuracy of such ML models typically degrades as test proteins stray further from the training data distribution. On the other hand, models that are more data-free, such as biophysics-based models, are typically uniformly accurate over all of the protein space, even if inferior for test points close to the training distribution. Consequently, being able to cohesively blend these two types of information within one model, as appropriate in different parts of the protein space, will improve overall importance. Herein, we tackle just this problem to yield a simple, practical, and scalable approach that can be easily implemented. In particular, we use a Bayesian formulation to integrate biophysical knowledge into neural networks. However, in doing so, a technical challenge arises: Bayesian neural networks (BNNs) enable the user to specify prior information only on the neural network weight parameters, rather than on the function values given to us from a typical biophysics-based model. Consequently, we devise a principled probabilistic method to overcome this challenge. Our approach yields intuitively pleasing results: predictions rely more heavily on the biophysical prior information when the BNN epistemic uncertainty─uncertainty arising from a lack of training data rather than sensor noise─is large and more heavily on the neural network when the epistemic uncertainty is small. We demonstrate this approach on an illustrative synthetic example, on two examples of protein property prediction (fluorescence and binding), and for generality on one small molecule property prediction problem.
Identifiants
pubmed: 37888887
doi: 10.1021/acssynbio.3c00217
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM