Coherent Blending of Biophysics-Based Knowledge with Bayesian Neural Networks for Robust Protein Property Prediction.

Bayesian methodology biophysical models deep learning machine learning protein engineering uncertainty quantification

Journal

ACS synthetic biology
ISSN: 2161-5063
Titre abrégé: ACS Synth Biol
Pays: United States
ID NLM: 101575075

Informations de publication

Date de publication:
17 Nov 2023
Historique:
pubmed: 27 10 2023
medline: 27 10 2023
entrez: 27 10 2023
Statut: ppublish

Résumé

Predicting properties of proteins is of interest for basic biological understanding and protein engineering alike. Increasingly, machine learning (ML) approaches are being used for this task. However, the accuracy of such ML models typically degrades as test proteins stray further from the training data distribution. On the other hand, models that are more data-free, such as biophysics-based models, are typically uniformly accurate over all of the protein space, even if inferior for test points close to the training distribution. Consequently, being able to cohesively blend these two types of information within one model, as appropriate in different parts of the protein space, will improve overall importance. Herein, we tackle just this problem to yield a simple, practical, and scalable approach that can be easily implemented. In particular, we use a Bayesian formulation to integrate biophysical knowledge into neural networks. However, in doing so, a technical challenge arises: Bayesian neural networks (BNNs) enable the user to specify prior information only on the neural network weight parameters, rather than on the function values given to us from a typical biophysics-based model. Consequently, we devise a principled probabilistic method to overcome this challenge. Our approach yields intuitively pleasing results: predictions rely more heavily on the biophysical prior information when the BNN epistemic uncertainty─uncertainty arising from a lack of training data rather than sensor noise─is large and more heavily on the neural network when the epistemic uncertainty is small. We demonstrate this approach on an illustrative synthetic example, on two examples of protein property prediction (fluorescence and binding), and for generality on one small molecule property prediction problem.

Identifiants

pubmed: 37888887
doi: 10.1021/acssynbio.3c00217
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

3242-3251

Auteurs

Hunter Nisonoff (H)

Center for Computational Biology, University of California, Berkeley, Berkeley, California 94720-3220, United States.

Yixin Wang (Y)

Department of Statistics, University of Michigan, Ann Arbor, Michigan 48109-1107, United States.

Jennifer Listgarten (J)

Center for Computational Biology, University of California, Berkeley, Berkeley, California 94720-3220, United States.
Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, California 94720-1776, United States.

Classifications MeSH