A novel statistical method for decontaminating T-cell receptor sequencing data.
Bayesian model
Contamination detection
TCR sequencing
Journal
Briefings in bioinformatics
ISSN: 1477-4054
Titre abrégé: Brief Bioinform
Pays: England
ID NLM: 100912837
Informations de publication
Date de publication:
20 07 2023
20 07 2023
Historique:
received:
29
12
2022
revised:
16
05
2023
accepted:
30
05
2023
pmc-release:
19
06
2024
medline:
24
7
2023
pubmed:
20
6
2023
entrez:
20
6
2023
Statut:
ppublish
Résumé
The T-cell receptor (TCR) repertoire is highly diverse among the population and plays an essential role in initiating multiple immune processes. TCR sequencing (TCR-seq) has been developed to profile the T cell repertoire. Similar to other high-throughput experiments, contamination can happen during several steps of TCR-seq, including sample collection, preparation and sequencing. Such contamination creates artifacts in the data, leading to inaccurate or even biased results. Most existing methods assume 'clean' TCR-seq data as the starting point with no ability to handle data contamination. Here, we develop a novel statistical model to systematically detect and remove contamination in TCR-seq data. We summarize the observed contamination into two sources, pairwise and cross-cohort. For both sources, we provide visualizations and summary statistics to help users assess the severity of the contamination. Incorporating prior information from 14 existing TCR-seq datasets with minimum contamination, we develop a straightforward Bayesian model to statistically identify contaminated samples. We further provide strategies for removing the impacted sequences to allow for downstream analysis, thus avoiding any need to repeat experiments. Our proposed model shows robustness in contamination detection compared with a few off-the-shelf detection methods in simulation studies. We illustrate the use of our proposed method on two TCR-seq datasets generated locally.
Identifiants
pubmed: 37337757
pii: 7202085
doi: 10.1093/bib/bbad230
pmc: PMC10359082
pii:
doi:
Substances chimiques
Receptors, Antigen, T-Cell
0
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : NCI NIH HHS
ID : R01 CA234629
Pays : United States
Organisme : NCI NIH HHS
ID : R03 CA270725
Pays : United States
Informations de copyright
© The Author(s) 2023. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Références
Nat Commun. 2020 Aug 14;11(1):4089
pubmed: 32796830
Nature. 2021 Aug;596(7870):126-132
pubmed: 34290408
Nat Med. 2019 Aug;25(8):1251-1259
pubmed: 31359002
Psychometrika. 1967 Sep;32(3):241-54
pubmed: 5234703
Nat Med. 2020 Sep;26(9):1468-1479
pubmed: 32778827
Nat Med. 2018 Dec;24(12):1845-1851
pubmed: 30397353
Genomics. 2021 Mar;113(2):456-462
pubmed: 33383142
Annu Rev Immunol. 1989;7:175-207
pubmed: 2653369
Cancer Immunol Res. 2016 May;4(5):412-418
pubmed: 26968205
Cell Rep Med. 2021 Oct 19;2(10):100411
pubmed: 34755131
J Exp Med. 2015 Sep 21;212(10):1663-77
pubmed: 26371186
J Clin Invest. 2021 Apr 15;131(8):
pubmed: 33630757
Science. 1999 Oct 29;286(5441):958-61
pubmed: 10542151
Nat Commun. 2021 May 11;12(1):2722
pubmed: 33976164
J Pathol. 2013 Dec;231(4):433-440
pubmed: 24027095
Cancer Immunol Res. 2017 Feb;5(2):137-147
pubmed: 28093446
Cancer Discov. 2017 Oct;7(10):1088-1097
pubmed: 28733428
BMC Biotechnol. 2017 Jul 10;17(1):61
pubmed: 28693542
PLoS One. 2018 Jan 31;13(1):e0178167
pubmed: 29385144
Cancer Immunol Res. 2020 Dec;8(12):1496-1507
pubmed: 32967912