From Alpha to Zeta: Identifying Variants and Subtypes of SARS-CoV-2 Via Clustering.
clustering
entropy
fitness
genomic surveillance
viral subtypes
viral variants
Journal
Journal of computational biology : a journal of computational molecular cell biology
ISSN: 1557-8666
Titre abrégé: J Comput Biol
Pays: United States
ID NLM: 9433358
Informations de publication
Date de publication:
11 2021
11 2021
Historique:
pubmed:
27
10
2021
medline:
30
11
2021
entrez:
26
10
2021
Statut:
ppublish
Résumé
The availability of millions of SARS-CoV-2 (Severe Acute Respiratory Syndrome-Coronavirus-2) sequences in public databases such as GISAID (Global Initiative on Sharing All Influenza Data) and EMBL-EBI (European Molecular Biology Laboratory-European Bioinformatics Institute) (the United Kingdom) allows a detailed study of the evolution, genomic diversity, and dynamics of a virus such as never before. Here, we identify novel variants and subtypes of SARS-CoV-2 by clustering sequences in adapting methods originally designed for haplotyping intrahost viral populations. We asses our results using clustering entropy-the first time it has been used in this context. Our clustering approach reaches lower entropies compared with other methods, and we are able to boost this even further through gap filling and Monte Carlo-based entropy minimization. Moreover, our method clearly identifies the well-known Alpha variant in the U.K. and GISAID data sets, and is also able to detect the much less represented (<1% of the sequences) Beta (South Africa), Epsilon (California), and Gamma and Zeta (Brazil) variants in the GISAID data set. Finally, we show that each variant identified has high selective fitness, based on the growth rate of its cluster over time. This demonstrates that our clustering approach is a viable alternative for detecting even rare subtypes in very large data sets.
Identifiants
pubmed: 34698508
doi: 10.1089/cmb.2021.0302
pmc: PMC8819513
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
1113-1129Subventions
Organisme : NIBIB NIH HHS
ID : R01 EB025022
Pays : United States
Références
Science. 2021 Aug 6;373(6555):648-654
pubmed: 34210893
JAMA. 2021 Apr 6;325(13):1324-1326
pubmed: 33571356
BMC Bioinformatics. 2015 Nov 04;16:355
pubmed: 26538192
J Infect. 2021 Apr;82(4):e8-e10
pubmed: 33472093
Science. 2021 Feb 12;371(6530):708-712
pubmed: 33419936
Genome Biol. 2016 May 05;17:86
pubmed: 27149953
Clin Infect Dis. 2021 Nov 16;73(10):1945-1946
pubmed: 33566076
In Silico Biol. 2011-2012;11(5-6):263-9
pubmed: 23202427
Nucleic Acids Res. 1990 Oct 25;18(20):6097-100
pubmed: 2172928
BMC Bioinformatics. 2020 Dec 9;21(Suppl 1):413
pubmed: 33297943
J Med Virol. 2021 May;93(5):2566-2568
pubmed: 33506970
Nature. 2021 May;593(7858):266-269
pubmed: 33767447
Nature. 2020 Mar;579(7798):270-273
pubmed: 32015507
IEEE Trans Pattern Anal Mach Intell. 1979 Feb;1(2):224-7
pubmed: 21868852
IEEE/ACM Trans Comput Biol Bioinform. 2014 Jan-Feb;11(1):182-91
pubmed: 26355517
Brief Bioinform. 2021 Jan 18;22(1):96-108
pubmed: 32568371
Nucleic Acids Res. 2021 Sep 27;49(17):e102
pubmed: 34214168
Bioinformatics. 2021 Apr 20;37(3):326-333
pubmed: 32805010
J Comput Biol. 2018 Jul;25(7):637-648
pubmed: 29480740
Clin Infect Dis. 2022 Jan 29;74(2):366-368
pubmed: 33961693
Bioinformatics. 2018 Dec 1;34(23):4121-4123
pubmed: 29790939
Nat Commun. 2021 Aug 9;12(1):4886
pubmed: 34373458
Nucleic Acids Res. 2018 Aug 21;46(14):e83
pubmed: 29718317
Genome Res. 2017 May;27(5):835-848
pubmed: 28396522
Mol Biol Evol. 1993 May;10(3):512-26
pubmed: 8336541
Glob Chall. 2017 Jan 10;1(1):33-46
pubmed: 31565258
JAMA Netw Open. 2020 Aug 3;3(8):e2017521
pubmed: 32804210
IEEE J Biomed Health Inform. 2021 Nov;25(11):4068-4078
pubmed: 34003758
Nat Microbiol. 2020 Nov;5(11):1403-1407
pubmed: 32669681
Nature. 2020 Mar;579(7798):265-269
pubmed: 32015508
MMWR Morb Mortal Wkly Rep. 2021 Jan 22;70(3):95-99
pubmed: 33476315
MMWR Morb Mortal Wkly Rep. 2021 May 14;70(19):712-716
pubmed: 33983915