Robust barcoding and identification of Mycobacterium tuberculosis lineages for epidemiological and clinical studies.
Barcoding
Diagnostics
Mycobacteria tuberculosis complex
Profiling
SNPs
Tuberculosis
Journal
Genome medicine
ISSN: 1756-994X
Titre abrégé: Genome Med
Pays: England
ID NLM: 101475844
Informations de publication
Date de publication:
14 12 2020
14 12 2020
Historique:
received:
19
07
2020
accepted:
03
12
2020
entrez:
15
12
2020
pubmed:
16
12
2020
medline:
6
11
2021
Statut:
epublish
Résumé
Tuberculosis, caused by bacteria in the Mycobacterium tuberculosis complex (MTBC), is a major global public health burden. Strain-specific genomic diversity in the known lineages of MTBC is an important factor in pathogenesis that may affect virulence, transmissibility, host response and emergence of drug resistance. Fast and accurate tracking of MTBC strains is therefore crucial for infection control, and our previous work developed a 62-single nucleotide polymorphism (SNP) barcode to inform on the phylogenetic identity of 7 human lineages and 64 sub-lineages. To update this barcode, we analysed whole genome sequencing data from 35,298 MTBC isolates (~ 1 million SNPs) covering 9 main lineages and 3 similar animal-related species (M. tuberculosis var. bovis, M. tuberculosis var. caprae and M. tuberculosis var. orygis). The data was partitioned into training (N = 17,903, 50.7%) and test (N = 17,395, 49.3%) sets and were analysed using an integrated phylogenetic tree and population differentiation (F By constructing a phylogenetic tree on the training MTBC isolates, we characterised 90 lineages or sub-lineages or species, of which 30 are new, and identified 421 robust barcoding mutations, of which a minimal set of 90 was selected that included 20 markers from the 62-SNP barcode. The barcoding SNPs (90 and 421) discriminated perfectly the 86 MTBC isolate (sub-)lineages in the test set and could accurately reconstruct the clades across the combined 35k samples. The validated 90 SNPs can be used for the rapid diagnosis and tracking of MTBC strains to assist public health surveillance and control. To facilitate this, the SNP markers have now been incorporated into the TB-Profiler informatics platform ( https://github.com/jodyphelan/TBProfiler ).
Sections du résumé
BACKGROUND
Tuberculosis, caused by bacteria in the Mycobacterium tuberculosis complex (MTBC), is a major global public health burden. Strain-specific genomic diversity in the known lineages of MTBC is an important factor in pathogenesis that may affect virulence, transmissibility, host response and emergence of drug resistance. Fast and accurate tracking of MTBC strains is therefore crucial for infection control, and our previous work developed a 62-single nucleotide polymorphism (SNP) barcode to inform on the phylogenetic identity of 7 human lineages and 64 sub-lineages.
METHODS
To update this barcode, we analysed whole genome sequencing data from 35,298 MTBC isolates (~ 1 million SNPs) covering 9 main lineages and 3 similar animal-related species (M. tuberculosis var. bovis, M. tuberculosis var. caprae and M. tuberculosis var. orygis). The data was partitioned into training (N = 17,903, 50.7%) and test (N = 17,395, 49.3%) sets and were analysed using an integrated phylogenetic tree and population differentiation (F
RESULTS
By constructing a phylogenetic tree on the training MTBC isolates, we characterised 90 lineages or sub-lineages or species, of which 30 are new, and identified 421 robust barcoding mutations, of which a minimal set of 90 was selected that included 20 markers from the 62-SNP barcode. The barcoding SNPs (90 and 421) discriminated perfectly the 86 MTBC isolate (sub-)lineages in the test set and could accurately reconstruct the clades across the combined 35k samples.
CONCLUSIONS
The validated 90 SNPs can be used for the rapid diagnosis and tracking of MTBC strains to assist public health surveillance and control. To facilitate this, the SNP markers have now been incorporated into the TB-Profiler informatics platform ( https://github.com/jodyphelan/TBProfiler ).
Identifiants
pubmed: 33317631
doi: 10.1186/s13073-020-00817-3
pii: 10.1186/s13073-020-00817-3
pmc: PMC7734807
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
114Subventions
Organisme : Foundation for the National Institutes of Health
ID : D43TW009127
Organisme : Medical Research Council (GB)
ID : MR/N010469/1
Organisme : Medical Research Council
ID : MR/M01360X/1
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/R025576/1
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/N010469/1
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/R013063/1
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/R020973/1
Pays : United Kingdom
Organisme : FIC NIH HHS
ID : D43 TW009127
Pays : United States
Références
Bioinformatics. 2012 Sep 15;28(18):i333-i339
pubmed: 22962449
Nat Genet. 2013 Jul;45(7):784-90
pubmed: 23749189
Mol Biol Evol. 2016 Jun;33(6):1635-8
pubmed: 26921390
Bioinformatics. 2012 Nov 15;28(22):2991-3
pubmed: 23014632
Infect Genet Evol. 2019 Sep;73:337-341
pubmed: 31170529
BMC Genomics. 2016 Feb 29;17:151
pubmed: 26923687
Biomed Res Int. 2014;2014:645802
pubmed: 24527454
Bioinformatics. 2011 Nov 1;27(21):2987-93
pubmed: 21903627
J Comput Biol. 2012 May;19(5):455-77
pubmed: 22506599
Mol Biol Evol. 2015 Jan;32(1):268-74
pubmed: 25371430
mBio. 2017 Jan 17;8(1):
pubmed: 28096490
Genome Med. 2019 Jun 24;11(1):41
pubmed: 31234910
Evolution. 1984 Nov;38(6):1358-1370
pubmed: 28563791
mBio. 2013 Jul 30;4(4):
pubmed: 23900170
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
Emerg Infect Dis. 2019 Mar;25(3):482-488
pubmed: 30789126
EBioMedicine. 2018 Aug;34:131-138
pubmed: 30115606
Sci Rep. 2019 Oct 25;9(1):15343
pubmed: 31653874
BMC Genomics. 2019 Mar 29;20(1):252
pubmed: 30922221
Genome Res. 2010 Sep;20(9):1297-303
pubmed: 20644199
Nat Commun. 2020 Jun 9;11(1):2917
pubmed: 32518235
Bioinformatics. 2014 Aug 1;30(15):2114-20
pubmed: 24695404
Nat Commun. 2014 Sep 01;5:4812
pubmed: 25176035
Semin Immunol. 2014 Dec;26(6):431-44
pubmed: 25453224
Biometrika. 1947;34(1-2):28-35
pubmed: 20287819
Immunol Rev. 2015 Mar;264(1):6-24
pubmed: 25703549
Nucleic Acids Res. 2019 Jul 2;47(W1):W256-W259
pubmed: 30931475
Int J Tuberc Lung Dis. 2019 Sep 1;23(9):972-979
pubmed: 31615603
Am J Hum Genet. 2007 Sep;81(3):559-75
pubmed: 17701901
PLoS One. 2010 Mar 10;5(3):e9490
pubmed: 20224823