Genetic substructure and complex demographic history of South African Bantu speakers.
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
07 04 2021
07 04 2021
Historique:
received:
10
07
2020
accepted:
10
02
2021
entrez:
8
4
2021
pubmed:
9
4
2021
medline:
23
4
2021
Statut:
epublish
Résumé
South Eastern Bantu-speaking (SEB) groups constitute more than 80% of the population in South Africa. Despite clear linguistic and geographic diversity, the genetic differences between these groups have not been systematically investigated. Based on genome-wide data of over 5000 individuals, representing eight major SEB groups, we provide strong evidence for fine-scale population structure that broadly aligns with geographic distribution and is also congruent with linguistic phylogeny (separation of Nguni, Sotho-Tswana and Tsonga speakers). Although differential Khoe-San admixture plays a key role, the structure persists after Khoe-San ancestry-masking. The timing of admixture, levels of sex-biased gene flow and population size dynamics also highlight differences in the demographic histories of individual groups. The comparisons with five Iron Age farmer genomes further support genetic continuity over ~400 years in certain regions of the country. Simulated trait genome-wide association studies further show that the observed population structure could have major implications for biomedical genomics research in South Africa.
Identifiants
pubmed: 33828095
doi: 10.1038/s41467-021-22207-y
pii: 10.1038/s41467-021-22207-y
pmc: PMC8027885
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
2080Subventions
Organisme : Wellcome Trust
ID : 069683/Z/02/Z
Pays : United Kingdom
Organisme : Wellcome Trust
ID : 085477/Z/08/Z
Pays : United Kingdom
Organisme : NHGRI NIH HHS
ID : U54 HG006938
Pays : United States
Organisme : Wellcome Trust
ID : 058893/Z/99/A
Pays : United Kingdom
Organisme : Wellcome Trust
ID : 085477/B/08/Z
Pays : United Kingdom
Références
Deacon, H. J. & Deacon, J. Human Beginnings in South Africa: Uncovering the Secrets of the Stone Age (New Africa Books, 1999).
Wadley, L., Hodgskiss, T. & Grant, M. Implications for complex cognition from the hafting of tools with compound adhesives in the Middle Stone Age, South Africa. Proc. Natl. Acad. Sci. USA 106, 9590–9594 (2009).
pubmed: 19433786
doi: 10.1073/pnas.0900957106
pmcid: 2700998
d’Errico, F. et al. Early evidence of San material culture represented by organic artifacts from Border Cave, South Africa. Proc. Natl. Acad. Sci. USA 109, 13214–13219 (2012).
pubmed: 22847420
doi: 10.1073/pnas.1204213109
pmcid: 3421171
Lander, F. & Russell, T. The archaeological evidence for the appearance of pastoralism and farming in southern Africa. PLoS ONE 13, e0198941 (2018).
pubmed: 29902271
pmcid: 6002040
doi: 10.1371/journal.pone.0198941
Sadr, K. Oxford Handbook of African Archaeology 645–655 (Oxford University Press, 2013).
Smith, A. B. Pastoralism in Africa: Origins and Development Ecology (Hurst & Company, 1992).
Smith, A. B. Origins and spread of pastoralism in Africa. Annu. Rev. Anthropol. 21, 125–141 (1992).
doi: 10.1146/annurev.an.21.100192.001013
Breton, G. et al. Lactase persistence alleles reveal partial East African ancestry of southern African Khoe pastoralists. Curr. Biol. 24, 852–858 (2014).
pubmed: 24704072
doi: 10.1016/j.cub.2014.02.041
Macholdt, E. et al. Tracing pastoralist migrations to southern Africa with lactase persistence alleles. Curr. Biol. 24, 875–879 (2014).
pubmed: 24704073
pmcid: 5102062
doi: 10.1016/j.cub.2014.03.027
Schlebusch, C. M. et al. Southern African ancient genomes estimate modern human divergence to 350,000 to 260,000 years ago. Science 358, 652–655 (2017).
pubmed: 28971970
doi: 10.1126/science.aao6266
Skoglund, P. et al. Reconstructing prehistoric African population structure. Cell 171, 59–71.e21 (2017).
pubmed: 28938123
pmcid: 5679310
doi: 10.1016/j.cell.2017.08.049
Pickrell, J. K. et al. Ancient west Eurasian ancestry in southern and eastern Africa. Proc. Natl. Acad. Sci. USA 111, 2632–2637 (2014).
pubmed: 24550290
doi: 10.1073/pnas.1313787111
pmcid: 3932865
Güldemann, T. & Vossen, R. African Languages: An Introduction (eds Heine, Bernd and Derek Nurse) 99–122 (Cambridge University Press, 2000).
Brenzinger, M. Language and Poverty 37–50 (Cambridge University Press, 2008).
Huffman, T. N. Handbook to the Iron Age: The Archaeology of Pre-colonial Farming Societies in Southern Africa (University of KwaZulu-Natal Press, 2007).
Mitchell, P. & Lane, P. The Oxford Handbook of African Archaeology (OUP Oxford, 2013).
Soodyall, H. The prehistory of Africa: Tracing the lineage of modern man 97–108 (Jonathan Ball Publishers, 2006).
Hammond-Tooke, W. D. Southern Bantu origins: light from kinship terminology. Southern African Humanities 16, 71–78 (2004).
Herbert, R. K. & Huffman, T. N. A new perspective on Bantu expansion and classification: linguistic and archaeological evidence fifty years after Doke. Afr. Stud. 52, 53–76 (1993).
doi: 10.1080/00020189308707778
Petersen, D. C. et al. Complex patterns of genomic admixture within southern Africa. PLoS Genet. 9, e1003309 (2013).
pubmed: 23516368
pmcid: 3597481
doi: 10.1371/journal.pgen.1003309
Schlebusch, C. M. et al. Genomic variation in seven Khoe-San groups reveals adaptation and complex African history. Science 338, 374–379 (2012).
pubmed: 22997136
doi: 10.1126/science.1227721
pmcid: 8978294
Chimusa, E. R. et al. A genomic portrait of haplotype diversity and signatures of selection in indigenous southern African populations. PLoS Genet. 11, e1005052 (2015).
pubmed: 25811879
pmcid: 4374865
doi: 10.1371/journal.pgen.1005052
de Wit, E. et al. Genome-wide analysis of the structure of the South African Coloured Population in the Western Cape. Hum. Genet. 128, 145–153 (2010).
pubmed: 20490549
doi: 10.1007/s00439-010-0836-1
Wentzel, P. J. The Relationship Between Venda and Western Shona (Pretoria: Unisa, 1981).
Jones-Phillipson, R. Affinities between Venda and other Southern Bantu languages (SOAS University of London, 1972).
Herbert, R. K. & Bailey, R. Language in South Africa 50–78 (Cambridge University Press, 2002).
Doke, C. M. The Southern Bantu Languages: Handbook of African Languages. (Routledge, 2017).
Schlebusch, C. M. & Jakobsson, M. Tales of human migration, admixture, and selection in Africa. Annu. Rev. Genomics Hum. Genet. 19, 405–428 (2018).
pubmed: 29727585
doi: 10.1146/annurev-genom-083117-021759
Lane, A. B., Soodyall, H. & Arndt, S. Genetic substructure in South African Bantu‐speakers: evidence from autosomal DNA and Y‐chromosome studies. Am. J. Phys. Anthropol. 119, 175–185 (2002).
pubmed: 12237937
doi: 10.1002/ajpa.10097
May, A. et al. Genetic diversity in black South Africans from Soweto. BMC Genomics 14, 644 (2013).
pubmed: 24059264
pmcid: 3850641
doi: 10.1186/1471-2164-14-644
Choudhury, A. et al. Whole-genome sequencing for an enhanced understanding of genetic variation among South Africans. Nat. Commun. 8, 2062 (2017).
pubmed: 29233967
pmcid: 5727231
doi: 10.1038/s41467-017-00663-9
Gurdasani, D. et al. The African Genome Variation Project shapes medical genetics in Africa. Nature 517, 327–332 (2015).
pubmed: 25470054
doi: 10.1038/nature13997
Bonner, P. L. & Segal, L. Soweto: A History (Maskew Miller Longman, 1998).
Diaz-Papkovich, A., Anderson-Trocmé, L., Ben-Eghan, C. & Gravel, S. UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts. PLoS Genet. 15, e1008432 (2019).
pubmed: 31675358
pmcid: 6853336
doi: 10.1371/journal.pgen.1008432
Lawson, D. J., Hellenthal, G., Myers, S. & Falush, D. Inference of population structure using dense haplotype data. PLoS Genet. 8, e1002453 (2012).
pubmed: 22291602
pmcid: 3266881
doi: 10.1371/journal.pgen.1002453
Semo, A. et al. Along the Indian Ocean Coast: genomic variation in Mozambique provides new insights into the Bantu expansion. Mol. Biol. Evol. 37, 406–416 (2020).
pubmed: 31593238
doi: 10.1093/molbev/msz224
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
pubmed: 19648217
pmcid: 2752134
doi: 10.1101/gr.094052.109
Hellenthal, G. et al. A genetic atlas of human admixture history. Science 343, 747–751 (2014).
pubmed: 24531965
pmcid: 4209567
doi: 10.1126/science.1243518
Busby, G. B. et al. Admixture into and within sub-Saharan Africa. elife 5, e15266 (2016).
Patin, E. et al. Dispersals and genetic adaptation of Bantu-speaking populations in Africa and North America. Science 356, 543–546 (2017).
pubmed: 28473590
doi: 10.1126/science.aal1988
Mitchell, P. Hunter-gatherer archaeology in southern Africa. Before Farming 2002, 1–36 (2002).
doi: 10.3828/bfarm.2002.1.3
Mathebula, M. 800 Years of Tsonga History: 1200–2000 (Sasavona Publishers and Booksellers, 2013).
Loh, P.-R. et al. Inferring admixture histories of human populations using linkage disequilibrium. Genetics 193, 1233–1254 (2013).
pubmed: 23410830
pmcid: 3606100
doi: 10.1534/genetics.112.147330
Salter-Townshend, M. & Myers, S. Fine-scale inference of ancestry segments without prior knowledge of admixing groups. Genetics 212, 869–889 (2019).
pubmed: 31123038
pmcid: 6614886
doi: 10.1534/genetics.119.302139
Pickrell, J. K. et al. The genetic prehistory of southern Africa. Nat. Commun. 3, 1143 (2012).
pubmed: 23072811
doi: 10.1038/ncomms2140
Uren, C. et al. Fine-scale human population structure in Southern Africa reflects ecogeographic boundaries. Genetics 204, 303–314 (2016).
pubmed: 27474727
pmcid: 5012395
doi: 10.1534/genetics.116.187369
Giliomee, H. B. & Mbenga, B. K. Nuwe geskiedenis van Suid-Afrika (Tafelberg, 2007).
Bajić, V. et al. Genetic structure and sex-biased gene flow in the history of southern African populations. Am. J. Phys. Anthropol. 167, 656–671 (2018).
pubmed: 30192370
pmcid: 6667921
doi: 10.1002/ajpa.23694
Schlebusch, C. M. Genetic variation in Khoisan-speaking populations from southern Africa (University of the Witwatersrand Johannesburg, 2010).
Browning, S. R. et al. Ancestry-specific recent effective population size in the Americas. PLoS Genet. 14, e1007385 (2018).
pubmed: 29795556
pmcid: 5967706
doi: 10.1371/journal.pgen.1007385
Huffman, T. N. The archaeology of the Nguni past. Southern African Humanities 16, 79–111 (2004).
Hellwege, J. N. et al. Population stratification in genetic association studies. Curr. Protoc. Hum. Genet. 95, 1.22.1–1.22.23 (2017).
Price, A. L., Zaitlen, N. A., Reich, D. & Patterson, N. New approaches to population stratification in genome-wide association studies. Nat. Rev. Genet. 11, 459–463 (2010).
pubmed: 20548291
pmcid: 2975875
doi: 10.1038/nrg2813
Lawson, D. J. et al. Is population structure in the genetic biobank era irrelevant, a challenge, or an opportunity? Hum. Genet. 139, 23–41 (2020).
pubmed: 31030318
doi: 10.1007/s00439-019-02014-8
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
pubmed: 30445434
doi: 10.1093/nar/gky1120
Ramsay, M. et al. H3Africa AWI-Gen Collaborative Centre: a resource to study the interplay between genomic and environmental risk factors for cardiometabolic diseases in four sub-Saharan African countries. Glob. Health Epidemiol. Genom 1, e20 (2016).
pubmed: 29276616
pmcid: 5732578
doi: 10.1017/gheg.2016.17
Ali, S. A. et al. Genomic and environmental risk factors for cardiometabolic diseases in Africa: methods used for Phase 1 of the AWI-Gen population cross-sectional study. Glob. Health Action 11, 1507133 (2018).
pubmed: 30259792
pmcid: 6161608
doi: 10.1080/16549716.2018.1507133
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
pubmed: 25722852
pmcid: 4342193
doi: 10.1186/s13742-015-0047-8
Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).
pubmed: 20926424
pmcid: 3025716
doi: 10.1093/bioinformatics/btq559
Conomos, M. P. et al. Genetic Diversity and Association Studies in US Hispanic/Latino Populations: Applications in the Hispanic Community Health Study/Study of Latinos. Am. J. Hum. Genet. 98, 165–184 (2016).
pubmed: 26748518
pmcid: 4716704
doi: 10.1016/j.ajhg.2015.12.001
1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Vicente, M., Jakobsson, M., Ebbesen, P. & Schlebusch, C. M. Genetic affinities among Southern Africa hunter-gatherers and the impact of admixing farmer and Herder populations. Mol. Biol. Evol. 36, 1849–1861 (2019).
pubmed: 31288264
pmcid: 6735883
doi: 10.1093/molbev/msz089
Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).
pubmed: 17194218
pmcid: 1713260
doi: 10.1371/journal.pgen.0020190
Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: Molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35, 1547–1549 (2018).
pubmed: 29722887
pmcid: 5967553
doi: 10.1093/molbev/msy096
Grollemund, R. et al. Bantu expansion shows that habitat alters the route and pace of human dispersals. Proc. Natl. Acad. Sci. USA 112, 13296–13301 (2015).
pubmed: 26371302
doi: 10.1073/pnas.1503793112
pmcid: 4629331
Ronquist, F. & Huelsenbeck, J. P. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574 (2003).
pubmed: 12912839
doi: 10.1093/bioinformatics/btg180
Felsenstein, J. Phylogenies from restriction sites: a maximum-likelihood approach. Evolution 46, 159–173 (1992).
pubmed: 28564959
Martin, A. D., Quinn, K. M. & Park, J. H. MCMCpack: Markov Chain Monte Carlo in R. J. Stat. Soft. 42, 1–21 (2011).
Oksanen, J. Vegan: ecological diversity (R Project, 2013).
Behr, A. A., Liu, K. Z., Liu-Fang, G., Nakka, P. & Ramachandran, S. pong: fast analysis and visualization of latent clusters in population genetic data. Bioinformatics 32, 2817–2823 (2016).
pubmed: 27283948
pmcid: 5018373
doi: 10.1093/bioinformatics/btw327
Maples, B. K., Gravel, S., Kenny, E. E. & Bustamante, C. D. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet. 93, 278–288 (2013).
pubmed: 23910464
pmcid: 3738819
doi: 10.1016/j.ajhg.2013.06.020
Delaneau, O., Zagury, J.-F. & Marchini, J. Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6 (2013).
pubmed: 23269371
doi: 10.1038/nmeth.2307
Browning, S. et al. Local ancestry inference in a large US-based Hispanic/Latino study: Hispanic Community Health Study/Study of Latinos (HCHS/SOL). G3 6, 1525–1534 (2016).
pubmed: 27172203
doi: 10.1534/g3.116.028779
pmcid: 4889649
Van Geystelen, A., Decorte, R. & Larmuseau, M. H. D. AMY-tree: an algorithm to use whole genome SNP calling for Y chromosomal phylogenetic applications. BMC Genomics 14, 101 (2013).
pubmed: 23405914
pmcid: 3583733
doi: 10.1186/1471-2164-14-101
Severson, A. L. et al. SNAPPY: Single nucleotide assignment of phylogenetic parameters on the Y chromosome. bioRxiv (2018) https://www.biorxiv.org/content/10.1101/454736v2 .
Weissensteiner, H. et al. HaploGrep 2: mitochondrial haplogroup classification in the era of high-throughput sequencing. Nucleic Acids Res. 44, W58–W63 (2016).
pubmed: 27084951
pmcid: 4987869
doi: 10.1093/nar/gkw233
van Oven, M. & Kayser, M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 30, E386–E394 (2009).
pubmed: 18853457
doi: 10.1002/humu.20921
Goldberg, A. & Rosenberg, N. A. Beyond 2/3 and 1/3: the complex signatures of sex-biased admixture on the X chromosome. Genetics 201, 263–279 (2015).
pubmed: 26209245
pmcid: 4566268
doi: 10.1534/genetics.115.178509
Rishishwar, L. et al. Ancestry, admixture and fitness in Colombian genomes. Sci. Rep. 5, 12376 (2015).
pubmed: 26197429
pmcid: 4508918
doi: 10.1038/srep12376
Browning, B. L. & Browning, S. R. Detecting identity by descent and estimating genotype error rates in sequence data. Am. J. Hum. Genet. 93, 840–851 (2013).
pubmed: 24207118
pmcid: 3824133
doi: 10.1016/j.ajhg.2013.09.014
Chiang, C. W. K., Ralph, P. & Novembre, J. Conflation of short identity-by-descent segments bias their inferred length distribution. G3 6, 1287–1296 (2016).
pubmed: 26935417
doi: 10.1534/g3.116.027581
pmcid: 4856080
Browning, B. L., Zhou, Y. & Browning, S. R. A one-penny imputed genome from next-generation reference panels. Am. J. Hum. Genet. 103, 338–348 (2018).
pubmed: 30100085
pmcid: 6128308
doi: 10.1016/j.ajhg.2018.07.015
Voight, B. F., Kudaravalli, S., Wen, X. & Pritchard, J. K. A map of recent positive selection in the human genome. PLoS Biol 4, e72 (2006).
pubmed: 16494531
pmcid: 1382018
doi: 10.1371/journal.pbio.0040072
Yi, X. et al. Sequencing of fifty human exomes reveals adaptation to high altitude. Science 329, 75–78 (2010).
pubmed: 20595611
pmcid: 3711608
doi: 10.1126/science.1190371
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
pubmed: 21653522
pmcid: 3137218
doi: 10.1093/bioinformatics/btr330