A comprehensive framework for detecting copy number variants from single nucleotide polymorphism data: 'rCNV', a versatile r package for paralogue and CNV detection.
CNVs
GBS
R statistics
SNPs
paralogues
Journal
Molecular ecology resources
ISSN: 1755-0998
Titre abrégé: Mol Ecol Resour
Pays: England
ID NLM: 101465604
Informations de publication
Date de publication:
Nov 2023
Nov 2023
Historique:
revised:
04
07
2023
received:
17
10
2022
accepted:
07
07
2023
medline:
4
10
2023
pubmed:
29
7
2023
entrez:
29
7
2023
Statut:
ppublish
Résumé
Recent studies have highlighted the significant role of copy number variants (CNVs) in phenotypic diversity, environmental adaptation and species divergence across eukaryotes. The presence of CNVs also has the potential to introduce genotyping biases, which can pose challenges to accurate population and quantitative genetic analyses. However, detecting CNVs in genomes, particularly in non-model organisms, presents a formidable challenge. To address this issue, we have developed a statistical framework and an accompanying r software package that leverage allelic-read depth from single nucleotide polymorphism (SNP) data for accurate CNV detection. Our framework capitalises on two key principles. First, it exploits the distribution of allelic-read depth ratios in heterozygotes for individual SNPs by comparing it against an expected distribution based on binomial sampling. Second, it identifies SNPs exhibiting an apparent excess of heterozygotes under Hardy-Weinberg equilibrium. By employing multiple statistical tests, our method not only enhances sensitivity to sampling effects but also effectively addresses reference biases, resulting in optimised SNP classification. Our framework is compatible with various NGS technologies (e.g. RADseq, Exome-capture). This versatility enables CNV calling from genomes of diverse complexities. To streamline the analysis process, we have implemented our framework in the user-friendly r package 'rCNV', which automates the entire workflow seamlessly. We trained our models using simulated data and validated their performance on four datasets derived from different sequencing technologies, including RADseq (Chinook salmon-Oncorhynchus tshawytscha), Rapture (American lobster-Homarus americanus), Exome-capture (Norway spruce-Picea abies) and WGS (Malaria mosquito-Anopheles gambiae).
Identifiants
pubmed: 37515483
doi: 10.1111/1755-0998.13843
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
1772-1789Informations de copyright
© 2023 The Authors. Molecular Ecology Resources published by John Wiley & Sons Ltd.
Références
Ali, O. A., O'Rourke, S. M., Amish, S. J., Meek, M. H., Luikart, G., Jeffres, C., & Miller, M. R. (2016). RAD capture (rapture): Flexible and efficient sequence-based genotyping. Genetics, 202(2), 389-400. https://doi.org/10.1534/genetics.115.183665
Assogba, B. S., Milesi, P., Djogbénou, L. S., Berthomieu, A., Makoundou, P., Baba-Moussa, L. S., Fiston-Lavier, A.-S., Belkhir, K., Labbe, P., & Weill, M. (2016). The ace-1 locus is amplified in all resistant Anopheles gambiae mosquitoes: Fitness consequences of homogeneous and heterogeneous duplications. PLoS Biology, 14(12), e2000618. https://doi.org/10.1371/journal.pbio.2000618
Barbitoff, Y. A., Polev, D. E., Glotov, A. S., Serebryakova, E. A., Shcherbakova, I. V., Kiselev, A. M., Kostareva, A. A., Glotov, O. S., & Predeus, A. V. (2020). Systematic dissection of biases in whole-exome and whole-genome sequencing reveals major determinants of coding sequence coverage. Scientific Reports, 10(1), 1-13. https://doi.org/10.1038/s41598-020-59026-y
Borges, M. G., Rocha, C. S., Carvalho, B. S., & Lopes-Cendes, I. (2020). Methodological differences can affect sequencing depth with a possible impact on the accuracy of genetic diagnosis. Genetics and Molecular Biology, 43(2), 1-6. https://doi.org/10.1590/1678-4685-GMB-2019-0270
Catanach, A., Crowhurst, R., Deng, C., David, C., Bernatchez, L., & Wellenreuther, M. (2019). The genomic pool of standing structural variation outnumbers single nucleotide polymorphism by threefold in the marine teleost Chrysophrys auratus. Molecular Ecology, 28(6), 1210-1223. https://doi.org/10.1111/mec.15051
Catchen, J., Hohenlohe, P. A., Bassham, S., Amores, A., & Cresko, W. A. (2013). Stacks: An analysis tool set for population genomics. Molecular Ecology, 22(11), 3124-3140. https://doi.org/10.1111/mec.12354
Cayuela, H., Dorant, Y., Forester, B. R., Jeffries, D. L., Mccaffery, R. M., Eby, L. A., Hossack, B. R., Gippet, J. M. W., Pilliod, D. S., & Chris Funk, W. (2021). Genomic signatures of thermal adaptation are associated with clinal shifts of life history in a broadly distributed frog. Journal of Animal Ecology, 91, 1222-1238. https://doi.org/10.1111/1365-2656.13545
Chain, F. J. J., & Feulner, P. G. D. (2014). Ecological and evolutionary implications of genomic structural variations. Frontiers in Genetics, 5, 125-138. https://doi.org/10.1038/nrg3373
Chen, J., Li, L., Milesi, P., Jansson, G., Berlin, M., Karlsson, B., Aleksic, J., Vendramin, G. G., & Lascoux, M. (2019). Genomic data provide new insights on the demographic history and the extent of recent material transfers in Norway spruce. Evolutionary Applications, 12(8), 1539-1551. https://doi.org/10.1111/eva.12801
Chong, Z., Ruan, J., Gao, M., Zhou, W., Chen, T., Fan, X., Ding, L., Lee, A. Y., Boutros, P., & Chen, J. (2017). novoBreak: Local assembly for breakpoint detection in cancer genomes. Nature Methods, 14(1), 65-67.
Clop, A., Vidal, O., & Amills, M. (2012). Copy number variation in the genomes of domestic animals. Animal Genetics, 43(5), 503-517. https://doi.org/10.1111/j.1365-2052.2012.02317.x
Collins, R. L., Brand, H., Karczewski, K. J., Zhao, X., Alföldi, J., Francioli, L. C., Khera, A. V., Lowther, C., Gauthier, L. D., Wang, H., Watts, N. A., Solomonson, M., O'Donnell-Luria, A., Baumann, A., Munshi, R., Walker, M., Whelan, C. W., Huang, Y., Brookings, T., … Consortium, G. A. D. (2020). A structural variation reference for medical and population genetics. Nature, 581(7809), 444-451. https://doi.org/10.1038/s41586-020-2287-8
DeBolt, S. (2010). Copy number variation shapes genome diversity in Arabidopsis over immediate family generational scales. Genome Biology and Evolution, 2, 441-453. https://doi.org/10.1093/gbe/evq033
Dennis, M. Y., Harshman, L., Nelson, B. J., Penn, O., Cantsilieris, S., Huddleston, J., Antonacci, F., Penewit, K., Denman, L., Raja, A., Baker, C., Mark, K., Malig, M., Janke, N., Espinoza, C., Stessman, H. A. F., Nuttle, X., Hoekzema, K., Lindsay-Graves, T. A., … Eichler, E. E. (2017). The evolution and population diversity of human-specific segmental duplications. Nature Ecology and Evolution, 1(3), 1-10. https://doi.org/10.1038/s41559-016-0069
Djedatin, G., Monat, C., Engelen, S., & Sabot, F. (2017). DuplicationDetector, a light weight tool for duplication detection using NGS data. Current Plant Biology, 9-10, 23-28. https://doi.org/10.1016/j.cpb.2017.07.001
Dorant, Y., Cayuela, H., Wellband, K., Laporte, M., Rougemont, Q., Mérot, C., Normandeau, E., Rochette, R., & Bernatchez, L. (2020a). Copy number variants outperform SNPs to reveal genotype-temperature association in a marine species. Molecular Ecology, 29(24), 4765-4782. https://doi.org/10.1111/mec.15565
Dorant, Y., Cayuela, H., Wellband, K., Laporte, M., Rougemont, Q., Mérot, C., Normandeau, É., Rochette, R., & Bernatchez, L. (2020b). Copy number variants outperform SNPs to reveal genotype-temperature association in a marine species. Dryad, https://doi.org/10.5061/dryad.vt4b8gtnv
Emerson, J. J., Cardoso-Moreira, M., Borevitz, J. O., & Long, M. (2008). Natural selection shapes genome-wide patterns of copy-number polymorphism in Drosophila melanogaster. Science, 320(5883), 1629-1631. https://doi.org/10.1126/science.1158078
Fan, S., & Meyer, A. (2014). Evolution of genomic structural variation and genomic architecture in the adaptive radiations of African cichlid fishes. Frontiers in Genetics, 5, 163.
Faust, G. G., & Hall, I. M. (2012). YAHA: Fast and flexible long-read alignment with optimal breakpoint detection. Bioinformatics, 28(19), 2417-2424. https://doi.org/10.1093/bioinformatics/bts456
Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458), 611-631. https://doi.org/10.1198/016214502760047131
Fraley, C., & Raftery, A. E. (2007). Bayesian regularization for normal mixture estimation and model-based clustering. Journal of Classification, 24(2), 155-181.
Gabrielaite, M., Torp, M. H., Rasmussen, M. S., Andreu-Sánchez, S., Vieira, F. G., Pedersen, C. B., Kinalis, S., Madsen, M. B., Kodama, M., Demircan, G. S., Simonyan, A., Yde, C. W., Olsen, L. R., Marvig, R. L., Østrup, O., Rossing, M., Nielsen, F. C., Winther, O., & Bagger, F. O. (2021). A comparison of tools for copy-number variation detection in germline whole exome and whole genome sequencing data. Cancers, 13(24), 1-21. https://doi.org/10.3390/cancers13246283
Gardina, P. J., Lo, K. C., Lee, W., Cowell, J. K., & Turpaz, Y. (2008). Ploidy status and copy number aberrations in primary glioblastomas defined by integrated analysis of allelic ratios, signal ratios and loss of heterozygosity using 500K SNP mapping arrays. BMC Genomics, 9(1), 489. https://doi.org/10.1186/1471-2164-9-489
Gayral, P., Melo-Ferreira, J., Glémin, S., Bierne, N., Carneiro, M., Nabholz, B., Lourenco, J. M., Alves, P. C., Ballenghien, M., & Faivre, N. (2013). Reference-free population genomics from next-generation transcriptome data and the vertebrate-invertebrate gap. PLoS Genetics, 9(4), e1003457.
Ghosh, M., Sharma, N., Singh, A. K., Gera, M., Pulicherla, K. K., & Jeong, D. K. (2018). Transformation of animal genomics by next-generation sequencing technologies: A decade of challenges and their impact on genetic architecture. Critical Reviews in Biotechnology, 38(8), 1157-1175. https://doi.org/10.1080/07388551.2018.1451819
Holland, P. W. H., Marlétaz, F., Maeso, I., Dunwell, T. L., & Paps, J. (2017). New genes from old: Asymmetric divergence of gene duplicates and the evolution of development. Philosophical Transactions of the Royal Society B: Biological Sciences, 372(1713), 20150480. https://doi.org/10.1098/rstb.2015.0480
Jones, M. R., & Good, J. M. (2016). Targeted capture in evolutionary and ecological genomics. Molecular Ecology, 25(1), 185-202. https://doi.org/10.1111/mec.13304
Kachitvichyanukul, V., & Schmeiser, B. W. (1988). Binomial random variate generation. Communications of the ACM, 31(2), 216-222.
Kondrashov, F. A. (2012). Gene duplication as a mechanism of genomic adaptation to a changing environment. Proceedings of the Royal Society B: Biological Sciences, 279(1749), 5048-5057.
Kosugi, S., Momozawa, Y., Liu, X., Terao, C., Kubo, M., & Kamatani, Y. (2019). Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biology, 20(1), 1-18.
Krumm, N., Sudmant, P. H., Ko, A., O'Roak, B. J., Malig, M., Coe, B. P., Project, N. E. S., Quinlan, A. R., Nickerson, D. A., & Eichler, E. E. (2012). Copy number variation detection and genotyping from exome sequence data. Genome Research, 22(8), 1525-1532.
Larson, W. A., Seeb, L. W., Everett, M. V., Waples, R. K., Templin, W. D., & Seeb, J. E. (2014). Genotyping by sequencing resolves shallow population structure to inform conservation of Chinook salmon (Oncorhynchus tshawytscha). Evolutionary Applications, 7(3), 355-369.
Lauer, S., & Gresham, D. (2019). An evolving view of copy number variants. Current Genetics, 65(6), 1287-1295. https://doi.org/10.1007/s00294-019-00980-0
Lelieveld, S. H., Spielmann, M., Mundlos, S., Veltman, J. A., & Gilissen, C. (2015). Comparison of exome and genome sequencing technologies for the complete capture of protein-coding regions. Human Mutation, 36(8), 815-822. https://doi.org/10.1002/humu.22813
Lenormand, T., Guillemaud, T., Bourguet, D., & Raymond, M. (1998). Appearance and sweep of a gene duplication: Adaptive response and potential for new functions in the mosquito Culex pipiens. Evolution, 52(6), 1705-1712. https://doi.org/10.1111/j.1558-5646.1998.tb02250.x
Limborg, M. T., Larson, W. A., Seeb, L. W., & Seeb, J. E. (2016). Screening of duplicated loci reveals hidden divergence patterns in a complex salmonid genome. Molecular Ecology, 26(17), 4509-4522. https://doi.org/10.1111/mec.14201
Lucek, K., Gompert, Z., & Nosil, P. (2019). The role of structural genomic variants in population differentiation and ecotype formation in Timema cristinae walking sticks. Molecular Ecology, 28(6), 1224-1237. https://doi.org/10.1111/mec.15016
Manel, S., & Holderegger, R. (2013). Ten years of landscape genetics. Trends in Ecology and Evolution, 28(10), 614-621. https://doi.org/10.1016/j.tree.2013.05.012
McKinney, G. J., Seeb, L. W., Larson, W. A., Gomez-Uchida, D., Limborg, M. T., Brieuc, M. S. O., Everett, M. V., Naish, K. A., Waples, R. K., & Seeb, J. E. (2016). An integrated linkage map reveals candidate genes underlying adaptive variation in Chinook salmon (Oncorhynchus tshawytscha). Molecular Ecology Resources, 16(3), 769-783. https://doi.org/10.1111/1755-0998.12479
McKinney, G. J., Waples, R. K., Seeb, L. W., & Seeb, J. E. (2017). Paralogs are revealed by proportion of heterozygotes and deviations in read ratios in genotyping-by-sequencing data from natural populations. Molecular Ecology Resources, 17(4), 656-669. https://doi.org/10.1111/1755-0998.12613
Meynert, A. M., Ansari, M., FitzPatrick, D. R., & Taylor, M. S. (2014). Variant detection sensitivity and biases in whole genome and exome sequencing. BMC Bioinformatics, 15(1), 247. https://doi.org/10.1186/1471-2105-15-247
Nabholz, B., Sarah, G., Sabot, F., Ruiz, M., Adam, H., Nidelet, S., Ghesquière, A., Santoni, S., David, J., & Glémin, S. (2014). Transcriptome population genomics reveals severe bottleneck and domestication cost in the African rice (Oryza glaberrima). Molecular Ecology, 23(9), 2210-2227. https://doi.org/10.1111/mec.12738
Nadukkalam Ravindran, P., Bentzen, P., Bradbury, I. R., & Beiko, R. G. (2018). PMERGE: Computational filtering of paralogous sequences from RAD-seq data. Ecology and Evolution, 8(14), 7002-7013. https://doi.org/10.1002/ece3.4219
Nelson, T. C., Monnahan, P. J., McIntosh, M. K., Anderson, K., MacArthur-Waltz, E., Finseth, F. R., Kelly, J. K., & Fishman, L. (2019). Extreme copy number variation at a tRNA ligase gene affecting phenology and fitness in yellow monkeyflowers. Molecular Ecology, 28(6), 1460-1475. https://doi.org/10.1111/mec.14904
Neves, L. G. (2013). Exome sequencing for high-throughput genomic analysis of trees. PhD, University of Florida.
North, H. L., Caminade, P., Severac, D., Belkhir, K., & Smadja, C. M. (2020). The role of copy-number variation in the reinforcement of sexual isolation between the two European subspecies of the house mouse: CNV and reinforcement in the house mouse. Philosophical Transactions of the Royal Society B: Biological Sciences, 375(1806), 20190540. https://doi.org/10.1098/rstb.2019.0540
Ohno, S. (1970). Evolution by gene duplication. Springer Science & Musiness Media.
Prunier, J., Giguère, I., Ryan, N., Guy, R., Soolanayakanahally, R., Isabel, N., MacKay, J., & Porth, I. (2019). Gene copy number variations involved in balsam poplar (Populus balsamifera L.) adaptive variations. Molecular Ecology, 28(6), 1476-1490. https://doi.org/10.1111/mec.14836
Ranwez, V., Holtz, Y., Sarah, G., Ardisson, M., Santoni, S., Glémin, S., Tavaud-Pirra, M., & David, J. (2013). Disentangling homeologous contigs in Allo-tetraploid assembly: Application to durum wheat. BMC Bioinformatics, 14(15), S15. https://doi.org/10.1186/1471-2105-14-S15-S15
Redon, R., Ishikawa, S., Fitch, K. R., Feuk, L., Perry, G. H., Andrews, T. D., Fiegler, H., Shapero, M. H., Carson, A. R., & Chen, W. (2006). Global variation in copy number in the human genome. Nature, 444(7118), 444-454.
Rochette, N. C., Rivera-Colón, A. G., & Catchen, J. M. (2019). Stacks 2: Analytical methods for paired-end sequencing improve RADseq-based population genomics. Molecular Ecology, 28(21), 4737-4754.
Roose, M. L., & Gottlieb, L. D. (1980). Alcohol dehydrogenase in the diploid plant Stephanomeria exigua (compositae): Gene duplication, mode of inheritance and linkage. Genetics, 95(1), 171-186. https://doi.org/10.1093/genetics/95.1.171
Savolainen, O., Lascoux, M., & Merilä, J. (2013). Ecological genomics of local adaptation. Nature Reviews Genetics, 14(11), 807-820. https://doi.org/10.1038/nrg3522
Spofford, J. B. (1969). Heterosis and the evolution of duplications. The American Naturalist, 103(932), 407-432.
Sudmant, P. H., Huddleston, J., Catacchio, C. R., Malig, M., Hillier, L. W., Baker, C., Mohajeri, K., Kondova, I., Bontrop, R. E., & Persengiev, S. (2013). Evolution and diversity of copy number variation in the great ape lineage. Genome Research, 23(9), 1373-1382.
The-Anopheles-gambiae-1000-Genomes-Consortium. (2017). Ag1000G phase 2 AR1 data release. MalariaGEN.
Verdu, C. F., Guichoux, E., Quevauvillers, S., De Thier, O., Laizet, Y., Delcamp, A., Gévaudant, F., Monty, A., Porté, A. J., Lejeune, P., Lassois, L., & Mariette, S. (2016). Dealing with paralogy in RADseq data: In silico detection and single nucleotide polymorphism validation in Robinia pseudoacacia L. Ecology and Evolution, 6(20), 7323-7333. https://doi.org/10.1002/ece3.2466
Wahlund, S. (1928). Zusammensetzung von Populationen und Korrelationserscheinungen vom Standpunkt der Ererbungslehre aus Betrachtet. Hereditas, 11(1), 65-106. https://doi.org/10.1111/j.1601-5223.1928.tb02483.x
Weir, B. S., & Cockerham, C. C. (1984). Estimating f-statistics for the analysis of population structure. Evolution, 38(6), 1358-1370. https://doi.org/10.1111/j.1558-5646.1984.tb05657.x
Wellenreuther, M., & Bernatchez, L. (2018). Eco-evolutionary genomics of chromosomal inversions. Trends in Ecology & Evolution, 33(6), 427-440. https://doi.org/10.1016/j.tree.2018.04.002
Wellenreuther, M., Mérot, C., Berdan, E., & Bernatchez, L. (2019). Going beyond SNPs: The role of structural genomic variants in adaptive evolution and species diversification. Molecular Ecology, 28(6), 1203-1209. https://doi.org/10.1111/mec.15066
Yang, J., Benyamin, B., McEvoy, B. P., Gordon, S., Henders, A. K., Nyholt, D. R., Madden, P. A., Heath, A. C., Martin, N. G., Montgomery, G. W., Goddard, M. E., & Visscher, P. M. (2010). Common SNPs explain a large proportion of the heritability for human height. Nature Genetics, 42(7), 565-569. https://doi.org/10.1038/ng.608