High-quality wild barley genome assemblies and annotation with Nanopore long reads and Hi-C sequencing data.
Journal
Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192
Informations de publication
Date de publication:
10 08 2023
10 08 2023
Historique:
received:
21
04
2023
accepted:
31
07
2023
medline:
14
8
2023
pubmed:
11
8
2023
entrez:
10
8
2023
Statut:
epublish
Résumé
Wild barley, from "Evolution Canyon (EC)" in Mount Carmel, Israel, are ideal models for cereal chromosome evolution studies. Here, the wild barley EC_S1 is from the south slope with higher daily temperatures and drought, while EC_N1 is from the north slope with a cooler climate and higher relative humidity, which results in a differentiated selection due to contrasting environments. We assembled a 5.03 Gb genome with contig N50 of 3.53 Mb for wild barley EC_S1 and a 5.05 Gb genome with contig N50 of 3.45 Mb for EC_N1 using 145 Gb and 160.0 Gb Illumina sequencing data, 295.6 Gb and 285.35 Gb Nanopore sequencing data and 555.1 Gb and 514.5 Gb Hi-C sequencing data, respectively. BUSCOs and CEGMA evaluation suggested highly complete assemblies. Using full-length transcriptome data, we predicted 39,179 and 38,373 high-confidence genes in EC_S1 and EC_N1, in which 93.6% and 95.2% were functionally annotated, respectively. We annotated repetitive elements and non-coding RNAs. These two wild barley genome assemblies will provide a rich gene pool for domesticated barley.
Identifiants
pubmed: 37563167
doi: 10.1038/s41597-023-02434-2
pii: 10.1038/s41597-023-02434-2
pmc: PMC10415357
doi:
Types de publication
Dataset
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
535Subventions
Organisme : National Natural Science Foundation of China (National Science Foundation of China)
ID : 31471496
Organisme : Grains Research and Development Corporation (Grains Research & Development Corporation)
ID : 9176507
Informations de copyright
© 2023. Springer Nature Limited.
Références
Liu, M. et al. The draft genome of a wild barley genotype reveals its enrichment in genes related to biotic and abiotic stresses compared to cultivated barley. Plant Biotechnol. J. 18, 443–456 (2020).
pubmed: 31314154
doi: 10.1111/pbi.13210
Jonathan, B. & Blattner, F. R. Species-level phylogeny and polyploid relationships in Hordeum (Poaceae) inferred by next-generation sequencing and in silico cloning of multiple nuclear loci. Syst. Biol. 644, 792–808 (2015).
Mayer, K. F. X. et al. Unlocking the barley genome by chromosomal and comparative genomics. Plant Cell. 23, 1249–1263 (2011).
pubmed: 21467582
pmcid: 3101540
doi: 10.1105/tpc.110.082537
Mingcheng, L. et al. Genome sequence of the progenitor of the wheat D genome Aegilops tauschii. Nature. 551, 498–502 (2017).
doi: 10.1038/nature24486
Palmgren, M. G. et al. Are we ready for back-to-nature crop breeding? Trends Plant Sci. 20, 155–164 (2015).
pubmed: 25529373
doi: 10.1016/j.tplants.2014.11.003
Fa, Irbairn, A. The origins and spread of domesticated plants in Southwest Asia and Europe. Environ. Archaeol. 15, 99-100 (2010).
Mascher, M. et al. A chromosome conformation capture ordered sequence of the barley genome. Nature. 544, 426–433 (2017).
doi: 10.1038/nature22043
Zeng, X. Q. et al. The draft genome of Tibetan hulless barley reveals adaptive patterns to the high stressful Tibetan Plateau. P. Natl. Acad. Sci. USA 112, 1095–1100 (2015).
doi: 10.1073/pnas.1423628112
Mayer, K. F. X. et al. A physical, genetic and functional sequence assembly of the barley genome. Nature. 491, 711–716 (2012).
pubmed: 23075845
doi: 10.1038/nature11543
Mascher, M. et al. Long-read sequence assembly: a technical evaluation in barley. Plant Cell. 33, 1888–1906 (2021).
pubmed: 33710295
pmcid: 8290290
doi: 10.1093/plcell/koab077
Dai, F. et al. Assembly and analysis of a qingke reference genome demonstrate its close genetic relation to modern cultivated barley. Plant Biotechnol. J. 16, 760–770 (2018).
pubmed: 28871634
doi: 10.1111/pbi.12826
Jayakodi, M. et al. The barley pan-genome reveals the hidden legacy of mutation breeding. Nature. 588, 284–289 (2020).
pubmed: 33239781
pmcid: 7759462
doi: 10.1038/s41586-020-2947-8
Zhang, W. et al. Genome architecture and diverged selection shaping pattern of genomic differentiation in wild barley. Plant Biotechnol. J. (2022).
Belton, J. M. et al. Hi-C: A comprehensive technique to capture the conformation of genomes. Methods. 58, 268–276 (2012).
pubmed: 22652625
doi: 10.1016/j.ymeth.2012.05.001
Chen, S. et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 34, 884–890 (2018).
doi: 10.1093/bioinformatics/bty560
Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 27, 764–770 (2011).
pubmed: 21217122
pmcid: 3051319
doi: 10.1093/bioinformatics/btr011
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1432 (2020).
pubmed: 32188846
pmcid: 7080791
doi: 10.1038/s41467-020-14998-3
Li, Z. Y. et al. Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph. Brief Funct. Genomics. 11, 25–37 (2012).
pubmed: 22184334
doi: 10.1093/bfgp/elr035
Myers, G. Building fragment assembly string graphs. Bioinformatics. 21, 79–85 (2005).
doi: 10.1093/bioinformatics/bti1114
Vaser, R. et al. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).
pubmed: 28100585
pmcid: 5411768
doi: 10.1101/gr.214270.116
Simao, F. A. et al. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 31, 3210–3212 (2015).
pubmed: 26059717
doi: 10.1093/bioinformatics/btv351
Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 23, 1061–1067 (2007).
pubmed: 17332020
doi: 10.1093/bioinformatics/btm071
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. 1303, 1–3 (2013).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
pubmed: 29750242
pmcid: 6137996
doi: 10.1093/bioinformatics/bty191
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259–270 (2015).
pubmed: 26619908
pmcid: 4665391
doi: 10.1186/s13059-015-0831-x
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 9, 357–354 (2012).
pubmed: 22388286
pmcid: 3322381
doi: 10.1038/nmeth.1923
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
pubmed: 24185095
pmcid: 4117202
doi: 10.1038/nbt.2727
Marçais, G. et al. MUMmer4: A fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).
pubmed: 29373581
pmcid: 5802927
doi: 10.1371/journal.pcbi.1005944
He, W. et al. NGenomeSyn: an easy-to-use and flexible tool for publication-ready visualization of syntenic relationships across multiple genomes. Bioinformatics. 39, 121–122 (2023).
doi: 10.1093/bioinformatics/btad121
Wang, X. W. & Wang, L. GMATA: An integrated software package for genome-scale SSR mining, marker development and viewing. Front. Plant Sci. 7, 1350 (2016).
pubmed: 27679641
pmcid: 5020087
Gary, B. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
doi: 10.1093/nar/27.2.573
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics. 25, 1–14 (2009).
doi: 10.1002/0471250953.bi0410s25
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Mob DNA. 110, 462–467 (2005).
Zhou, Y. et al. Chromosome genome assembly and annotation of the yellowbelly pufferfish with PacBio and Hi-C sequencing data. Sci. Data 6, 267–275 (2019).
pubmed: 31704938
pmcid: 6841922
doi: 10.1038/s41597-019-0279-z
Keilwagen, J. et al. GeMoMa: homology-based gene prediction utilizing intron position conservation and RNA-seq data. Methods Mol. Biol. 1962, 161–177 (2019).
pubmed: 31020559
doi: 10.1007/978-1-4939-9173-0_9
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29, 15–21 (2012).
pubmed: 23104886
pmcid: 3530905
doi: 10.1093/bioinformatics/bts635
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
pubmed: 25690850
pmcid: 4643835
doi: 10.1038/nbt.3122
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).
pubmed: 14500829
pmcid: 206470
doi: 10.1093/nar/gkg770
Mario, S. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, 435–439 (2006).
doi: 10.1093/nar/gkl200
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
pubmed: 18190707
pmcid: 2395244
doi: 10.1186/gb-2008-9-1-r7
TransposonPSI. http://transposonpsi.sourceforge.net/ .
Bairoch, A. The swiss-prot protein sequence database user manual. Nucleic Acids Res. 28, 45–48 (2000).
pubmed: 10592178
pmcid: 102476
doi: 10.1093/nar/28.1.45
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics. 10, 421–430 (2009).
pubmed: 20003500
pmcid: 2803857
doi: 10.1186/1471-2105-10-421
Kanehisa, M. et al. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J. Mol. Biol. 428, 726–731 (2016).
pubmed: 26585406
doi: 10.1016/j.jmb.2015.11.006
Tatusov, R. L. et al. The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 29, 22–28 (2001).
pubmed: 11125040
pmcid: 29819
doi: 10.1093/nar/29.1.22
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
pubmed: 10802651
pmcid: 3037419
doi: 10.1038/75556
Zdobnov, E. M. & Rolf, A. InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847–848 (2001).
pubmed: 11590104
doi: 10.1093/bioinformatics/17.9.847
Altschul, S. F. et al. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
pubmed: 2231712
doi: 10.1016/S0022-2836(05)80360-2
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
pubmed: 9023104
pmcid: 146525
doi: 10.1093/nar/25.5.955
Nawrocki, E. P. et al. Infernal 1.0: inference of RNA alignments. Bioinformatics. 25, 1335–1337 (2009).
pubmed: 19307242
pmcid: 2732312
doi: 10.1093/bioinformatics/btp157
Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33, 121–124 (2005).
doi: 10.1093/nar/gki081
Karin, L. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35, 3100–3108 (2007).
doi: 10.1093/nar/gkm160
NCBI Assembly https://identifiers.org/ncbi/insdc.gca:GCA_029782615.1 (2023).
NCBI Assembly https://identifiers.org/ncbi/insdc.gca:GCA_029783385.1 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP329205 (2023).
Pan, R. Wild barley genome annotation. Figshare https://doi.org/10.6084/m9.figshare.23501529.v1 (2023).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25, 2078–2079 (2009).
pubmed: 19505943
pmcid: 2723002
doi: 10.1093/bioinformatics/btp352
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 27, 2987–2993 (2011).
pubmed: 21903627
pmcid: 3198575
doi: 10.1093/bioinformatics/btr509