High-quality wild barley genome assemblies and annotation with Nanopore long reads and Hi-C sequencing data.


Journal

Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192

Informations de publication

Date de publication:
10 08 2023
Historique:
received: 21 04 2023
accepted: 31 07 2023
medline: 14 8 2023
pubmed: 11 8 2023
entrez: 10 8 2023
Statut: epublish

Résumé

Wild barley, from "Evolution Canyon (EC)" in Mount Carmel, Israel, are ideal models for cereal chromosome evolution studies. Here, the wild barley EC_S1 is from the south slope with higher daily temperatures and drought, while EC_N1 is from the north slope with a cooler climate and higher relative humidity, which results in a differentiated selection due to contrasting environments. We assembled a 5.03 Gb genome with contig N50 of 3.53 Mb for wild barley EC_S1 and a 5.05 Gb genome with contig N50 of 3.45 Mb for EC_N1 using 145 Gb and 160.0 Gb Illumina sequencing data, 295.6 Gb and 285.35 Gb Nanopore sequencing data and 555.1 Gb and 514.5 Gb Hi-C sequencing data, respectively. BUSCOs and CEGMA evaluation suggested highly complete assemblies. Using full-length transcriptome data, we predicted 39,179 and 38,373 high-confidence genes in EC_S1 and EC_N1, in which 93.6% and 95.2% were functionally annotated, respectively. We annotated repetitive elements and non-coding RNAs. These two wild barley genome assemblies will provide a rich gene pool for domesticated barley.

Identifiants

pubmed: 37563167
doi: 10.1038/s41597-023-02434-2
pii: 10.1038/s41597-023-02434-2
pmc: PMC10415357
doi:

Types de publication

Dataset Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

535

Subventions

Organisme : National Natural Science Foundation of China (National Science Foundation of China)
ID : 31471496
Organisme : Grains Research and Development Corporation (Grains Research & Development Corporation)
ID : 9176507

Informations de copyright

© 2023. Springer Nature Limited.

Références

Liu, M. et al. The draft genome of a wild barley genotype reveals its enrichment in genes related to biotic and abiotic stresses compared to cultivated barley. Plant Biotechnol. J. 18, 443–456 (2020).
pubmed: 31314154 doi: 10.1111/pbi.13210
Jonathan, B. & Blattner, F. R. Species-level phylogeny and polyploid relationships in Hordeum (Poaceae) inferred by next-generation sequencing and in silico cloning of multiple nuclear loci. Syst. Biol. 644, 792–808 (2015).
Mayer, K. F. X. et al. Unlocking the barley genome by chromosomal and comparative genomics. Plant Cell. 23, 1249–1263 (2011).
pubmed: 21467582 pmcid: 3101540 doi: 10.1105/tpc.110.082537
Mingcheng, L. et al. Genome sequence of the progenitor of the wheat D genome Aegilops tauschii. Nature. 551, 498–502 (2017).
doi: 10.1038/nature24486
Palmgren, M. G. et al. Are we ready for back-to-nature crop breeding? Trends Plant Sci. 20, 155–164 (2015).
pubmed: 25529373 doi: 10.1016/j.tplants.2014.11.003
Fa, Irbairn, A. The origins and spread of domesticated plants in Southwest Asia and Europe. Environ. Archaeol. 15, 99-100 (2010).
Mascher, M. et al. A chromosome conformation capture ordered sequence of the barley genome. Nature. 544, 426–433 (2017).
doi: 10.1038/nature22043
Zeng, X. Q. et al. The draft genome of Tibetan hulless barley reveals adaptive patterns to the high stressful Tibetan Plateau. P. Natl. Acad. Sci. USA 112, 1095–1100 (2015).
doi: 10.1073/pnas.1423628112
Mayer, K. F. X. et al. A physical, genetic and functional sequence assembly of the barley genome. Nature. 491, 711–716 (2012).
pubmed: 23075845 doi: 10.1038/nature11543
Mascher, M. et al. Long-read sequence assembly: a technical evaluation in barley. Plant Cell. 33, 1888–1906 (2021).
pubmed: 33710295 pmcid: 8290290 doi: 10.1093/plcell/koab077
Dai, F. et al. Assembly and analysis of a qingke reference genome demonstrate its close genetic relation to modern cultivated barley. Plant Biotechnol. J. 16, 760–770 (2018).
pubmed: 28871634 doi: 10.1111/pbi.12826
Jayakodi, M. et al. The barley pan-genome reveals the hidden legacy of mutation breeding. Nature. 588, 284–289 (2020).
pubmed: 33239781 pmcid: 7759462 doi: 10.1038/s41586-020-2947-8
Zhang, W. et al. Genome architecture and diverged selection shaping pattern of genomic differentiation in wild barley. Plant Biotechnol. J. (2022).
Belton, J. M. et al. Hi-C: A comprehensive technique to capture the conformation of genomes. Methods. 58, 268–276 (2012).
pubmed: 22652625 doi: 10.1016/j.ymeth.2012.05.001
Chen, S. et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 34, 884–890 (2018).
doi: 10.1093/bioinformatics/bty560
Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 27, 764–770 (2011).
pubmed: 21217122 pmcid: 3051319 doi: 10.1093/bioinformatics/btr011
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1432 (2020).
pubmed: 32188846 pmcid: 7080791 doi: 10.1038/s41467-020-14998-3
Li, Z. Y. et al. Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph. Brief Funct. Genomics. 11, 25–37 (2012).
pubmed: 22184334 doi: 10.1093/bfgp/elr035
Myers, G. Building fragment assembly string graphs. Bioinformatics. 21, 79–85 (2005).
doi: 10.1093/bioinformatics/bti1114
Vaser, R. et al. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).
pubmed: 28100585 pmcid: 5411768 doi: 10.1101/gr.214270.116
Simao, F. A. et al. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 31, 3210–3212 (2015).
pubmed: 26059717 doi: 10.1093/bioinformatics/btv351
Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 23, 1061–1067 (2007).
pubmed: 17332020 doi: 10.1093/bioinformatics/btm071
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. 1303, 1–3 (2013).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
pubmed: 29750242 pmcid: 6137996 doi: 10.1093/bioinformatics/bty191
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259–270 (2015).
pubmed: 26619908 pmcid: 4665391 doi: 10.1186/s13059-015-0831-x
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 9, 357–354 (2012).
pubmed: 22388286 pmcid: 3322381 doi: 10.1038/nmeth.1923
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
pubmed: 24185095 pmcid: 4117202 doi: 10.1038/nbt.2727
Marçais, G. et al. MUMmer4: A fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).
pubmed: 29373581 pmcid: 5802927 doi: 10.1371/journal.pcbi.1005944
He, W. et al. NGenomeSyn: an easy-to-use and flexible tool for publication-ready visualization of syntenic relationships across multiple genomes. Bioinformatics. 39, 121–122 (2023).
doi: 10.1093/bioinformatics/btad121
Wang, X. W. & Wang, L. GMATA: An integrated software package for genome-scale SSR mining, marker development and viewing. Front. Plant Sci. 7, 1350 (2016).
pubmed: 27679641 pmcid: 5020087
Gary, B. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
doi: 10.1093/nar/27.2.573
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics. 25, 1–14 (2009).
doi: 10.1002/0471250953.bi0410s25
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Mob DNA. 110, 462–467 (2005).
Zhou, Y. et al. Chromosome genome assembly and annotation of the yellowbelly pufferfish with PacBio and Hi-C sequencing data. Sci. Data 6, 267–275 (2019).
pubmed: 31704938 pmcid: 6841922 doi: 10.1038/s41597-019-0279-z
Keilwagen, J. et al. GeMoMa: homology-based gene prediction utilizing intron position conservation and RNA-seq data. Methods Mol. Biol. 1962, 161–177 (2019).
pubmed: 31020559 doi: 10.1007/978-1-4939-9173-0_9
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29, 15–21 (2012).
pubmed: 23104886 pmcid: 3530905 doi: 10.1093/bioinformatics/bts635
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
pubmed: 25690850 pmcid: 4643835 doi: 10.1038/nbt.3122
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).
pubmed: 14500829 pmcid: 206470 doi: 10.1093/nar/gkg770
Mario, S. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, 435–439 (2006).
doi: 10.1093/nar/gkl200
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
pubmed: 18190707 pmcid: 2395244 doi: 10.1186/gb-2008-9-1-r7
TransposonPSI. http://transposonpsi.sourceforge.net/ .
Bairoch, A. The swiss-prot protein sequence database user manual. Nucleic Acids Res. 28, 45–48 (2000).
pubmed: 10592178 pmcid: 102476 doi: 10.1093/nar/28.1.45
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics. 10, 421–430 (2009).
pubmed: 20003500 pmcid: 2803857 doi: 10.1186/1471-2105-10-421
Kanehisa, M. et al. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J. Mol. Biol. 428, 726–731 (2016).
pubmed: 26585406 doi: 10.1016/j.jmb.2015.11.006
Tatusov, R. L. et al. The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 29, 22–28 (2001).
pubmed: 11125040 pmcid: 29819 doi: 10.1093/nar/29.1.22
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
pubmed: 10802651 pmcid: 3037419 doi: 10.1038/75556
Zdobnov, E. M. & Rolf, A. InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847–848 (2001).
pubmed: 11590104 doi: 10.1093/bioinformatics/17.9.847
Altschul, S. F. et al. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
pubmed: 2231712 doi: 10.1016/S0022-2836(05)80360-2
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
pubmed: 9023104 pmcid: 146525 doi: 10.1093/nar/25.5.955
Nawrocki, E. P. et al. Infernal 1.0: inference of RNA alignments. Bioinformatics. 25, 1335–1337 (2009).
pubmed: 19307242 pmcid: 2732312 doi: 10.1093/bioinformatics/btp157
Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33, 121–124 (2005).
doi: 10.1093/nar/gki081
Karin, L. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35, 3100–3108 (2007).
doi: 10.1093/nar/gkm160
NCBI Assembly https://identifiers.org/ncbi/insdc.gca:GCA_029782615.1 (2023).
NCBI Assembly https://identifiers.org/ncbi/insdc.gca:GCA_029783385.1 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP329205 (2023).
Pan, R. Wild barley genome annotation. Figshare https://doi.org/10.6084/m9.figshare.23501529.v1 (2023).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25, 2078–2079 (2009).
pubmed: 19505943 pmcid: 2723002 doi: 10.1093/bioinformatics/btp352
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 27, 2987–2993 (2011).
pubmed: 21903627 pmcid: 3198575 doi: 10.1093/bioinformatics/btr509

Auteurs

Rui Pan (R)

Research Center of Crop Stresses Resistance Technologies, Yangtze University, Jingzhou, 434025, China.

Haifei Hu (H)

Western Crop Genetics Alliance, Western Australian State Agricultural Biotechnology Centre, College of Science, Health, Engineering and Education, Murdoch University, Murdoch, WA, 6155, Australia.
Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High-Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangzhou, 510640, China.

Yuhui Xiao (Y)

Grandomics Biotechnology Co., Ltd, Wuhan, 430076, China.

Le Xu (L)

Research Center of Crop Stresses Resistance Technologies, Yangtze University, Jingzhou, 434025, China.
Hubei Collaborative Innovation Centre for Grain Industry, Yangtze University, Jingzhou, 434025, China.

Yanhao Xu (Y)

Research Center of Crop Stresses Resistance Technologies, Yangtze University, Jingzhou, 434025, China.
Hubei Collaborative Innovation Centre for Grain Industry, Yangtze University, Jingzhou, 434025, China.

Kai Ouyang (K)

Grandomics Biotechnology Co., Ltd, Wuhan, 430076, China.

Chengdao Li (C)

Western Crop Genetics Alliance, Western Australian State Agricultural Biotechnology Centre, College of Science, Health, Engineering and Education, Murdoch University, Murdoch, WA, 6155, Australia.
Department of Primary Industries and Regional Development, South Perth, WA, 6155, Australia.

Tianhua He (T)

Western Crop Genetics Alliance, Western Australian State Agricultural Biotechnology Centre, College of Science, Health, Engineering and Education, Murdoch University, Murdoch, WA, 6155, Australia. tianhua.he@murdoch.edu.au.

Wenying Zhang (W)

Research Center of Crop Stresses Resistance Technologies, Yangtze University, Jingzhou, 434025, China. wyzhang@yangtzeu.edu.cn.
MARA Key Laboratory of Sustainable Crop Production in the Middle Reaches of the Yangtze River (Co-construction by Ministry and Province), Yangtze University, Jingzhou, 434025, China. wyzhang@yangtzeu.edu.cn.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing
Animals Hemiptera Insect Proteins Phylogeny Insecticides
Amaryllidaceae Alkaloids Lycoris NADPH-Ferrihemoprotein Reductase Gene Expression Regulation, Plant Plant Proteins
Drought Resistance Gene Expression Profiling Gene Expression Regulation, Plant Gossypium Multigene Family

Classifications MeSH