Extending rnaSPAdes functionality for hybrid transcriptome assembly.
De novo assembly
Hybrid assembly
Iso-seq
Oxford nanopores
RNA-Seq
Transcriptome assembly
Transcriptomics
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
24 Jul 2020
24 Jul 2020
Historique:
received:
14
06
2020
accepted:
18
06
2020
entrez:
25
7
2020
pubmed:
25
7
2020
medline:
28
8
2020
Statut:
epublish
Résumé
De novo RNA-Seq assembly is a powerful method for analysing transcriptomes when the reference genome is not available or poorly annotated. However, due to the short length of Illumina reads it is usually impossible to reconstruct complete sequences of complex genes and alternative isoforms. Recently emerged possibility to generate long RNA reads, such as PacBio and Oxford Nanopores, may dramatically improve the assembly quality, and thus the consecutive analysis. While reference-based tools for analysing long RNA reads were recently developed, there is no established pipeline for de novo assembly of such data. In this work we present a novel method that allows to perform high-quality de novo transcriptome assemblies by combining accuracy and reliability of short reads with exon structure information carried out from long error-prone reads. The algorithm is designed by incorporating existing hybridSPAdes approach into rnaSPAdes pipeline and adapting it for transcriptomic data. To evaluate the benefit of using long RNA reads we selected several datasets containing both Illumina and Iso-seq or Oxford Nanopore Technologies (ONT) reads. Using an existing quality assessment software, we show that hybrid assemblies performed with rnaSPAdes contain more full-length genes and alternative isoforms comparing to the case when only short-read data is used.
Sections du résumé
BACKGROUND
BACKGROUND
De novo RNA-Seq assembly is a powerful method for analysing transcriptomes when the reference genome is not available or poorly annotated. However, due to the short length of Illumina reads it is usually impossible to reconstruct complete sequences of complex genes and alternative isoforms. Recently emerged possibility to generate long RNA reads, such as PacBio and Oxford Nanopores, may dramatically improve the assembly quality, and thus the consecutive analysis. While reference-based tools for analysing long RNA reads were recently developed, there is no established pipeline for de novo assembly of such data.
RESULTS
RESULTS
In this work we present a novel method that allows to perform high-quality de novo transcriptome assemblies by combining accuracy and reliability of short reads with exon structure information carried out from long error-prone reads. The algorithm is designed by incorporating existing hybridSPAdes approach into rnaSPAdes pipeline and adapting it for transcriptomic data.
CONCLUSION
CONCLUSIONS
To evaluate the benefit of using long RNA reads we selected several datasets containing both Illumina and Iso-seq or Oxford Nanopore Technologies (ONT) reads. Using an existing quality assessment software, we show that hybrid assemblies performed with rnaSPAdes contain more full-length genes and alternative isoforms comparing to the case when only short-read data is used.
Identifiants
pubmed: 32703149
doi: 10.1186/s12859-020-03614-2
pii: 10.1186/s12859-020-03614-2
pmc: PMC7379828
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
302Références
Gigascience. 2019 Sep 1;8(9):
pubmed: 31494669
Gigascience. 2019 May 1;8(5):
pubmed: 31077315
Nat Biotechnol. 2011 May 15;29(7):644-52
pubmed: 21572440
Bioinformatics. 2018 Jul 1;34(13):2168-2176
pubmed: 29905763
Genome Res. 2017 May;27(5):737-746
pubmed: 28100585
J Comput Biol. 2012 May;19(5):455-77
pubmed: 22506599
Nat Commun. 2017 Jul 19;8:16027
pubmed: 28722025
Bioinformatics. 2016 Jul 15;32(14):2210-2
pubmed: 27153654
Gigascience. 2017 Nov 1;6(11):1-13
pubmed: 29048540
Bioinformatics. 2015 Oct 15;31(20):3262-8
pubmed: 26040456
Bioinformatics. 2014 Jun 15;30(12):i293-301
pubmed: 24931996
Bioinformatics. 2016 Apr 1;32(7):1009-15
pubmed: 26589280
Nat Methods. 2018 Mar;15(3):201-206
pubmed: 29334379
Plant J. 2019 Jan;97(2):296-305
pubmed: 30288819
Bioinformatics. 2014 Aug 1;30(15):2114-20
pubmed: 24695404
Genome Res. 2016 Aug;26(8):1134-44
pubmed: 27252236
Genome Res. 2018 Jul;28(7):1008-1019
pubmed: 29903723
Hepatology. 2019 Sep;70(3):1011-1025
pubmed: 30637779