Extending rnaSPAdes functionality for hybrid transcriptome assembly.

De novo assembly Hybrid assembly Iso-seq Oxford nanopores RNA-Seq Transcriptome assembly Transcriptomics

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
24 Jul 2020
Historique:
received: 14 06 2020
accepted: 18 06 2020
entrez: 25 7 2020
pubmed: 25 7 2020
medline: 28 8 2020
Statut: epublish

Résumé

De novo RNA-Seq assembly is a powerful method for analysing transcriptomes when the reference genome is not available or poorly annotated. However, due to the short length of Illumina reads it is usually impossible to reconstruct complete sequences of complex genes and alternative isoforms. Recently emerged possibility to generate long RNA reads, such as PacBio and Oxford Nanopores, may dramatically improve the assembly quality, and thus the consecutive analysis. While reference-based tools for analysing long RNA reads were recently developed, there is no established pipeline for de novo assembly of such data. In this work we present a novel method that allows to perform high-quality de novo transcriptome assemblies by combining accuracy and reliability of short reads with exon structure information carried out from long error-prone reads. The algorithm is designed by incorporating existing hybridSPAdes approach into rnaSPAdes pipeline and adapting it for transcriptomic data. To evaluate the benefit of using long RNA reads we selected several datasets containing both Illumina and Iso-seq or Oxford Nanopore Technologies (ONT) reads. Using an existing quality assessment software, we show that hybrid assemblies performed with rnaSPAdes contain more full-length genes and alternative isoforms comparing to the case when only short-read data is used.

Sections du résumé

BACKGROUND BACKGROUND
De novo RNA-Seq assembly is a powerful method for analysing transcriptomes when the reference genome is not available or poorly annotated. However, due to the short length of Illumina reads it is usually impossible to reconstruct complete sequences of complex genes and alternative isoforms. Recently emerged possibility to generate long RNA reads, such as PacBio and Oxford Nanopores, may dramatically improve the assembly quality, and thus the consecutive analysis. While reference-based tools for analysing long RNA reads were recently developed, there is no established pipeline for de novo assembly of such data.
RESULTS RESULTS
In this work we present a novel method that allows to perform high-quality de novo transcriptome assemblies by combining accuracy and reliability of short reads with exon structure information carried out from long error-prone reads. The algorithm is designed by incorporating existing hybridSPAdes approach into rnaSPAdes pipeline and adapting it for transcriptomic data.
CONCLUSION CONCLUSIONS
To evaluate the benefit of using long RNA reads we selected several datasets containing both Illumina and Iso-seq or Oxford Nanopore Technologies (ONT) reads. Using an existing quality assessment software, we show that hybrid assemblies performed with rnaSPAdes contain more full-length genes and alternative isoforms comparing to the case when only short-read data is used.

Identifiants

pubmed: 32703149
doi: 10.1186/s12859-020-03614-2
pii: 10.1186/s12859-020-03614-2
pmc: PMC7379828
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

302

Références

Gigascience. 2019 Sep 1;8(9):
pubmed: 31494669
Gigascience. 2019 May 1;8(5):
pubmed: 31077315
Nat Biotechnol. 2011 May 15;29(7):644-52
pubmed: 21572440
Bioinformatics. 2018 Jul 1;34(13):2168-2176
pubmed: 29905763
Genome Res. 2017 May;27(5):737-746
pubmed: 28100585
J Comput Biol. 2012 May;19(5):455-77
pubmed: 22506599
Nat Commun. 2017 Jul 19;8:16027
pubmed: 28722025
Bioinformatics. 2016 Jul 15;32(14):2210-2
pubmed: 27153654
Gigascience. 2017 Nov 1;6(11):1-13
pubmed: 29048540
Bioinformatics. 2015 Oct 15;31(20):3262-8
pubmed: 26040456
Bioinformatics. 2014 Jun 15;30(12):i293-301
pubmed: 24931996
Bioinformatics. 2016 Apr 1;32(7):1009-15
pubmed: 26589280
Nat Methods. 2018 Mar;15(3):201-206
pubmed: 29334379
Plant J. 2019 Jan;97(2):296-305
pubmed: 30288819
Bioinformatics. 2014 Aug 1;30(15):2114-20
pubmed: 24695404
Genome Res. 2016 Aug;26(8):1134-44
pubmed: 27252236
Genome Res. 2018 Jul;28(7):1008-1019
pubmed: 29903723
Hepatology. 2019 Sep;70(3):1011-1025
pubmed: 30637779

Auteurs

Andrey D Prjibelski (AD)

Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia. a.przhibelsky@spbu.ru.

Giuseppe D Puglia (GD)

Consiglio Nazionale delle Ricerche, Istituto per i Sistemi Agricoli e Forestali del Mediterraneo, Catania, Italy.

Dmitry Antipov (D)

Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia.

Elena Bushmanova (E)

Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia.

Daniela Giordano (D)

Department of Electrical, Electronics and Computer Engineering, University of Catania, Catania, Italy.

Alla Mikheenko (A)

Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia.

Domenico Vitale (D)

Consiglio Nazionale delle Ricerche, Istituto per i Sistemi Agricoli e Forestali del Mediterraneo, Catania, Italy.

Alla Lapidus (A)

Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH