Correcting PCR amplification errors in unique molecular identifiers to generate accurate numbers of sequencing molecules.


Journal

Nature methods
ISSN: 1548-7105
Titre abrégé: Nat Methods
Pays: United States
ID NLM: 101215604

Informations de publication

Date de publication:
Mar 2024
Historique:
received: 08 04 2023
accepted: 04 01 2024
medline: 13 3 2024
pubmed: 6 2 2024
entrez: 5 2 2024
Statut: ppublish

Résumé

Unique molecular identifiers are random oligonucleotide sequences that remove PCR amplification biases. However, the impact that PCR associated sequencing errors have on the accuracy of generating absolute counts of RNA molecules is underappreciated. We show that PCR errors are a source of inaccuracy in both bulk and single-cell sequencing data, and synthesizing unique molecular identifiers using homotrimeric nucleotide blocks provides an error-correcting solution that allows absolute counting of sequenced molecules.

Identifiants

pubmed: 38317008
doi: 10.1038/s41592-024-02168-y
pii: 10.1038/s41592-024-02168-y
pmc: PMC10927542
doi:

Substances chimiques

Nucleotides 0
Oligonucleotides 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

401-405

Subventions

Organisme : RCUK | Medical Research Council (MRC)
ID : MR/V010182/1

Informations de copyright

© 2024. The Author(s).

Références

Hug, H. & Schuler, R. Measurement of the number of molecules of a single mRNA species in a complex mRNA preparation. J. Theor. Biol. 221, 615–624 (2003).
doi: 10.1006/jtbi.2003.3211 pubmed: 12713944
Aird, D. et al. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12, R18 (2011).
doi: 10.1186/gb-2011-12-2-r18 pubmed: 21338519 pmcid: 3188800
Kivioja, T. et al. Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods 9, 72–74 (2011).
doi: 10.1038/nmeth.1778 pubmed: 22101854
Islam, S. et al. Quantitative single-cell RNA-seq with unique molecular identifiers. Nat. Methods 11, 163–166 (2014).
doi: 10.1038/nmeth.2772 pubmed: 24363023
Hagemann-Jensen, M. et al. Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nat. Biotechnol. 38, 708–714 (2020).
doi: 10.1038/s41587-020-0497-0 pubmed: 32518404
Schmitt, M. W. et al. Detection of ultra-rare mutations by next-generation sequencing. Proc. Natl Acad. Sci. USA 109, 14508–14513 (2012).
doi: 10.1073/pnas.1208715109 pubmed: 22853953 pmcid: 3437896
Kukita, Y. et al. High-fidelity target sequencing of individual molecules identified using barcode sequences: de novo detection and absolute quantitation of mutations in plasma cell-free DNA from cancer patients. DNA Res. 22, 269–277 (2015).
doi: 10.1093/dnares/dsv010 pubmed: 26126624 pmcid: 4535617
Peng, X. & Dorman, K. S. Accurate estimation of molecular counts from amplicon sequence data with unique molecular identifiers. Bioinformatics 39, btad002 (2023).
doi: 10.1093/bioinformatics/btad002 pubmed: 36610988 pmcid: 9891248
Shafin, K. et al. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat. Methods 18, 1322–1332 (2021).
doi: 10.1038/s41592-021-01299-w pubmed: 34725481 pmcid: 8571015
You, Y. et al. Identification of cell barcodes from long-read single-cell RNA-seq with BLAZE. Genome Biol. 24, 66 (2023).
Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).
doi: 10.1101/gr.209601.116 pubmed: 28100584 pmcid: 5340976
Volden, R. & Vollmers, C. Single-cell isoform analysis in human immune cells. Genome Biol. 23, 47 (2022).
doi: 10.1186/s13059-022-02615-z pubmed: 35130954 pmcid: 8819920
Philpott, M. et al. Nanopore sequencing of single-cell transcriptomes with scCOLOR-seq. Nat. Biotechnol. 39, 1517–1520 (2021).
Karst, S. M. et al. High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing. Nat. Methods 18, 165–169 (2021).
doi: 10.1038/s41592-020-01041-y pubmed: 33432244
Tsagiopoulou, M. et al. UMIc: a preprocessing method for UMI deduplication and reads correction. Front. Genet. 12, 660366 (2021).
doi: 10.3389/fgene.2021.660366 pubmed: 34122513 pmcid: 8193862
Bose, S. et al. Scalable microfluidics for single-cell RNA printing and sequencing. Genome Biol. 16, 120 (2015).
doi: 10.1186/s13059-015-0684-3 pubmed: 26047807 pmcid: 4487847
Shagin, D. A. et al. A high-throughput assay for quantitative measurement of PCR errors. Sci. Rep. 7, 2718 (2017).
doi: 10.1038/s41598-017-02727-8 pubmed: 28578414 pmcid: 5457411
Potapov, V. & Ong, J. L. Examining sources of error in PCR by single-molecule sequencing. PLoS ONE 12, e0169774 (2017).
doi: 10.1371/journal.pone.0169774 pubmed: 28060945 pmcid: 5218489
Pflug, F. G. & von Haeseler, A. TRUmiCount: correctly counting absolute numbers of molecules using unique molecular identifiers. Bioinformatics 34, 3137–3144 (2018).
doi: 10.1093/bioinformatics/bty283 pubmed: 29672674 pmcid: 6157883
Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
doi: 10.1016/j.cell.2015.05.002 pubmed: 26000488 pmcid: 4481139
Zhu, Y. Y., Machleder, E. M., Chenchik, A., Li, R. & Siebert, P. D. Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. Biotechniques 30, 892–897 (2001).
doi: 10.2144/01304pf02 pubmed: 11314272
Cribbs, A. et al. CGAT-core: a python framework for building scalable, reproducible computational biology workflows [version 1; peer review: 1 approved, 1 approved with reservations]. F1000 Res. 8, 377 (2019).
doi: 10.12688/f1000research.18674.2
FastQC: a quality control tool for high throughput sequence data (Brabham Bioinformatics, 2010).
Ewels, P., Magnusson, M., Lundin, S. & Kaller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).
doi: 10.1093/bioinformatics/btw354 pubmed: 27312411 pmcid: 5039924
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
doi: 10.1093/bioinformatics/btp352 pubmed: 19505943 pmcid: 2723002
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
doi: 10.1093/bioinformatics/btt656 pubmed: 24227677
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
doi: 10.1186/s13059-014-0550-8 pubmed: 25516281 pmcid: 4302049
Cribbs, A. et al. CGAT-core: a python framework for building scalable, reproducible computational biology workflows [version 2; peer review: 1 approved, 1 approved with reservations]. F1000 Res. 8, 377 (2019).
doi: 10.12688/f1000research.18674.2
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
doi: 10.1038/s41587-019-0201-4 pubmed: 31375807 pmcid: 7605509
Liao, Y., Smyth, G. K. & Shi, W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res. 41, e108 (2013).
doi: 10.1093/nar/gkt214 pubmed: 23558742 pmcid: 3664803
Chvatal, V. A greedy heuristic for the set-covering problem. Math. Oper. Res. 4, 233–235 (1979).
doi: 10.1287/moor.4.3.233
Chen, Y. J. et al. Quantifying molecular bias in DNA data storage. Nat. Commun. 11, 3264 (2020).
doi: 10.1038/s41467-020-16958-3 pubmed: 32601272 pmcid: 7324401
Lalam, N. Statistical inference for quantitative polymerase chain reaction using a hidden Markov model: a Bayesian approach. Stat. Appl. Genet. Mol. Biol. 6, 10 (2007).
Wagner, A. et al. Surveys of gene families using polymerase chain-reaction—PCR selection and PCR drift. Syst. Biol. 43, 250–261 (1994).
doi: 10.1093/sysbio/43.2.250
Rabadan, R. et al. On statistical modeling of sequencing noise in high depth data to assess tumor evolution. J. Stat. Phys. 172, 143–155 (2018).
doi: 10.1007/s10955-017-1945-1 pubmed: 30034030
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
doi: 10.1093/bioinformatics/bty191 pubmed: 29750242 pmcid: 6137996
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 e1821 (2019).
doi: 10.1016/j.cell.2019.05.031 pubmed: 31178118 pmcid: 6687398
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587 e3529 (2021).
doi: 10.1016/j.cell.2021.04.048 pubmed: 34062119 pmcid: 8238499
McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).

Auteurs

Jianfeng Sun (J)

Botnar Research Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, National Institute of Health Research Oxford Biomedical Research Unit (BRU), University of Oxford, Oxford, UK.

Martin Philpott (M)

Botnar Research Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, National Institute of Health Research Oxford Biomedical Research Unit (BRU), University of Oxford, Oxford, UK.

Danson Loi (D)

Botnar Research Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, National Institute of Health Research Oxford Biomedical Research Unit (BRU), University of Oxford, Oxford, UK.

Shuang Li (S)

Gene Center, Ludwig-Maximilians University of Munich, Munich, Germany.

Pablo Monteagudo-Mesas (P)

Gene Center, Ludwig-Maximilians University of Munich, Munich, Germany.

Gabriela Hoffman (G)

ATDBio Ltd (now part of Biotage), Magdalen Centre, Oxford Science Park, Oxford, UK.

Jonathan Robson (J)

ATDBio Ltd (now part of Biotage), Magdalen Centre, Oxford Science Park, Oxford, UK.

Neelam Mehta (N)

Botnar Research Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, National Institute of Health Research Oxford Biomedical Research Unit (BRU), University of Oxford, Oxford, UK.

Vicki Gamble (V)

Botnar Research Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, National Institute of Health Research Oxford Biomedical Research Unit (BRU), University of Oxford, Oxford, UK.

Tom Brown (T)

ATDBio Ltd (now part of Biotage), Magdalen Centre, Oxford Science Park, Oxford, UK.

Tom Brown (T)

Chemistry Research Laboratory, Department of Chemistry, University of Oxford, Oxford, UK.

Stefan Canzar (S)

Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA.
Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA.

Udo Oppermann (U)

Botnar Research Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, National Institute of Health Research Oxford Biomedical Research Unit (BRU), University of Oxford, Oxford, UK.
Oxford Centre for Translational Myeloma Research, University of Oxford, Oxford, UK.

Adam P Cribbs (AP)

Botnar Research Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, National Institute of Health Research Oxford Biomedical Research Unit (BRU), University of Oxford, Oxford, UK. adam.cribbs@ndorms.ox.ac.uk.
Oxford Centre for Translational Myeloma Research, University of Oxford, Oxford, UK. adam.cribbs@ndorms.ox.ac.uk.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing
Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Animals Lung India Sheep Transcriptome

Classifications MeSH