Evaluating stably expressed genes in single cells.

gene expression variability housekeeping genes scRNA-seq single cells stably expressed genes

Journal

GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872

Informations de publication

Date de publication:
01 09 2019
Historique:
received: 22 11 2018
revised: 22 05 2019
accepted: 09 08 2019
entrez: 19 9 2019
pubmed: 19 9 2019
medline: 18 3 2020
Statut: ppublish

Résumé

Single-cell RNA-seq (scRNA-seq) profiling has revealed remarkable variation in transcription, suggesting that expression of many genes at the single-cell level is intrinsically stochastic and noisy. Yet, on the cell population level, a subset of genes traditionally referred to as housekeeping genes (HKGs) are found to be stably expressed in different cell and tissue types. It is therefore critical to question whether stably expressed genes (SEGs) can be identified on the single-cell level, and if so, how can their expression stability be assessed? We have previously proposed a computational framework for ranking expression stability of genes in single cells for scRNA-seq data normalization and integration. In this study, we perform detailed evaluation and characterization of SEGs derived from this framework. Here, we show that gene expression stability indices derived from the early human and mouse development scRNA-seq datasets and the "Mouse Atlas" dataset are reproducible and conserved across species. We demonstrate that SEGs identified from single cells based on their stability indices are considerably more stable than HKGs defined previously from cell populations across diverse biological systems. Our analyses indicate that SEGs are inherently more stable at the single-cell level and their characteristics reminiscent of HKGs, suggesting their potential role in sustaining essential functions in individual cells. SEGs identified in this study have immediate utility both for understanding variation and stability of single-cell transcriptomes and for practical applications such as scRNA-seq data normalization. Our framework for calculating gene stability index, "scSEGIndex," is incorporated into the scMerge Bioconductor R package (https://sydneybiox.github.io/scMerge/reference/scSEGIndex.html) and can be used for identifying genes with stable expression in scRNA-seq datasets.

Sections du résumé

BACKGROUND
Single-cell RNA-seq (scRNA-seq) profiling has revealed remarkable variation in transcription, suggesting that expression of many genes at the single-cell level is intrinsically stochastic and noisy. Yet, on the cell population level, a subset of genes traditionally referred to as housekeeping genes (HKGs) are found to be stably expressed in different cell and tissue types. It is therefore critical to question whether stably expressed genes (SEGs) can be identified on the single-cell level, and if so, how can their expression stability be assessed? We have previously proposed a computational framework for ranking expression stability of genes in single cells for scRNA-seq data normalization and integration. In this study, we perform detailed evaluation and characterization of SEGs derived from this framework.
RESULTS
Here, we show that gene expression stability indices derived from the early human and mouse development scRNA-seq datasets and the "Mouse Atlas" dataset are reproducible and conserved across species. We demonstrate that SEGs identified from single cells based on their stability indices are considerably more stable than HKGs defined previously from cell populations across diverse biological systems. Our analyses indicate that SEGs are inherently more stable at the single-cell level and their characteristics reminiscent of HKGs, suggesting their potential role in sustaining essential functions in individual cells.
CONCLUSIONS
SEGs identified in this study have immediate utility both for understanding variation and stability of single-cell transcriptomes and for practical applications such as scRNA-seq data normalization. Our framework for calculating gene stability index, "scSEGIndex," is incorporated into the scMerge Bioconductor R package (https://sydneybiox.github.io/scMerge/reference/scSEGIndex.html) and can be used for identifying genes with stable expression in scRNA-seq datasets.

Identifiants

pubmed: 31531674
pii: 5570567
doi: 10.1093/gigascience/giz106
pmc: PMC6748759
pii:
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : NIDCD NIH HHS
ID : R21 DC015107
Pays : United States

Informations de copyright

© The Author(s) 2019. Published by Oxford University Press.

Références

Trends Genet. 2003 Jul;19(7):362-5
pubmed: 12850439
Nucleic Acids Res. 2014 Jan;42(Database issue):D472-7
pubmed: 24243840
Microbiol Mol Biol Rev. 2004 Sep;68(3):518-37, table of contents
pubmed: 15353568
Science. 2017 Apr 21;356(6335):
pubmed: 28428369
Nat Methods. 2017 Jun;14(6):584-586
pubmed: 28418000
Science. 2017 Mar 31;355(6332):1433-1436
pubmed: 28360329
Science. 2011 Apr 22;332(6028):472-4
pubmed: 21415320
Trends Genet. 2008 Oct;24(10):481-4
pubmed: 18786740
Multivariate Behav Res. 1986 Oct 1;21(4):441-58
pubmed: 26828221
Nature. 2014 May 15;509(7500):371-5
pubmed: 24739965
Nat Biotechnol. 2014 Sep;32(9):896-902
pubmed: 25150836
PLoS One. 2007 Sep 19;2(9):e898
pubmed: 17878933
Nat Methods. 2009 May;6(5):377-82
pubmed: 19349980
Nat Genet. 1999 Dec;23(4):387-8
pubmed: 10581018
BMC Genomics. 2008 Apr 16;9:172
pubmed: 18416810
Science. 2015 Mar 6;347(6226):1138-42
pubmed: 25700174
Physiol Genomics. 2001 Dec 21;7(2):95-6
pubmed: 11773595
Proc Natl Acad Sci U S A. 2006 Jan 10;103(2):425-30
pubmed: 16407165
Nat Rev Genet. 2005 Jun;6(6):451-64
pubmed: 15883588
Cell Syst. 2016 Oct 26;3(4):346-360.e4
pubmed: 27667365
Hum Mol Genet. 2005 Feb 1;14(3):421-7
pubmed: 15590696
Sci Rep. 2014 Jan 14;4:3678
pubmed: 24419370
Mol Cell. 2015 May 21;58(4):610-20
pubmed: 26000846
Cell. 2016 May 5;165(4):1012-26
pubmed: 27062923
Cell. 2016 Jul 14;166(2):358-368
pubmed: 27293191
Eur Arch Psychiatry Clin Neurosci. 2012 Mar;262(2):167-72
pubmed: 21553311
Genome Res. 2014 Mar;24(3):496-510
pubmed: 24299736
Nat Methods. 2014 Jul;11(7):740-2
pubmed: 24836921
Trends Genet. 2013 Oct;29(10):569-74
pubmed: 23810203
Gigascience. 2019 Sep 1;8(9):
pubmed: 31531674
Genome Biol. 2010;11(3):R25
pubmed: 20196867
Proc Natl Acad Sci U S A. 2019 May 14;116(20):9775-9784
pubmed: 31028141
Science. 2014 Feb 14;343(6172):776-9
pubmed: 24531970
Proc Natl Acad Sci U S A. 2015 Jun 9;112(23):7285-90
pubmed: 26060301
Genome Biol. 2016 Apr 27;17:75
pubmed: 27122128
Biotechniques. 2000 Aug;29(2):332-7
pubmed: 10948434
Mol Cell Probes. 2005 Apr;19(2):101-9
pubmed: 15680211
Physiol Genomics. 2000 Apr 27;2(3):143-7
pubmed: 11015593
PLoS Comput Biol. 2009 Dec;5(12):e1000598
pubmed: 20011106
Nature. 2013 Jun 13;498(7453):236-40
pubmed: 23685454
Mol Biol Evol. 2004 Feb;21(2):236-9
pubmed: 14595094
Nat Rev Microbiol. 2003 Nov;1(2):127-36
pubmed: 15035042
Biostatistics. 2012 Jul;13(3):539-52
pubmed: 22101192
Nature. 2016 Jul 06;535(7611):289-293
pubmed: 27383781
J Biotechnol. 1999 Oct 8;75(2-3):291-5
pubmed: 10617337
BMC Syst Biol. 2016 Dec 5;10(Suppl 5):127
pubmed: 28105940
Genome Biol. 2016 Aug 17;17(1):173
pubmed: 27534536
Science. 2016 Apr 8;352(6282):189-96
pubmed: 27124452
Nature. 2018 Oct;562(7727):367-372
pubmed: 30283141
Nucleic Acids Res. 2017 Jan 4;45(D1):D331-D338
pubmed: 27899567
Environ Sci Technol. 2006 Dec 15;40(24):7944-9
pubmed: 17256553
Physiol Genomics. 2001 Dec 21;7(2):97-104
pubmed: 11773596
Science. 2014 Jan 10;343(6167):193-6
pubmed: 24408435
Bioinformatics. 2005 Aug 15;21(16):3439-40
pubmed: 16082012

Auteurs

Yingxin Lin (Y)

School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia.

Shila Ghazanfar (S)

School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia.
Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.

Dario Strbenac (D)

School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia.

Andy Wang (A)

School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia.
Sydney Medical School, University of Sydney, Sydney, NSW 2006, Australia.

Ellis Patrick (E)

School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia.
Westmead Institute for Medical Research, University of Sydney, Westmead, NSW 2145, Australia.

David M Lin (DM)

Department of Biomedical Sciences, Cornell University, Ithaca, NY 14853, USA.

Terence Speed (T)

Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, Australia.
Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC 3010, Australia.

Jean Y H Yang (JYH)

School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia.

Pengyi Yang (P)

School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia.
Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH