0% found this document useful (0 votes)

22 views8 pages

Housekeeping

Uploaded by

w7mr7hxvyw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views8 pages

Housekeeping

Uploaded by

w7mr7hxvyw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Human housekeeping genes are compact

Eli Eisenberg and Erez Y. Levanon

Compugen Ltd., 72 Pinchas Rosen Street, Tel Aviv 69512, Israel

Abstract
arXiv:q-bio/0309020v1 [q-bio.GN] 30 Sep 2003

We identify a set of 575 human genes that are expressed in all conditions tested in a publicly
available database of microarray results. Based on this common occurrence, the set is expected
to be rich in “housekeeping” genes, showing constitutive expression in all tissues. We compare
selected aspects of their genomic structure with a set of background genes. We find that the
introns, untranslated regions and coding sequences of the housekeeping genes are shorter, indicating
a selection for compactness in these genes.

1
The amazing diversity of the human body stems from the diﬀerent expression patterns of
genes in diﬀerent tissues. Although most genes show constitutive expression in only a subset
of tissues, some gene products are required for the maintenance of the basal cellular function
and are constitutively found in all human cells. These genes are called housekeeping genes
(HK genes) [1]. HK genes can be used to calibrate measurements of gene expression [2].
They might also help to define the minimal gene complement needed for a human cell [1].
Several attempts have been made recently to define the complete set of HK genes [3, 4].
Microarrays are often used to identify sets of genes that are expressed either ubiquitously
or in specific tissues or conditions. However, the technique is technically demanding and
prone to artifacts, so independent evidence is often required to confirm the results. In
principle, identifying the set of HK genes using microarray data is straightforward; one
need only look for genes that are expressed in all tissues and all experimental conditions.
Employing such an approach has so far resulted in two lists of HK genes [3, 4]. However,
problems in probe design, measurement noise and other artifacts introduce inevitable errors
in such lists. Because a northern blot experiment for each gene in each tissue is impractical,
an independent test is needed to validate any list of HK genes. Here, we report a validation
test that uses a recently discovered property of highly expressed genes.
The transcription process is both slow and costly; it takes 50 milliseconds [5, 6] and
two ATP molecules [7] approximately to transcribe a nucleotide. This might be expected
to provide selective pressure to make genes as short as functionally possible. The more
copies of a gene required for the organism, the stronger this pressure should be. The first
demonstration of this principle [8] showed that genes with a large number of expressed
sequence tags (ESTs) in public libraries (and hence most mRNAs) have a significantly
shorter average intron length than those with fewer ESTs.
Here, an implication of this principle is used to validate a set of HK genes. The HK
genes, which are transcribed in all somatic cells and under all circumstances, are by nature
highly expressed, and therefore should be selected to have shorter introns. We used a
recently published database of microarray experiments [9] to identify a set of HK genes. As
a further validation step, we checked the Gene Ontology (GO) annotation of these genes.
We compared the structure of the HK genes with all other genes, and not only the introns,
but all parts of the HK genes were found to be, on average, shorter than other genes. In
particular, the untranslated regions and the translated proteins are all shorter in the HK

2
assumed
housekeeping

Nuber of genes
1000 genes
background

500

0
0 5 10 15 20 25 30 35 40 45
Number of tissues expressed in

FIG. 1: The distribution of 7500 RefSeq genes represented on the microarray as a function of the
number of tissues they express in. Each bin gives the number of genes expressed in M out of 47
diﬀerent tissues. The M=47 bin corresponds to the housekeeping genes, expressed in all tissues.

genes.

I. ASSIGNMENT OF HOUSEKEEPING GENES

A recently published database provides microarray expression data for Affymetrix U95A
chip, containing 12,600 probes, and hybridized to 101 different samples [9] from 47 different
human tissues and cell lines. These samples are mainly from the normal human physio-
logical state, and therefore this dataset provides a description of the normal mammalian
transcriptome.
We calculated the distribution of the number of different tissues in which a gene is ex-
pressed. Discarding probes for which the associated gene was not represented in the RefSeq
database [10], and unifying all probes measuring the same gene (ignoring the potential
differences among splice variants) yielded probes representing 7500 human genes. The ex-
periments measuring replicates of the same biological condition were averaged to reduce the
measurement noise, resulting in 47 data points per probe. We considered that a probe was
expressed in a certain condition if its average reading was above a certain cutoff value. The
results were not sensitive to the exact cut-off value, and we chose 200 standard Affymetrix
averagedifference units, considered to be a conservative cut-off value for determining gene
presence [9]. This is also the trimmed average expression level in each tissue in accordance
with the standard Affymetrix normalization procedure [11, 12]. Thus, our HK genes are
expressed in all tissues at an above-average level.
A histogram (Fig. 1) of the number of genes expressed in exactly M of the 47 tissues

3
0 20000 40000 60000
Total introns length

FIG. 2: A Histogram of the total length of introns. Green bars, HK genes; blue bars, non-HK
genes.

shows a clear tendency for frequency to decrease as M increases. However, a substantial

number of genes (575), belong to the class of genes that are expressed in all tissues. Because
their number is far greater than expected based on the general trend described above, we
assumed this class to be rich in HK genes, and considered it to be the set of HK genes.
It is noteworthy that the genes in our HK list tend to have an average expression sig-
nificantly higher than other genes; the geometric mean expression of our HK genes is 1200
in Affymetrix average difference units, whereas that of other genes is 150. The difference
cannot be accounted for by the cutoff used to define the HK genes, and is not a result of a
bias due to inclusion of genes expressed in a few tissues only (data not shown).
Two additional tests were conducted to validate this set. First, a study of the GO
annotation [13] of these genes revealed the set is rich in metabolic proteins (24%) and
RNA-interacting proteins (19%, mostly ribosomal proteins). Second, we compiled a list
of 18 well-established HK genes commonly used for quantitative PCR calibration [14, 15],
and checked our list against it. We found 13 of the 18 genes in our list, and the other
five were not represented on the microarray (see Table in Supplementary Information at
http://www.compugen.co.il/supp info/Housekeeping genes.html).

II. LENGTH ANALYSIS OF HK GENES

Table 1 compares the lengths of various parts of the HK genes and the background genes.
The alignment data was taken from the UCSC genome browser (http://genome.ucsc.edu)
[16]. We excluded 322 genes that do not have a unique alignment, as well as 1242 genes that
were not expressed in any tissue (to avoid potential problems because of defective probes).

4
This left 532 HK genes and 5404 non-HK genes. The histograms in Fig. 2-4 compare HK
genes with the other genes by total intron length, 5’ UTR length and coding sequence length.
Remarkably, there was a statistically significant diﬀerence between HK and non-HK genes
in all aspects of gene structure. Average intron length is shorter for the HK genes than for
the background genes (2573 bp versus 5025 bp, respectively); total gene length is shorter
(21,050 bp versus 53,418 bp); average exon length is shorter (212 bp versus 240 bp); average
lengths of both 3’ and 5’ untranslated regions (UTRs) are shorter (5’: 135 bp versus 173

TABLE I: Human housekeeping genes are compact. Comparison of structure of housekeeping

(HK) genes versus non-HK genes. For each case the first line gives the average value, s.e.m, and
the second line gives the median. For the average intron and exon lengths, all introns and exons
belonging to the relevant set were included; the number appears in parentheses. The P-value was
calculated using the Mann-Whitney test. UTR, untranslated region.

HK genes (n=532) non-HK (n=5404) P-value

Average intron length 2573 ± 145(n=4353) 5025 ± 71(n=57447) 4 × 10−130
672 1365
Total intron length 21050 ± 1781 53418 ± 1425 7 × 10−28
9293 20804
Average exon length 212 ± 5(n=4885) 240 ± 2(n=62851) 9 × 10−5
672 1365
5’ UTR length 135 ± 8 173 ± 3 4 × 10−7
79 106
3’ UTR length 599 ± 30 846 ± 13 3 × 10−13
333 552
Coding sequence length 1211 ± 44 1770 ± 26 3 × 10−26
928 1322
Number of introns 8.2 ± 0.3 10.6 ± 0.2 6 × 10−7
6 8
Intron bps per coding bp 20 ± 2 31.8 ± 0.8 2 × 10−11
9.9 15.6

5
0 250 500
5` UTR length

FIG. 3: A Histogram of the length of the 5’ untranslated regions (UTR). Green bars, HK genes;
blue bars, non-HK genes.

bp; 3’: 599 bp versus 846 bp); and, most notably, the translated proteins are shorter as well
(403 amino acids versus 590 amino acids). Accordingly, the number of introns bp per unit of
coding sequence length is lower for the HK genes (20 versus 32). We studied the structure
of each gene as a function of the number of tissues it is expressed in and verified that the
results are not due to bias of the non-HK genes by tissue-specific genes (data not shown).
The pronounced statistical characteristics of the HK gene set further supports their as-
signment as a unique set. Our findings confirm and extend previous research, showing that
the introns of highly expressed genes are shorter [5]. As mentioned above, the HK genes
expression levels are high, and the fact that they have to be expressed in all cells at all
times makes them even more costly to transcribe. Previously [8], the high abundance of a
certain gene in EST libraries was an indication the gene was highly expressed in the hu-
man body. It was pointed out [8], however, that this method is prone to bias due to the
inclusion of normalized and tumor libraries and overrepresentation of certain tissues. Our
approach overcomes this difficulty and confirms the previous result. Moreover, we find here
that UTRs and even the encoded proteins are shorter for the HK genes. The magnitude
of the difference is greater for the introns than for the exons and proteins (Table 1), which
makes sense because the coding sequences and the UTRs are less susceptible to change.
It should be mentioned that intronless genes were included in our analysis after verifying
that their inclusion or exclusion had no effect on the results. It also must be noted that the
UTRs are not always fully sequenced, and thus their actual lengths might be longer. This
bias was found to have no effect on the length of the coding sequences, and in any case the
effect would be the same for both HK and non-HK genes.

6
0 1000 2000 3000 4000
Total cds length

FIG. 4: A Histogram of the length of the coding region. Green bars, HK genes; blue bars, non-HK
genes.

It has been noted that codon usage bias in nonmammalian organisms is correlated with
the expression level and with the gene length [17, 18, 19]. These results led to the conjecture
of selective pressure on highly expressed genes resulting in shorter proteins [19]. However, no
evidence for this selection was found [18], possibly because of a lack of high quality databases
for these organisms. Recent works have suggested that there is no selection for codon usage
bias in humans [20], and thus our results demonstrate that the expression-length correlation
is not related to the expression-codon bias correlation.
It could be argued that selection towards shorter genes should have eliminated the introns
in highly expressed genes. However, it is known that introns do have important roles, such
as splicing regulation. Therefore, there is a balance between the advantageous contribution
of the introns and the selective pressure for shortening.
Finally, when we compared our results with two (largely overlapping) published sets of
HK genes, we found that roughly half of the genes in the intersection of those sets were
present in our set. We used the genomic structure to test the remaining genes, and found
a statistically significant difference between them and our HK gene set. The differences
between our results and those of earlier studies [3, 4] could be due to the fact that the
database we used was based on more advanced chip technology and included many more
different tissues, giving it more discriminative power to identify HK genes.
In conclusion, we have identified a set of HK genes. The set is publicly available at
http://www.compugen.co.il/supp info/Housekeeping genes.html and can be used for cali-
bration of microarrays, toxicity evaluation and quantitative PCR experiments. Furthermore,
we show that HK genes have shorter introns, UTRs and coding sequences, attesting to the

7
strong selection for compactness in these genes.

Acknowledgments

We thank Andrew Su for helpful discussion and for providing us with the RefSeq mapping.
Gady Cojocaru and Rotem Sorek are acknowledged for comments on the manuscript and
insightful discussion.

[1] Butte, A.J. et al. (2001) Physiol. Genomics 7, 95-96.

[2] Gibson, U.E. et al. (1996) Genome Res. 6, 995-1001.
[3] Warrington, J.A. et al. (2000) Physiol. Genomics 2, 143-147.
[4] Hsiao, L.L. et al. (2001) Physiol. Genomics 7, 97-104.
[5] Ucker,D.S. and Yamamoto, K.R. (1984) J. Biol. Chem. 259, 7416-7420.
[6] Izban, M.G. and Luse, D.S. (1992) J. Biol. Chem. 267, 13647-13655.
[7] Lehninger, A.L. et al. (1982) Biochemistry, 615-644.
[8] Castillo-Davis, C.I. et al. (2002) Nat. Genet. 31, 415-418.
[9] Su, A.I. et al. (2002) Proc. Natl. Acad. Sci. U.S.A. 99, 4465-4470.
[10] Pruitt, K.D. et al. (2000) Trends Genet. 16, 44-47.
[11] Lockhart, D.J. et al. (1996) Nat. Biotechnol. 14, 1675-1680.
[12] Wodicka, L. et al. (1997) Nat. Biotechnol. 15, 1359-1367.
[13] Gene Ontlology Consortium, (2001) Genome Res. 11, 1425-1433.
[14] Hamalainen, H.K. et al. (2001) Anal. Biochem. 299, 63-70.
[15] Lee, P.D. (2002) Genome Res. 12, 292-297.
[16] Karolchik, D. et al. (2003) Nucleic Acids Res. 31, 51-54.
[17] Akashi, H. (2001) Curr. Opin. Genet. Dev. 11, 660-666.
[18] Duret, L. and Mouchiroud, D. (1999) Proc. Natl. Acad. Sci. U.S.A. 96, 4482-4487.
[19] Moriyama, E.N. and Powell, J.R. (1998) Nucleic Acids Res. 26, 3188-3193.
[20] Urrutia, A.O. and Hurst, L.D. (2001) Genetics 159, 1191-1199.

Human Housekeeping Genes, Revisited
No ratings yet
Human Housekeeping Genes, Revisited
6 pages
Naive Bayes Classifier for Housekeeping Genes
No ratings yet
Naive Bayes Classifier for Housekeeping Genes
85 pages
Fundamentals of Genomic Medicine
No ratings yet
Fundamentals of Genomic Medicine
23 pages
Genes and Chromosomes
No ratings yet
Genes and Chromosomes
70 pages
Micro 201 Microbial Genetics 2
No ratings yet
Micro 201 Microbial Genetics 2
6 pages
Exploring Human Genome and Genetic Analysis
No ratings yet
Exploring Human Genome and Genetic Analysis
13 pages
Genes
No ratings yet
Genes
42 pages
Lab Manual in Genetics 2019
100% (7)
Lab Manual in Genetics 2019
125 pages
Expanded Human Gene Tally Reignites Debate
No ratings yet
Expanded Human Gene Tally Reignites Debate
2 pages
Kimura & Ohta 1974. On Some Principles Governing Molecular Evolution
No ratings yet
Kimura & Ohta 1974. On Some Principles Governing Molecular Evolution
5 pages
Molecular Genetics Overview
No ratings yet
Molecular Genetics Overview
41 pages
Genome Size vs. Gene Count Analysis
No ratings yet
Genome Size vs. Gene Count Analysis
8 pages
GAPDH in 72 Tissue
No ratings yet
GAPDH in 72 Tissue
7 pages
Introduction to Genetics Concepts
No ratings yet
Introduction to Genetics Concepts
5 pages
Understanding Noncoding DNA Functions
No ratings yet
Understanding Noncoding DNA Functions
5 pages
Chapter 5
No ratings yet
Chapter 5
64 pages
Genetics - Laboratory Investigations (12th Edn)
No ratings yet
Genetics - Laboratory Investigations (12th Edn)
1 page
Human Genome Project Insights and Findings
No ratings yet
Human Genome Project Insights and Findings
48 pages
The Human Genome Project
100% (3)
The Human Genome Project
19 pages
Understanding Comparative Genomics
No ratings yet
Understanding Comparative Genomics
40 pages
DNA Microarray Analysis Overview
No ratings yet
DNA Microarray Analysis Overview
34 pages
Bibliography PDF
No ratings yet
Bibliography PDF
33 pages
Large-Scale Chemical-Genetics of The Human Gut Bacterium Bacteroides Thetaiotaomicron
No ratings yet
Large-Scale Chemical-Genetics of The Human Gut Bacterium Bacteroides Thetaiotaomicron
35 pages
Genetics Review Questions and Answers
0% (1)
Genetics Review Questions and Answers
3 pages
Biochemical Genetics Overview
No ratings yet
Biochemical Genetics Overview
30 pages
Lecture 1 Overview of Genetics
No ratings yet
Lecture 1 Overview of Genetics
46 pages
Key Concepts in Bioinformatics
No ratings yet
Key Concepts in Bioinformatics
4 pages
SBL720 Set 2 10 17jan2025
No ratings yet
SBL720 Set 2 10 17jan2025
82 pages
Gene Discovery in Plant Genomics
No ratings yet
Gene Discovery in Plant Genomics
27 pages
GB 2002 3 7 Research0034
No ratings yet
GB 2002 3 7 Research0034
12 pages
Genome Anatomies
No ratings yet
Genome Anatomies
15 pages
Genetics Solution Manual 5th Ed
100% (78)
Genetics Solution Manual 5th Ed
36 pages
Using BLAST for Evolutionary Analysis
No ratings yet
Using BLAST for Evolutionary Analysis
6 pages
Understanding Genes and Genomics Basics
No ratings yet
Understanding Genes and Genomics Basics
2 pages
2018 Biology Paper 3 Exam Questions
No ratings yet
2018 Biology Paper 3 Exam Questions
13 pages
Model Test Paper 6-Ms
No ratings yet
Model Test Paper 6-Ms
8 pages
Microbial Genomics Overview and Insights
No ratings yet
Microbial Genomics Overview and Insights
18 pages
FINE STRUCTURE OF Gene
No ratings yet
FINE STRUCTURE OF Gene
37 pages
Introduction to Genomics and Bioinformatics
No ratings yet
Introduction to Genomics and Bioinformatics
22 pages
2 - Some Terminology Used in The Molecular Biology
No ratings yet
2 - Some Terminology Used in The Molecular Biology
19 pages
Na Plug Jacks
No ratings yet
Na Plug Jacks
63 pages
Class XII Biology Trial Exam 2021
No ratings yet
Class XII Biology Trial Exam 2021
18 pages
SAFC Biosciences Scientific Posters - Using Microarray Technology To Select Housekeeping Genes in CHO Cells
100% (1)
SAFC Biosciences Scientific Posters - Using Microarray Technology To Select Housekeeping Genes in CHO Cells
1 page
Zhang 2012 The Evolution of Intron Size in Amniotes A Role For Powered Flight Genome Biol Evol
No ratings yet
Zhang 2012 The Evolution of Intron Size in Amniotes A Role For Powered Flight Genome Biol Evol
11 pages
Molecular Genetics of Bacteria & Phages
No ratings yet
Molecular Genetics of Bacteria & Phages
50 pages
Lec 15-16
No ratings yet
Lec 15-16
33 pages
Fokunang Lecture-Biotechnology Principles Practice
No ratings yet
Fokunang Lecture-Biotechnology Principles Practice
52 pages
Human Genome Project Overview
75% (4)
Human Genome Project Overview
94 pages
Genomes and Their Evolution: Biology
No ratings yet
Genomes and Their Evolution: Biology
94 pages
Eukaryotic Genome Complexity - Learn Science at Scitable
No ratings yet
Eukaryotic Genome Complexity - Learn Science at Scitable
4 pages
Introduction To Genetics
0% (1)
Introduction To Genetics
14 pages
Bacterial Chromosome Organization and Features
No ratings yet
Bacterial Chromosome Organization and Features
98 pages
Genetics Exam Questions and Answers
No ratings yet
Genetics Exam Questions and Answers
21 pages
Chemical Biology Principles and Techniques
No ratings yet
Chemical Biology Principles and Techniques
31 pages
Genetic Databases and Maps Overview
No ratings yet
Genetic Databases and Maps Overview
17 pages
Practice Questions 1 Genetics
No ratings yet
Practice Questions 1 Genetics
11 pages
Gene Concepts for Biology Students
No ratings yet
Gene Concepts for Biology Students
6 pages
Name. SBC 420 Cat2
No ratings yet
Name. SBC 420 Cat2
4 pages
A Four in One Replicase Integrating Key
No ratings yet
A Four in One Replicase Integrating Key
13 pages
Topic 7 Genetics Populations Evolution and Ecosystems (Updated For 2025)
No ratings yet
Topic 7 Genetics Populations Evolution and Ecosystems (Updated For 2025)
20 pages
Biology CP CH 11 Cell Growth and Division
No ratings yet
Biology CP CH 11 Cell Growth and Division
41 pages
Genetics - 5th CT 3
No ratings yet
Genetics - 5th CT 3
1 page
PCR Cloning Protocols 2nd Edition Lori A. Kolmodin
0% (1)
PCR Cloning Protocols 2nd Edition Lori A. Kolmodin
466 pages
DNA Polymorphisms and Human Identification
No ratings yet
DNA Polymorphisms and Human Identification
34 pages
Principles of Inheritance Quiz
No ratings yet
Principles of Inheritance Quiz
3 pages
4-Gr12 Genetics Secta
No ratings yet
4-Gr12 Genetics Secta
4 pages
Selection: Which of The Following Are The Indian Dairy Breeds of Cattle?
No ratings yet
Selection: Which of The Following Are The Indian Dairy Breeds of Cattle?
49 pages
Biotechnology
No ratings yet
Biotechnology
8 pages
Molecular Tools and Techniques
No ratings yet
Molecular Tools and Techniques
50 pages
Genetics Notes NN
No ratings yet
Genetics Notes NN
10 pages
13 - IB Biology (2016) - 2.7 - DNA Replication, Transcription & Translation
No ratings yet
13 - IB Biology (2016) - 2.7 - DNA Replication, Transcription & Translation
43 pages
Human Endogenous Retrovirus-K HML-2 A Comprehensive Review
No ratings yet
Human Endogenous Retrovirus-K HML-2 A Comprehensive Review
25 pages
Molecular Evolution for Biologists
No ratings yet
Molecular Evolution for Biologists
118 pages
How Is Genotype Called in Usa - Google Search
No ratings yet
How Is Genotype Called in Usa - Google Search
1 page
HWE Worksheet - Mating Game
No ratings yet
HWE Worksheet - Mating Game
2 pages
Lesson Two Codominance and Incomplete Dominance - Answers
No ratings yet
Lesson Two Codominance and Incomplete Dominance - Answers
2 pages
CH 15 Dna Questions
No ratings yet
CH 15 Dna Questions
11 pages
Journal Pgen 1009550
No ratings yet
Journal Pgen 1009550
21 pages
Chap 16 - Study Questions2017 (1) - Tagged
No ratings yet
Chap 16 - Study Questions2017 (1) - Tagged
7 pages
Using A Punnett Square: Directions
No ratings yet
Using A Punnett Square: Directions
4 pages
Mechanisms of Evolution Explained
No ratings yet
Mechanisms of Evolution Explained
23 pages
SC A3.1 Diversity of Organisms IBQB
0% (1)
SC A3.1 Diversity of Organisms IBQB
17 pages
SeqStudio Flex Series Brochure - 1 - Compressed
No ratings yet
SeqStudio Flex Series Brochure - 1 - Compressed
12 pages
Introduction To Genetics
No ratings yet
Introduction To Genetics
45 pages
Variation and Selection in Biology IGCSE
No ratings yet
Variation and Selection in Biology IGCSE
6 pages
Beispielfragen Bioinformatik
No ratings yet
Beispielfragen Bioinformatik
4 pages
Biology Systems and Innovations
No ratings yet
Biology Systems and Innovations
1 page

Housekeeping

Uploaded by

Housekeeping

Uploaded by

Human housekeeping genes are compact

Eli Eisenberg and Erez Y. Levanon

I. ASSIGNMENT OF HOUSEKEEPING GENES

shows a clear tendency for frequency to decrease as M increases. However, a substantial

II. LENGTH ANALYSIS OF HK GENES

TABLE I: Human housekeeping genes are compact. Comparison of structure of housekeeping

HK genes (n=532) non-HK (n=5404) P-value

[1] Butte, A.J. et al. (2001) Physiol. Genomics 7, 95-96.

You might also like