Housekeeping genes are widely used as internal controls in a variety of study types, including real time RT-PCR, microarrays, Northern analysis and RNase protection assays. However, even commonly used housekeeping genes may vary in stability depending on the cell type or disease being studied. Thus,...
Housekeeping genes are widely used as internal controls in a variety of study types, including real time RT-PCR, microarrays, Northern analysis and RNase protection assays. However, even commonly used housekeeping genes may vary in stability depending on the cell type or disease being studied. Thus, it is necessary to identify additional housekeeping-type genes that show sample-independent stability. Here, we used statistical analysis to examine a large human microarray database, seeking genes that were stably expressed in various tissues, disease states and cell lines. We further selected genes that were expressed at different levels, because reference and target genes should be present in similar copy numbers to achieve reliable quantitative results. Real time RT-PCR amplification of three newly identified reference genes, CGI-119, CTBP1 and GOLGAl, alongside three well-known housekeeping genes, B2M, GAPD, and TUBB, confirmed that the newly identified genes were more stably expressed in individual samples with similar ranges. These results collectively suggest that statistical analysis of microarray data can be used to identify new candidate housekeeping genes showing consistent expression across tissues and diseases. Our analysis identified three novel candidate housekeeping genes (CGI-119, GOLGA1, and CTBP1) that could prove useful for normalization across a variety of RNA-based techniques.
Housekeeping genes are widely used as internal controls in a variety of study types, including real time RT-PCR, microarrays, Northern analysis and RNase protection assays. However, even commonly used housekeeping genes may vary in stability depending on the cell type or disease being studied. Thus, it is necessary to identify additional housekeeping-type genes that show sample-independent stability. Here, we used statistical analysis to examine a large human microarray database, seeking genes that were stably expressed in various tissues, disease states and cell lines. We further selected genes that were expressed at different levels, because reference and target genes should be present in similar copy numbers to achieve reliable quantitative results. Real time RT-PCR amplification of three newly identified reference genes, CGI-119, CTBP1 and GOLGAl, alongside three well-known housekeeping genes, B2M, GAPD, and TUBB, confirmed that the newly identified genes were more stably expressed in individual samples with similar ranges. These results collectively suggest that statistical analysis of microarray data can be used to identify new candidate housekeeping genes showing consistent expression across tissues and diseases. Our analysis identified three novel candidate housekeeping genes (CGI-119, GOLGA1, and CTBP1) that could prove useful for normalization across a variety of RNA-based techniques.
* AI 자동 식별 결과로 적합하지 않은 문장이 있을 수 있으니, 이용에 유의하시기 바랍니다.
제안 방법
/slope values. Analysis of the respective copy numbers of these genes allowed us to select B2M, CGI-119, CTBP1 and GOLGA1 as possible reference genes, as they showed low expression variation across individual tissue samples and a good range of copy numbers (Fig. 1A and B). Of these, B2M is a previously known, housekeeping gene, whereas the other three have not previously been identified as housekeeping genes.
, 2002) developed the GeNorm program, which uses geometric means to calculate the correct normalizing factor from existing housekeeping genes. Here, we utilized statistical tools, such as geometric mean, standard deviation and linear regression, to search a large microarray database for new, stably expressed, novel genes. We further screened for reference genes that are expressed at different levels, as it is beneficial for the reference genes and the genes of interest to be within similar ranges of expression.
These profiles were originally generated using high-density oligonucleotide microarray analysis (HG-U133; Affymetrix) of 281 normal tissue samples from 17 different organs, including breast (27), cervix (5), colon (26), duodenum (10), endometrium (9), esophagus (14), kidney (29), liver (21), lung (32), lymph node (5), myometrium (5), ovary (19), pancreas (19), prostate (15), rectum (18), skin (5), and stomach (22) (numbers in parentheses indicate the number of normal tissue samples analyzed). Normalized signals (expression values) were obtained using the Microarray Suite 5.0 software (Affymetrix), which deletes the largest 2% and the smallest 2% outliers, and the mean of the remaining values (trimmed mean) was used to compute the scale factor (SF = 100/trimmed mean).
3C). The expression levels of GOLGA1, CTBP1, B2M, CGI-119, GAPD and TUBB in the different tissues were then analyzed by linear regression to identify the most stable gene in each cancer type, regardless of the sampled tissue. GOLGA1, CTBP1, B2M and CGI-119 showed higher stabilities than GAPD and TUBB in each cancer type (data not shown).
The results of our data analysis identified several genes having low mean times standard deviation values and high R2/slope values. Analysis of the respective copy numbers of these genes allowed us to select B2M, CGI-119, CTBP1 and GOLGA1 as possible reference genes, as they showed low expression variation across individual tissue samples and a good range of copy numbers (Fig.
The GeNorm program results indicated that the three novel genes showed high stability using the same data set (Table 1). To further examine the tissue specific effect, we generated cDNA from several laboratory cell lines originating from different tissues, and tested the expression levels of the selected genes by real-time quantitative analysis. Again, although the relative order of the stabilities differed from those in the microarray analysis, the stabilities of GOLGA1, CTBP1 and CGI-119 were consistently better than those of the commonly used housekeeping genes, TUBB and GAPD.
To validate these array-based analyses, we next performed real time quantitative RT-PCR analysis. We used gene-specific primers (see Materials and Methods) to amplify cDNA from independent sets of normal, cirrhotic and cancerous liver tissues.
Using the linear regression model, we further analyzed the expression profiles of the identified stable genes (GOLGA1, CTBP1, B2M and CGI-119) in the same tissues under different disease states (Fig. 3B). Although the relative rankings of their stabilities were somewhat altered, the four tested genes were more stable in the 23 different cancers than were the commonly used housekeeping genes, GAPD and TUBB (Fig.
We obtained the genomic expression patterns of 281 normal tissue samples from the 17 different organs, available from the Oncology Database of Gene Logic, and used two statistical methods to screen these data for novel reference genes. First, we used mean and standard deviations.
Since the data had been divided by their minimum values, corrected values near 1 indicated that the genes showed little variation, as did small standard deviations. We then analyzed the data by multiplying the mean values with the corresponding standard deviations, which increased the reliability of the analysis. Thus, lower values (mean times standard deviation) indicated genes with lower variations in expression level.
데이터처리
A lower slope value indicated less variation in the expression of a given gene across the 17 tissue sample sets. For individual tissue samples, linear regression analysis was performed with FVMs calculated by dividing the expression value of each tissue sample by the minimum among the 281 expression values.
Briefly, for each reference gene, the Fold Value to Minimum (FVM) of each tissue sample set was obtained by dividing the geometric mean of the sample set by the minimum among the 17 mean values. Linear regression analysis was performed using the FVMs to generate slope and R2 values. A lower slope value indicated less variation in the expression of a given gene across the 17 tissue sample sets.
이론/모형
Novel reference genes exhibiting little variation across the 17 tissue sample sets were identified by comparing the geometric means of the expression values in each sample set, using the GeneExpress 2000 Software Contrast Analysis and Electronic Northern Analysis tools. Contrast analysis was used to find genes that were similarly expressed across sample sets, while electronic Northern analysis allowed us to infer the range of expression levels for each gene in each sample set (Schmitt et al.
성능/효과
3B). Although the relative rankings of their stabilities were somewhat altered, the four tested genes were more stable in the 23 different cancers than were the commonly used housekeeping genes, GAPD and TUBB (Fig. 3C). The expression levels of GOLGA1, CTBP1, B2M, CGI-119, GAPD and TUBB in the different tissues were then analyzed by linear regression to identify the most stable gene in each cancer type, regardless of the sampled tissue.
2A-F). Furthermore, simple linear regression of the sorted data revealed that CGI-119, B2M, CTBP1 and GOLGA1 had lower slope values than GAPD and TUBB (Fig. 3). The results of these two separate statistical methods confirmed the same three novel genes (CGI-110, CTBP1 and GOLGA1) as good candidate housekeeping genes that may be more stably expressed than the commonly used housekeeping genes, GAPD and TUBB (Figs.
3). The results of these two separate statistical methods confirmed the same three novel genes (CGI-110, CTBP1 and GOLGA1) as good candidate housekeeping genes that may be more stably expressed than the commonly used housekeeping genes, GAPD and TUBB (Figs. 1, 2 and 3).
We used gene-specific primers (see Materials and Methods) to amplify cDNA from independent sets of normal, cirrhotic and cancerous liver tissues. The stabilities of GOLGA1, CTBP1 and CGI-119 were high in these tissues, while B2M was less stable than GAPD in this analysis (Table 1). This highlights the limitations of commonly used housekeeping genes such as B2M, which may show disease-specific effects.
Again, although the relative order of the stabilities differed from those in the microarray analysis, the stabilities of GOLGA1, CTBP1 and CGI-119 were consistently better than those of the commonly used housekeeping genes, TUBB and GAPD. To be consistent with our liver tissue data (Table 1), we further validated our newly identified housekeeping genes by using GOLGA1, CTBP1 and CGI-119 as references to normalize the expression levels of genes having different ranges of copy number, and found that the use of a reference with a similar copy number yielded better normalization results (data not shown).
참고문헌 (21)
Chen, X., Cheung, S. T., So, S., Fan, S. T., Barry, C., Higgins, J., Lai, K. M., Ji, J., Dudoit, S., Ng, I. O., Van De Rijn, M., Botstein, D. and Brown, P. O. (2002) Gene expression patterns in human liver cancers. Mol. Biol. Cell 13, 1929-1939.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D. and Lander, E. S. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531-537.
Graveel, C. R., Jatkoe, T., Madore, S. J., Holt, A. L. and Farnham, P. J. (2001) Expression profiling and identification of novel genes in hepatocellular carcinomas. Oncogene 20, 2704-2712.
Hamadeh, H. K., Bushel, P. R., Jayadev, S., DiSorbo, O., Bennett, L., Li, L., Tennant, R., Stoll, R., Barrett, J. C., Paules, R. S., Blanchard, K. and Afshari, C. A. (2002) Prediction of compound signature using high density gene expression profiling. Toxicol. Sci. 67, 232-240.
Khimani, A. H., Mhashilkar, A. M., Mikulskis, A., O'Malley, M., Liao, J., Golenko, E. E., Mayer, P., Chada, S., Killian, J. B. and Lott, S. T. (2005) Housekeeping genes in cancer: normalization of array data. Biotechniques 38, 739-745.
Kim, J. W. and Wang, X. W. (2003) Gene expression profiling of preneoplastic liver disease and liver cancer: a new era for improved early detection and treatment of these deadly diseases? Carcinogenesis 24, 363-369.
Kim, S., Shi, H., Lee, D. K. and Lis, J. T. (2003) Specific SR protein-dependent splicing substrates identified through genomic SELEX. Nucleic Acids Res. 31, 1955-1961.
Lee, J. S. and Thorgeirsson, S. S. (2002) Functional and genomic implications of global gene expression profiles in cell lines from human hepatocellular cancer. Hepatology 35, 1134-1143.
Schena, M., Shalon, D., Davis, R. W. and Brown, P. O. (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467-470.
Schmitt, A. O., Specht, T., Beckmann, G., Dahl, E., Pilarsky, C. P., Hinzmann, B. and Rosenthal, A. (1999) Exhaustive mining of EST libraries for genes differentially expressed in normal and tumour tissues. Nucleic Acids Res. 27, 4251-4260.
Suh, Y. J., Yang, M. H., Yoon, S. J. and Park, J. H. (2006) GEDA: new knowledge base of gene expression in drug addiction. J. Biochem. Mol. Biol. 39, 441-447.
Szabo, A., Perou, C. M., Karaca, M., Perreard, L., Quackenbush, J. F. and Bernard, P. S. (2004) Statistical modeling for selecting housekeeper genes. Genome Biol. 5, 59.
Thellin, O., Zorzi, W., Lakaye, B., De Borman, B., Coumans, B., Hennen, G., Grisar, T., Igout, A. and Heinen, E. (1999) Housekeeping genes as internal standards: use and limits. J. Biotechnol. 75, 291-295.
Vandesompele, J., De Preter, K., Pattyn, F., Poppe, B., Van Roy, N., De Paepe, A. and Speleman, F. (2002) Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 3, 34.
Warrington, J. A., Nair, A., Mahadevappa, M. and Tsyganskaya, M. (2000) Comparison of human adult and fetal expression and identification of 535 housekeeping/maintenance genes. Physiol. Genomics 2, 143-147.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.