[논문]Identification of Novel Universal Housekeeping Genes by Statistical Analysis of Microarray Data

Lee, Se-Ram; Jo, Min-Joung; Lee, Jung-Eun; Koh, Sang-Seok; Kim, So-Youn

doi:10.5483/bmbrep.2007.40.2.226

Identification of Novel Universal Housekeeping Genes by Statistical Analysis of Microarray Data 원문보기

Journal of biochemistry and molecular biology = 한국생화학회지, v.40 no.2, 2007년, pp.226 - 231

Lee, Se-Ram (Department of Chemistry, Dongguk University) , Jo, Min-Joung (Department of Chemistry, Dongguk University) , Lee, Jung-Eun (Department of Chemistry, Dongguk University) , Koh, Sang-Seok (LG Life Sciences Ltd.) , Kim, So-Youn (Department of Chemistry, Dongguk University)

Abstract ▼ AI-Helper

Housekeeping genes are widely used as internal controls in a variety of study types, including real time RT-PCR, microarrays, Northern analysis and RNase protection assays. However, even commonly used housekeeping genes may vary in stability depending on the cell type or disease being studied. Thus, it is necessary to identify additional housekeeping-type genes that show sample-independent stability. Here, we used statistical analysis to examine a large human microarray database, seeking genes that were stably expressed in various tissues, disease states and cell lines. We further selected genes that were expressed at different levels, because reference and target genes should be present in similar copy numbers to achieve reliable quantitative results. Real time RT-PCR amplification of three newly identified reference genes, CGI-119, CTBP1 and GOLGAl, alongside three well-known housekeeping genes, B2M, GAPD, and TUBB, confirmed that the newly identified genes were more stably expressed in individual samples with similar ranges. These results collectively suggest that statistical analysis of microarray data can be used to identify new candidate housekeeping genes showing consistent expression across tissues and diseases. Our analysis identified three novel candidate housekeeping genes (CGI-119, GOLGA1, and CTBP1) that could prove useful for normalization across a variety of RNA-based techniques.

주제어

AI 본문요약
AI-Helper

* AI 자동 식별 결과로 적합하지 않은 문장이 있을 수 있으니, 이용에 유의하시기 바랍니다.

제안 방법

/slope values. Analysis of the respective copy numbers of these genes allowed us to select B2M, CGI-119, CTBP1 and GOLGA1 as possible reference genes, as they showed low expression variation across individual tissue samples and a good range of copy numbers (Fig. 1A and B). Of these, B2M is a previously known, housekeeping gene, whereas the other three have not previously been identified as housekeeping genes.
, 2002) developed the GeNorm program, which uses geometric means to calculate the correct normalizing factor from existing housekeeping genes. Here, we utilized statistical tools, such as geometric mean, standard deviation and linear regression, to search a large microarray database for new, stably expressed, novel genes. We further screened for reference genes that are expressed at different levels, as it is beneficial for the reference genes and the genes of interest to be within similar ranges of expression.
These profiles were originally generated using high-density oligonucleotide microarray analysis (HG-U133; Affymetrix) of 281 normal tissue samples from 17 different organs, including breast (27), cervix (5), colon (26), duodenum (10), endometrium (9), esophagus (14), kidney (29), liver (21), lung (32), lymph node (5), myometrium (5), ovary (19), pancreas (19), prostate (15), rectum (18), skin (5), and stomach (22) (numbers in parentheses indicate the number of normal tissue samples analyzed). Normalized signals (expression values) were obtained using the Microarray Suite 5.0 software (Affymetrix), which deletes the largest 2% and the smallest 2% outliers, and the mean of the remaining values (trimmed mean) was used to compute the scale factor (SF = 100/trimmed mean).
3C). The expression levels of GOLGA1, CTBP1, B2M, CGI-119, GAPD and TUBB in the different tissues were then analyzed by linear regression to identify the most stable gene in each cancer type, regardless of the sampled tissue. GOLGA1, CTBP1, B2M and CGI-119 showed higher stabilities than GAPD and TUBB in each cancer type (data not shown).
The results of our data analysis identified several genes having low mean times standard deviation values and high R²/slope values. Analysis of the respective copy numbers of these genes allowed us to select B2M, CGI-119, CTBP1 and GOLGA1 as possible reference genes, as they showed low expression variation across individual tissue samples and a good range of copy numbers (Fig.
The GeNorm program results indicated that the three novel genes showed high stability using the same data set (Table 1). To further examine the tissue specific effect, we generated cDNA from several laboratory cell lines originating from different tissues, and tested the expression levels of the selected genes by real-time quantitative analysis. Again, although the relative order of the stabilities differed from those in the microarray analysis, the stabilities of GOLGA1, CTBP1 and CGI-119 were consistently better than those of the commonly used housekeeping genes, TUBB and GAPD.
To validate these array-based analyses, we next performed real time quantitative RT-PCR analysis. We used gene-specific primers (see Materials and Methods) to amplify cDNA from independent sets of normal, cirrhotic and cancerous liver tissues.
Using the linear regression model, we further analyzed the expression profiles of the identified stable genes (GOLGA1, CTBP1, B2M and CGI-119) in the same tissues under different disease states (Fig. 3B). Although the relative rankings of their stabilities were somewhat altered, the four tested genes were more stable in the 23 different cancers than were the commonly used housekeeping genes, GAPD and TUBB (Fig.
We obtained the genomic expression patterns of 281 normal tissue samples from the 17 different organs, available from the Oncology Database of Gene Logic, and used two statistical methods to screen these data for novel reference genes. First, we used mean and standard deviations.
Since the data had been divided by their minimum values, corrected values near 1 indicated that the genes showed little variation, as did small standard deviations. We then analyzed the data by multiplying the mean values with the corresponding standard deviations, which increased the reliability of the analysis. Thus, lower values (mean times standard deviation) indicated genes with lower variations in expression level.

데이터처리

A lower slope value indicated less variation in the expression of a given gene across the 17 tissue sample sets. For individual tissue samples, linear regression analysis was performed with FVMs calculated by dividing the expression value of each tissue sample by the minimum among the 281 expression values.
Briefly, for each reference gene, the Fold Value to Minimum (FVM) of each tissue sample set was obtained by dividing the geometric mean of the sample set by the minimum among the 17 mean values. Linear regression analysis was performed using the FVMs to generate slope and R² values. A lower slope value indicated less variation in the expression of a given gene across the 17 tissue sample sets.

이론/모형

Novel reference genes exhibiting little variation across the 17 tissue sample sets were identified by comparing the geometric means of the expression values in each sample set, using the GeneExpress 2000 Software Contrast Analysis and Electronic Northern Analysis tools. Contrast analysis was used to find genes that were similarly expressed across sample sets, while electronic Northern analysis allowed us to infer the range of expression levels for each gene in each sample set (Schmitt et al.

성능/효과

3B). Although the relative rankings of their stabilities were somewhat altered, the four tested genes were more stable in the 23 different cancers than were the commonly used housekeeping genes, GAPD and TUBB (Fig. 3C). The expression levels of GOLGA1, CTBP1, B2M, CGI-119, GAPD and TUBB in the different tissues were then analyzed by linear regression to identify the most stable gene in each cancer type, regardless of the sampled tissue.
2A-F). Furthermore, simple linear regression of the sorted data revealed that CGI-119, B2M, CTBP1 and GOLGA1 had lower slope values than GAPD and TUBB (Fig. 3). The results of these two separate statistical methods confirmed the same three novel genes (CGI-110, CTBP1 and GOLGA1) as good candidate housekeeping genes that may be more stably expressed than the commonly used housekeeping genes, GAPD and TUBB (Figs.
3). The results of these two separate statistical methods confirmed the same three novel genes (CGI-110, CTBP1 and GOLGA1) as good candidate housekeeping genes that may be more stably expressed than the commonly used housekeeping genes, GAPD and TUBB (Figs. 1, 2 and 3).
We used gene-specific primers (see Materials and Methods) to amplify cDNA from independent sets of normal, cirrhotic and cancerous liver tissues. The stabilities of GOLGA1, CTBP1 and CGI-119 were high in these tissues, while B2M was less stable than GAPD in this analysis (Table 1). This highlights the limitations of commonly used housekeeping genes such as B2M, which may show disease-specific effects.
Again, although the relative order of the stabilities differed from those in the microarray analysis, the stabilities of GOLGA1, CTBP1 and CGI-119 were consistently better than those of the commonly used housekeeping genes, TUBB and GAPD. To be consistent with our liver tissue data (Table 1), we further validated our newly identified housekeeping genes by using GOLGA1, CTBP1 and CGI-119 as references to normalize the expression levels of genes having different ranges of copy number, and found that the use of a reference with a similar copy number yielded better normalization results (data not shown).

참고문헌 (21)

Chen, X., Cheung, S. T., So, S., Fan, S. T., Barry, C., Higgins, J., Lai, K. M., Ji, J., Dudoit, S., Ng, I. O., Van De Rijn, M., Botstein, D. and Brown, P. O. (2002) Gene expression patterns in human liver cancers. Mol. Biol. Cell 13, 1929-1939.

상세보기
Gibson, U. E., Heid, C. A. and Williams, P. M. (1996) A novel method for real time quantitative RT-PCR. Genome Res. 6, 995-1001.

상세보기
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D. and Lander, E. S. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531-537.

상세보기
Graveel, C. R., Jatkoe, T., Madore, S. J., Holt, A. L. and Farnham, P. J. (2001) Expression profiling and identification of novel genes in hepatocellular carcinomas. Oncogene 20, 2704-2712.

상세보기
Hamadeh, H. K., Bushel, P. R., Jayadev, S., DiSorbo, O., Bennett, L., Li, L., Tennant, R., Stoll, R., Barrett, J. C., Paules, R. S., Blanchard, K. and Afshari, C. A. (2002) Prediction of compound signature using high density gene expression profiling. Toxicol. Sci. 67, 232-240.

상세보기
Heid, C. A., Stevens, J., Livak, K. J. and Williams, P. M. (1996) Real time quantitative PCR. Genome Res. 6, 986-994.

상세보기
Khimani, A. H., Mhashilkar, A. M., Mikulskis, A., O'Malley, M., Liao, J., Golenko, E. E., Mayer, P., Chada, S., Killian, J. B. and Lott, S. T. (2005) Housekeeping genes in cancer: normalization of array data. Biotechniques 38, 739-745.

상세보기
Kim, J. W. and Wang, X. W. (2003) Gene expression profiling of preneoplastic liver disease and liver cancer: a new era for improved early detection and treatment of these deadly diseases? Carcinogenesis 24, 363-369.

상세보기
Kim, S. and Kim, T. (2003) Selection of optimal internal controls for gene expression profiling of liver disease. Biotechniques 35, 456-460.

상세보기
Kim, S. and Park, Y. M. (2005) Specific gene expression patterns in liver cirrhosis. Biochem. Biophys. Res. Commun. 334, 681-688.

상세보기
Kim, S., Shi, H., Lee, D. K. and Lis, J. T. (2003) Specific SR protein-dependent splicing substrates identified through genomic SELEX. Nucleic Acids Res. 31, 1955-1961.

상세보기
Lee, J. S. and Thorgeirsson, S. S. (2002) Functional and genomic implications of global gene expression profiles in cell lines from human hepatocellular cancer. Hepatology 35, 1134-1143.

상세보기
Schena, M., Shalon, D., Davis, R. W. and Brown, P. O. (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467-470.

상세보기
Schmitt, A. O., Specht, T., Beckmann, G., Dahl, E., Pilarsky, C. P., Hinzmann, B. and Rosenthal, A. (1999) Exhaustive mining of EST libraries for genes differentially expressed in normal and tumour tissues. Nucleic Acids Res. 27, 4251-4260.

상세보기
Suh, Y. J., Yang, M. H., Yoon, S. J. and Park, J. H. (2006) GEDA: new knowledge base of gene expression in drug addiction. J. Biochem. Mol. Biol. 39, 441-447.

원문보기 상세보기
Suzuki, T., Higgins, P. J. and Crawford, D. R. (2000) Control selection for RNA quantitation. Biotechniques 29, 332-337.

상세보기
Szabo, A., Perou, C. M., Karaca, M., Perreard, L., Quackenbush, J. F. and Bernard, P. S. (2004) Statistical modeling for selecting housekeeper genes. Genome Biol. 5, 59.

상세보기
Thellin, O., Zorzi, W., Lakaye, B., De Borman, B., Coumans, B., Hennen, G., Grisar, T., Igout, A. and Heinen, E. (1999) Housekeeping genes as internal standards: use and limits. J. Biotechnol. 75, 291-295.

상세보기
Vandesompele, J., De Preter, K., Pattyn, F., Poppe, B., Van Roy, N., De Paepe, A. and Speleman, F. (2002) Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 3, 34.
Warrington, J. A., Nair, A., Mahadevappa, M. and Tsyganskaya, M. (2000) Comparison of human adult and fetal expression and identification of 535 housekeeping/maintenance genes. Physiol. Genomics 2, 143-147.

상세보기
Zhong, J., Wang, Y., Qiu, X., Mo, X., Liu, Y., Li, T., Song, Q., Ma, D. and Han, W. (2006) Characterization and expression profile of CMTM3/CKLFSF3. J. Biochem. Mol. Biol. 39, 537-545.

원문보기 상세보기

저자의 다른 논문 :

LOADING...

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증