[보고서]유전자집합 분석을 위한 다양한 통계기법 개발

[국가R&D연구보고서] 유전자집합 분석을 위한 다양한 통계기법 개발
Various statistical algorithms for analyzing gene sets and their core genes 원문보기

보고서 정보
주관연구기관	명지대학교 MyongJi University
보고서유형	최종보고서
발행국가	대한민국
언어	한국어
발행년월	2014-05
과제시작연도	2013
주관부처	미래창조과학부 Ministry of Science, ICT and Future Planning
과제관리전문기관	한국연구재단 National Research Foundation of Korea
등록번호	TRKO201500004079
과제고유번호	1711005847
사업명	중견연구자지원
DB 구축일자	2015-05-23
키워드	유전자 복제수변이.유전자 집합.길이편향.Copy number variation.false discovery rate.FDRseq.gene set.length bias.RNA-Seq.separate fdr procedure.weighted fdr procedure.wFDR.
DOI	https://doi.org/10.23000/TRKO201500004079

초록 ▼

연구의 목적 및 내용
이번연구는 크게 두 가지 방향으로 진행 되었다. 과거 약 10여년 동안, 마이크로어레이는 유전자 발현 분석에 있어서 필수적 선택이었습니다. 그러나 최근 몇 년간, RNA-Seq가 마이크로어레이를 대체하는 매우 경쟁력 있는 기술이 되었다. 이 기술의 수요는 급격히 증가하리라고 예상된다. 이에 따라 첫 번째 방향은 RNA-Seq와 관련된 개별 유전자(gene) 분석과 유전자 집합 (gene set) 분석을 위한 다양한 통계적 기법들이 개발 하였다. 두 번째는 방향은 유전자 복제수변이 (Copy number variation)들의 위치 추정을 위한 통계기법을 개발 하였다. 건강한 사람들에게서 나타나는 유전자 복제수변이는 대개 짧고 서로 멀리 떨어져 있는데 비해 암에 걸린 사람들은 상대적으로 길고, 어떤 경우는 전체 염색체에 걸쳐서 나타나기도 합니다. 더 길고, 더 많은 유전자 복제수변이는 많은 암 종류에서 전형적으로 나타나고, 몇몇 암의 발생 과정을 만든다고 생각된다. 이번 연구에서는 이런 발암 유전자 복제수변이들의 위치를 찾기 위해서, Bayesian information factor(BIC)에 바탕을 둔 circular binary segmentation 기법을 개발하였다. 이 기법은 유전자 복제수변이 들의 위치 추정을 위한 계산에 있어서 효과적이고 매우 단순하다.
연구결과
이번 연구 결과는 bmc bioinformatics (2012), evolutionary bioinformatics (2013), computational statistics (in press)등에 게제 되었으며, 이런 연구 결과들을 활용하기 위해 다양한 R 프로그램들도 함께 개발하였다. 웹사이트(http://home.mju.ac.kr/home/index.action?siteId=tyang)에 무료로 공개된 wFDR 프로그램은 RNA-Seq 유전자들의 정확한 유의성 검정을 위해 개발되었으며, FDRseq 프로그램은 RNA-Seq 유전자 집합 분석을 위한 프로그램이다.
연구결과의 활용계획
이론적 연구뿐 아니라 이에 연관된 다양한 R 프로그램들도 개발하였다. 향후 6개월간 보다 유용하고 사용하기 용이한 R 프로그램을 추가적으로 개발하여 무료로 생명정보학 관련web site인 Bioconductor을 통하여 전세계 유전 정보학 연구자에게 무료로 배포 할 예정이다.

Abstract ▼

Purpose&contents
In this research, two main aspects have been studied. Over the past decade, microarrays have been the primary choice for genome-wide gene expression analysis. In recent years, RNA-seq has become a very competitive alternative to the microarray approach. First, I presented a new gene set analysis method of RNA-seq, called FDRseq, which can accurately calculate the statistical significance of a gene-set enrichment score by the grouped false-discovery rate. In addition, to determine the accurate list of significant genes in RNA-Seq, we proposed two multiple-testing procedures based on a weighted-FDR and a separate-FDR approach. These two methods use prior information on differential gene length while keeping the false-discovery rate (FDR) controlled at an appripriate level. Second, variation in DNA copy number, due to gains and losses of chromosome segments, is common. A main step for analyzing DNA copy number data is to identify amplified or deleted regions in individuals. To locate such regions, I proposed a circular binary segmentation procedure, which is based on a sequence of nested hypothesis tests, each using the Bayesian information criterion.
Result
The results of this research have been published in BMC bioinformatics (2012), Evolutionary Bioinformatics (2013), and computational statistics (in press). We have developed free R programs at http://home.mju.ac.kr/home/index.action?siteId=tyang.
In detail, the program, called wFDR, was developed to determine the accurate list of significant genes in RNA-Seq; the program, called FDRseq, which can accurately calculate the statistical significance of a gene-set enrichment score by the grouped false-discovery rate.
Expected Contribution
We have developed various statistical tools for RNA-Seq and Copy number variation. In addition, we provided various free R-programs, which are useful for researchers of bioinformatics or statistical genetics. For near future, we will develop more user-friendly R-programs, and distribute them via Bioconductor.

목차 Contents

핵심연구사업 최종보고서(평가용) ... 1
목 차 ... 3
연구계획 요약문 ... 4
연구결과 요약문 ... 5
한글요약문 ... 5
SUMMARY ... 6
연구내용 및 결과 ... 7
1. 연구개발과제의 개요 ... 7
2. 국내외 기술개발 현황 ... 11
3. 연구수행 내용 및 결과 ... 14
4. 목표달성도 및 관련분야에의 기여도 ... 36
5. 연구결과의 활용계획 ... 37
6. 연구과정에서 수집한 해외과학기술정보 ... 38
7. 주관연구책임자 대표적 연구실적 ... 41
8. 참고문헌 ... 41
9. 연구성과 ... 43
10. 기타사항 ... 43
끝페이지 ... 51

표/그림 (1)

표 단일염기다형성 (SNP) - DNA 염기서열에서 하나의 염기서열 (A, T, G, C)의 차이를 보이는 유전적 변화 또는 변이

과제명(ProjectTitle) :	-
연구책임자(Manager) :	-
과제기간(DetailSeriesProject) :	-
총연구비 (DetailSeriesProject) :	-
키워드(keyword) :	-
과제수행기간(LeadAgency) :	-
연구목표(Goal) :	-
연구내용(Abstract) :	-
기대효과(Effect) :	-

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 제목(한글), 저자명(한글), 발행일자, 전자원문, 초록(한글), 초록(영문) 관리번호, 제목(한글), 제목(영문), 저자명(한글), 저자명(영문), 주관연구기관(한글), 주관연구기관(영문), 발행일자, 총페이지수, 주관부처명, 과제시작일, 보고서번호, 과제종료일, 주제분류, 키워드(한글), 전자원문, 키워드(영문), 입수제어번호, 초록(한글), 초록(영문), 목차
저장형식	Text(ASCII format) Excel format
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

[국가R&D연구보고서] 유전자집합 분석을 위한 다양한 통계기법 개발
Various statistical algorithms for analyzing gene sets and their core genes 원문보기

초록 ▼

Abstract ▼

목차 Contents

표/그림 (1)

표/그림 (1)

참고문헌 (25)

연구과제 타임라인

관련 콘텐츠

원문 보기

이 보고서와 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

[국가R&D연구보고서] 유전자집합 분석을 위한 다양한 통계기법 개발 Various statistical algorithms for analyzing gene sets and their core genes 원문보기

초록 ▼

Abstract ▼

목차 Contents

표/그림 (1)

표/그림 (1)

참고문헌 (25)

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

관련 콘텐츠

원문 보기

이 보고서와 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

[국가R&D연구보고서] 유전자집합 분석을 위한 다양한 통계기법 개발
Various statistical algorithms for analyzing gene sets and their core genes 원문보기