[보고서]기계학습이론을 이용한 알츠하이머의 유전자 변이 기능 예측

이영희

기계학습이론을 이용한 알츠하이머의 유전자 변이 기능 예측
Machine learning-based approaches for annotating genetic variants in alzheimer’s disease 원문보기

보고서 정보
주관연구기관	유타대학교아시아캠퍼스
연구책임자	이영희
보고서유형	최종보고서
발행국가	대한민국
언어	대한민국
발행년월	2022-03
과제시작연도	2020
주관부처	과학기술정보통신부 Ministry of Science and ICT
등록번호	TRKO202200015028
과제고유번호	1711145856
사업명	개인기초연구(과기정통부)(R&D)
DB 구축일자	2022-11-09
키워드	유전자변이.선택적 스플라이싱.알츠하이머.유전자기능예측.기계학습.Genetic variants.Alternative splicing.Alzheimer’s disease.Genetic variant annotation.Machine learning.

초록 ▼

□ 연구개요
전체 게놈 시퀀싱 (Whole Genome Sequencing, WGS) 은 유전체 전체의 염기서열을모두 읽어내어 유전자 변이를분석, 식별, 해석하는 방법이다. 전체 게놈 시퀀싱 방법은 유전체분야에서 새로운치료법의 발견및 개발에 사용될 수 있는 빅 데이터를 생성할 뿐만 아니라, 질병 진단을 위한도구로서의 기능을포함하여 임상학적응용에 큰가치가 있다.그러나 전체게놈 시퀀싱 방법으로 찾아진유전자 변이들의기능을 밝히는방법은 여전히개발 중에 있다. 따라서 본 연구에서는 유전자 변이예측 방법론에서 특히, 아래와 같은 질문에 답을 하고자 한다. 1.동의적 (synonymous) 변이 와 인트론 (intron) 에있는 유전자 변이들의 기능은무엇인가? 2.희귀변이/저빈도변이 (rare variant/lowfrequency) 들을 어떻게 감지 하는가

□ 연구 목표대비 연구결과
정성적 연구개발성과
- Predictive variable refinement
- Targeted variable refinement
- 6가지 Machine learning 방법을 이용하여 예측도 계산 완성
- 두 개의 서로 다른 알츠하이머 데이터 셋을 적용하여 모델 예측도에 대한 reproducibility 확인함
세부 정량적 연구개발성과
- 본 과제와 관련된 논문 2편 게재, 1편은 심사 중

□ 연구개발성과의 활용 계획 및 기대효과(연구개발결과의 중요성)
1. Synonymous 혹은 intronic 유전자 변이의 기능을 예측하고 기능적 변이를 식별 할 수 있다.
2. 유전자 변이데이터로 전사체를 예측할 수 있다.
3. 알츠하이머의 분자메커니즘을 이해하기 위한 생물정보학적 데이터가 제공되어 새로운 치료를 위한 진단 및 바이오 마커 개발을 위한 전략에 사용 가능한 기계학습 방법제시
4. 후속 과제 개발1 : 유전자 변이만은 기반으로 하여 전사체를 예측한 후 생물학적 요인 (나이, 성별 등)의 고려와 함께 표현형 (질병 등..) 을 예측하는 방법으로의 개발이 기대됨.
5. 후속 과제 개발2: 개발된 기계학습방법은 다른 질병의 유전자 변이 예측에 활용. 활용 시 biological context (예를 들면, 질병 특이적-, 조직 특이적 정상인과의 차이점) 을 고려하는 방법의 고도화.

(출처 : 연구결과 요약문 2p)

Abstract ▼

1. 연구개발과제의 개요
Alternative splicing (AS) is a key mechanism that generates unusual pre-mRNA splicing isoforms; these encode many key proteins that are associated with human diseases due to missing particular exons or retaining undesired intron sequences, thereby preventing the isoform from encoding a key functional part of the protein product. Most of all, splicing events provide a potential basis for developing new therapeutic opportunities (i.e. antisense oligonucleotides, spliceosome-mediated RNA trans-splicing, siRNAs, and splicing-switch oligonucleotides). In addition, the comprehensive interpretation of complex disease pathology, especially in Alzheimer’s disease (AD), requires multi-layer analyses that can account for complicated biological mechanisms. Multi-omics data, such as genomics and transcriptomics, and its integration can contribute to broadening the understanding of mechanisms underlying complex disease. For instance, extensive transcriptional alterations occur in the course of AD progression, and AD progression can be affected by disruption of the balance of transcriptional isoform expression via AS. In fact, brain tissue has one of the most tissue specific AS patterns, in which AS contributes to a process of nervous development, including neuronal migration and axon guidance.
Reduced costs of whole-genome sequencing (WGS) have inspired researchers to make complex medical diagnoses, which in turn paved the way for application of artificial intelligence in the domain of genomics. The field of genomics generates large data sets that can be used for the discovery and development of potential new therapeutics. Thus, machine learning algorithms are highly valuable for accelerating analysis and reducing the time it takes to get from information to insight. It is evident that most recent research proves the usefulness of machine learning algorithms for the analysis of complex omics data. Therefore, our working hypothesis is that sequence-based features associated with splicing machinery will be informative variables for the functional annotation of genetic variants using machine learning techniques.
Our overarching hypothesis is that analysis of alternative splicing can improve our detection power for identifying and functionally annotating various types of genetic variants without constraining frequency (somatic and rare germline variants), functionality (synonymous variants), or location (intronic variants); while a greater number of variants will be captured, more are potentially functional. This is a molecular mechanism-driven approach that utilizes expression data, which will be useful for summing genetic variants within a biologically informed unit. To study AD, we will utilize WGS, RNA-Seq, and clinical information from the Synapse database (www.synapse.org). Data were generated from three independent cohorts in the Accelerating Medicines Partnership-Alzheimer’s Disease (AMP-AD) project, Mayo Clinic, ROS and MAP (ROSMAP), and the Mount Sinai Brain Bank (MSBB). The data includes seven brain regions: cerebellum, temporal cortex, dorsolateral prefrontal cortex, frontal pole, inferior frontal gyrus, p arahippocampal gyrus, and superior temporal gyrus. With this data, our aims will be:
Aim 1. Develop a machine learning model to predict genetic variants based on sequence-driven splicing features
Aim 2. Evaluate the clinical value of the predicted genetic variants using AD phenotypic data.
The proposed comprehensive and translational study, which targets alternative splicing as a strategy for integrating multi-omics and AD-related endophenotype data, is highly innovative in concept and approach; it has the potential to enable us to gain deeper mechanistic insight into the molecular mechanisms of AD and ultimately to help identify new therapeutic targets and diagnostic/biomarker strategies.

(source : 본문 1. 연구개발과제의 개요 4p)

목차 Contents

COVER ... 1
연구결과 요약문 ... 2
목차 ... 3
1. 연구개발과제의 개요 ... 4
2. 연구개발과제의 수행 과정 및 수행 내용 ... 4
3. 연구개발과제의 수행 결과 및 목표 달성 수준 ... 7
1) 정성적 연구개발성과(연구개발결과) ... 7
2) 세부 정량적 연구개발성과 ... 8
3) 목표 달성 수준 ... 9
4) 목표 미달 시 원인 분석 ... 9
4. 연구개발성과의 관련 분야에 대한 기여 정도(연구개발결과의 중요성) ... 10
5. 연구개발성과의 관리 및 활용 계획 ... 10
6. 참고문헌 ... 11
[붙임1] 세부 정량적 연구개발성과 ... 15
[붙임2] 연구책임자 대표적 연구실적 및 증빙(요약문 및 사본) ... 16
End of Page ... 28

표/그림 (4)

표 Aggregation of low-frequency variants based on splicing machinary in TREM2 gene
표 XGBoost Feature Importance using Shapley Additive Explanations: A) AD Cases; B) CN Cases
표 Evaluation result of prediction models on ROSMAP dataset using 5-fold cross validation in the case of K-means clustering without oversampling
표 Model comparisons using ROC Curves on ROSMAP Data: A) AD Cases; B) CN Cases; The models are trained using 70% data and the performances values are computed using 30% for test data

과제명(ProjectTitle) :	-
연구책임자(Manager) :	-
과제기간(DetailSeriesProject) :	-
총연구비 (DetailSeriesProject) :	-
키워드(keyword) :	-
과제수행기간(LeadAgency) :	-
연구목표(Goal) :	-
연구내용(Abstract) :	-
기대효과(Effect) :	-

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 제목(한글), 저자명(한글), 발행일자, 전자원문, 초록(한글), 초록(영문) 관리번호, 제목(한글), 제목(영문), 저자명(한글), 저자명(영문), 주관연구기관(한글), 주관연구기관(영문), 발행일자, 총페이지수, 주관부처명, 과제시작일, 보고서번호, 과제종료일, 주제분류, 키워드(한글), 전자원문, 키워드(영문), 입수제어번호, 초록(한글), 초록(영문), 목차
저장형식	Text(ASCII format) Excel format
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

기계학습이론을 이용한 알츠하이머의 유전자 변이 기능 예측
Machine learning-based approaches for annotating genetic variants in alzheimer’s disease 원문보기

초록 ▼

Abstract ▼

목차 Contents

표/그림 (4)

표/그림 (4)

참고문헌 (25)

연구과제 타임라인

관련 콘텐츠

원문 보기

이 보고서와 함께 이용한 콘텐츠

연관된 기능

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

기계학습이론을 이용한 알츠하이머의 유전자 변이 기능 예측 Machine learning-based approaches for annotating genetic variants in alzheimer’s disease 원문보기

초록 ▼

Abstract ▼

목차 Contents

표/그림 (4) 모든 표/그림 보기

표/그림 (4) 슬라이드로 보기

참고문헌 (25)

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

관련 콘텐츠

원문 보기

이 보고서와 함께 이용한 콘텐츠

연관된 기능

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

기계학습이론을 이용한 알츠하이머의 유전자 변이 기능 예측
Machine learning-based approaches for annotating genetic variants in alzheimer’s disease 원문보기

표/그림 (4)

표/그림 (4)