[논문]대장 선종(大腸 腺腫)과 관련이 있는 위험 인자들에 대한 분석모형 비교

김수연

대장 선종(大腸腺腫)과 관련이 있는 위험 인자들에 대한 분석모형 비교 원문보기

김수연 (중앙대학교 대학원 통계학과 통계학전공 국내석사)

초록 ▼
AI-Helper

의학(생물)통계학의 중요한 역할 중 하나는 통계적 사고와 방법을 통하여 질환이나 증상과 같은 의학적 결과의 원인이 되는 여러 요인들과의 연관성을 찾는 것이다. 본 연구는 830명을 대상으로 하였으며 dataset의 종속변수는 이산형변수로, 변수값은 대장선종군과 정상군이고 독립변수들 중 Sex(성별), CaFH(암가족력)는 이산형 변수였으며 Age(연령), BMI(체질량지수), Tchol(총 콜레스테롤), CEA(암배아성 항원)은 연속형 변수였다. R 프로그램(Ver3.2.2.)을 이용하여 판별분석, 프로빗분석, 로지스틱 회귀분석, 신경망모형분석으로 적합성 검정, 각 군의 정분류율, 민감도, 특이도, 선종예측도, 정상예측도를 탐색해 보았다. 선형판별분석을 시행하기 전에 주어진 dataset의 정규성 여부를 확인하기 위해 kurtosis(첨도)와 Skewness(왜도)를 구하고 종속변수에 의해 범주화되는 집단들의 분산-공분산행렬(variance -covariance matrices)이 동일한지를 검증하기 위해 BOX’s M검정을 시행하였으나 선형판별분석을 적용하기 위한 다변량 정규분포의 가정을 충족시키지 못했기 때문에 이차판별분석을 사용하여 각 군의 실제값과 예측값에 대한 정분류율을 구한 결과, 71.33%였다. 선형판별분석과는 달리, 프로빗모형이나 로지스틱 회귀모형은 data의 정규성 가정이나 비교하려는 변수들에 대한 등분산성 가정이 필요하지 않은 모형이며 주어진 dataset에 대한 프로빗 모형과 로지스틱 회귀모형의 정분류율은 모두 74.46%였다. 프로빗 분석과 로지스틱 회귀분석에서 독립변수들에 대한 주효과 외에 독립변수들의 이차교차항을 추가하여 프로빗분석과 로지스틱 회귀분석을 시행해보았으나, 두 모형에서 이차교차항의 회귀계수들은 모두 통계적으로 의미가 없었다. 마지막으로 살펴본 신경망모형에서는 은닉노드의 수를 2개부터 14개까지 다양하게 설정하여 신경망모형을 구축하였다. 그 결과, 적합한 은닉노드의 갯수라고 생각되는 6개에서 8개의 은닉노드로 구축한 신경망모형의 정분류율은 78.55% ~ 80.96%로 나타났다. 각 모형들에 대한 민감도는 37.109% ~ 64.844%, 특이도는 74.216% ~ 91.115%, 선종예측도는 52.866% ~ 72.477%, 정상예측도는 76.462% ~ 83.987%이다. 민감도의 비율이 그리 높지 않은 이유는 주어진 독립변수들만으로는 대장 선종의 발생에 대한 원인을 모두 다 설명할 수 없기 때문인 것으로 생각된다. 왜냐하면 대장 선종 발생에는 이 연구의 독립변수들 외에도 유전적 원인이나 사회적, 환경적 요인이 영향을 줄 수 있기 때문이다. 특이도가 이차판별분석모형을 제외한 다른 모형에서 90% 이상으로 선종이 없는 정상인을 정상인으로 분류한 비율이 높았고 독립변수들에 대한 선종군과 정상군 간 차이 비교결과에서도 선종군이 정상군보다 모든 독립변수에 대하여 유의수준 0.05에서 유의한 차이가 있었으므로, 본 연구의 독립변수들은 모두 대장 선종의 발생에 영향을 줄 수 있는 변수들이라고 판단된다. 이상의 결과를 종합해볼 때, 본 연구에서는 은닉 노드의 수가 8개인 신경망분석모형이 주어진 830명의 dataset에 가장 적합한 분류와 예측 분석모형이라고 판단된다. 의학 관련 데이터는 정규분포의 가정을 충족시키지 못하는 경우가 많고 분석하고자 하는 변수의 종류도 이산형 변수와 연속형 변수가 복합적으로 존재하는 경우가 많기 때문에 통계분석을 할 때 가장 중요한 것은 의학적 근거에 맞게 실제 data를 적합시킬 수 있는 통계분석모형을 찾는 것이라고 생각한다. 또한, 본 연구에서 사용한 독립변수들 외에, 대장 선종 발생에 영향을 주는 다른 독립변수들을 추가하고 표본수도 늘린다면 더 정확한 분류도와 예측도를 얻을 수 있을 것으로 판단된다.

Abstract ▼ AI-Helper

One of the important roles of medical (bio)statistics is to find associations among the factors in medical outcomes, such as diseases and symptoms, based on statistical thinking and methodologies. This study uses data on 830 participants. The dependent variable is a discrete variable, indicating whether a participant is part of the colonic adenoma group or the normal group without abnormal developments. Of the independent variables, Sex and CaFH are discrete variables, while age, BMI, Tchol, and CEA are continuous variables. The computer program R was used for exploring validity testing, the accuracy, sensitivity, specificity, adenoma predictability, and normalcy predictability, through discriminant analysis, probit analysis, logistic regression analysis, and artificial neural network analysis. Before discriminant analysis, the kurtosis and skewness of the provided data set were determined, followed by Box’s M test to verify whether the categorized groups’ variance and covariance were the same, based on the dependent variables. However, the multivariate normal distribution assumptions could not be satisfied in order to apply a linear discrimination analysis. Thus, a quadratic discriminant analysis was employed to obtain the accuracy using each group’s actual value and predicted value, which was 71.33%. Unlike the discriminant analysis, the probit model and logistic regression model do not assume normality in the data or homoscedasticity of the compared variables. The accuracy of the probit model and logistic regression model for the given data set was 74.46%. In addition, both the probit analysis and logistic regression analysis were implemented on the main effect and the added interaction terms of the independent variables. However, the regression coefficient of the interaction terms in the two models did not have statistical meaning. In the neural network model, examined at the end, 2 to 14 hidden nodes were set. As a result, with the anticipated suitable number of hidden nodes of between six and eight, the accuracy of the established neural network model was between 78.55% and 80.96%. In addition, comparisons of the sensitivity, specificity, adenoma predictability, and normalcy predictability results among all the models showed the following: sensitivity of 37.109% ~ 64.844%; specificity of 74.216% ~ 91.115%; adenoma predictability of 52.866% ~ 72.477%; and normalcy predictability of 76.462% ~ 83.987%. The sensitivity ratio is not particularly high because it is impossible to explain all the causes of colonic adenomas using only the given independent variables. There are factors outside of these variables that can influence the occurrence of colonic adenomas, including genetic causes and social and environmental factors. With regard to specificity, with the exception of the quadratic discriminant analysis model, the models had a high rate of over 90% when classifying healthy participants correctly. Even when comparing the independent variables across groups, there were statistical differences between each of the independent variables for the adenomatous polyp group and the normal group at the 5% level of significance. Thus, this study’s independent variables do influence the occurrence of colonic adenomatous polyp. From the above results, the neural network model with eight hidden nodes is believed to be the most suitable analysis model for classification and prediction, based on the given data set of 830 participants. In many instances, because medical data do not satisfy the assumption of a normal distribution and as there are many types of variables to be analyzed, discrete variables and continuous variables exist in a complex relationship. Therefore, finding a statistical analysis model that can correctly adapt the actual data to the medical basis is most important. In addition, if additional independent variables that influence the occurrence of colonic adenomatous polyp are included and the sampling size is increased, more accurate classification and prediction rates may be obtained.

학위논문 정보

저자	김수연
학위수여기관	중앙대학교 대학원
학위구분	국내석사
학과	통계학과 통계학전공
지도교수	박상규
발행연도	2016
총페이지	iii, 50장
언어	kor
원문 URL	http://www.riss.kr/link?id=T14021245&outLink=K
정보원	한국교육학술정보원

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명(한글), 저자명(한글), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문) 관리번호, 논문명(한글), 논문명(영문), 저자명(한글), 저자명(영문), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문)
저장형식	Text(ASCII format) Excel format
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

대장 선종(大腸腺腫)과 관련이 있는 위험 인자들에 대한 분석모형 비교 원문보기

초록 ▼
AI-Helper

Abstract ▼ AI-Helper

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

대장 선종(大腸 腺腫)과 관련이 있는 위험 인자들에 대한 분석모형 비교 원문보기

초록 ▼ 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

대장 선종(大腸腺腫)과 관련이 있는 위험 인자들에 대한 분석모형 비교 원문보기

초록 ▼
AI-Helper