[논문]베이지안 순서형 프로빗 준모수 회귀 모형 : 국민건강영양조사 2016 자료를 통한 흡연양태와 커피섭취 간의 관계 분석

이다솜; 이은지; 조성일; 최태련

doi:10.5351/kjas.2020.33.1.025

베이지안 순서형 프로빗 준모수 회귀 모형 : 국민건강영양조사 2016 자료를 통한 흡연양태와 커피섭취 간의 관계 분석
Bayesian ordinal probit semiparametric regression models: KNHANES 2016 data analysis of the relationship between smoking behavior and coffee intake 원문보기

응용통계연구 = The Korean journal of applied statistics, v.33 no.1, 2020년, pp.25 - 46

이다솜 , 이은지 (고려대학교 통계학과) , 조성일 (전북대학교 통계학과 (응용통계연구소)) , 최태련 (고려대학교 통계학과)

초록
AI-Helper

본 논문에서는 Bayesian spectral analysis regression (BSAR) 방법론을 이용한 베이지안 순서형 프로빗 준모수 회귀모형에 대해서 고찰한다. 순서형 프로빗 회귀모형은 순서가 있는 범주형 자료를 모형화하는 방법으로, 정규 분포의 분포함수의 역함수인 프로빗 연결함수를 이용해 각 범주의 확률과 설명변수을 연결함으로써 반응변수의 확률을 모형화한다. 베이지안 프로빗 회귀 모형은 정규 분포를 따르는 잠재변수를 도입함으로써 사후 분포 도출을 용이하게 하고, 절단점에 따라 나뉘어지는 잠재변수들의 값에 따라서 반응 변수들이 범주화된다. 본 논문에서는 이러한 잠재 변수 방법을 확장해 BSAR 방법론에 기반하여 단조증가/감소와 같은 형태제약을 반영할 수 있는 베이지안 이항형 및 순서형 프로빗 준모수 회귀모형에 대해 연구한다. 모의실험을 통하여 이항형 프로빗 준모수 회귀모형과 기존의 다른 모형들 간의 적합결과를 비교하고, 형태 제약에 따른 순서형 프로빗 준모수 회귀모형의 적합결과를 비교 분석하도록 한다. 아울러, 국민건강영양조사 제 7기 1차년도 (2016) 자료(Korean National Health and Nutrition Examination Survey (KNHANES), 2016)를 바탕으로, 본 논문에서 고찰한 이항형 및 순서형 프로빗 준모수 회귀모형을 적용하여, 흡연양태와 커피섭취 간의 관계에 대한 실증적 분석을 수행한다.

Abstract ▼ AI-Helper

This paper presents ordinal probit semiparametric regression models using Bayesian Spectral Analysis Regression (BSAR) method. Ordinal probit regression is a way of modeling ordinal responses - usually more than two categories - by connecting the probability of falling into each category explained by a combination of available covariates using a probit (an inverse function of normal cumulative distribution function) link. The Bayesian probit model facilitates posterior sampling by bringing a latent variable following normal distribution, therefore, the responses are categorized by the cut-off points according to values of latent variables. In this paper, we extend the latent variable approach to a semiparametric model for the Bayesian ordinal probit regression with nonparametric functions using a spectral representation of Gaussian processes based BSAR method. The latent variable is decomposed into a parametric component and a nonparametric component with or without a shape constraint for modeling ordinal responses and predicting outcomes more flexibly. We illustrate the proposed methods with simulation studies in comparison with existing methods and real data analysis applied to a Korean National Health and Nutrition Examination Survey (KNHANES) 2016 for investigating nonparametric relationship between smoking behavior and coffee intake.

주제어

표/그림 (15)

표 Table 3.1. Numerical summary of model comparison for three binary probit regression models
표 Table 3.2. Numerical summary of LPML, WAIC, and RMSE for BSAR with/without shape restrictions
표 Table 3.3. Numerical summary of model assessment for JAGAM and BSAR with/without shape restrictions under functions (2) and (3)
그림 Figure 3.1. Receiver operating characteristic curves of three binary probit regression models under function (3).
그림 Figure 3.2. Simulation data (x, z) with n = 500 for ordinal probit regression model with shape restrictions.
그림 Figure 3.3. Model ﬁtting of simulation data with function (1) based on ordinal probit Bayesian spectral analysis regression.
그림 Figure 3.4. Model ﬁtting of simulation data with function (2) based on ordinal probit Bayesian spectral analysis regression.
그림 Figure 3.5. Model ﬁtting of simulation data with function (3) based on ordinal probit Bayesian spectral analysis regression.
표 Table 3.4. Summary of LPML and WAIC for ordinal probit BSAR with/without shape restrictions
표 Table 4.1. Frequency table of coﬀee intake and smoking behavior
표 Table 4.2. Frequency table of smoking behavior with ordinal covariates
그림 Figure 4.1. Model ﬁtting with binary probit BSAR with shape restrictions : Coﬀee intake and smoking behavior with gender.
표 Table 4.3. Posterior estimates of β with binary probit BSAR (monotone increasing)
그림 Figure 4.2. Model ﬁtting with ordinal probit BSAR with shape restrictions: coﬀee intake and smoking behavior with gender.
표 Table 4.4. Posterior estimates of β with ordinal probit BSAR (monotone increasing)

질의응답

핵심어	질문	논문에서 추출한 답변
	BART란 무엇인가?	명목형 변수를 위한 비모 수적 베이지안 방법에 있어서는 Chipman 등 (2010)의 Bayesian additive regression tree (BART)가 대표적인 방법이라고 할 수 있다. BART는 정규화 사전 분포(regularization prior)를 사용하고, 반응 변수의 확률을 프로빗 연결함수를 통해 m개의 설명력이 약한 나무들의 합으로 표현하는 방식으로, R의 BART 패키지의 pbart함수를 통해 적합할 수 있다. BART를 통한 비모수적 접근 방법 외에, Wood (2017)에서는 mgcv패키지의 jagam함수를 통해 GAM 모형을 베이지안 방식으로 적합할 수 있도록 하였다.
	이항형 반응변수를 분류하는 방식은 무엇에 용이한가?	Albert와 Chib (1993)은 이항확률에 프로빗 연결함수를 직접 적용하는 방식 대신 선형회귀모형을 통해 설명된는 잠재변수 값에 따라 이항형 반응변수를 분류하는 방식을 채택하였다. 이러한 접근방식은 사전 분포와 사후 분포 간의 켤레성(conjugacy)으로 인해 사후 분포 도출에 용이하기 때문에 베이지안 범주형 자료 분석의 토대가 되었다. 명목형 변수를 위한 비모 수적 베이지안 방법에 있어서는 Chipman 등 (2010)의 Bayesian additive regression tree (BART)가 대표적인 방법이라고 할 수 있다.
	BSAR 방법이 매우 유용할 것이라고 판단되는 이유는?	또한, 순서형 범주를 적합하기 위한 스플라인 회귀 모형은 대해서는 Wood (2017)의 gam의 ocat family를 참조해 모형을 적합할 수 있지만, 현재 로짓 링크만 제한적으로 사용할 수 있기 때문에, 실용적인 면에 있어서는 다소 간의 한계를 보인다. 이에 반해, BSAR 방법은 가우지안 확률과정과 코사인 기 저를 바탕으로 한 다소 복잡한 적합방식이지만, 단조증가, 감소 뿐 아니라 오목,볼록, U자형, S자형 등 의 다양한 형태제약을 반영할 수 있으며 (Lenk와 Choi, 2017), R 패키지 bsamGP (Jo 등, 2019)에서 다양한 모형적합을 제공하기 때문에, 본 논문에서의 연구와 같은 실제 자료분석에 있어서 매우 유용할 것이라고 판단된다.

참고문헌 (32)

Agresti, A. (2013). Categorical Data Analysis (3rd ed), John Wiley & Sons, NJ.
Ahn, H. J., Gwak, J. I., Yun, S. J., Choi, H. J., Nam, J. W., and Shin, J. S. (2017). The influence of coffee consumption for smoking behavior, Korean Journal of Family Practice, 7, 218-222.
Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data, Journal of the American Statistical Association, 88, 669-679.

상세보기
Carmody, T. P., Brischetto, C. S., Matarazzo, J. D., O’Donnell, R. P., and Connor, W. E. (1985). Cooccurrent use of cigarettes, alcohol, and coffee in healthy, community-living men and women. Health Psychology, 4, 323.

상세보기
Chen, M. H. and Dey, D. K. (2000). Bayesian analysis for correlated ordinal data models. In Generalized Linear Models: A Bayesian Perspective (volume 5, pages 133-157), Dekker, New York.
Chipman, H. A., George, E. I., and McCulloch, R. E. (2010). BART: Bayesian additive regression trees, The Annals of Applied Statistics, 4, 266-298.

상세보기
Cho, K. S. (2013). Prevalence of hardcore smoking and its associated factors in Korea, Health and Social Welfare Review, 33, 603-628.
Clark, A., Georgellis, Y., and Sanfey, P. (2001). Scarring: The psychological impact of past unemployment, Economica, 68, 221-241.

상세보기
Cowles, M. K., Carlin, B. P., and Connett, J. E. (1996). Bayesian tobit modeling of longitudinal ordinal clinical trial compliance data with nonignorable missingness, Journal of the American Statistical Association, 91, 86-98.

상세보기
Geisser, S. and Eddy, W. F. (1979). A predictive approach to model selection, Journal of the American Statistical Association, 74, 153-160.

상세보기
Harris, M. N. and Zhao, X. (2007). A zero-inflated ordered probit model, with an application to modelling tobacco consumption, Journal of Econometrics, 141, 1073-1099.

상세보기
Hasegawa, H. (2010). Analyzing tourists' satisfaction: a multivariate ordered probit approach, Tourism Management, 31, 86-97.

상세보기
Hastie, T. J. and Tibshirani, R. J. (1990). Generalized additive models, Monographs on Statistics and Applied Probability (Vol 43), Chapman and Hall, London.
Jara, A., Hanson, T. E., and Lesaffre, E. (2009). Robustifying generalized linear mixed models using a new class of mixtures of multivariate Polya trees, Journal of Computational and Graphical Statistics, 18, 838-860.

상세보기
Jo, S., Choi, T., Park, B., and Lenk, P. (2019). bsamGP: An R package for Bayesian spectral analysis models using Gaussian process priors, Journal of Statistical Software, 90, 1-41.
Jung, K. W., Won, Y. J., Kong, H. J., Lee, E. S., and Community of Population-Based Regional Cancer Registries (2018). Cancer statistics in Korea: incidence, mortality, survival, and prevalence in 2015, Cancer Research and Treatment: Official Journal of Korean Cancer Association, 50, 303-316.

상세보기
Kang, E., Lee, J. A., and Cho, H. J. (2017). Characteristics of hardcore smokers in South Korea from 2007 to 2013, BMC Public Health, 17, 521.

상세보기
Kim, M. (2015). Semiparametric approach to logistic model with random intercept, Korean Journal of Applied Statistics, 28, 1121-1131.

원문보기 상세보기
Kockelman, K. M. and Kweon, Y. J. (2002). Driver injury severity: an application of ordered probit models, Accident Analysis & Prevention, 34, 313-321.

상세보기
Koop, G., Poirier, D. J., and Tobias, J. L. (2007). Bayesian Econometric Methods (Econometric Exercises), Cambridge University Press, Cambridge.
Korean Centers for Disease Control and Prevention (2016). The Seventh Korea National Health and Nutrition Examination Survey (KNHANES VII-1).
Lee, J. H. and Heo, T. Y. (2014). A study of effect on the smoking status using multilevel logistic model, Korean Journal of Applied Statistics, 27, 89-102.

원문보기 상세보기
Lenk, P. J. and Choi, T. (2017). Bayesian analysis of shape-restricted functions using Gaussian process priors, Statistica Sinica, 27, 43-69.

상세보기
Moon, S. (2016). Types of smoking statuses and associated factors among Korean wageworkers, Journal of Korean Public Health Nursing, 30, 495-511.

원문보기 상세보기
Nelder, J. A. and Wedderburn, R. W. (1972). Generalized linear models, Journal of the Royal Statistical Society. Series A (General), 135, 370-384.

상세보기
Park, J. C., Kim, M. H., and Lee, J. Y. (2018). Nomogram comparison conducted by logistic regression and naive Bayesian classifier using type 2 diabetes mellitus (T2D), Korean Journal of Applied Statistics, 31, 573-585.
Seok, H. E., Bang, H. J., and Kim, S. Y. (2017). Bayesian analysis of KBSID-III adaptive behavior data using a zero-inflated ordered probit model, Korean Journal of Psychology: General, 36, 215-239.

상세보기
Sha, N. and Dechi, B. O. (2019). A Bayes inference for ordinal response with latent variable approach, Stats, 2, 321-331.

상세보기
Tan, Y. V. and Roy, J. (2019). Bayesian additive regression trees and the general BART model, Statistics in Medicine, 38, 5048-5069.

상세보기
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory, Journal of Machine Learning Research, 11, 3571-3594.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R (2nd ed), CRC Press, Florida.
Xie, Y., Zhang, Y., and Liang, F. (2009). Crash injury severity analysis using Bayesian ordered probit models, Journal of Transportation Engineering, 135, 18-25.

상세보기

저자의 다른 논문 :

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format