[논문]경험적 베이지안 방법을 이용한 결측자료 연구

윤용화; 최보승

doi:10.5351/kjas.2014.27.6.1003

경험적 베이지안 방법을 이용한 결측자료 연구
Analysis of Missing Data Using an Empirical Bayesian Method 원문보기

응용통계연구 = The Korean journal of applied statistics, v.27 no.6, 2014년, pp.1003 - 1016

초록
AI-Helper

조사를 통하여 수집된 자료에 기반하여 분석을 수행하는데 있어서 결측값에 대한 적절한 대체 방법은 보다 정확한 결과를 얻기 위한 매우 중요한 절차이다. 본 연구에서는 모형에 기반하여 결측자료에 대한 대체방법과 모형 추정방법을 다루었다. 특히 최대우도추정 방법의 적용에서 발생할 수 있는 변방값 문제(bounday soluntion problem)를 해결하기 위하여 베이지안 방법을 적용하였다. 분석된 결과를 바탕으로 하여 예측을 수행한 후 결측체계에 따른 정확성 비교를 수행하여 결측체계에 따른 결측모형의 선택 문제를 다루었다. 예측의 정확도를 측정하기 위하여 Bautista 등 (2007)이 제안한 MWPE(modified within precinct error) 이용하여 비교를 수행 하였다. 본 연구에서 제시된 방법들은 2012년에 시행된 제 18대 대통령 선거 당일 시행된 출구조사의 자료를 적용하여 분석을 수행하였다. 분석 결과 임의결측체계의 가정에 따른 결과가 비임의체계 가정에 따른 결과보다 예측의 정확도가 더 높았다.

Abstract ▼ AI-Helper

Proper missing data imputation is an important procedure to obtain superior results for data analysis based on survey data. This paper deals with both a model based imputation method and model estimation method. We utilized a Bayesian method to solve a boundary solution problem in which we applied a maximum likelihood estimation method. We also deal with a missing mechanism model selection problem using forecasting results and a comparison between model accuracies. We utilized MWPE(modified within precinct error) (Bautista et al., 2007) to measure prediction correctness. We applied proposed ML and Bayesian methods to the Korean presidential election exit poll data of 2012. Based on the analysis, the results under the missing at random mechanism showed superior prediction results than under the missing not at random mechanism.

주제어

질의응답

핵심어	질문	논문에서 추출한 답변
	변방값(boundary solution) 문제란 어떠한 문제를 말하는가?	최대우도추정 방법을 이용하여 결측 자료에 대한 모수 추정을 수행할 때 무시할 수 없는 결측 체계를 가정하여 모수를 추정하는 경우 변방값(boundary solution) 문제가 발생할 수 있다. 이는 모수의 추정치가 모수 영역의 변방에 걸리게 되어 국소 최대값을 가지는 문제를 말한다 (Baker와 Laird, 1988). 변방값 문제가 발생하게 되면 추정치의 결과가 불안정 해지고 분산이 발산하는 문제가 발생한다 (Park과 Brown, 1994; Choi 등, 2009).
	변방값 문제가 발생하면 어떠한 문제가 나타나는가?	이는 모수의 추정치가 모수 영역의 변방에 걸리게 되어 국소 최대값을 가지는 문제를 말한다 (Baker와 Laird, 1988). 변방값 문제가 발생하게 되면 추정치의 결과가 불안정 해지고 분산이 발산하는 문제가 발생한다 (Park과 Brown, 1994; Choi 등, 2009). 또한 계수자료에 대한 결측치의 추정에 있어서 0의 값을 가지게 되거나 일방적으로 큰 값을 가지게 되는 문제가 발생하게 된다. Baker와 Laird (1988)은 2 × 2형태의 간단한 분할표 자료에 대하여 변방값 문제가 발생하는 조건을 제시하였다.
	여론조사를 시행하는 기본적인 목적은 무엇인가?	각종 여론조사를 시행하는 기본적인 목적은 조사의 대상이 되는 사람들이 가지고 있는 생각을 파악하고 예측하고자 하는데 있다. 특히 선거를 앞두고 시행되는 여러 사전조사에서는 예측이 더 중요한 목적이다.

참고문헌 (30)

Agresti, A. (2002). Categorical Data Analysis, Second edition, John Wiley & Sons Inc., New Jersey.
Baker, S. G. and Laird, N. M. (1988). Regression analysis for categorical variables with outcome subject to nonignorable nonresponse, Journal of the American Statistical Association, 83, 62-69.

상세보기
Baker, S. G., Rosenberger, W. F. and Dersimonian, R. (1992). Closed-form estimates for missing counts in two-way contingency tables, Statistics in Medicine, 11, 643-657.

상세보기
Bautista, R., Callegaro, M., Vera, J. A. and Abundis, F. (2007). Studying nonresponse in Mexican exit polls, International Journal of Public Opinion Research, 19, 492-503.

상세보기
Chib, S. (1995). Marginal likelihood from the Gibbs output, Journal of the American Statistical Association, 90, 1313-1321.

상세보기
Chib, S. and Jeliazkov, I. (2001). Marginal likelihood from the Metropolis-Hastings output, Journal of the American Statistical Association, 96, 270-281.

상세보기
Choi, B., Choi, J. W. and Park, Y. S. (2009). Bayesian methods for an incomplete two-way contingency table with application to the Ohio (Buckeye state polls), Survey Methodology, 35, 37-51.

상세보기
Choi, B. and Kim, G. M. (2012). A model selection method using EM algorithm for missing data, Journal of the Korean Data Analysis Society, 14, 767-779.
Choi, B., Kim, D. Y., Kim, K. W. and Park, Y. S. (2008). Nonignorable nonresponse imputation and rotation group bias estimation on the rotation sample survey, The Korean Journal of Applied Statistics, 21, 361-375.

원문보기 상세보기
Choi, B., Park, Y. S. and Lee, D. H. (2007). Election forecasting using pre-election survey data with nonignorable nonresponse, Journal of the Korean Data Analysis Society, 9, 2321-2333.
Clarke, P. S. (2002). On boundary solutions and identifiability in categorical regression with non-ignorable non-response, Biometrical Journal, 44, 701-717.
Dempster, A. P., Laird, N. M. and Rubin, D. M. (1977). Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society B, 4, 1-38.
Forster, J. J. and Smith, P. W. (1998). Model-based inference for categorical survey data subject to nonignorable non-response, Journal of the Royal Statistical society, Series B, 60, 57-70.

상세보기
Green, P. E. and Park, T. (2003). A Bayesian hierarchical model for categorical data with nonignorable nonresponse, Biometrics, 59, 886-896.

상세보기
Ibrahim, J. G., Zhu, H. and Tang, N. (2008). Model selection criteria for missing-data problems using the EM algorithm, Journal of the American Statistical Association, 103, 1648-1658.

상세보기
Little, J. A. and Rubin, D. B. (2002). Statistical analysis with missing data, Second edition, Wiley, New York.
Kim, S. Y. and Kwon, S. P. (2009). The effect of survey refusal and noncontact on nonresponse error: For economically active population survey, The Korean Journal of Applied Statistics, 22, 667-676.

원문보기 상세보기
Kim, Y. W. and Nam, S. J. (2009). Forming weighting adjustment cells for unit-nonresponse in sample surveys, Communications for Statistical Applications and Methods, 16, 103-113.

원문보기 상세보기
Kwak, J. and Choi, B. (2014). A comparison study for accuracy of exit poll based on nonresponse model, Journal of the Korean Data & Information Science Society, 25, 53-64.

원문보기 상세보기
Pak, G. D. and Shin, K. I. (2010). Non-response imputation for panel data, Communications for Statistical Applications and Methods, 17, 899-907.

원문보기 상세보기
Park, J. S., Kang, C., and Kim, K. K. (2013). A simulation study of imputation methods for transportation corporation's survey data, Journal of the Korean Data Analysis Society, 15, 1903-1912.
Park, T. and Brown, M. B. (1994). Models for categorical data with nonignorable nonresponse, Journal of the American Statistical Association, 89, 44-52.

상세보기
Park, T. (1998). An approach to categorical data with nonignorable nonresponse, Biometrics, 54, 1579-1690.

상세보기
Park, T. S. and Lee, S. Y. (1998). Analysis of categorical data with nonresponses, The Korean Journal of Applied Statistics, 11, 83-95.
Park, Y. S., Kim, K. H., and Choi, B. (2013). Dynamic Bayesian analysis for irregularly and incompletely observed contingency tables, Journal of the Korean Statistical Society, 42, 277-289.

상세보기
Park, Y. S. and Choi, B. (2010). Bayesian analysis for incomplete multi-way contingency tables with nonignorable nonresponse, Journal of Applied Statistics, 37, 1439-1453.

상세보기
Rubin, D. B., Stern, H. S. and Vehovar, V. (1995). Handling "Don't know" survey responses: The case of the Slovenian Plebiscite, Journal of the American Statistical Association, 90, 822-828, nonresponse, Journal of Applied Statistics, 37, 1439-1453.
Song, J. (2011). Selection of variables to form imputation classes in Hotdeck imputation, Journal of the Korean Data Analysis Society, 13, 1321-1329.
Song, J. (2014). A comparison of imputation methods for multiple response questions, Journal of the Korean Data Analysis Society, 16, 691-701.
Yoon, Y. H. and Choi, B. (2012). Model selection method for categorical data with non-response, Journal of the Korean Data & Information Science Society, 23, 627-641.

원문보기 상세보기

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

경험적 베이지안 방법을 이용한 결측자료 연구
Analysis of Missing Data Using an Empirical Bayesian Method 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

질의응답

참고문헌 (30)

이 논문을 인용한 문헌

저자의 다른 논문 :

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

경험적 베이지안 방법을 이용한 결측자료 연구 Analysis of Missing Data Using an Empirical Bayesian Method 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

질의응답

참고문헌 (30)

이 논문을 인용한 문헌

저자의 다른 논문 :

윤용화 (10) 최보승 (9)

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

경험적 베이지안 방법을 이용한 결측자료 연구
Analysis of Missing Data Using an Empirical Bayesian Method 원문보기

초록
AI-Helper