환경자료를 이용한 다양한 통계적인 추정에서 요구되는 정규분포 가정 만족 정도를 파악하기 위하여, KOEM(해양환경공단, Korea Marine Environment Management Corporation) 항만환경 모니터링 자료 3,000세트(관측 정점 50, 표층-저층 구분 수질항목 30, 동계-하계 2)를 대상으로 18가지의 방법으로 정규분포 적합도 검정을 수행하고, 각각의 검정방법에 대한 비교 및 평가를 수행하였다. 추가적으로 자료변환 및 이상자료 영향 평가를 위하여 Shapiro-Wilk 방법을 기준 검정방법으로 선택하였다. 선정된 검정 방법을 이용하여 대표적인 정규분포 변환 방법인 Box-Cox 변환 전·후의 정규분포 적합 기각 정도와 Rosner 이상자료 진단방법을 이용한 정규분포 적합 기각 정도를 추정 및 분석하였다. Box-Cox 변환 전·후 정규분포 기각비율은 하나의 수질항목을 기준으로 24-28개 정점에서 3-4개 정도의 정점으로 크게 감소하였으며, 이상자료로 진단된 자료를 제외한 경우에는 Box-Cox 변환 전·후의 기각개수는 6-9개 정도에서 1개 정도로 감소하였다. 따라서 정규분포를 따르지 않는 연안 환경자료를 이용하여 통계적인 추정을 수행하는 경우에는 이상자료 검정 방법과 Box-Cox 변환을 모두 적용할 필요가 있다.
환경자료를 이용한 다양한 통계적인 추정에서 요구되는 정규분포 가정 만족 정도를 파악하기 위하여, KOEM(해양환경공단, Korea Marine Environment Management Corporation) 항만환경 모니터링 자료 3,000세트(관측 정점 50, 표층-저층 구분 수질항목 30, 동계-하계 2)를 대상으로 18가지의 방법으로 정규분포 적합도 검정을 수행하고, 각각의 검정방법에 대한 비교 및 평가를 수행하였다. 추가적으로 자료변환 및 이상자료 영향 평가를 위하여 Shapiro-Wilk 방법을 기준 검정방법으로 선택하였다. 선정된 검정 방법을 이용하여 대표적인 정규분포 변환 방법인 Box-Cox 변환 전·후의 정규분포 적합 기각 정도와 Rosner 이상자료 진단방법을 이용한 정규분포 적합 기각 정도를 추정 및 분석하였다. Box-Cox 변환 전·후 정규분포 기각비율은 하나의 수질항목을 기준으로 24-28개 정점에서 3-4개 정도의 정점으로 크게 감소하였으며, 이상자료로 진단된 자료를 제외한 경우에는 Box-Cox 변환 전·후의 기각개수는 6-9개 정도에서 1개 정도로 감소하였다. 따라서 정규분포를 따르지 않는 연안 환경자료를 이용하여 통계적인 추정을 수행하는 경우에는 이상자료 검정 방법과 Box-Cox 변환을 모두 적용할 필요가 있다.
Normality test (hereafter NT) is a highly recommended test for statistical estimation because the normality assumption on the data is the basic and essential. NT was carried using the KOEM water quality monitoring data in harbor which are composed of total 3,000 data sets (50 stations, 30 water qual...
Normality test (hereafter NT) is a highly recommended test for statistical estimation because the normality assumption on the data is the basic and essential. NT was carried using the KOEM water quality monitoring data in harbor which are composed of total 3,000 data sets (50 stations, 30 water quality parameters including surface and bottom layers, and two seasons, such as summer and winter). The comparative analysis of the normality are carried out using total 18 methods supported by the R program packages. In addition, the Shapiro-Wilk test method is selected as the references method in this study for the analysis on the data transformation and outliers's effects in detail. The numbers of normality assumption rejection (NAR) are estimated and compared to these cases, before and after applications of the Box-Cox transformation and Rosner's outlier test. The NAR numbers are reduced from 24-28 to 3-4 in the "before and after" BC transformation cases with the no outlier-exclusion condition. On the contrary, the NAR numbers are rapidly diminished from 6-9 to below one in the same case with the outlier exclusion condition. Thus, the Box-Cox transformation based on the outlier test of the coastal water quality monitoring data that are not comes form the normal distribution, is highly recommended for the suitable statistical estimation and inferences.
Normality test (hereafter NT) is a highly recommended test for statistical estimation because the normality assumption on the data is the basic and essential. NT was carried using the KOEM water quality monitoring data in harbor which are composed of total 3,000 data sets (50 stations, 30 water quality parameters including surface and bottom layers, and two seasons, such as summer and winter). The comparative analysis of the normality are carried out using total 18 methods supported by the R program packages. In addition, the Shapiro-Wilk test method is selected as the references method in this study for the analysis on the data transformation and outliers's effects in detail. The numbers of normality assumption rejection (NAR) are estimated and compared to these cases, before and after applications of the Box-Cox transformation and Rosner's outlier test. The NAR numbers are reduced from 24-28 to 3-4 in the "before and after" BC transformation cases with the no outlier-exclusion condition. On the contrary, the NAR numbers are rapidly diminished from 6-9 to below one in the same case with the outlier exclusion condition. Thus, the Box-Cox transformation based on the outlier test of the coastal water quality monitoring data that are not comes form the normal distribution, is highly recommended for the suitable statistical estimation and inferences.
Barnett, V. and Lewis, T. (1994). Outliers in statistical data, John Wiley & Sons.
Cho, H.Y., Lee, K.S. and Ahn, S.M. (2016). Impact of outliers on the statistical measures of the environmental monitoring data in Busan coastal sea, Note. Ocean and Polar Research, 38(2), 149-159.
Frosini, B.V. (1987). On the distribution and power of a goodnessof-fit statistic with parametric and nonparametric applications. "Goodness-of-fit" (edited by Revesz P., Sarkadi K., Sen P.K.). 133-154.
Gavrilov, I. and Pusev, R. (2014). normtest: Tests for Normality. R package version 1.1. https://CRAN.R-project.org/packagenormtest.
Geary, R.C. (1935). The ratio of the mean deviation to the standard deviation as a test of normality. Biometrika, 27, 310-332.
Gross, J. and Ligges, U. (2015). nortest: Tests for Normality. R package version 1.0-4. https://CRAN.R-project.org/packagenortest.
Hegazy, Y.A.S. and Green, J.R. (1975). Some new goodness-of-fit tests using order statistics. Journal of the Royal Statistical Society. Series C (Applied Statistics), 24, 299-308.
Jarque, C.M. and Bera, A.K. (1987). A test for normality of observations and regression residuals. International Statistical Review, 55, 163-172.
Millard, S.P. (2013). EnvStats: An R Package for Environmental Statistics. Springer, New York. ISBN 978-1-4614-8455-4, https://www.springer.com.
Ministry of Oceans and Fisheris (2012). Marine Environment Information (System) Portal (2021). https://www.meis.go.kr [accessed 2021.02.26.].
Pohlert, T. (2020). ppcc: Probability Plot Correlation Coefficient Test. R package version 1.2. https://CRAN.R-project.org/packageppcc.
R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
Razali, N.M. and Wah, Y.B. (2011). Power comparisons of shapirowilk, kolmogrov-smirnov, lilliefors and anderson-darling tests. Journal of Statistical Modeling and Analytics, 2(1), 21-33.
Royston, P. (1995). Remark AS R94: A remark on Algorithm AS 181: The W test for normality. Applied Statistics, 44, 547-551. doi:10.2307/2986146.
Royston, P. (1993). A pocket-calculator algorithm for the Shapiro-Francia test for non-normality: an application to medicine. Statistics in Medicine, 12, 181-184.
Shapiro, S.S., Wilk, M.B. and Chen, H.J. (1968). A comparative study of various tests for normality. Journal of the American Statistical Association, 63, 1343-1372.
Stephens, M.A. (1986). Tests based on EDF statistics. Goodnessof-Fit Techniques. (edited by D'Agostino, R.B. and Stephens, M.A.). Marcel Dekker, New York.
Thode, Jr., H.C. (2002). Testing for Normality. Marcel Dekker, New York.
Urzua, C.M. (1996). On the correct use of omnibus tests for normality. Economics Letters, 53, 247-251.
Weisberg, S. and Bingham, C. (1975). An approximate analysis of variance test for non-normality suitable for machine calculation. Technometrics, 17, 133-134.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.