서울 미세먼지 데이터 결측대치를 위한 시공간 크리깅의 앙상블 머신러닝

An Ensemble Machine Learning from Spatio-temporal Kriging for Imputation of PM10 in Seoul, Korea

대한지리학회지 v.53 no.3 , 2018년, pp.427 - 444  
송인상, 이창로, 박기호

Missing values in spatio-temporal data presumably cause defects, such that contaminate the results of spatio-temporal analyses. However, imputation methods for spatio-temporal data considering the inherent nature of spatio-temporal dependence have been neglected. We suggest an imputation algorithm based on ensemble spatio-temporal kriging for particulate matter measurement data for the period 2010-2014 at 54 monitoring stations near the metropolitan city of Seoul, Korea. We review previous studies on imputation methods for spatio-temporal data, then shed light on the necessity of our approach. Our approach implements resampling techniques on limited spatio-temporal data for a short-term period, then aims to enhance the imputation accuracy by taking the ensemble of the imputation results of resampled sub datasets. To examine such enhancement, we apply different conditions in experiments, including the number of resampling, neighborhood ratios, and ratios of artificially generated missing values. Results show that our approach outperforms both spatio-temporal kriging with the whole dataset (1.32~11.36%) and the linear regression-based imputation algorithm (52% in average). Our results show that the learning approach by resampling is still effective in spatiotemporal kriging in a limited environment as well as the spatio-temporal algorithm considering the inherent dependence among the data. But the considerable underperformance compared to the accuracy of the machine learning-based algorithm indicates the necessity of further examination of the effect of spatio-temporal dependence in such an algorithm.

