[논문]수질자료의 특성을 고려한 앙상블 머신러닝 모형 구축 및 설명가능한 인공지능을 이용한 모형결과 해석에 대한 연구

박정수

doi:10.11001/jksww.2022.36.4.239

수질자료의 특성을 고려한 앙상블 머신러닝 모형 구축 및 설명가능한 인공지능을 이용한 모형결과 해석에 대한 연구
Development of ensemble machine learning model considering the characteristics of input variables and the interpretation of model performance using explainable artificial intelligence 원문보기

上下水道學會誌 = Journal of Korean Society of Water and Wastewater, v.36 no.4, 2022년, pp.239 - 248

박정수 (국립한밭대학교 건설환경공학과)

Abstract ▼ AI-Helper

The prediction of algal bloom is an important field of study in algal bloom management, and chlorophyll-a concentration(Chl-a) is commonly used to represent the status of algal bloom. In, recent years advanced machine learning algorithms are increasingly used for the prediction of algal bloom. In this study, XGBoost(XGB), an ensemble machine learning algorithm, was used to develop a model to predict Chl-a in a reservoir. The daily observation of water quality data and climate data was used for the training and testing of the model. In the first step of the study, the input variables were clustered into two groups(low and high value groups) based on the observed value of water temperature(TEMP), total organic carbon concentration(TOC), total nitrogen concentration(TN) and total phosphorus concentration(TP). For each of the four water quality items, two XGB models were developed using only the data in each clustered group(Model 1). The results were compared to the prediction of an XGB model developed by using the entire data before clustering(Model 2). The model performance was evaluated using three indices including root mean squared error-observation standard deviation ratio(RSR). The model performance was improved using Model 1 for TEMP, TN, TP as the RSR of each model was 0.503, 0.477 and 0.493, respectively, while the RSR of Model 2 was 0.521. On the other hand, Model 2 shows better performance than Model 1 for TOC, where the RSR was 0.532. Explainable artificial intelligence(XAI) is an ongoing field of research in machine learning study. Shapley value analysis, a novel XAI algorithm, was also used for the quantitative interpretation of the XGB model performance developed in this study.

주제어

참고문헌 (31)

Ahmad, A., and Dey, L. (2007). A k-mean clustering algorithm for mixed numeric and categorical data, Data Knowl. Eng., 63, 503-527.
Bennett, N.D., Croke, B.F., Guariso, G., Guillaume, J.H., Hamilton, S.H., Jakeman, A.J., Marsili-Libelli, S., Newham, L.T., Norton, J.P. and Perrin, C. (2013). Characterising performance of environmental models, Environ. Modell. Softw., 40, 1-20.
Chen, T. and Guestrin, C. (2016). "Xgboost: A scalable tree boosting system", In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17 August, San Francisco, CA, USA. Association for computing Machinery.
Dietterich, T.G. (2000). Ensemble methods in machine learning, In international workshop on multiple classifier systems, June, Berlin, Heidelberg. 1-15.
Ekmekcioglu, O., Koc, K., Ozger, M., and Isik, Z. (2022). Exploring the additional value of class imbalance distributions on interpretable flash flood susceptibility prediction in the Black Warrior River basin, Alabama, United States, J. Hydrol., 610, 127877.

상세보기
Friedman, J.H. (2001). Greedy function approximation: a gradient boosting machine, Ann. Stat., 1189-1232.
Gunning, D., Stefik, M., Choi, J., Miller, T., Stumpf, S., Yang, G.-Z., 2019. XAI-explainable artificial intelligence, Sci. Robot. 4(37).
Hollister, J.W., Milstead, W.B. and Kreakie, B.J. (2016). Modeling lake trophic state: A random forest approach, Ecosphere, 7, e01321.

상세보기
KMA Korea Meteorological Administration, open met data portal, https://www.data.kma.go.kr/ (April 1, 2022).
Kwak, J. (2021). A study on the 3-month prior prediction of Chl-a concentraion in the Daechong lake using hydrometeorological forecasting data, J. Wetl. Res., 23(2), 144-153.
K-water Mywater https://www.water.or.kr/ (June 1, 2022).
Kwon, Y.S., Baek, S.H., Lim, Y.K., Pyo, J., Ligaray, M., Park, Y. and Cho, K.H. (2018). Monitoring coastal chlorophyll-a concentrations in coastal areas using machine learning models, Water 10(8), 1020.
Liu, M., and Lu, J. (2014). Support vector machine-an alternative to artificial neuron network for water quality forecasting in an agricultural nonpoint source polluted river?, Environ. Sci. Pollut. R., 21, 11036-11053.
Lundberg, S.M., Erion, G.G., and Lee, S.I. (2018). Consistent individualized feature attribution for tree ensembles, https://arxiv.org/abs/1802.03888
Lundberg, S.M. and Lee, S.I. (2017). "A unified approach to interpreting model predictions", Proceedings of the 31st International Conference on Neural Information Processing Systems, 4768-4777.
Ma, X., Sha, J., Wang, D., Yu, Y., Yang, Q. and Niu, X. (2018). Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning, Electron Commer. Res. Appl., 31, 24-39.
Mangalathu, S., Hwang, S.H., and Jeon, J.S. (2020). Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach, Eng. Struct., 219, 110927.

상세보기
Moriasi, D.N., Arnold, J.G., Van Liew, M.W., Bingner, R.L., Harmel, R.D. and Veith, T.L. (2007). Model evaluation guidelines for systematic quantification of accuracy in watershed simulations, Am. Soc. Agric. Biol. Eng., 50, 885-900.
NIER National Institute of Environmental Research, realtime water information system http://www.koreawqi.go.kr/index_web.jsp (April 1, 2022).
Park, J. (2021). The effect of input variables clustering on the characteristics of ensemble machine learning model for water quality prediction, J Korean Soc. Wat. Environ., 37(5), 335-343.
Park, J., Lee, W.H., Kim, K.T., Park, C.Y., Lee, S. and Heo, T.Y. (2022). Interpretation of ensemble learning to predict water quality using explainable artificial intelligence, Sci. Total Environ., 832, 155070.

상세보기
Park, J., Park, J.H., Choi, J.S., Joo, J.C., Park, K., Yoon, H.C., Park, C.Y., Lee, W.H., and Heo, T.Y. (2020). Ensemble Model Development for the Prediction of a Disaster Index in Water Treatment Systems, Water, 12, 3195.
Park, Y., Cho, K.H., Park, J., Cha, S.M. and Kim, J.H. (2015). Development of early-warning protocol for predicting chlorophyll-a concentration using machine learning models in freshwater and estuarine reservoirs, Korea. Sci. Total Environ., 502, 31-41.

상세보기
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R. and Dubourg, V. (2011). Scikit-learn: Machine learning in Python, J. Mach. Learn. Res., 12, 2825-2830.

상세보기
Ribeiro, M.T., Singh, S., Guestrin, C., 2016. "Why should I trust you?" explaining the predictions of any classifier, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery And Data Mining, 1135-1144.
Shin, C.M., Min, J.H., Park, S.Y., Choi, J., Park, J.H., Song, Y.S. and Kim, K. (2017). Operational water quality forecast for the Yeongsan river using EFDC model, J. Korean Soc. Water Environ., 33(2), 219-229.
Shin, Y., Kim, T., Hong, S., Lee, S., Lee, E., Hong, S., Lee, C., Kim, T., Park, M.S., and Park, J. (2020). Prediction of chlorophyll-a concentrations in the Nakdong River using machine learning methods, Water, 12, 1822.
Shrikumar, A., Greenside, P., Shcherbina, A., and Kundaje, A., 2016. Not just a black box: learning important features through propagating activation differences arXiv preprint arXiv: 1605.01713.
Singh, K.P., Basant, N., and Gupta, S. (2011). Support vector machines in water quality management, Anal. Chim. Acta., 703, 152-162.

상세보기
Song, J. (2017). K-Means cluster analysis for missing data, J. Korean Data Anal. Soc., 19, 689-697.
Zhang, D., Qian, L., Mao, B., Huang, C., Huang, B. and Si, Y. (2018). A data-driven design for fault detection of wind turbines using random forests and XGboost. IEEE Access, 6, 21020-21031.

상세보기

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Abstract ▼ AI-Helper

주제어

참고문헌 (31)

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트