$\require{mediawiki-texvc}$

연합인증

연합인증 가입 기관의 연구자들은 소속기관의 인증정보(ID와 암호)를 이용해 다른 대학, 연구기관, 서비스 공급자의 다양한 온라인 자원과 연구 데이터를 이용할 수 있습니다.

이는 여행자가 자국에서 발행 받은 여권으로 세계 각국을 자유롭게 여행할 수 있는 것과 같습니다.

연합인증으로 이용이 가능한 서비스는 NTIS, DataON, Edison, Kafe, Webinar 등이 있습니다.

한번의 인증절차만으로 연합인증 가입 서비스에 추가 로그인 없이 이용이 가능합니다.

다만, 연합인증을 위해서는 최초 1회만 인증 절차가 필요합니다. (회원이 아닐 경우 회원 가입이 필요합니다.)

연합인증 절차는 다음과 같습니다.

최초이용시에는
ScienceON에 로그인 → 연합인증 서비스 접속 → 로그인 (본인 확인 또는 회원가입) → 서비스 이용

그 이후에는
ScienceON 로그인 → 연합인증 서비스 접속 → 서비스 이용

연합인증을 활용하시면 KISTI가 제공하는 다양한 서비스를 편리하게 이용하실 수 있습니다.

비대칭 오류 비용을 고려한 XGBoost 기반 재범 예측 모델
A Recidivism Prediction Model Based on XGBoost Considering Asymmetric Error Costs 원문보기

지능정보연구 = Journal of intelligence and information systems, v.25 no.1, 2019년, pp.127 - 137  

원하람 (국민대학교 비즈니스IT전문대학원) ,  심재승 (국민대학교 비즈니스IT전문대학원) ,  안현철 (국민대학교 비즈니스IT전문대학원)

초록
AI-Helper 아이콘AI-Helper

재범예측은 70년대 이전부터 전문가들에 의해서 꾸준히 연구되어온 분야지만, 최근 재범에 의한 범죄가 꾸준히 증가하면서 재범예측의 중요성이 커지고 있다. 특히 미국과 캐나다에서 재판이나 가석방심사 시 재범 위험 평가 보고서를 결정적인 기준으로 채택하게 된 90년대를 기점으로 재범예측에 관한 연구가 활발해졌으며, 비슷한 시기에 국내에서도 재범요인에 관한 실증적인 연구가 시작되었다. 지금까지 대부분의 재범예측 연구는 재범요인 분석이나 재범예측의 정확성을 높이는 연구에 집중된 경향을 보이고 있다. 그러나 재범 예측에는 비대칭 오류 비용 구조가 있기 때문에 경우에 따라 예측 정확도를 최대화함과 동시에 예측 오분류 비용을 최소화하는 연구도 중요한 의미를 가진다. 일반적으로 재범을 저지르지 않을 사람을 재범을 저지를 것으로 오분류하는 비용은 재범을 저지를 사람을 재범을 저지르지 않을 것으로 오분류하는 비용보다 낮다. 전자는 추가적인 감시 비용만 증가되는 반면, 후자는 범죄 발생에 따른 막대한 사회적, 경제적 비용을 야기하기 때문이다. 이러한 비대칭비용에 따른 비용 경제성을 반영하여, 본 연구에서 비대칭 오류 비용을 고려한 XGBoost 기반 재범 예측모델을 제안한다. 모델의 첫 단계에서 최근 데이터 마이닝 분야에서 높은 성능으로 각광받고 있는 앙상블 기법, XGBoost를 적용하였고, XGBoost의 결과를 로지스틱 회귀 분석(Logistic Regression Analysis), 의사결정나무(Decision Trees), 인공신경망(Artificial Neural Networks), 서포트 벡터 머신(Support Vector Machine)과 같은 다양한 예측 기법과 비교하였다. 다음 단계에서 임계치최적화를 통해 FNE(False Negative Error)와 FPE(False Positive Error)의 가중 평균인 전체 오분류 비용을 최소화한다. 이후 모델의 유용성을 검증하기 위해 모델을 실제 재범예측 데이터셋에 적용하여 XGBoost 모델이 다른 비교 모델 보다 우수한 예측 정확도를 보일 뿐 아니라 오분류 비용도 가장 효과적으로 낮춘다는 점을 확인하였다.

Abstract AI-Helper 아이콘AI-Helper

Recidivism prediction has been a subject of constant research by experts since the early 1970s. But it has become more important as committed crimes by recidivist steadily increase. Especially, in the 1990s, after the US and Canada adopted the 'Recidivism Risk Assessment Report' as a decisive criter...

주제어

표/그림 (7)

AI 본문요약
AI-Helper 아이콘 AI-Helper

* AI 자동 식별 결과로 적합하지 않은 문장이 있을 수 있으니, 이용에 유의하시기 바랍니다.

제안 방법

  • After comparing the results of XGB, an experiment to reflect the asymmetric error cost to the model is conducted. The method of the experiment is as follows.
  • Bagging is a method developed by Breiman, which makes learning algorithms into multiple copies, then learns each of them and combines the results(Breiman, 1994). And, boosting is a method to construct a committee of weak learners that lowers the error rate in classification and prediction error in regression. Boosting works by iteratively constructing weak learners whose training set is conditioned on the performance of the previous members of the ensemble(Sharkey, 1999).
  • From the theoretical point of view, this study has the theoretical implication that the asymmetric error cost was reflected in the recidivism prediction and XGB, the latest classification prediction method, was applied to the recidivism prediction to consider the social cost. In addition, from the practical point of view, it is possible to utilize the proposed model in the present study as a reference for criminal judgment or review of parole, so that it is possible to proactively respond to the potential problem of recidivism.
  • Therefore, this paper investigated crime prediction, which is a proactive response through data analysis, and focused on recidivism prediction among crime predictions. In addition, the study focused on the asymmetric error cost, which is mainly used in the research field for detection model, in view of social cost of recidivism.
  • In the field of data analysis, crime prediction is a very interesting subject in that it takes a scientific approach to a wide variety of data. There are various research fields in crime prediction, but this study focuses on a prediction of criminal recidivism. There is no consistent definition of recidivism, but it is defined as “reengaging in criminal behavior after receiving a sanction or intervention" in general(King and Elderbroom, 2014).
  • As shown in this table, XGB model outperformed LOGIT and SVM at the 5% statistical significance level, and surpassed DT, and ANN at the 1% statistical significance level. Therefore, The XGB model was verified to be the optimal model, and the experiments reflecting the asymmetric error costs were performed using XGB model.
  • In that sense, proactive prevention is a much more efficient and effective method than post-counteraction. Therefore, this paper investigated crime prediction, which is a proactive response through data analysis, and focused on recidivism prediction among crime predictions. In addition, the study focused on the asymmetric error cost, which is mainly used in the research field for detection model, in view of social cost of recidivism.
  • This study proposed a novel recidivism prediction model that considers the asymmetric error cost structure. Using an open dataset from the ICSPR, we applied the recidivism prediction to the XGB model and compared it with other statistical and machine learning classification methods to verify that XGB is the best model for recidivism prediction accuracy.
  • The dataset was classified into training and validation datasets after preprocessing and were apply to XGboost(XGB). To validate XGB performance after application, the results are compared with the statistical model, Logistic Regression Analysis(LOGIT), Decision Trees(DT), Artificial Neural Networks(ANN), and Support Vector Machine(SVM), which are machine learning models.
  • This study proposed a novel recidivism prediction model that considers the asymmetric error cost structure. Using an open dataset from the ICSPR, we applied the recidivism prediction to the XGB model and compared it with other statistical and machine learning classification methods to verify that XGB is the best model for recidivism prediction accuracy. And then, we searched for the optimal classification threshold minimized the total cost, which is a weighted average of FPE and FNE.
  • Using the selected 15 final variables in [Table 2], we applied XGB model, and compared the results with LOGIT, DT, ANN, SVM models presented above. In [Table 3], which shows the results of the classification models, the validation data set accuracy of XGB model was the highest at 69.

대상 데이터

  • The data used in this project consisted of information from prisoners released from the North Carolina Prison in the United States from July 1, 1978 to June 30, 1979 and were collected at the ICSPR (Inter-university Consortium for Political and Social Research) website. To build the model, a total of 13,002 data were set with 1:1 ratio (6,501:6,501) of the recidivist and non-recidivist.
  • First, a dataset was built up at 1: 1 ratio of recidivist and non-recidivist data. The dataset was classified into training and validation datasets after preprocessing and were apply to XGboost(XGB). To validate XGB performance after application, the results are compared with the statistical model, Logistic Regression Analysis(LOGIT), Decision Trees(DT), Artificial Neural Networks(ANN), and Support Vector Machine(SVM), which are machine learning models.
  • The data used in this project consisted of information from prisoners released from the North Carolina Prison in the United States from July 1, 1978 to June 30, 1979 and were collected at the ICSPR (Inter-university Consortium for Political and Social Research) website. To build the model, a total of 13,002 data were set with 1:1 ratio (6,501:6,501) of the recidivist and non-recidivist.

이론/모형

  • XGBoost, the abbreviated name for “eXtream Gradient Boosting”, which was used in this study, is a decision trees tree algorithm that uses a boosting method to reduce the error value by grouping several CART(classification and regression trees).
본문요약 정보가 도움이 되었나요?

참고문헌 (13)

  1. Breiman, L., "Bagging Predictors," Machine Learning, Vol.24, No.2(1996), 123-140. 

  2. Chen, T., and C, Guestrin, "Xgboost: A scalable tree boosting system," Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, (2016). 

  3. Joo, D., Hong, T., and I. Han, "The neural network models for IDS based on the asymmetric costs of false negative errors and false positive errors," Expert Systems with Applications, Vol.25(2003), 69-75. 

  4. Jung, S., "A Study on the Use of Big data in Criminal Law," Journal of Public Policy Studies, Vol.29, No. 2(2012), 161-184. 

  5. King, R. S., and B. Elderbroom, Improving recidivism as a performance measure, Washington, DC: Urban Institute, 2014. 

  6. Lee, H.-U., and H. Ahn, "An intelligent intrusion detection model based on support vector machines and the classification threshold optimization for considering the asymmetric error cost," Journal of Intelligence and Information Systems, Vol.17, No.4(2011), 157-173. 

  7. Nam, S., and S. Park, "Study on recidivism factors of prisoners," Corrections Review, Vol.50 (2011), 115-139. 

  8. New York Times, Recidivism's high cost and a way to cut it, 2011, Available at https://www.nytimes.com/2011/04/28/opinion/28thu3.html (Accessed 21 January 2019). 

  9. Prison Education News, The Cost of Recidivism: Victims, the Economy, and American Prisons, 2014, Available at https://prisoneducation.com/prison-education-news/the-cost-of-recidivism-victims-the-economy-and-american-pris-html (Accessed 21 January, 2019). 

  10. Schmidt, P., and A. D. Witte, "Predicting criminal recidivism using 'Split Population' survival time models", Journal of Econometrics, Vol.40, No.1(1989) 141-159. 

  11. Seong, H. G., "Methods and tasks in the prediction of criminal recidivism," Proceeding of the 2006 Annual Conference of Korean Psychological Association, (2006), 404-405. 

  12. Sharkey A.J., Combining Artificial Neural Nets: ensemble and modular multi-net systems, (Ed.), Springer Science & Business Media, 2012. 

  13. Turgut O., "Predicting recidivism through machine learning," Ph.D. dissertation, University of Texas at Dallas, 2017. 

저자의 다른 논문 :

LOADING...

관련 콘텐츠

오픈액세스(OA) 유형

BRONZE

출판사/학술단체 등이 한시적으로 특별한 프로모션 또는 일정기간 경과 후 접근을 허용하여, 출판사/학술단체 등의 사이트에서 이용 가능한 논문

유발과제정보 저작권 관리 안내
섹션별 컨텐츠 바로가기

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

AI-Helper 아이콘
AI-Helper
안녕하세요, AI-Helper입니다. 좌측 "선택된 텍스트"에서 텍스트를 선택하여 요약, 번역, 용어설명을 실행하세요.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.

선택된 텍스트

맨위로