• 검색어에 아래의 연산자를 사용하시면 더 정확한 검색결과를 얻을 수 있습니다.
  • 검색연산자
검색연산자 기능 검색시 예
() 우선순위가 가장 높은 연산자 예1) (나노 (기계 | machine))
공백 두 개의 검색어(식)을 모두 포함하고 있는 문서 검색 예1) (나노 기계)
예2) 나노 장영실
| 두 개의 검색어(식) 중 하나 이상 포함하고 있는 문서 검색 예1) (줄기세포 | 면역)
예2) 줄기세포 | 장영실
! NOT 이후에 있는 검색어가 포함된 문서는 제외 예1) (황금 !백금)
예2) !image
* 검색어의 *란에 0개 이상의 임의의 문자가 포함된 문서 검색 예) semi*
"" 따옴표 내의 구문과 완전히 일치하는 문서만 검색 예) "Transform and Quantization"
쳇봇 이모티콘
ScienceON 챗봇입니다.
궁금한 것은 저에게 물어봐주세요.

논문 상세정보

데이터마이닝을 활용한 한국프로야구 승패예측모형 수립에 관한 연구

Using Data Mining Techniques to Predict Win-Loss in Korean Professional Baseball Games


In this research, we employed various data mining techniques to build predictive models for win-loss prediction in Korean professional baseball games. The historical data containing information about players and teams was obtained from the official materials that are provided by the KBO website. Using the collected raw data, we additionally prepared two more types of dataset, which are in ratio and binary format respectively. Dividing away-team's records by the records of the corresponding home-team generated the ratio dataset, while the binary dataset was obtained by comparing the record values. We applied seven classification techniques to three (raw, ratio, and binary) datasets. The employed data mining techniques are decision tree, random forest, logistic regression, neural network, support vector machine, linear discriminant analysis, and quadratic discriminant analysis. Among 21(= 3 datasets${\times}$7 techniques) prediction scenarios, the most accurate model was obtained from the random forest technique based on the binary dataset, which prediction accuracy was 84.14%. It was also observed that using the ratio and the binary dataset helped to build better prediction models than using the raw data. From the capability of variable selection in decision tree, random forest, and stepwise logistic regression, we found that annual salary, earned run, strikeout, pitcher's winning percentage, and four balls are important winning factors of a game. This research is distinct from existing studies in that we used three different types of data and various data mining techniques for win-loss prediction in Korean professional baseball games.

참고문헌 (23)

  1. Breiman, L. (2001), Random forests, Machine Learning, 45(1), 5-32. 
  2. Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984), Classification and regression trees, Wadsworth, CA, USA. 
  3. Burges, C. J. C. (1998), A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 2, 121-167. 
  4. Chea, J.-S., Cho, E.-H., and Eom, H.-J. (2010), Comparisons of the outcomes of statistical models applied to the prediction of post-season entry in Korean professional baseball, The Korean Journal of Measurement and Evaluation in Physical Education and Sport Science, 12(1), 33-48. 
  5. Hong, C., Jung, M., and Lee, J. (2010), Prediction model analysis of 2010 South Africa world cup, Journal of the Korean data and information science society, 21(6), 1137-1146. 
  6. Hong, S., Jung, K., and Chung, T. (2003), Win/Lose prediction system : Predicting baseball game results using a hybrid machine learning model, Journal of Korea Information Science Society : Computing Practices, 9(6), 693-698. 
  7. Jensen, S. T., McShane, B. B., and Wyner, A. J. (2009), Hierarchical Bayesian modeling of hitting performance in baseball, Bayesian Analysis, 4(4), 631-652. 
  8. Jun, C.-H. (2012), Data Mining Techniques and Applications, Hannarae, Seoul, Korea. 
  9. Kim, C. (2001), A win-loss predicting model by analyzing professional baseball game, Journal of Sport and Leisure Studies, 16, 807-819. 
  10. Kim, D., Lee, S., and Kim, Y. (2007), Prediction for 2006 Germany world cup using Bradley-Terry model, The Korean journal of applied statistics, 20(2), 205-218. 
  11. Kim, J. H., Ro, G. T., Park, J. S., and Lee, W. H. (2007), The development of soccer game win-lost prediction model using neural network analysis : FIFA world cup 2006 Germany, Korean Journal of Sport Science, 18(4), 54-63. 
  12. Kim, N.-K. and Park, H.-M. (2011), Predicting the score of a soccer match by use of a Markovian arrival process, IE Interfaces, 24(4), 323-329. 
  13. Koo, S., Kim, H., and Chang, S. (2009), A comparative study on win-loss prediction models for Korean professional basketball, Korean Journal of Sport Science, 20(4), 704-711. 
  14. Korean Baseball Organization (2013), 2013 KBO Annual Report, Korean Baseball Organization, Seoul, Korea. 
  15. Lee, D.-J. and Yang, W. M. (2004), Performance evaluations of professional baseball players using DEA/OERA, IE Interfaces, 17(4), 440-449. 
  16. Lewis, M. M. (2004), Moneyball : The Art of Winning an Unfair Game, W. W. Norton and Company, NY, USA. 
  17. Miljkovic, D., Gajic, L., Kovacevic, A., and Konjovic, Z. (2010), The use of data mining for basketball matches outcomes prediction, Proceedings of the 8th International Symposium on Intelligent Systems and Informatics, 309-312. 
  18. Min, D. K. and Hyun, M. S. (2009), Prediction of a winner in PGA tournament using neural network, Journal of the Korean data and information science society, 20(6), 1119-1127. 
  19. Null, B. (2009), Modeling baseball player ability with a nested Dirichlet distribution, Journal of Quantitative Analysis in Sports, 5(2), 1-36. 
  20. Odachowski, K. and Grekow, J. (2013), Using bookmaker odds to predict the final result of football matches, Lecture Notes in Artificial Intelligence, 7828, 196-205. 
  21. Oh, K.-M. and Lee, J.-T. (2003), A model study on salaries of Korean pro-baseball players using data mining, Journal of Korean Sociology of Sport, 16(2), 295-309. 
  22. Seidman, C. (2002), MS SQL server2000 data mining (Technical Reference). 
  23. Sung, H. and Chang, W. (2007), Forecasting the results of soccer matches using poisson model, IE Interfaces, 20(2), 133-141. 

이 논문을 인용한 문헌 (0)

  1. 이 논문을 인용한 문헌 없음

DOI 인용 스타일