[논문]교차 프로젝트 결함 예측을 위한 유사도 측정 기법 비교 연구

류덕산; 백종문

doi:10.3745/ktsde.2018.7.6.205

교차 프로젝트 결함 예측을 위한 유사도 측정 기법 비교 연구
A Comparative Study on Similarity Measure Techniques for Cross-Project Defect Prediction 원문보기

정보처리학회논문지. KIPS transactions on software and data engineering. 소프트웨어 및 데이터 공학, v.7 no.6, 2018년, pp.205 - 220

류덕산 (KAIST, School of Computing) , 백종문 (KAIST, School of Computing)

초록
AI-Helper

소프트웨어 결함 예측은 결함이 자주 발생하는 모듈에 집중함으로써 소프트웨어 품질 보증 활동에 귀중한 프로젝트 리소스를 효과적으로 할당하는 데 도움이 될 수 있다. 회사 내에서 수집 된 충분한 기록 데이터를 사용하여 정확한 결함 발생 가능성이 높은 모듈 예측에 대해 WPDP (프로젝트 내 결함 예측)를 사용할 수 있다. 회사가 과거 데이터를 유지하지 못한 경우 CPDP (Cross-Project Defect Prediction) 메커니즘을 기반으로 오류를 예측하는 분류기를 만드는 것이 도움이 될 수 있다. CPDP는 다른 조직에서 수집 한 다른 프로젝트 데이터를 사용하여 분류기를 작성하기 때문에 정확한 분류기를 만드는데 가장 큰 장애물은 소스와 대상 프로젝트 간의 서로 다른 분포이다. 이 문제의 해결을 위해 효과적인 유사도 측정 기술을 식별하는 것이 중요하므로, 본 논문에서는 다양한 유사도 측정 기술을 CPDP 모델에 적용하여 성능을 비교한다. 유사도 가중치의 유효성을 평가하고, 통계적 유의성 검정 및 효과 크기 검정을 통해 결과를 검증한다. 실험 결과, k-Nearest Neighbor (k-NN), LOcal Correlation Integral (LOCI) 및 Range 방법이 유사도 측정 기술 중 상위 3 개에 속했고, 이들을 사용하는 CPDP 예측 성능이 WPDP의 성능과 유사하였다.

Abstract ▼ AI-Helper

Software defect prediction is helpful for allocating valuable project resources effectively for software quality assurance activities thanks to focusing on the identified fault-prone modules. If historical data collected within a company is sufficient, a Within-Project Defect Prediction (WPDP) can be utilized for accurate fault-prone module prediction. In case a company does not maintain historical data, it may be helpful to build a classifier towards predicting comprehensible fault prediction based on Cross-Project Defect Prediction (CPDP). Since CPDP employs different project data collected from other organization to build a classifier, the main obstacle to build an accurate classifier is that distributions between source and target projects are not similar. To address the problem, because it is crucial to identify effective similarity measure techniques to obtain high performance for CPDP, In this paper, we aim to identify them. We compare various similarity measure techniques. The effectiveness of similarity weights calculated by those similarity measure techniques are evaluated. The results are verified using the statistical significance test and the effect size test. The results show k-Nearest Neighbor (k-NN), LOcal Correlation Integral (LOCI), and Range methods are the top three performers. The experimental results show that predictive performances using the three methods are comparable to those of WPDP.

주제어

참고문헌 (50)

S. Kim, E. Whitehead, and Y. Zhang, "Classifying software changes: Clean or buggy?," Softw. Eng. IEEE Trans., Vol. 34, No.2, pp.181-196, 2008.

상세보기
K. O. Elish and M. O. Elish, "Predicting defect-prone software modules using support vector machines," J. Syst. Softw., Vol.81, No.5, pp.649-660, May 2008.

상세보기
E. Arisholm, L. C. Briand, and E. B. Johannessen, "A systematic and comprehensive investigation of methods to build and evaluate fault prediction models," J. Syst. Softw., Vol.83, No.1, pp.2-17, Jan. 2010.

상세보기
T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, "A Systematic Literature Review on Fault Prediction Performance in Software Engineering," IEEE Trans. Softw. Eng., Vol.38, No.6, pp.1276-1304, Nov. 2012.

상세보기
T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A. Bener, "Defect prediction from static code features: current results, limitations, new approaches," Autom. Softw. Eng., Vol.17, No.4, pp.375-407, May 2010.

상세보기
M. D'Ambros, M. Lanza, and R. Robbes, "Evaluating defect prediction approaches: A benchmark and an extensive comparison," Empir. Softw. Eng., Vol.17, No.4-5, pp.531-577, Aug. 2012.

상세보기
T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy, "Cross-project defect prediction," in Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, pp.91-100, 2009.
Z. He, F. Shu, Y. Yang, M. Li, and Q. Wang, "An investigation on the feasibility of cross-project defect prediction," Autom. Softw. Eng., Vol.19, No.2, pp.167-199, Jul. 2011.
Y. Ma, G. Luo, X. Zeng, and A. Chen, "Transfer learning for cross-company software defect prediction," Inf. Softw. Technol., Vol.54, No.3, pp.248-256, Mar. 2012.

상세보기
J. Nam, S. J. Pan, and S. Kim, "Transfer defect learning," in Proceedings of the 35th International Conference on Software Engineering, pp.382-391, 2013.
D. Ryu, J. Jang, and J. Baik, "A Hybrid Instance Selection using Nearest-Neighbor for Cross-Project Defect Prediction," J. Comput. Sci. Technol., Vol.30, No.5, pp.969-980, 2015.

상세보기
G. Woodbury, "An Introduction to Statistics." Cengage Learning, 2001.
D. Ryu, O. Choi, and J. Baik, "Value-cognitive boosting with a support vector machine for cross-project defect prediction," Empir. Softw. Eng., Vol.21, No.1, pp.43-71, Feb. 2016.

상세보기
B. Turhan, T. Menzies, A. B. Bener, and J. Di Stefano, "On the relative value of cross-company and within-company data for defect prediction," Empir. Softw. Eng., Vol.14, No.5, pp.540-578, Jan. 2009.

상세보기
T. Pang-Ning, M. Steinbach, and V. Kumar, "Introduction to Data Mining." 2006.
T. Grbac, G. Mausa, and B. Basic, "Stability of Software Defect Prediction in Relation to Levels of Data Imbalance.," in Proceedings of the 2nd Workshop of SQAMIA, 2013.
N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE : Synthetic Minority Over-sampling Technique," J. Artif. Intell. Res., Vol.16, pp.321-357, 2002.
C. C. Aggarwal, "Outlier Analysis." New York, NY: Springer New York, 2013.
H.-P. Kriegel, M. Schubert, and A. Zimek, "Angle-based outlier detection in high-dimensional data," Proceeding 14th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. (KDD '08), pp.444-452, 2008.
N. Altman, "An introduction to kernel and nearest-neighbor nonparametric regression," Am. Stat., Vol.46, No.3, pp.175-185, 1992.

상세보기
R. Hamming, "Error Detecting and Error Correcting Codes," Bell Syst. Tech. J., Vol.XXIX, No.2, 1950.
B. Raman and T. R. Ioerger, "Enhancing Learning using Feature and Example selection," Texas A&M Univ. Coll. Station. TX, USA, 2003.
E. Parzen, "On estimation of a probability density function and mode," Ann. Math. Stat., Vol.33, No.3, pp.1065-1076, 1962.

상세보기
M. Breunig, H. Kriegel, R. Ng, and J. Sander, "LOF: identifying density-based local outliers," ACM Sigmod Rec., pp.1-12, 2000.
S. Papadimitriou, H. Kitagawa, P. B. Gibbons, and C. Faloutsos, "LOCI: Fast outlier detection using the local correlation integral," Proc. - Int. Conf. Data Eng., pp.315-326, 2003.
S. Lloyd, "Least squares quantization in PCM," IEEE Trans. Inf. Theory, Vol.28, No.2, pp.129-137, 1982.

상세보기
I. T. Jolliffe, "Principal Component Analysis." Springer, 2002.
T. Kohonen, "Self-organized formation of topologically correct feature maps," Biol. Cybern., Vol.43, No.1, pp.59-69, 1982.

상세보기
C. M. Bishop, "Pattern recognition and machine learning." New York, New York, USA: Springer, 2006.
B. Turhan, A. Tosun MIsirli, and A. Bener, "Empirical evaluation of the effects of mixed project data on learning defect predictors," Inf. Softw. Technol., Vol.55, No.6, pp.1101-1118, Jun. 2013.

상세보기
M. Jureczko and D. Spinellis, "Using Object-Oriented Design Metrics to Predict Software Defects," in Models and Methods of System Dependability. Oficyna Wydawnicza Politechniki Wroclawskiej, 2010, pp.69-81.
T. Menzies et al., "The PROMISE Repository of empirical software engineering data," 2012. [Online]. Available: http://openscience.us/repo/.
K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, When is "nearest neighbor" meaningful? Springer-Verlag, 1999.
S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, "Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings," IEEE Trans. Softw. Eng., Vol.34, No.4, pp.485-496, 2008.

상세보기
M. Hall, E. Frank, and G. Holmes, "The WEKA data mining software: an update," ACM SIGKDD Explor. Newsl., Vol.11, No.1, pp.10-18, 2009.

상세보기
P. C. Mahalanobis, "On the generalised distance in statistics," in Proceedings of the National Institute of Sciences of India, Vol.2, No.1, pp.49-55, 1936.
F. Menzies T, Greenwald J, "Data mining static code attributes to learn defect predictors," IEEE Trans. Softw. Eng., Vol.33, No.1, pp.2-13, 2007.

상세보기
B. Turhan, A. Tosun, and A. Bener, "Empirical Evaluation of Mixed-Project Defect Prediction Models," in Proceedings of the 37th EUROMICRO Conference on Software Engineering and Advanced Applications, pp.396-403, 2011.
Y. Kamei, S. Matsumoto, A. Monden, K. I. Matsumoto, B. Adams, and A. E. Hassan, "Revisiting common bug prediction findings using effort-aware models," IEEE Int. Conf. Softw. Maintenance, ICSM, 2010.
S. Wang and X. Yao, "Using Class Imbalance Learning for Software Defect Prediction," IEEE Trans. Reliab., Vol.62, No.2, pp.434-443, Jun. 2013.

상세보기
M. Friedman, "The use of ranks to avoid the assumption of normality implicit in the analysis of variance," J. Am. Stat. Assoc., No.32, pp.675-701, 1937.
M. Friedman, "A comparison of alternative tests of significance for the problem of m rankings.," Ann. Math. Stat., No.11, pp.86-92, 1940.
J. Demsar, "Statistical comparisons of classifiers over multiple data sets," J. Mach. Learn. Res., Vol.7, pp.1-30, 2006.
J. Tukey, "Comparing individual means in the analysis of variance," Biometrics, No.5, pp.99-114, 1949.
P. Nemenyi, "Distribution-free multiple comparisons.," Princeton University, 1963.
O. J. Dunn, "Multiple comparisons among means," J. Am. Stat. Assoc., No.56, pp.52-64, 1961.
F. Wilcoxon, "Individual comparisons by ranking methods," Biometrics Bull., pp.80-83, 1945.
A. Arcuri and L. Briand, "A practical guide for using statistical tests to assess randomized algorithms in software engineering," in 2011 33rd International Conference on Software Engineering (ICSE), pp.1-10, 2011.
A. Vargha and H. D. Delaney, "A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong," J. Educ. Behav. Stat., Vol.25, No.2, pp.101-132, 2000.

상세보기
D. M. J. Tax, "DDtools, the Data Description Toolbox for Matlab." 2014.

LOADING...

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

교차 프로젝트 결함 예측을 위한 유사도 측정 기법 비교 연구
A Comparative Study on Similarity Measure Techniques for Cross-Project Defect Prediction 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (50)

이 논문을 인용한 문헌

연구과제 타임라인

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

교차 프로젝트 결함 예측을 위한 유사도 측정 기법 비교 연구 A Comparative Study on Similarity Measure Techniques for Cross-Project Defect Prediction 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (50)

이 논문을 인용한 문헌

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

교차 프로젝트 결함 예측을 위한 유사도 측정 기법 비교 연구
A Comparative Study on Similarity Measure Techniques for Cross-Project Defect Prediction 원문보기

초록
AI-Helper