[논문]의미적 유사성 기반의 코드 검색엔진

Kisub Kim

의미적 유사성 기반의 코드 검색엔진
Semantic Similarity-based Code Search Engine 원문보기

Kisub Kim (충북대학교 컴퓨터공학과(원) 컴퓨터공학전공 국내석사)

초록 ▼
AI-Helper

소스코드 검색은 소프트웨어 개발 프로젝트에서 높은 비중을 차지하며, 개발자들은 검색을 통해 개발에 필요한 모듈, 코드 예시 등을 얻고 개발의 효율성을 증대시키거나 소프트웨어 품질을 개선한다. 또한 개발자들은 수시로 자신 또는 타인의 코드를 수정하고 확장하는데, 이 경우에도 소스코드 검색을 통해 수정, 확장 방향을 모색한다. 이때 대부분의 개발자들은 단순히 자연어 입력으로 코드를 검색하는 경우가 많고, 때때로 검색에 많은 시간을 할애하기도 한다.
본 논문에서는 코드조각을 입력으로 코드예시를 검색하는 의미적 유사성 기반의 코드 검색엔진인 Semantic similarity-based COde seArCH engine (S_Coach)를 제안한다. 본 접근법에서는 대규모의 코드샘플과 사용자 질의 예시를 추출하기 위하여 각각 가장 큰 규모의 코드 호스팅 사이트와 Q&A 커뮤니티인 GitHub와 Stackoverflow의 데이터를 기반으로 색인을 구성하였다. 또한 사용자 입력 코드를 분석해 구조적 코드요소들과 맵핑하는 방식을 통해 검색의 성능을 향상시키고, 의미적 유사성을 고려하기 위하여 코드를 직접 검색하지 않고 Q&A 커뮤니티 내에서 사용자 입력코드와 관련된 포스트들의 자연어 분석을 거쳐 유사한 질문들을 자동으로 검색하는 프로세스를 추가했다.
S_Coach의 효용성을 입증하기 위하여 Stackoverflow에서 가장 많이 검색되는 질문들내의 답변들에서 추출한 코드조각들을 입력으로 하여, S_Coach와 상용 코드검색 엔진들로부터 반환되는 결과의 정확도를 분석하고, 구조적 및 의미적 유사성을 비교하였다.
결론적으로, S_Coach가 Krugle과 SearchCode에 비해 각각 50%, 23% 포인트의 구조적 유사성이 높은 것으로 나타났으며, 의미적으로는 유사하지만 문법이나 라이브러리가 다른 코드들도 검색 가능함을 보였다.

Abstract ▼ AI-Helper

Source code retrieval takes high proportion in software development projects and developers obtain the needed modules and code examples through code search so that they can improve the effectiveness or quality of software. In addition, developers constantly modify and expand their own or others' codes, and in this case, they also search for source code for modification and expansion. At this time, most of developers often simply search for codes with free-form query, and sometimes it takes a lot of time.
In this paper, I propose Semantic similarity-based COde seArCH engine (S_Coach), an approach that retrieves code examples based on semantic similarity and that receives an input as a code fragment. In this approach, indices are constructed based on the data of GitHub and Stackoverflow, the largest code hosting site and Q&A forum respectively to extract a large number of code samples and user code query examples. In addition, by analyzing the user input code and mapping it with the structural code elements, the performance of the search engine is improved. S_Coach has added a process to automatically search for similar questions through natural language analysis of posts related to the user input code within the Q&A forum, without directly searching for code to consider semantic similarity. To evaluate the effectiveness of S_Coach, structural similarity and semantic similarity were compared against to the results of commercial code search engines, taking input as the code fragments extracted from the answers in the most frequently searched questions in Stackoverflow. As a result, S_Coach showed structural similarity of 50% and 23% points higher compared to two commercial code search engines (Krugle and searchcode) respectively, and the result also indicated that S_Coach is capable of finding semantically similar codes without syntactic similarity or those using different libraries.

주제어

학위논문 정보

저자	Kisub Kim
학위수여기관	충북대학교
학위구분	국내석사
학과	컴퓨터공학과(원) 컴퓨터공학전공
지도교수	서영훈
발행연도	2017
총페이지	91
키워드	Semantic Code Search Open Source
언어	eng
원문 URL	http://www.riss.kr/link?id=T14437495&outLink=K
정보원	한국교육학술정보원

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명(한글), 저자명(한글), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문) 관리번호, 논문명(한글), 논문명(영문), 저자명(한글), 저자명(영문), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문)
저장형식	Text(ASCII format) Excel format
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

의미적 유사성 기반의 코드 검색엔진
Semantic Similarity-based Code Search Engine 원문보기

초록 ▼
AI-Helper

Abstract ▼ AI-Helper

주제어

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

의미적 유사성 기반의 코드 검색엔진 Semantic Similarity-based Code Search Engine 원문보기

초록 ▼ 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

의미적 유사성 기반의 코드 검색엔진
Semantic Similarity-based Code Search Engine 원문보기

초록 ▼
AI-Helper