[논문]기계학습에 기반한 화이트리스트 구축을 통한 강화된 유해사이트 분류 시스템

김병진

[학위논문] 기계학습에 기반한 화이트리스트 구축을 통한 강화된 유해사이트 분류 시스템
Enhanced Harmful Sites Classification System using Whitelist Construction based on Machine Learning 원문보기

김병진 (숭실대학교 대학원 융합소프트웨어학(일원) 시스템 소프트웨어 국내석사)

초록 ▼
AI-Helper

인터넷 이용률은 매년 꾸준히 상승하고 있다. 인터넷의 발전은 정보를 빠르고 쉽게 얻는 것이 가능해졌지만 음란물, 성매매, 도박, 살인 및 자살 등의 유해 정보에 쉽게 접근이 가능하다는 문제가 있다. 또한 일부 커뮤니티에서는 개인의 사생활 침해 또는 불법행위의 선동 등 유해 정보들이 이성적인 성인뿐만이 아니라 아직 미성숙한 청소년들 에게 까지 무분별하게 노출 되고 있다. 이러한 문제점 때문에 정부는 유해사이트 차단을 위한 노력을 하고 있지만 유해사이트의 규모가 방대하고 판별 방법에 정확한 기준이 없어 어려움을 겪고 있다. 오늘날, 유해사이트를 판별하기 위한 여러 방법이 제안 되었지만 큰 비용과 시간이 필요하거나 판별의 정확도가 낮았다. 기존에 제안 된 웹 사이트 간 연결 관계를 이용한 유해사이트 판별 방법은 대규모 유해사이트를 판별하는데 소모되는 비용과 시간을 줄이는데 기여했지만 판별의 정확도 문제는 해결하지 못하였다. 따라서 유해사이트 판별 방법의 정확도를 개선 할 새로운 방법이 필요하다.

본 논문에서는 기존에 제안 된 웹 사이트 간 연결 관계 기반 유해사이트 판별 방법의 정확도 문제를 해결하기 위해 먼저 기계학습에 기반한 유해사이트 분류 시스템을 제안한다. 제안 방법은 기존의 연결 관계를 이용한 유해사이트 판별 방법으로 구축된 유해사이트 데이터베이스를 클러스터링 하고 클러스터 별 정보를 전처리 과정을 통해 메타데이터를 생성한다. 웹 사이트의 메타데이터를 이용하여 분류 예측 알고리즘을 만들고 예측 값과 메타데이터 값을 학습시켜 유해사이트 분류 모델을 생성한다. 제안한 모델로 분류한 여러 유형의 사이트 중 비 유해사이트를 이용하여 화이트리스트를 구축하고 웹 사이트간 연결 관계를 이용한 화이트리스트 구축 방법으로 화이트리스트를 확장한다. 또한, 제안 방법으로 명확한 유해사이트에 대한 분류 기준이 없는 우리나라에게 유해사이트 분류 기준을 제시하고 화이트리스트를 이용하여 기존 웹 사이트간 연결 관계를 이용한 유해사이트 판별 방법의 정확도를 개선한다.

Abstract ▼ AI-Helper

Internet usage rate is steadily rising every year. The spread of the Internet has made it possible to obtain information easily and quickly, but it also has the problem of easily accessing harmful information such as prostitution, gambling, pornography, suicide and murder. In some communities, harmful information such as personal privacy invasions and the promotion of illegal activities are indiscriminately exposed not only to rational adults but also to immature adolescents. Due to these problems, the government has made efforts to block harmful sites, but it has been difficult because the scale of the harmful sites are huge and there are no precise standard in the discrimination method. Today, several methods have been proposed to discrimination harmful sites, but they are costly, time consuming, and inaccurate. The proposed method of discrimination harmful sites using linkage relationships between web sites has contributed to reducing the cost and time required to discrimination large - scale harmful sites but the problem of accuracy of discrimination was not solved. Therefore, we need a new method to improve the accuracy of harmful site discrimination method.

In this paper, we propose a method to classify harmful sites to solve the accuracy problem of the proposed method. The proposed method clusters the harmful site database constructed by the existing harmful site discrimination method using the linkages relations and generates metadata through pre-processing of cluster-specific information. We create a classification prediction algorithm by using the metadata of the website and generate the harmful site classification model by learning the prediction value and the metadata value. We construct a whitelist using non harmful sites among various types of sites classified by this model, and expand a whitelist by using a whitelist construction method using linkages between websites. In addition, we propose classification scheme of harmful sites to Korea which does not have clear classification criteria for harmful sites by the proposed method, and improve the accuracy of harmful site discrimination method using linkages relationship between existing websites using whitelist.

주제어

학위논문 정보

저자	김병진
학위수여기관	숭실대학교 대학원
학위구분	국내석사
학과	융합소프트웨어학(일원) 시스템 소프트웨어
지도교수	이상준
발행연도	2019
총페이지	41
키워드	빅 데이터 기계학습 데이터베이스
언어	kor
원문 URL	http://www.riss.kr/link?id=T15305565&outLink=K
정보원	한국교육학술정보원

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명(한글), 저자명(한글), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문) 관리번호, 논문명(한글), 논문명(영문), 저자명(한글), 저자명(영문), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문)
저장형식	Text(ASCII format) Excel format
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

[학위논문] 기계학습에 기반한 화이트리스트 구축을 통한 강화된 유해사이트 분류 시스템
Enhanced Harmful Sites Classification System using Whitelist Construction based on Machine Learning 원문보기

초록 ▼
AI-Helper

Abstract ▼ AI-Helper

주제어

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

[학위논문] 기계학습에 기반한 화이트리스트 구축을 통한 강화된 유해사이트 분류 시스템 Enhanced Harmful Sites Classification System using Whitelist Construction based on Machine Learning 원문보기

초록 ▼ 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

[학위논문] 기계학습에 기반한 화이트리스트 구축을 통한 강화된 유해사이트 분류 시스템
Enhanced Harmful Sites Classification System using Whitelist Construction based on Machine Learning 원문보기

초록 ▼
AI-Helper