[논문]텍스트 문서 클러스터링을 위한 Particle Swarm Optimization 알고리즘

Suganya Selvaraj

[학위논문] 텍스트 문서 클러스터링을 위한 Particle Swarm Optimization 알고리즘
Particle Swarm Optimization Algorithm for Text Document Clustering 원문보기

Suganya Selvaraj (국민대학교 일반대학원 정보융합보안전공 국내박사)

초록 ▼
AI-Helper

텍스트 문서 클러스터링은 텍스트 문서를 내용 유사성에 따라 클러스터로 비지도 분류하는 것을 말하며 유사한 문서 찾기, 대규모 문서 모음 구성 등의 응용 프로그램에 적용할 수 있다. 데이터에서 관련 정보를 추출하는 것은 어려운 작업이나, 고품질의 문서 클러스터링 알고리즘 개발에 필요한 작업이다. Swarm 지능(...

텍스트 문서 클러스터링은 텍스트 문서를 내용 유사성에 따라 클러스터로 비지도 분류하는 것을 말하며 유사한 문서 찾기, 대규모 문서 모음 구성 등의 응용 프로그램에 적용할 수 있다. 데이터에서 관련 정보를 추출하는 것은 어려운 작업이나, 고품질의 문서 클러스터링 알고리즘 개발에 필요한 작업이다. Swarm 지능(SI)은 매우 복잡한 작업을 수행하기 위해 간단한 규칙을 따르는 단순하고 지능적이지 않은agent들을 이용하는 인공지능 기술이다. SI 알고리즘은 문제의 특징을 SI 알고리즘의 매개 변수에 매핑함으로써 유연하고 강력하며 분산적이고 자가 조직적인 방식으로 솔루션을 달성할 수 있다. 이러한 해결 메커니즘은 기존의 클러스터링 알고리즘과 비교하여 복잡한 문서 클러스터링 문제를 해결하는 데 적합한 Swarm 알고리즘을 만든다. SI는 확장성, 적응성, 집단적 강건성, 개인의 단순성 등 더 많은 장점이 있으며 복잡한 문제를 해결할 수 있는 능력도 갖추고 있다. 하지만 SI 알고리즘은 또한 시간 종속 응용, 매개 변수 조정 및 정체에 몇 가지 문제가 있다. 이러한 문제를 극복하기 위해서는 SI 알고리즘에 대한 연구가 더 필요하다. 본 논문에서 우리는 텍스트 문서 클러스터링 문제를 개선하기 위해 SI 알고리즘의 변형을 연구하고 제안했다.
1장에서는 텍스트 문서 클러스터링 및 SI 알고리즘에 대한 일반적인 소개를 제공한다. 또한, 이 연구의 동기와 기여도에 대해 설명한다.
2장에서는 중요한 제어 매개 변수와 무작위 분포를 식별하기 위해 몇 가지 주요 알고리즘을 자세히 연구했다. 또한 서로 다른 응용 분야에서 SI 알고리즘의 성능 비교를 연구하고 요약했다. 각각의 SI 알고리즘은 장점과 단점에 따라 서로 다른 영역에서 다른 성능을 보여준다.
3장에서는 텍스트 문서 클러스터링에서 최상의SI 알고리즘을 찾기 위해 BBC 스포츠 뉴스와 20개 뉴스 그룹에서 생성된 다양한 크기의 6개 데이터 세트를 사용하여 particle swarm optimization (PSO), bat 알고리즘(BA), 그레이 울프 최적화(GWO) 및 K-means알고리즘에 대한 비교 연구를 수행했다. 실험 결과를 바탕으로 SI 알고리즘의 특성에 대한 텍스트 문서 클러스터링 문제의 특징을 논의하고 PSO와 GWO가 K-means보다 효율적임을 보이며, 이러한 알고리즘 중에서 PSO가 최적의 솔루션을 찾는 측면에서 가장 잘 수행된다.
4장에서는 텍스트 문서 클러스터링을 위한 최적의 초기화를 통해 PSO 알고리즘을 개선하는 것을 목표로 한다. PSO와 다른 알고리즘의 조합은 PSO를 최적으로 초기화하도록 조정되었다. 이 중, comb-GWO-PSO는 실행 시간이 SI 알고리즘의 다른 표준 및 조합과 동일하지만 성능이 가장 우수했다. 이 연구 결과는 GWO를 이용한 PSO의 적절한 초기화가 초기 단계에서는 좋은 탐색 능력을 제공하고 후기 단계에서는 활용 능력을 제공하여 최적의 해결책으로 이어진다는 것을 보여준다.
5장에서는 텍스트 문서 클러스터링 문제의 결과를 개선하기 위해 PSO 알고리즘의 또 다른 동적 하위 swarm 접근법을 제안했다. 우리는 이러한 결과를 표준 SI 알고리즘 및 K-means 알고리즘과 비교하며, 우리가 제안한 알고리즘 동적 두 하위 스왑 PSO(subswarm-PSO)가 PSO보다 실행 시간이 짧은 다른 알고리즘과 비교하여 가장 잘 수행된다는 것을 보인다.

Abstract ▼ AI-Helper

Text document clustering refers to the unsupervised classification of textual documents into clusters based on content similarity and can be applied in applications such as finding similar documents and organizing large document collections. Extracting relevant information out of the data is a challenging task leading to the development of fast and high-quality document clustering algorithms. Swarm Intelligence (SI) is an artificial intelligence technique that includes simple and unintelligent individuals that follow some simple rules to accomplish very complex tasks. By mapping features of problems to parameters of SI algorithms, SI algorithms can achieve solutions in a flexible, robust, decentralized, and self-organized manner. Compared to traditional clustering algorithms, these solving mechanisms make swarm algorithms suitable for resolving complex document clustering problems. SI has more advantages such as scalability, adaptability, collective robustness, and individual simplicity and also has the ability to solve complex problems. Besides, SI algorithms also have a few issues in time-critical applications, parameter tuning, and stagnation. SI algorithms need to be studied more to overcome these kinds of issues. In this thesis, we studied and proposed the variants of SI algorithms to improve the results of the text document clustering problem.
Chapter 1 provides the general introduction of text document clustering and SI algorithms. Also, describe the motivation and contribution of this study.
In chapter 2, we studied a few popular algorithms in detail to identify important control parameters and randomized distribution. We also studied and summarized the performance comparison of SI algorithms in different applications. Each SI algorithm shows different performance in the different domains based on its own strengths and weaknesses.
In chapter 3, we performed a comparative study for the PSO, bat algorithm (BA), grey wolf optimization (GWO), and K-means algorithms using six data sets of various sizes, which were created from BBC sports news and 20 newsgroups to find the best performing standard SI algorithm in text document clustering. Based on our experimental results, we discuss the features of a text document clustering problem with the nature of SI algorithms and conclude that the PSO and GWO are better than K-means, and among those algorithms, the PSO performs best in terms of finding the optimal solution.
In chapter 4, we are aiming to improve the PSO algorithm by optimal initialization for text document clustering. The different combinations of algorithms with PSO had been adapted to optimally initialize the PSO. Based on the results, the comb-GWO-PSO is the best performing algorithm among other compared algorithms, and execution time is also the same as the other standard and combinations of SI algorithms. The results of this study show that proper initialization of PSO using GWO provides good exploration ability in the initial stage and exploitation ability in the later stage and this leads to the optimum solution.
In chapter 5, we proposed another dynamic sub-swarm approach in the PSO algorithm to improve the results of the text document clustering problem. These results were compared with the standard SI algorithms and K-means algorithm. Here, the results show that our proposed algorithm dynamic two sub-swam PSO (subswarm-PSO) performs best comparing other algorithms with less execution time than PSO.

주제어

학위논문 정보

저자	Suganya Selvaraj
학위수여기관	국민대학교 일반대학원
학위구분	국내박사
학과	정보융합보안전공
지도교수	최은미
발행연도	2021
총페이지	vii, 124
키워드	Artificial intelligence algorithms swarm intelligence algorithms particle swarm optimization algorithm subswarm PSO text document clustering 인공 지능 알고리즘 swarm 알고리즘 PSO 알고리즘 하위 군집 PSO 텍스트 문서 클러스터링
언어	eng
원문 URL	http://www.riss.kr/link?id=T16065862&outLink=K
정보원	한국교육학술정보원

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명(한글), 저자명(한글), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문) 관리번호, 논문명(한글), 논문명(영문), 저자명(한글), 저자명(영문), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문)
저장형식	Text(ASCII format) Excel format
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

[학위논문] 텍스트 문서 클러스터링을 위한 Particle Swarm Optimization 알고리즘
Particle Swarm Optimization Algorithm for Text Document Clustering 원문보기

초록 ▼
AI-Helper

Abstract ▼ AI-Helper

주제어

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

[학위논문] 텍스트 문서 클러스터링을 위한 Particle Swarm Optimization 알고리즘 Particle Swarm Optimization Algorithm for Text Document Clustering 원문보기

초록 ▼ 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

[학위논문] 텍스트 문서 클러스터링을 위한 Particle Swarm Optimization 알고리즘
Particle Swarm Optimization Algorithm for Text Document Clustering 원문보기

초록 ▼
AI-Helper