[논문]신경망과 앙상블의 조합을 이용한 한국어 감성분석

심유정

신경망과 앙상블의 조합을 이용한 한국어 감성분석
Korean Sentiment Analysis using Neural Network and Ensemble Combination 원문보기

심유정 (광운대학교 스마트융합대학원 스마트시스템학과 국내석사)

초록 ▼
AI-Helper

소셜미디어상의 온라인 리뷰 데이터들은 사용자의 주관적인 의견을
가장 정확하게 분석할 수 있는 마케팅 지표로 주목받고 있다. 이러한
소셜미디어의 정보들은 사용자가 리뷰를 통해 긍정과 부정을 느끼고 이를
분석해 주는 감성분석은 여러 분야에서 활용되고 있다. 감성분석이란,
텍스트에 표현된 개체와 그 속성에 대한 의견, 감성, 성향, 태도 등과
같은 주관적인 데이터를 분석한 것이다. 그래서 감성분석은 자연어 처리
기술로 감성 마이닝, 오피니언 마이닝이라고 하고, 총 3단계로 진행된다.
첫 번째는 인터넷의 자료를 수집하는 데이터 수집 단계이다. 두 번째는
수집된 데이터를 정제해주는 주관성 탐지 단계이다. 마지막 세 번째
단계는 추출한 데이터가 긍정인지 부정인지 분류하는 극성탐지 단계이다.
감성분석은 대부분 온라인 리뷰를 이용하기 때문에 일반적으로 대용량
데이터로 실험한다. 작은 기관에서 진행하는 설문조사 같은 경우 리뷰가
그다지 많지 않기 때문에 소규모 데이터에 대한 감성분석이 필요하다.
본 논문에서는 소규모 데이터에 대한 성능을 향상 시키는 감성분석
모델을 제안한다. 실험에 사용한 데이터셋은 소규모 영화 리뷰 데이터,
대규모 영화 리뷰 데이터, 일반 리뷰 데이터로 총 3가지이다. 첫 번째
실험 데이터셋은 크롤링한 네이버 영화 리뷰 15,000개, 두 번째 실험
데이터셋은 NSMC 데이터 200,000개, 세 번째 실험 데이터셋은 네이버
쇼핑 리뷰 데이터 200,000개를 사용한다. 이는 감성분석의 3단계 중
데이터 수집 단계이다. 수집한 데이터셋 중 레이블링을 해줘야 하는 첫
번째 실험 데이터와 세 번째 실험 데이터는 별점을 통계적 분석에 따라
긍정과 부정 레이블로 치환한다. 레이블링한 데이터셋의 중복 데이터,
한글 외의 특수문자, 이모티콘, Null 값을 제거하는 정제 작업을 한다. 이
단계가 감성분석의 주관성 탐지 단계이다. 정제된 데이터셋을 이용하여
긍정과 부정을 판단하는 극성탐지를 한다. 이 단계가 감성분석의
극성탐지 단계이다.
본 논문에서는 소규모 데이터에 대한 감성분석 모델을 제안하고,
실험을 통하여 검증한다. 이를 위하여 순차적인 데이터에서 뛰어난
성능을 갖는 LSTM의 변형인 GRU를 양방향으로 학습하는 Bi-GRU와
앙상블 학습 방법 중 하나인 배깅(Bagging) 기법을 결합한 Bagging-Bi-
GRU를 제안한다. 제안된 모델의 성능을 검증하기 위하여, 소규모
데이터와 대규모 데이터에 적용한다. 그리고 기존의 기계학습 알고리즘인
Bi-GRU와 비교 분석하여, 제안 모델이 소규모 데이터뿐만 아니라,
대규모 데이터에 대해서도 성능이 향상되었음을 보여주고 있다.

Abstract ▼ AI-Helper

Online review data on social media is attracting attention as a
marketing indicator that can analyze users' subjective opinions most
accurately. Sentiment analysis, in which users feel positive and
negative about the target and analyze the reason, is drawing
attention in many fields. Sentiment analysis is an analysis of
subjective data such as opinions, emotions, inclinations, and attitudes
about entities expressed in texts and their properties. So, sentiment
analysis is a natural language processing technology, called emotion
mining and opinion mining, and it proceeds in three steps. The first
is the data collection step to collect data from the Internet. The
second is the subjectivity detection step that purifies the collected
data. The third and final step is the polarity detection step, which
classifies whether the extracted data is positive or negative. Since
most of the sentiment analysis uses online reviews, it is generally
experimented with large amounts of data. In the case of surveys
conducted by small organizations, sentiment analysis on small-scaled
data is needed though there are not many reviews. In this paper, we
propose the sentiment analysis model that improves the performance
of small data. The datasets used in the experiment were small-scale
movie review data, large-scale movie review data, and shopping
review data which is not movie review data. The first experimental
dataset uses 15,000 crawled Naver movie reviews, the second
experimental dataset uses 200,000 NSMC data, and the third
experimental dataset uses 200,000 Naver shopping review data. This
is the data collection stage among the three stages of sentiment
analysis. Among the collected datasets, the first and third
experimental data that need to be labeled are replaced with
positive/negative labels according to statistical analysis. The
duplicated data of labeled datasets, all other texts except korean and
null values should be removed. This is the subjectivity detection step
among the three stages of sentiment analysis. Polarity detection is
performed to determine positivity/negative of purified data. This is
the polarity detection stage among the three stages of sentiment
analysis. Finally, we do experiments of the proposed model with
purified datasets each.
In this paper, we propose a model that can analyze polarity detection
for small-scale data and verify it through experiments. We propose
Bagging-Bi-GRU, which combines Bi-GRU, which is a variant of
LSTM with excellent performance on sequential data, that learns
GRU in both directions, and Bagging, which is one of the ensemble
learning methods. Bi-GRU learns in both directions GRU, a variant
of LSTM that produces outstanding performance on sequential data.
Bagging trains each model in parallel, reducing individual differences
between models and improving generalization errors. The
experimental results of the proposed model are compared with that of
the existing machine learning algorithm, Bi-GRU. As results of the
experiments, they are showing that the proposed method has better
performances than the existing Bi-GRU not only in small-scale data
but also in large-scale data. And it had better performances with
non-moving review data.

학위논문 정보

저자	심유정
학위수여기관	광운대학교 스마트융합대학원
학위구분	국내석사
학과	스마트시스템학과
지도교수	이종용
발행연도	2021
언어	kor
원문 URL	http://www.riss.kr/link?id=T15959474&outLink=K
정보원	한국교육학술정보원

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명(한글), 저자명(한글), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문) 관리번호, 논문명(한글), 논문명(영문), 저자명(한글), 저자명(영문), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문)
저장형식	Text(ASCII format) Excel format
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

신경망과 앙상블의 조합을 이용한 한국어 감성분석
Korean Sentiment Analysis using Neural Network and Ensemble Combination 원문보기

초록 ▼
AI-Helper

Abstract ▼ AI-Helper

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

신경망과 앙상블의 조합을 이용한 한국어 감성분석 Korean Sentiment Analysis using Neural Network and Ensemble Combination 원문보기

초록 ▼ 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

신경망과 앙상블의 조합을 이용한 한국어 감성분석
Korean Sentiment Analysis using Neural Network and Ensemble Combination 원문보기

초록 ▼
AI-Helper