[논문]빅데이터 분석을 위한 한국어 SentiWordNet 개발 방안 연구 : 분노 감정을 중심으로

최석재; 권오병

doi:10.7838/jsebs.2014.19.4.001

[국내논문] 빅데이터 분석을 위한 한국어 SentiWordNet 개발 방안 연구 : 분노 감정을 중심으로
The Study of Developing Korean SentiWordNet for Big Data Analytics : Focusing on Anger Emotion 원문보기

한국전자거래학회지 = The Journal of Society for e-Business Studies, v.19 no.4, 2014년, pp.1 - 19

최석재 (School of Management, Kyung Hee University) , 권오병 (School of Management, Kyung Hee University)

초록
AI-Helper

빅데이터 내에 존재하는 감정 정보를 추출하여 사용자들이 특정 대상에 대하여 갖고 있는 인식이 어떠한지를 파악하고자 하는 노력이 활발히 이루어지고 있다. 상품, 영화, 그리고 사회적 이슈 등에 대한 문장을 분석하여 사람들이 해당 주제에 어떠한 견해를 가지고 있는지를 분석하고 측정하여 구체적인 선호도를 알아내는 것이다. 문장에서 드러나는 감정 정도를 얻기 위해서는 감정어휘의 목록과 정도값을 제시할 수 있는 감정어휘사전이 필요하므로 본 연구에서는 감정어휘를 발견하는 방법과 이들의 정도값을 결정하는 문제를 다룬다. 기본적인 방법은 기초 감정어휘의 목록 수집과 이들의 정도값은 선행연구 결과와 직접 설문 방식을 이용하고, 확장된 목록의 수집과 정도값은 사전의 표제어 설명부(glosses)를 이용해 추론하는 것이다. 그 결과 발견된 감정어휘는 전형성을 띠고 있는 기본형 감정어휘, 기본형 감정어휘의 gloss에 사용된 확장형 1단계 1층위 감정어휘, 비 감정어휘 중 gloss에 기본형 또는 확장형 감정어휘를 가지고 있는 확장형 2단계 1층위 감정어휘, gloss의 gloss에 기본형 또는 확장형 감정어휘가 사용된 확장형 2단계 2층위 감정어휘의 네 종류로 나뉜다. 그리고 확장형 감정어휘의 정도값은 기본형 감정어휘의 정도값을 기초로 문형의 가중치와 강조승수를 적용하여 얻었다. 실험 결과 AND, OR 문형은 내포된 어휘의 감정 정도값을 평균내는 가중치를, Multiply 문형은 정도 부사어의 종류에 따라 1.2~1.5의 가중치를 갖는 것으로 파악되었다. 또한 NOT 문형은 사용된 어휘의 감정 정도를 일정 정도로 낮추어 역전시키는 것으로 추정된다. 또한 확장형 어휘에 적용되는 강조승수는 1층위에서 2, 2층위에서 3을 갖는 것으로 예상된다.

Abstract ▼ AI-Helper

Efforts to identify user's recognition which exists in the big data are being conducted actively. They try to measure scores of people's view about products, movies and social issues by analyzing statements raised on Internet bulletin boards or SNS. So this study deals with the problem of determining how to find the emotional vocabulary and the degree of these values. The survey methods are using the results of previous studies for the basic emotional vocabulary and degree, and inferring from the dictionary's glosses for the extended emotional vocabulary. The results were found to have the 4 emotional words lists (vocabularies) as basic emotional list, extended 1 stratum 1 level list from basic vocabulary's glosses, extended 2 stratum 1 level list from glosses of non-emotional words, and extended 2 stratum 2 level list from glosses' glosses. And we obtained the emotional degrees by applying the weight of the sentences and the emphasis multiplier values on the basis of basic emotional list. Experimental results have been identified as AND and OR sentence having a weight of average degree of included words. And MULTIPLY sentence having 1.2 to 1.5 weight depending on the type of adverb. It is also assumed that NOT sentence having a certain degree by reducing and reversing the original word's emotional degree. It is also considered that emphasis multiplier values have 2 for 1 stratum and 3 for 2 stratum.

주제어

질의응답

핵심어	질문	논문에서 추출한 답변
	SentiWordNet의 정확도가 떨어지는 일차적인 이유는 무엇인가?	SentiWordNet의 정확도가 떨어지는 일차 적인 이유는 정도값 계산의 근거 자료로 사용된 glosses가 WordNet의 것이기 때문이다. WordNet은 어휘 사이의 관계를 형성하는 데주목적이 있는 것이어서 일반 사전처럼 자세한 해설을 하지 않는다.
	NOT 연산자가 쓰였을 때의 감정 정도값이 그대로 유지되지는 않는 예시는 무엇인가?	그러나 NOT 연산자가 쓰였을 때의 감정 정도값이 그대로 유지되지는 않을 것이다. 어떤 사람이 불쾌하거나 언짢을 때 직접 ‘불쾌 하다’ 또는 ‘언짢다’라는 표현을 쓰지 않고 간접적 방법인 기쁨 감정의 어휘를 부정시켜서 표현하는 이유는 분노의 감정 정도가 그만큼 큰 것은 아니기 때문이다. 엄밀히는 ‘기쁜 것은 아니다’라는 의미를 가질 뿐이다. 다른 예로, ‘나쁘지 않다’는 것은 ‘나쁘다’라는 정도가 갖는 만큼 ‘좋다’는 것이 아니라, ‘보통이다’ 정도의 의미를 가진다.
	감정어휘 선정에서 우선 고려되어야 하는 것은 무엇인가?	감정어휘를 선정함에 있어 우선 고려되어야 하는 것은 언어학적 기준을 모두 충족시키는 감정어휘의 수는 제한적이라는 점이다. 따라서 실제에서 사용하기 위해서는 조건을 잘 충족시키는 전형적인 감정어휘, 즉 기본형 감정어휘를 기초로 어휘의 목록을 확장할 필요가 있다.

참고문헌 (26)

Abbasi, A., Chen, H., Thome, S., and Fu, T., "Affect Analysis of Web forums and Blogs Using Correlation Ensembles," IEEE Transactions on Knowledge and Data Engineering, Vol. 20, No. 9, pp. 1168-1180, 2008.

상세보기
Baccianella, S., Esuli, A., and Sebastiani, F., "SentiWordNet 3.0 : An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining," In Proceedings of the 7th Conference on International Language Resources and Evaluation(LREC'10), pp. 2200-2204, 2010.
Biswas, S., Yoo, J. H., and Jung, C. Y., "A Study on Priorities of the Components of Big Data Information Security Service by AHP," Journal of Society for e-Business Studies, Vol. 18, No. 4, pp. 301-314, 2013.

원문보기 상세보기
Choi, S. J., "The Type and Character of Feeling Verb," EoMunNonJip, Vol. 58, pp. 127-159, 2008.
Choi, S. J., "The level of Feeling Verb : in the case of Anger words," Lingua Humanitatis, Vol. 11, No. 2, pp. 273-295, 2009.
Collins Cobuild Advanced Learner's English Dictionary, 6th Edition, Harper Collins Publishers, 2009.
Dehkharghani, R., Yanikoglu, B. D., and Tapucu, Y., "Adaptation and Use of Subjectivity Lexicons for Domain Dependent Sentiment Classification," IEEE 12th International Conference on Data Mining Workshops(ICDMW), pp. 669-673, 2012.
Esuli, A. and Sebastiani, F., "Determining the Semantic Orientation of Terms through Gloss Classification," In Proceedings of 14th ACM International conference on Information and knowledge management, pp. 617-624, 2005.
Esuli, A. and Sebastiani, F., "Determining Term Subjectivity and Term Orientation for Opinion Mining," In Proceedings of EACL-06, 11th Conference of the European Chapter of the Association for Computational Linguistics, pp. 193-200, 2006.
Esuli, A. and Sebastiani, F., "SentiWord-Net : A Publicly Available Lexical Resource for Opinion Mining," In Proceedings of the 5th Conference on Language Resources and Evaluation(LREC'06), pp. 417-422, 2006.
Esuli, A. and Sebastiani, F., "Random-Walk Models of Term Semantics : An Application to Opinion-Related Properties," In Proceedings of the 3rd language Technology Conference(LTI '07), pp. 221-225, 2007.
Gim, E. Y., "A Study on the Korean Emotion Verbs," PhD thesis, Chonnam National University, 2004.
Hamouda, A. and Rohaim, M., "Reviews Classification Using SentiWordNet Lexicon," The Online Journal on Computer Science and information Technology(OJCSIT), Vol. 2, No. 1, pp. 120-123, 2011.
Hatzivassiloglou, V. and Katheleen R. M., "Predicting the Semantic Orientation of Adjectives," In Proceedings of ACL-97, 35th Annual Meeting of the Association for Computational Linguistics, pp. 174-181, 1997.
Hwang, J. W. and Ko, Y. J., "A Korean Sentence and Document Sentiment Classification System Using Sentiment Features," Journal of KISS : computing practices, Vol. 14, No. 3, pp. 336-340, 2008.
Kamps, J., Marx, M., Mokken, R. J., and Rijke, M. D., "Using WordNet to Measure Semantic Orientation of Adjectives," In Proceedings of LREC-04, 4th International Conference on Language Resources and Evaluation, Vol. IV, pp. 1115-1118, 2004.
Lyons, W., Emotion, Cambridge UniversityPress, London, 1980.
Ohana, B. and Tierney, B., "Sentiment Classification of Reviews Using Senti-WordNet," Proceedings of the 9th IT&T Conference, 2009.
Rao, D., Lewis, S., and Reichenbach, C., "Automatic Opinion Poloarity Classification of Movie Reviews," Colorado Research in Linguistics, Vol. 17, No. 1, 2004.
Roh, J. H., Kim, H. J., and Chang, J. Y., "Improving Hypertext Classification Systems through WordNet-based Feature Abstraction," Journal of Society for e-Business Studies, Vol. 18, No. 2, pp. 95-110, 2013.

원문보기 상세보기
Rohracher, H., Einfuhrung in die psychologie, Urban und Schwarzenberg, Munchen, Berlin, Wien, 1976(윤흥섭 역. 심리학개론, 성원사, 1990).
Shaver, P., Schwarth, J., Kirson, D., and O'Connor, C., "Emotion Knowledge : Further Exploration of a Prototype Approach," Journal of Personality and Social Psychology, Vol. 52, No. 6, pp. 1061-1086, 1987.

상세보기
Su, Q., Xiang, Kun., Wang, H., Sun, B., and Yu, S., "Using Pointwise Mutual Information to Identify Implicit Features in Customer Reviews," International Conference on the Computer Processing of Oriental Languages, pp. 22-30, 2006.
Turney, P. D. and Littman, M. T., "Measuring Praise and Criticism : Inference of Semantic Orientation from Association," ACM Transactions on Information Systems, Vol. 21, No. 4, pp. 315-346, 2003.

상세보기
Yeon, J., Shim, J., and Lee, S. G., "Outlier Detection Techniques for Biased Opinion Discovery," Journal of Society for e-Business Studies, Vol. 18, No. 4, pp. 315-326, 2013.

원문보기 상세보기
Yoon, A. S. and Kwon, H. C., "Compononet Analysis for Constructing an Emotion Ontology," Korean Journal of Cognitive Science, Vol. 21, No. 1, pp. 157-175, 2010.

원문보기 상세보기

저자의 다른 논문 :

LOADING...

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증