[논문]『난장이가 쏘아올린 작은 공』 텍스트 마이닝 연구: 계층적 군집분석(Hierarchical clustering)에 의한 주제담론 해석을 중심으로

유민

『난장이가 쏘아올린 작은 공』 텍스트 마이닝 연구: 계층적 군집분석(Hierarchical clustering)에 의한 주제담론 해석을 중심으로
A Text Mining Study on 『The Dwarf』: focused on an interpretation of subject discussion by hierarchical clustering 원문보기

유민 (연세대학교 교육대학원 국어교육전공 국내석사)

초록 ▼
AI-Helper

본 연구는＇컴퓨터가 인간처럼 현대 소설의 주제를 찾을 수 있는가?＇라는 질문에 대한 현재 시점의 답이다. 결론부터 말하자면, 인간만큼 섬세하지 않지만 가능하다. 이를 설명하기 위하여, 조세희 작가의 『난장이가 쏘아올린 작은 공』(이하 난쏘공) 연작 12편을 오픈소스 ＇R＇로 텍스트 마이닝해서 주제 담론을 산출하는 연구를 실시하였다. 난쏘공 연작 12편 안에는 여러 개의 주제담론이 내포되어 있는데 그것을 찾기 위해 개별 작품 간 유사도를 수학적 알고리즘을 활용하여 측정하였고, 계층적으로 군집화(Hiarchical Clustering) 했다. 그리고 군집별로 주요 키워드를 뽑아내어 조합하였는데 인간이 난쏘공을 읽고 느끼는 주제와 유사했다. 이 과정을 설명하기 위한 논문의 구성은 아래와 같다.
1장 서론에서는 현대소설 난쏘공에 관한 빅데이터 연구의 당위성을 설명하였다. 2장 선행연구 검토 및 이론적 배경에서는 난쏘공과 텍스트 마이닝 선행연구 및 텍스트 마이닝을 이해하는데 필요한 이론적 개념을 설명하였다. 3장 연구방법 및 절차에서는 컴퓨터가 난쏘공을 읽기 위해 필요한 사전처리(Pre-process) 과정을 자세하게 설명하였다. 4장 연구결과에서는 컴퓨터가 알려준 난쏘공의 주인공과 주제 담론을 정리하였다. 마지막으로 5장에서는 본 논문의 결과를 요약하고 더 나아가야 할 방향을 제시하였다.
본 논문의 분석 결과를 요약하면 다음과 같다.
첫째, 난쏘공은 인물을 중심으로 서사가 전개된다. 이를 난쏘공 연작 DTM에서 인물 등장 빈도를 산출하여 통계적 수치로 제시하였다.
둘째, 컴퓨터가 산출한 난쏘공 연작의 초점인물과 인간이 판단한 초점인물의 일치율은 75%다. 12편 중에서 9편이 일치하고, 3편은 불일치한다. 그 이유는 컴퓨터가 ＇시점＇을 이해하지 못하는 것과 관련이 있었다.
셋째, 난쏘공을 계층적 군집화 한 결과 총 7개로 산출된다. 3개의 군집에서는＇공장＇을 배경으로 하는 각 인물의 서로 다른 상황과 입장이 드러난다. 또 다른 군집 3개는 중산층의 인물들이 겪는 상황과 고민이 제시된다. 마지막 1개의 군집은, 『뫼비우스의 띠』와 『에필로그』로 엮인 군집으로, 주제 담론을 산출하기 힘든 형식 자체의 특수성을 확인할 수 있었다.
넷째, 수학적 알고리즘을 통해 산출된 주제 담론은 추상적이다. 왜냐하면, 출현 빈도를 통한 주요 키워드를 산출할 수는 있지만, 그 주요 키워드를 문장 수준으로 조립하는 것은 현재 기술의 수준상 인간의 몫이기 때문이다. 다시 말해, 인간이 주요 키워드와 소설 텍스트를 비교하면서 주제 담론의 타당성을 판단해야 하기 때문에, 아직까지는 인간의 해석이 보조되어야 한다.

Abstract ▼ AI-Helper

This study has got an answer for question ‘Can computers find a subject of modern novel as much as humans can do?’. In conclusion, here is the answer that computers can answer it although their answer is not detailed as much as humans. To explain it, this study carries out a text mining with open source ‘R’ to derive a subject discourse based on a dozen of novel series 『The Dwarf』written by Cho, Se-hee. There are several subjects discourse included in a dozen of novel series 『The Dwarf』, to find it, this study performs hierarchical clustering by using mathematical algorithms to measure a similarity among individual novels. After this study extracts key-words according to each cluster of group, it combines its relevant rules that it has a similarity with readers’ appreciation. To explain this process, the composition of study is as follows.
In Chapter 1, it is about the necessity of big data-related study for modern novel 『The Dwarf』. In Chapter 2 as advanced study review and theoretical background, it describes required theoretical ideas to understand 『The Dwarf』, advanced study of text mining, and the text minning. In Chapter 3, it describes the pre-process steps required for computer to read 『The Dwarf』 in detail. In Chapter 4, as a result, it arranges a protagonist and discourse subject of 『The Dwarf』 informed by the computer. At last, in Chapter 5, it summarizes conclusions of this study and suggests a further direction.
To summarize it, there is the result of study as follows.
First, 『The Dwarf』 is narrated focused on characters, therefore, it suggests statistical figures by calculating a characters’appearance rate in its series DTM.
Second, there is only 75% matched between main character of the series of 『The Dwarf』classified by the computer and main character decided by reader. From this, 9 pieces matched while 3 pieces did not among the 12 pieces. There is a reason why the computers do not understand the concept of ‘point of view’ perfectly.
Third, according to hierarchical clustering procedure, there are 7 groups in 『The Dwarf』. In first three groups, it narrates each different situation and position of each characters on the background of ‘factory’. In other three groups, it narrates their life story and agony as middle-class people. At last one group, it belongs to a group as 『Mobius Strip』 and 『Epilogue』, furthermore, it is difficult to derive the discourse subject due to its distinct characteristics.
Fourth, this study finds out that the discourse subjects derived by the mathematical algorithms are abstract. Because it can extract main key-words by its appearance rate, only humans can make up a sentence from the main key-words at a level of current technique. In other words, humans have to decide the wisdom about discourse subject by comparing main key-words and novel text, therefore, human interpretation is still necessary to assist it.

주제어

학위논문 정보

저자	유민
학위수여기관	연세대학교 교육대학원
학위구분	국내석사
학과	국어교육전공
지도교수	정희모
발행연도	2019
총페이지	viii, 123p.
키워드	장이가 쏘아올린 작은 공 빅데이터 R 인물 주제 담론 텍스트 마이닝 사전처리 계층적 군집분석 워드클라우드 유클리디안 거리 빈도 문서-단어 행렬 유사도 거리 군집 공장
언어	kor
원문 URL	http://www.riss.kr/link?id=T15014689&outLink=K
정보원	한국교육학술정보원

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명(한글), 저자명(한글), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문) 관리번호, 논문명(한글), 논문명(영문), 저자명(한글), 저자명(영문), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문)
저장형식	Text(ASCII format) Excel format
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

『난장이가 쏘아올린 작은 공』 텍스트 마이닝 연구: 계층적 군집분석(Hierarchical clustering)에 의한 주제담론 해석을 중심으로
A Text Mining Study on 『The Dwarf』: focused on an interpretation of subject discussion by hierarchical clustering 원문보기

초록 ▼
AI-Helper

Abstract ▼ AI-Helper

주제어

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

연관된 기능

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

『난장이가 쏘아올린 작은 공』 텍스트 마이닝 연구: 계층적 군집분석(Hierarchical clustering)에 의한 주제담론 해석을 중심으로 A Text Mining Study on 『The Dwarf』: focused on an interpretation of subject discussion by hierarchical clustering 원문보기

초록 ▼ 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

연관된 기능

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

『난장이가 쏘아올린 작은 공』 텍스트 마이닝 연구: 계층적 군집분석(Hierarchical clustering)에 의한 주제담론 해석을 중심으로
A Text Mining Study on 『The Dwarf』: focused on an interpretation of subject discussion by hierarchical clustering 원문보기

초록 ▼
AI-Helper