[논문]제한된 한글 입력환경을 위한 음소기반 근사 문자열 검색 시스템

윤태진; 조환규; 정우근

제한된 한글 입력환경을 위한 음소기반 근사 문자열 검색 시스템
A Phoneme-based Approximate String Searching System for Restricted Korean Character Input Environments 원문보기

정보과학회논문지. Journal of KIISE. 소프트웨어 및 응용, v.37 no.10, 2010년, pp.788 - 801

윤태진 (부산대학교 정보컴퓨터공학부) , 조환규 (부산대학교 정보컴퓨터공학부) , 정우근 (부산대학교 정보컴퓨터공학부)

초록
AI-Helper

모바일 기기가 발전함에 따라 입력 수단에 대한 연구는 중요한 이슈이다 키패드, 쿼티키패드, 터치, 음성인식 등 다양한 입력장치가 사용되고 있으나 아직 데스크톱 입력장치에 비해 편의성이 떨어져서 입력 시의 오타나 탈자 등의 오류가 포함되는 경우가 많다. 이러한 입력 오류는 문자 메시지 등 사람과의 의사소통에는 문제를 일으키지 않으나 사전, 주소록 등의 데이터베이스 검색에는 치명적인 오류로서 원하는 검색 결과를 얻지 못하게 된다. 특히 한글의 경우 자음과 모음의 조합을 통해 글자를 생성하는 특성상 1만자가 넘는 글자의 조합이 가능하여 영문에 비하여 오류의 빈도가 높다. 기존의 검색 시스템은 Suffix Tree등을 이용하여 입력 오류를 처리하지만 다양한 오류에 대응하기에는 한계가 있다. 본 논문에서는 오자, 탈자 등의 입력 오류를 허용하면서 빠른 검색이 가능한 근사 한글 단어 검색 시스템을 제안하고자 한다. 이 시스템은 기존의 알파벳에 적용된 근사 문자열 검색(Approximate String Searching)을 한글에 효과적으로 적용할 수 있는 여러 가지 알고리즘과 기법이 포함되어 있다. 그리고 제안된 시스템을 이용한 변형 욕설 필터링 시스템의 개발에 대해 이야기하고자 한다. 이 시스템은 유저의 각종 변형 욕설 입력에 대해 90% 이상의 필터링 성능을 보였다.

Abstract ▼ AI-Helper

Advancing of mobile device is remarkable, so the research on mobile input device is getting more important issue. There are lots of input devices such as keypad, QWERTY keypad, touch and speech recognizer, but they are not as convenient as typical keyboard-based desktop input devices so input strings usually contain many typing errors. These input errors are not trouble with communication among person, but it has very critical problem with searching in database, such as dictionary and address book, we can not obtain correct results. Especially, Hangeul has more than 10,000 different characters because one Hangeul character is made by combination of consonants and vowels, frequency of error is higher than English. Generally, suffix tree is the most widely used data structure to deal with errors of query, but it is not enough for variety errors. In this paper, we propose fast approximate Korean word searching system, which allows variety typing errors. This system includes several algorithms for applying general approximate string searching to Hangeul. And we present profanity filters by using proposed system. This system filters over than 90% of coined profanities.

주제어

참고문헌 (21)

A. Apostolico. The myriad virtues of subword trees. Combinatorial Algorithms on Words, pp.85-96, 1985.
W. A. Burkhard and R. M. Keller. Some approaches to best-match file searching. Commun. ACM, vol.16, no.4, pp.230-236, 1973.

상세보기
Chang-Keon Ryu, Hyong-Jun Kim, Seung-Hyun Ji, Gyun Woo, and Hwan-Gue Cho. Detecting and tracing plagiarized documents by reconstruction plagiarism-evolution tree. Computer and Information Technology, 2008. CIT 2008. 8th IEEE International Conference on, pp.119-124, July 2008.
Hyong-Jun Kim, Chang-Keon Ryu, and Hwan-Gue Cho. A detecting and tracing algorithm for unauthorized internet-news plagiarism using spatiotemporal document evolution model. In SAC '09: Proceedings of the 2009 ACM symposium on Applied Computing, pp.863-868, New York, NY, USA, 2009. ACM.
Chang-Keon Ryu, Hyong-Jun Kim, and Hwan-Gue Cho. Reconstructing evolution process of documents in spatio-temporal analysis. In ICCIT '08: Proceedings of the 2008 Third International Conference on Convergence and Hybrid Information Technology, pp.136-142, Washington, DC, USA, 2008. IEEE Computer Society.
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, and D.J. Basic local alignment search tool. Journal of Molecular Biology., 215, 1990.
K. M. Chao and L. Zhang, Sequence Comparison Theory and Methods, Springer, 2009.
Sreenivas Gollapudi and Rina Panigrahy. A dictionary for approximate string search and longest prefix search. In CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management, pp.768-775, New York, NY, USA, 2006. ACM.
Trinh N. D. Huynh, Wing-Kai Hon, Tak-Wah Lam, and Wing-Kin Sung. Approximate string matching using compressed suffix arrays. Theoretical Computer Science, vol.352, no.1, pp.240-249, 2006.

상세보기
Marios Hadjieleftheriou, Nick Koudas, and Divesh Srivastava. Incremental maintenance of length normalized indexes for approximate string matching. In SIGMOD '09: Proceedings of the 35th SIGMOD international conference on Management of data, pp.429-440, New York, NY, USA, 2009. ACM.
Gonzalo Navarro and Edgar Chavez. A metric index for approximate string matching. Theoretical Computer Science, vol.352, no.1, pp.266-279, 2006.

상세보기
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. The r*-tree: an efficient and robust access method for points and rectangles. In SIGMOD '90: Proceedings of the 1990 ACM SIGMOD international conference on Management of data, pp.322-331, New York, NY, USA, 1990. ACM.
Antonin Guttman. R-trees: a dynamic index structure for spatial searching. Readings in database systems, pp.599-609, 1988.
http://en.wikipedia.org/wiki/swearfilter.
Korea Game Industry Agency, Sound Game language guide research, 2008.
Shekhar Dhupelia. Designing a vulgarity filtering system. In Game Programming Gems 5. Charles River Media, 2005.
Lai C. An empirical study of three machine learning methods for spam filtering. Know-Based System, vol.20, no.3, pp.249-254, 2007.

상세보기
Kyo Hyeon Park and Jee Hyong Lee. Developing a vulgarity filtering system for online games using svm. In Proceedings of the Korean Institute of Information Scientists and Engineers Autumn, 2006.
Ramachandran A Feamster N and Vempala S. Filtering spam with behavioral blacklisting. In Proceedings of the 14th ACM Conference on Computer and Communications Security (Alexandria, Virginia), pp.342-351, 2001.
Imoxion, Lewdness/Profanity Filtering System Using Syllable Information, patent, In 2001-0067853, 2001.
http://en.wikipedia.org/wiki/scunthorpe problem.

저자의 다른 논문 :

LOADING...

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

제한된 한글 입력환경을 위한 음소기반 근사 문자열 검색 시스템
A Phoneme-based Approximate String Searching System for Restricted Korean Character Input Environments 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (21)

이 논문을 인용한 문헌

저자의 다른 논문 :

연구과제 타임라인

관련 콘텐츠

원문 보기

원문 URL 링크

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

제한된 한글 입력환경을 위한 음소기반 근사 문자열 검색 시스템 A Phoneme-based Approximate String Searching System for Restricted Korean Character Input Environments 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (21)

이 논문을 인용한 문헌

저자의 다른 논문 :

윤태진 (1) 조환규 (80) 정우근 (2)

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

관련 콘텐츠

원문 보기

원문 URL 링크

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

제한된 한글 입력환경을 위한 음소기반 근사 문자열 검색 시스템
A Phoneme-based Approximate String Searching System for Restricted Korean Character Input Environments 원문보기

초록
AI-Helper