[논문]SKSpark : 공간 키워드 질의를 지원하는 분산 공간 컴퓨팅 프레임워크

양평우

SKSpark : 공간 키워드 질의를 지원하는 분산 공간 컴퓨팅 프레임워크 원문보기

양평우 (群山大學校 컴퓨터情報工學科 컴퓨터情報工學專攻 국내박사)

초록 ▼
AI-Helper

GPS를 내장한 모바일 장치의 보급으로 인하여 일반 사용자들도 쉽게 위치 기반 서비스를 사용할 수 있게 되었다. 사용자는 SNS나 마이크로블로그 같은 서비스를 이용하여 글이나 사진과 같은 컨텐츠를 게시할 때 위치 정보를 태그할 수 있고, 기업에서는 위치 정보를 기반으로 광고나 마케팅을 할 수 있다. 이와 같이 위치 정보를 포함하는 컨텐츠를 공간 웹 객체라고 한다. 위치 기반 서비스에 대한 관심이 높아지고 SNS와 같은 서비스들을 이용하는 사용자가 증가하면서 공간 웹 객체의 생성량이 빠르게 증가하고 있다. 이러한 공간 웹 객체를 검색하기 위해서는 공간 정보와 텍스트 정보를 같이 검색할 수 있는 공간 키워드 질의가 사용된다. 공간 키워드 질의를 위한 기존의 연구들은 중앙집중식 처리 방식 기반으로 연구되었기 때문에 대용량의 데이터에는 적합하지 않다. 따라서 대용량의 공간 웹 객체를 빠르게 처리할 수 있는 방법에 대한 연구가 필요하다.
본 논문에서는 대용량의 공간 웹 객체를 빠르게 처리하기 위하여 분산 처리 환경에서 효율적인 공간 키워드 질의와 검색을 지원하는 분산 공간 컴퓨팅 프레임워크인 SKSpark를 설계하고 구현하였다.
SKSpark는 분산 처리 환경을 기반으로 인-메모리 기반의 공간 질의와 공간 파티셔닝을 지원하는 분산 공간 컴퓨팅 프레임워크이다. SKSPark는 SpatialKeyword Data Layer, GeoPartition Layer, SKIndex Layer를 만들어 공간 데이터와 공간 웹 객체를 효율적으로 지원한다. SpatialKeyword Data Layer에서는 원시 데이터를 공간정보와 텍스트정보를 저장할 수 있는 데이터로 변형을 해준다. GeoPartition Layer에서는 클러스터를 효율적으로 사용하기 위하여 각각의 클러스터가 비슷한 양의 데이터를 처리할 수 있도록 공간 정보를 기반으로 전체 데이터를 분할한다. SKIndex Layer는 하나의 파티션에 하나의 인덱스를 포함하여 Index에 질의를 수행할 수 있도록 하였다. SKIndex Layer에는 공간 인덱스, 공간 키워드 인덱스를 사용할 수 있도록 하고, 공간 질의, 공간 키워드 질의, 단어 질의 등 여러 질의를 처리할 수 있다.
제안하는 프레임워크는 분산 시스템을 효율적으로 사용하기 위하여 공간 정보를 기반으로 전체 데이터를 분할하여 전역 그리드를 만들어 파티션으로 사용하고, 하나의 클러스터에서 하나의 그리드를 처리하도록 한다. 또한 각각의 클러스터에는 지역 인덱스를 구성하여 검색의 효율성을 높였다. 전역 그리드에서 사용하는 공간 분할 기법은 STR과 BSP 등 사용가능하고, 클러스터에 구축되는 지역 인덱스는 공간 정보만을 이용하는 인덱스와 공간과 텍스트 정보를 이용하는 공간 키워드 인덱스를 사용할 수 있다.
기존의 공간 키워드 인덱스 기법은 R-tree와 역색인 파일을 많이 사용하였다. R-tree는 공간 인덱스에서 가장 많이 쓰이는 인덱스 기법이지만, 노드의 균형을 맞추기 위하여 많은 비용이 든다. 또한 역색인 파일은 베이스 인덱스의 각 노드마다 생성하기 때문에, 트리의 크기가 커진다는 단점이 있다. 따라서 본 논문에서는 인덱스의 크기를 줄이면서 인-메모리 컴퓨팅 환경에서 효율적인 성능을 보이는 QP-tree를 제안한다. QP-tree는 인메모리 컴퓨팅에 효율적인 성능을 보일수 있도록 Quad-tree를 사용하고, 텍스트 정보를 위하여 Patricia-trie를 사용하는 하이브리드 인덱스로, Quad-tree의 중간 노드는 공간 정보만을 저장하고, 말단 노드에는 노드가 포함하고 있는 텍스트 정보를 위하여 Patricia-trie를 사용한다.
본 논문에서는 실제 소셜 마이크로블로그 데이터에 기반한 실험을 통하여 제안하는 프레임워크의 우수성을 보인다. 제안하는 프레임워크는 소셜 네트워크 서비스에서 사용자의 위치를 기반으로 관심사에 대한 검색을 하는 서비스나, 지역별로 선거 후보에 대한 선호도 조사와 같은 공간 키워드 질의를 사용하는 서비스에 효과적으로 사용될 수 있다. 또한 1cm2 정도의 표면에 수천에서 수만 종류의 유전자 문자열 서열이 공간적으로 분포된 마이크로어레이(microarray) 데이터에 대한 빠른 검색을 지원함으로써 동․식물의 유전자 정보를 효율적으로 분석하기 위한 기반 시스템으로 사용될 수 있다.

Abstract ▼ AI-Helper

The spread of mobile devices with built-in GPS makes it easy for ordinary users to take advantage of location-based services. The users can tag their location information to the contents that they post on SNS or microblogging. The companies can advertise or market their products based on location information. The contents including the location information are called a spatial web object. As the interest in location-based services and the number of users using SNS services increases, the amount of space web objects shows an explosive increase. In order to search for a spatial web object, a spatial keyword query is used that can search text and spatial information together. Existing studies for spatial keyword queries are not suitable for dealing with massive data because they have been studied based on a centralized processing system. Therefore, it is necessary to study how to process plenty of spatial web objects quickly.
This paper designs and implements SKSpark, a distributed spatial computing framework that supports spatial keyword queries in a distributed processing environment, in order to efficient process and search of spatial web objects.
SKSpark is a distributed spatial computing framework that supports spatial query and spatial partitioning based on in-memory distributed processing environment. The proposed framework support efficiently spatial data and spatial web objects producing SpatialKeyword Data Layer, GeoPartition Layer, and SKIndex Layer. SpatialKeyword Data Layer transforms raw data into a data that can store spatial and text information. GeoPartition Layer partitions the whole data based on spatial information to use the clusters efficiently by letting each cluster process similar amount of data. SKIndex Layer allows you to perform an index query by including one index in one partition. SKIndex Layer can use indexes that support spatial index and spatial keyword index, and can process various queries such as spatial query, spatial keyword query, and word query.
In order to use the distributed system efficiently, the proposed framework divides the entire data based on the spatial information to generate a global grid and uses it as a partition, and allows all clusters to process the same number of partitions. In addition, each cluster has a local index to improve search efficiency. The spatial partitioning scheme used in the global grid uses the STR and the BSP, and the local index constructed in the cluster can use both an index with spatial information only and a spatial web object index with spatial and text information.
In the conventional spatial keyword index, the indexes use R-tree and inverted files. The R-tree is the most commonly used indexing technique in spatial indexes, but it has the disadvantage of balancing the nodes. In addition, since the inverted file is generated for each node of the base index, size of the tree is increased. Therefore, this paper proposes a QP-tree that shows better performance than the conventional technique while reducing the size of the index. The QP-tree is a hybrid index using Quad-tree for spatial information and Patricia-trie for text information. The non-leaf node of quad-tree stores only spatial information. Leaf node of quad-tree stores spatial information and stores textual information using Patricia-trie.
The performance of the system implemented with the proposed framework can be verified through experiment based on actual social microblog data. The proposed framework can be efficiently used for a social network service that searches for interests based on a user's location or a service that uses a spatial keyword query such as a preference survey for an election candidate by region. It can be used as an infrastructure system for efficiently analyzing the genetic information of plants and animal by supporting rapid search for spatially distributed microarray data from thousands to tens of thousands of genotypes on a surface of about 1cm2.

학위논문 정보

저자	양평우
학위수여기관	群山大學校
학위구분	국내박사
학과	컴퓨터情報工學科 컴퓨터情報工學專攻
지도교수	南光祐
발행연도	2017
총페이지	vi, 98장
언어	kor
원문 URL	http://www.riss.kr/link?id=T14617404&outLink=K
정보원	한국교육학술정보원

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명(한글), 저자명(한글), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문) 관리번호, 논문명(한글), 논문명(영문), 저자명(한글), 저자명(영문), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문)
저장형식	Text(ASCII format) Excel format
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

SKSpark : 공간 키워드 질의를 지원하는 분산 공간 컴퓨팅 프레임워크 원문보기

초록 ▼
AI-Helper

Abstract ▼ AI-Helper

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

SKSpark : 공간 키워드 질의를 지원하는 분산 공간 컴퓨팅 프레임워크 원문보기

초록 ▼ 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

초록 ▼
AI-Helper