[논문]웹 검색엔진 분류 및 하이브리드 검색엔진의 필요성

백주련

doi:10.9728/dcs.2018.19.4.719

웹 검색엔진 분류 및 하이브리드 검색엔진의 필요성
Classification of Web Search Engines and Necessity of a Hybrid Search Engine 원문보기

디지털콘텐츠학회 논문지 = Journal of Digital Contents Society, v.19 no.4, 2018년, pp.719 - 729

초록
AI-Helper

2017년 기준, 데스크탑과 모바일 영역에서 90% 이상의 압도적인 점유율을 보이는 검색엔진은 Google로써, 대다수의 사람들은 Google 이 검색하는 영역이 웹의 전체라고 생각할 것이다. 그러나 웹 연구 결과에 의하면 전체 웹 데이터의 불과 10% 만이 Google에 의해 검색가능하다고 한다. 대부분의 영역은 딥 웹이라고 불리며 Google 과는 다른 종류의 검색엔진들에 의해 검색된다. 해당 엔진들은 자신만의 딥 웹 데이터베이스를 구축 후 특화된 알고리즘을 사용하여 높은 정확성과 전문성의 검색결과를 제공한다. 현재 사용되고 있는 검색엔진들 중, 전체 웹 영역을 검색하는 엔진은 존재하지 않는다. 광범위에 걸쳐 그리고 유효하면서 정확 신속한 검색을 수행하기 위한 최적의 방법은 Google 같은 일반적인 검색엔진과 딥 웹 검색엔진들을 동시에 적용하여 결과를 도출하는 것이다. 본 논문에서는 이러한 검색엔진을 하이브리드검색엔진이라 명하고 기존 검색엔진들에 비해 갖는 차이점 및 특징에 대해 살펴본 후 개괄적인 프레임을 제시한다.

Abstract ▼ AI-Helper

Abstract In 2017, it has been reported that Google had more than 90% of the market share in search-engines of desktops and mobiles. Most people may consider that Google surely searches the entire web area. However, according to many researches for web data, Google only searches less than 10%, surprisingly. The most region is called the Deep Web, and it is indexable by special search engines, which are different from Google because they focus on a specific segment of interest. Those engines build their own deep-web databases and run particular algorithms to provide accurate and professional search results. There is no search engine that indexes the entire Web, currently. The best way is to use several search engines together for broad and efficient searches as best as possible. This paper defines that kind of search engine as Hybrid Search Engine and provides characteristics and differences compared to conventional search engines, along with a frame of hybrid search engine.

주제어

질의응답

핵심어	질문	논문에서 추출한 답변
	딥 웹이 검색 엔진으로 접근이 허용되지 않는 이유는?	문제는 우리가 손쉽게 이용하는 구글, 야후, Bing 등과 같은 검색 엔진으로는 접근이 허용되지 않는다는 것이다. 왜냐하면 딥 웹을 구성하고 있는 대다수의 데이터원 들은 전문적인 데이터베이스로 구축되어 있기에 허가받지 못한 사용자들은 접근 자체를 할 수 없을 뿐 더러 해당 데이터들은 일반 검색엔진들에게 색인을 허용하지 않기 때문이다. 끊임 없이 변하고 있는 표면 웹 데이터와 더불어 그 양을 측정할 수없는 딥 웹의 데이터까지, 사용자의 요구에 부합하는 정보를 얼마나 정확하고 빠르게 그리고 가능한 많이 웹 세상으로부터 도출하는가는 데이터 주도의 빅데이터 세상에서 검색 엔진들이 필수적으로 갖추어야 하는 능력일 것이다.
	딥 웹의 데이터는 무엇인가?	전체 웹 정보의 90% 이상 (일반적으로, 96% 정도의 비중을 차지하고 있는 것으로 고려)을 차지하고 있는 딥 웹의 데이터들2) 은, 표면 웹에 존재하는 데이터와 비교해서 훨씬 더 전문적 이고 정확하게 수준 높은 지식을 전달하는 도메인 특화된 데이터라고 할 수 있다. 그림 1은 딥 웹을 구성하고 있는 전문 영역들에 대한 개괄적인 분류 또한 보인다.
	딥 웹의 데이터 증가의 문제점은?	뿐만 아니라, 딥 웹은 전체 웹 영역에서 가장 빠르게 데이터가 증가하고 있는 영역이다. 문제는 우리가 손쉽게 이용하는 구글, 야후, Bing 등과 같은 검색 엔진으로는 접근이 허용되지 않는다는 것이다. 왜냐하면 딥 웹을 구성하고 있는 대다수의 데이터원 들은 전문적인 데이터베이스로 구축되어 있기에 허가받지 못한 사용자들은 접근 자체를 할 수 없을 뿐 더러 해당 데이터들은 일반 검색엔진들에게 색인을 허용하지 않기 때문이다.

참고문헌 (26)

Total number of websites. Available: http://www.internetlivestats.com/total-number-of-websites/
C. Asselin, Discover and exploit the invisible web for competitive intelligence, Digimind, New York, 2006.
M. K. Bergman (2001, August). "The deep web: surfacing hidden value," The Journal of Electronic Publishing [Online]. 7(1). Available: https://brightplanet.com/2012/06/the-deep-web-surfacing-hidden-value/
B. He, M. Patel, Z. Zhang, and K. C.-C. Chang, "Accessing the deep web," Communications of the ACM, Vol. 50, No. 5, pp. 95-101, May 2007.
Y. Ru and E. Horowitz, "Indexing the invisible web: a survey," Online Information Review, Vol. 29, No. 3, pp. 249-265, 2005.

상세보기
A. Alba, V. Bhagwan, and T. Grandison, "Accessing the deep web: when good ideas go bad," in Proceedings Companion to the 23rd ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications, Nashville, USA, pp. 815-818, October 19-23, 2008.
A. Ghani (2017, March). How to access the deep web safely [Internet]. Available: http://www.techglows.com/access-deep-web-safely/.
S. Lawewnce and CL. Giles, "Searching the world wide web," Science Magazine, Vol. 280, No. 5360, pp. 98-100, April 1998.
A. Gulli and A. Signorini, "The indexable web is more than 11.5 billion pages," in Proceedings of the 14th International Conference on World Wide Web, Chiba, Japan, pp. 902-903, May 10-14, 2005.
W. B. Croft, D. Metzler, and T. Strohman, Search engines information retrieval in practice, Pearson, pp. 1-28, 2009.
ATLAS Research & Consulting, The global trends for the post-Google and the requirements for the next generation of the search engines, DigiEco, June 2008. Available: http://digieco.co.kr/KTFront/report/report_issue_trend_view.action?board_idstrategy&board_seq756&sort_ordernew&list_page#
B. A. Galitsky and B. Kovalerchuk, "Building a repository of background knowledge using semantic skeletons," in Proceedings of AAAI Spring Symposium 2006 - Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Question Answering, CA, USA, pp. 22-27, March 27-29, 2006.
A. McCallum, K. Nigam, J. Rennie, and K. Seymore, "A machine learning approach to building domain-specific search engines," in Proceedings of the 16th International Joint Conference on Artificial Intelligence, Vol.2, Stockholm, Sweden, pp. 662-667, July 31 - August 6, 1999.
E. J. Glover, S. Lawrence, W. P. Birmingham, and C. L. Giles, "Architecture of a metasearch engine that supports user information needs," in Proceedings of the 8th International Conference on Information and Knowledge Management, Missouri, USA, pp. 210-216, November 2-6, 1999.
M. Nanoj and J. Elizabeth, "Information retrieval on internet using meta-search engines: a review," Journal of Scientific & Industrial Research, Vol. 67, No. 10, pp. 739-746, October 2008.

상세보기
R. Shettar and R. Bhuptani, "A vertical search engine - based on domain classifier," International Journal of Computer Science and Security, Vol. 2, No. 4, pp. 18-27, November 2008.
M. Cui and S. Hu, "Search engine optimization research for website promotion," in Proceedings of International Conference on Information Technology, Computer Engineering and Management Sciences, Jiangsu, China, pp. 100-103, September 24-25, 2011.
G. Luo, C. Tang, H. Yang, and X. Wei, "MedSearch: a specialized search engine for medical information retrieval," in Proceedings of the 17th ACM Conference on Information and Knowledge Management, California, USA, pp. 143-152, October 26-30, 2008.
X. Y. Xu and D. Zhao, "Research on the development of vertical search engines," Advances in Future Computer and Control Systems, Vol. 1, pp. 579-584, 2012.
Dogpile.com, University of Pittsburgh, and Pennsylvania State University, Different Engines, Different Results, DOGPILE, 30 pages, April 2007.
F. Yuan and J. Wang, "An implemented rank merging algorithm for meta search engine," in Proceedings of International Conference on Research Challenges in Computer Science, Shanghai, China, pp. 191-193, December 28-29, 2009.
Y. Lu, W. Meng, L. Shu, C. Yu, and K.-L. Liu, "Evaluation of result merging strategies for metasearch engines," Lecture Notes in Computer Science, Vol. 3860, pp. 53-66, 2005.
D. Sheldon, M. Shokouhi, M. Szummer, and N. Craswll, "LambdaMerge: merging the results of query reformulations," in Proceedings of the 4th ACM International Conference on Web Search and Data Mining, Hong Kong, China, pp. 795-804, February 2009.
H. Jadidoleslamy, "Search result merging and ranking strategies in meta-search engines: a survey," International Journal of Computer Science Issues, Vol. 9, No. 3, pp. 239-251, July 2012.
M. Khaled Abd El-Fatah, Merging multiple search results approach for meta-search engines, Doctoral Dissertation, University of Pittsburgh, School of Information Sciences, PA, January 2006.
S. Oh and B. Kim, "Query processing model for internet ontology data change," Journal of Digital Contents Society, Vol. 17, No. 1, pp. 11-22, February 2016.

원문보기 상세보기

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

웹 검색엔진 분류 및 하이브리드 검색엔진의 필요성
Classification of Web Search Engines and Necessity of a Hybrid Search Engine 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

질의응답

참고문헌 (26)

이 논문을 인용한 문헌

저자의 다른 논문 :

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

웹 검색엔진 분류 및 하이브리드 검색엔진의 필요성 Classification of Web Search Engines and Necessity of a Hybrid Search Engine 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

질의응답

참고문헌 (26)

이 논문을 인용한 문헌

저자의 다른 논문 :

백주련 (1)

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

웹 검색엔진 분류 및 하이브리드 검색엔진의 필요성
Classification of Web Search Engines and Necessity of a Hybrid Search Engine 원문보기

초록
AI-Helper