[특허]Expanded inverted index

Expanded inverted index 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-017/30 G06F-015/16
출원번호	UP-0606804 (2006-11-29)
등록번호	US-7634468 (2009-12-24)
발명자 / 주소	Stephan, Wolfgang
출원인 / 주소	SAP AG
대리인 / 주소	Mintz, Levin, Cohn, Ferris, Glovsky and Popeo, P.C.
인용정보	피인용 횟수 : 44 인용 특허 : 4

초록 ▼

Indexing documents is accomplished by generating an inverted index for a collection of one or more documents. The inverted index includes an inverted list for an index term appearing in one or more of the documents in the collection, and one or more postings. A posting includes a document identifier identifying a document in the collection of documents, a position identifier identifying a position of the index term in the document, and proximity information specifying whether the index term is positioned in a predefined proximal relationship between the index term and another a second index term in the document.

대표청구항 ▼

What is claimed is: 1. A computer implemented method comprising: parsing, by at least one of one or more processors, each document in a collection of documents to create a vocabulary comprising a plurality of index terms that occur in the collection of documents, the plurality of index terms comprising commonly occurring terms and infrequently occurring terms, the commonly occurring terms differing from the infrequently occurring terms; identifying, by at least one of the one or more processors, the commonly occurring terms and assigning a unique common term identifier value to each of the commonly occurring terms; generating, by at least one of the one or more processors, a plurality of expanded inverted lists, each expanded inverted list of the plurality of expanded inverted lists corresponding to one of the plurality of index terms and comprising a posting for each document in the collection of documents that includes the corresponding one of the plurality of index terms, each posting comprising: an identification of the document of the collection of documents in which the corresponding one of the plurality of index terms appears, an indication of whether the corresponding one of the plurality of index terms in the one of the collection of documents is one of the infrequently occurring terms and whether the corresponding one of the plurality of index terms occurs immediately adjacent to one of the commonly occurring terms, a number of times the corresponding one of the plurality of index terms occurs in the one of the collection of documents, a location offset value specifying where in the document the corresponding one of the plurality of index terms occurs relative to a reference location in the document, and the unique common term identifier assigned to the commonly occurring one of the plurality of index terms if the corresponding one of the plurality of index terms occurs immediately adjacent to one of the commonly occurring terms; creating, by at least one of the one or more processors, an expanded inverted index that comprises the vocabulary and the expanded inverted list corresponding to each of the plurality of index terms; parsing, by at least one of the one or more processors, a search query that comprises a phrase query of two or more search terms that must appear in a specified order, the parsing comprising identifying a sequence of the search terms that includes a first one of the commonly occurring terms that is immediately followed or immediately preceded by one of the infrequently occurring terms and also identifying a second one of the search terms that is a second of the infrequently occurring terms and that is not immediately adjacent to one of the commonly occurring terms; retrieving, by at least one of the one or more processors, two or more of the plurality of expanded inverted lists from the expanded inverted index, the two or more of the plurality of expanded inverted lists comprising expanded inverted lists corresponding to the first and the second infrequently occurring index terms identified in the parsing; and returning, by at least one of the one or more processors, one or more documents of the collections of documents that appear in all of the retrieved two or more of the plurality of expanded inverted lists after evaluating the search query using only the retrieved two or more of the plurality of expanded inverted lists. 2. A method as in claim 1, wherein the indication of whether the corresponding one of the plurality of index terms in the one of the collection of documents is one of the infrequently occurring terms and that the corresponding one of the plurality of index terms occurs immediately adjacent to one of the commonly occurring terms indicates that the corresponding one of the plurality of index terms occurs immediately after one of the commonly occurring terms. 3. A method as in claim 1, wherein the indication of whether the corresponding one of the plurality of index terms in the one of the collection of documents is one of the infrequently occurring terms and that the corresponding one of the plurality of index terms occurs immediately adjacent to one of the commonly occurring terms indicates that the corresponding one of the plurality of index terms occurs immediately before one of the commonly occurring terms. 4. A method as in claim 1, wherein each posting further comprises information about whether the corresponding one of the plurality of index terms in the one of the collection of documents is one of the infrequently occurring terms and that the corresponding one of the plurality of index terms occurs both immediately after one of the commonly occurring terms and immediately before another one of the commonly occurring terms; and wherein the parsing further comprises identifying an additional sequence of the search terms that includes one of the commonly occurring terms that is both immediately followed and immediately preceded by one of the infrequently occurring terms. 5. A method as in claim 1, wherein the documents in the collections of documents comprise one or more of source code, binary files, tables of genetic code, text documents, structured documents, and unstructured documents. 6. A method as in claim 1, further comprising compressing the expanded inverted index using an integer compression scheme. 7. A method as in claim 1, wherein the vocabulary comprises all index terms that occur in the collection of documents. 8. A method as in claim 1, wherein the reference location is a beginning of the document and wherein the location offset value indicates how many terms from the beginning of the document the corresponding term is. 9. A computer-implemented method comprising: parsing, by at least one of one or more processors, a search query that comprises a phrase query of two or more search terms that must appear in a specified order, the parsing comprising identifying a sequence of the search terms that includes a first one of a plurality of commonly occurring terms that is immediately adjacent to one of a plurality of infrequently occurring terms and also identifying a second of the search terms that is a second one of the plurality of infrequently occurring terms and that is not immediately adjacent to any of the commonly occurring terms; retrieving, by at least one of the one or more processors, two or more expanded inverted lists of a plurality of expanded inverted lists that comprise an expanded inverted index, the expanded inverted index comprising a vocabulary that comprises a plurality of index terms that occur in a collection of documents that are searched in response to the search query, each of the plurality of expanded inverted lists corresponding to one of the plurality of index terms, the plurality of index terms comprising commonly occurring terms and infrequently occurring terms, the commonly occurring terms and infrequently occurring terms differing form each other, each expanded inverted list comprising a posting for each document in the collection of documents that includes the corresponding index term, each posting in each expanded inverted list comprising: an identification of the document of the collection of documents in which the corresponding index term appears, an indication of whether the corresponding index term in the one of the collection of documents is one of the infrequently occurring terms and whether the corresponding index term occurs immediately adjacent to one of the commonly occurring terms, a number of times the corresponding index term occurs in the one of the collection of documents, a location offset value specifying where in the document the corresponding term occurs relative to a reference location in the document, and a unique common term identifier assigned to the commonly occurring term if the corresponding index term occurs immediately adjacent to one of the commonly occurring terms, the retrieved two or more of the plurality of expanded inverted lists corresponding to the first and the second infrequently occurring index terms identified in the parsing; returning, by at least one of the one or more processors, one or more documents of the collections of documents that appear in the all of the two or more of the plurality of expanded inverted lists after evaluating the search query using only the retrieved two or more of the plurality of expanded inverted lists. 10. An apparatus comprising: one or more processors that perform functions comprising: parsing each document in a collection of documents to create a vocabulary comprising a plurality of index terms that occur in the collection of documents, the plurality of index terms comprising commonly occurring terms and infrequently occurring terms, the commonly occurring terms differing from the infrequently occurring terms; identifying the commonly occurring terms and assigning a unique common term identifier value to each of the commonly occurring terms; generating a plurality of expanded inverted lists, each expanded inverted list of the plurality of expanded inverted lists corresponding to one of the plurality of index terms and comprising a posting for each document in the collection of documents that includes the corresponding one of the plurality of index terms, each posting comprising: an identification of the document of the collection of documents in which the corresponding one of the plurality of index terms appears, an indication of whether the corresponding one of the plurality of index terms in the one of the collection of documents is one of the infrequently occurring terms and whether the corresponding one of the plurality of index terms occurs immediately adjacent to one of the commonly occurring terms, a number of times the corresponding one of the plurality of index terms occurs in the one of the collection of documents, a location offset value specifying where in the document the corresponding one of the plurality of index terms occurs relative to a reference location in the document, and the unique common term identifier assigned to the commonly occurring one of the plurality of index terms if the corresponding one of the plurality of index terms occurs immediately adjacent to one of the commonly occurring terms; creating an expanded inverted index that comprises the vocabulary and the expanded inverted list corresponding to each of the plurality of index terms; parsing a search query that comprises a phrase query of two or more search terms that must appear in a specified order, the parsing comprising identifying a sequence of the search terms that includes a first one of the commonly occurring terms that is immediately followed or immediately preceded by one of the infrequently occurring terms and also identifying a second one of the search terms that is a second of the infrequently occurring terms and that is not immediately adjacent to one of the commonly occurring terms; retrieving of the plurality of expanded inverted lists from the expanded inverted index, the two or more of the plurality of expanded inverted lists comprising expanded inverted lists corresponding to the first and the second infrequently occurring index terms identified in the parsing; and returning one or more documents of the collections of documents that appear in all of the retrieved two or more of the plurality of expanded inverted lists after evaluating the search query using only the retrieved two or more of the plurality of expanded inverted lists.

이 특허에 인용된 특허 (4)

Rubin Bradley Scott, Object oriented information retrieval framework mechanism.
상세보기
Spencer Graham, System and method for accelerated query evaluation of very large full-text databases.
상세보기
Vogel Claude,FRX, Text processing and retrieval system and method.
상세보기
Henderson Richard D. (San Jose CA) Barbarino Michael J. (Moss Beach CA), Text retrieval method and system using signature of nearby words.
상세보기

이 특허를 인용한 특허 (44)

Hoffberg, Steven M.; Hoffberg-Borghesani, Linda I., Adaptive pattern recognition based controller apparatus and method and human-interface therefore.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Adding information or functionality to a rendered document via association with an electronic counterpart.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Adding value to a rendered document.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Aggregate analysis of text captures performed by multiple users from rendered documents.
상세보기
King, Martin; Grover, Dale; Kushler, Clifford; Stafford-Fraser, James; Mannby, Claes-Fredrik, Archive of text captures from rendered documents.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Association of a portable scanner with input/output and storage devices.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Association of a portable scanner with input/output and storage devices.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Automatic modification of web pages.
상세보기
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Automatically capturing information, such as capturing information using a document-aware device.
상세보기
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J.; Daley-Watson, Christopher J., Automatically providing content associated with captured information, such as information captured in real-time.
상세보기
King, Martin Towle; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Quentin, Capturing text from rendered documents using supplement information.
상세보기
King, Martin Towle; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Quentin, Capturing text from rendered documents using supplemental information.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Contextual dynamic advertising based upon captured rendered text.
상세보기
Qin, Jian, Creation of inverted index system, and data processing method and apparatus.
상세보기
King, Martin Towle; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Quentin, Data capture from rendered documents using handheld device.
상세보기
King, Martin Towle; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Quentin, Data capture from rendered documents using handheld device.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Determining actions involving captured information and electronic content associated with rendered documents.
상세보기
Hornkvist, John M.; Koebler, Eric R., Directory tree search.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Document enhancement system and method.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Establishing an interactive environment for rendered documents.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Handheld device for capturing text from both a document printed on paper and a document displayed on a dynamic display device.
상세보기
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Identifying a document by performing spectral analysis on the contents of the document.
상세보기
King, Martin T.; Mannby, Claes-Fredrik; Smith, Michael J., Image search using text-based elements within the contents of images.
상세보기
Hoffberg, Steven M.; Hoffberg-Borghesani, Linda I., Internet appliance system and method.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Method and system for character recognition.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Method and system for character recognition.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Quentin, Method and system for character recognition.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Methods and systems for initiating application processes by data capture from rendered documents.
상세보기
King, Martin T.; Mannby, Claes-Fredrik; Arends, Thomas C.; Bajorins, David P.; Fox, Daniel C., Optical scanners, such as hand-held optical scanners.
상세보기
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Performing actions based on capturing information from rendered documents, such as documents under copyright.
상세보기
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Performing actions based on capturing information from rendered documents, such as documents under copyright.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Processing techniques for text capture from a rendered document.
상세보기
King, Martin Towle; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Quentin, Processing techniques for text capture from a rendered document.
상세보기
King, Martin T.; Kushler, Clifford A.; Stafford-Fraser, James Q.; Grover, Dale L., Processing techniques for visual capture data from a rendered document.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Publishing techniques for adding value to a rendered document.
상세보기
Gutlapalli, Hari K.; Kothari, Shirish K.; Mehta, Suhas R.; Pak, Wai, Push-model based index updating.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Search engines and systems with handheld document data capture devices.
상세보기
King, Martin Towle; Stafford-Fraser, James Quentin; Kushler, Clifford A.; Grover, Dale L., System and method for information gathering utilizing form identifiers.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
상세보기
King, Martin T.; Mannby, Claes-Fredrik; Valenti, William, Using gestalt information to identify locations in printed information.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Expanded inverted index 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (4)

이 특허를 인용한 특허 (44)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Expanded inverted index 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (4)

이 특허를 인용한 특허 (44)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트