IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0510054
(2000-02-22)
|
발명자
/ 주소 |
|
출원인 / 주소 |
- International Business Machines Corporation
|
인용정보 |
피인용 횟수 :
30 인용 특허 :
13 |
초록
▼
Metadata files representing Web document content are parsed in accordance with a specification file, with a specification file being generated for each class of documents, e.g., HTML pages, newsgroup articles, and JAVA programs. Each specification file has the same format, i.e., schema, as a metadat
Metadata files representing Web document content are parsed in accordance with a specification file, with a specification file being generated for each class of documents, e.g., HTML pages, newsgroup articles, and JAVA programs. Each specification file has the same format, i.e., schema, as a metadata file for the associated document class. Within each specification file, each element in the hierarchy is associated with a weight. When a metadata file is received, both the metadata file and the specification file are walked through top-down to parse data out of the metadata file into an index file in accordance with the weights in the specification file, e.g., a data element having a weight of zero is not written to the index file, an element with a weight of two is written out twice to the index file, and so on. Importantly, the tags in the metadata file are not written out to the index file. The index file is then used by an index engine to build an index, which can then be accessed by a query executor to respond to a user query for Web documents without having to search through an index containing tags and other data that is irrelevant to the search.
대표청구항
▼
1. A computer system, comprising:a general purpose computer; logic executable by the computer for undertaking method acts comprising: receiving metadata representing at least one document accessible via a wide area computer network, the metadata including plural elements; weighting at least some ele
1. A computer system, comprising:a general purpose computer; logic executable by the computer for undertaking method acts comprising: receiving metadata representing at least one document accessible via a wide area computer network, the metadata including plural elements; weighting at least some elements in accordance with a weighting scheme to render weighted metadata; and providing the weighted metadata to an index engine. 2. The system of claim 1, comprising the index engine, the index engine generating an index based on the metadata.3. The system of claim 2, comprising a crawler sending the metadata to the logic and a query executor accessing the index to execute queries for documents.4. The system of claim 1, wherein the method acts undertaken by the logic further include:generating at least one specification file for at least one respective metadata document class defining a metadata hierarchy, the specification file defining a specification hierarchy matching the metadata hierarchy. 5. The system of claim 4, wherein the method acts undertaken by the logic further include generating plural specification files for respective plural classes.6. The system of claim 4, wherein the specification file includes at least one higher element having an associated higher weight and at least one lower element, the lower element being hierarchically lower than the higher element, the lower element having an associated weight attribute, the lower element having a default weight equal to the higher weight when the weight attribute is null, the lower element otherwise having a weight equal to a value in the weight attribute.7. The system of claim 6, wherein the metadata is arranged in a hierarchical metadata file having plural tags with associated metadata elements, and the weighting act undertaken by the logic further includes:for each metadata element in the metadata file, accessing a corresponding weight in the specification file; and writing out metadata elements but not tags in accordance with the respective weights, wherein a metadata element is not written out if its respective weight is zero and a metadata element is written out twice if its respective weight is two. 8. A computer-implemented method for indexing documents, comprising:generating a specification file for each of a plurality of document classes defining respective metadata hierarchies; receiving at least one metadata file representative of at least one document; parsing the metadata file in accordance with the specification file to write out data to an index file in markup language in accordance with weights defined by the specification file; and sending the index file to an indexing engine of a Web search engine, the indexing engine being selected from the group including full text indexing engines, value indexing engines, and path expression indexing engines. 9. The method of claim 8, further comprising indexing data using the index file to render an index and then using the index to execute a query for Web-based documents.10. The method of claim 8, wherein the specification file includes at least one higher element having an associated higher weight and at least one lower element, the lower element being hierarchically lower than the higher element, the lower element having an associated weight attribute, the lower element having a default weight equal to the higher weight when the weight attribute is empty, the lower element otherwise having a weight equal to a value in the weight attribute.11. The method of claim 10, wherein the metadata file has plural tags with associated metadata elements, and the method further includes:for each metadata element in the metadata file, accessing a corresponding weight in the specification file; and writing out metadata elements but not tags in accordance with the respective weights, wherein a metadata element is not written out if its respective weight is zero and a metadata element is written out twice if its respective weight is two. 12. A computer program device comprising:a computer program storage device readable by a digital processing apparatus; and a program on the program storage device and including instructions executable by the digital processing apparatus, the program comprising: computer readable code means for receiving at least one metadata file representative of a document on the World Wide Web, the metadata file including tags and associated data elements; and computer readable code means for writing only data elements to an index file, a data element being written “n” times to the index file, wherein “n” is a weight associated with the data element. 13. The device of claim 12, wherein “n” is zero for first elements, one for second elements, and at least two for third elements.14. The device of claim 12, further comprising computer readable code means for indexing information in the metadata file using the index file to generate an index.15. The device of claim 14, further comprising computer readable code means for accessing the index to execute a keyword query for Web documents.16. The device of claim 12, further comprising computer readable code means for generating at least one specification file for at least one respective metadata document class defining a metadata hierarchy, the specification file defining a specification hierarchy matching the metadata hierarchy.17. The device of claim 16, wherein the specification file includes at least one higher element having an associated higher weight and at least one lower element, the lower element being hierarchically lower than the higher element, the lower element having an associated weight attribute, the lower element having a default weight equal to the higher weight when the weight attribute is empty, the lower element otherwise having a weight equal to a value in the weight attribute.18. The device of claim 17, wherein the metadata is arranged in a hierarchical metadata file having plural tags with associated metadata elements, and the device further comprises:computer readable code means for, for each metadata element in the metadata file, accessing a corresponding weight in the specification file; and computer readable code means for writing out metadata elements but not tags in accordance with the respective weights, wherein a metadata element is not written out if its respective weight is zero and a metadata element is written out twice if its respective weight is two. 19. The system of claim 1, wherein a weight of at least a first element is a function of a weight of at least a second element.20. The method of claim 8, wherein a weight of at least a first element is a function of a weight of at least a second element.21. The device of claim 12, wherein a weight of at least a first element is a function of a weight of at least a second element.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.