최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
DataON 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Edison 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Kafe 바로가기국가/구분 | United States(US) Patent 등록 |
---|---|
국제특허분류(IPC7판) |
|
출원번호 | US-0831909 (2010-07-07) |
등록번호 | US-8713021 (2014-04-29) |
발명자 / 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 | 피인용 횟수 : 11 인용 특허 : 533 |
According to one embodiment, a latent semantic mapping (LSM) space is generated from a collection of a plurality of documents, where the LSM space includes a plurality of document vectors, each representing one of the documents in the collection. For each of the document vectors considered as a cent
According to one embodiment, a latent semantic mapping (LSM) space is generated from a collection of a plurality of documents, where the LSM space includes a plurality of document vectors, each representing one of the documents in the collection. For each of the document vectors considered as a centroid document vector, a group of document vectors is identified in the LSM space that are within a predetermined hypersphere diameter from the centroid document vector. As a result, multiple groups of document vectors are formed. The predetermined hypersphere diameter represents a predetermined closeness measure among the document vectors in the LSM space. Thereafter, a group from the plurality of groups is designated as a cluster of document vectors, where the designated group contains a maximum number of document vectors among the plurality of groups.
1. A computer-implemented method for clustering documents, comprising: at a device comprising one or more processors and memory: generating a latent semantic mapping (LSM) space from a collection of a plurality of documents, the LSM space includes a plurality of document vectors, each representing o
1. A computer-implemented method for clustering documents, comprising: at a device comprising one or more processors and memory: generating a latent semantic mapping (LSM) space from a collection of a plurality of documents, the LSM space includes a plurality of document vectors, each representing one of the documents in the collection;identifying a plurality of centroid document vectors from the plurality of document vectors;forming a plurality of document groups each including a respective group of document vectors in the LSM space that are within a predetermined hypersphere diameter from a respective one of the plurality of centroid document vectors, wherein the predetermined hypersphere diameter represents a predetermined closeness measure among the document vectors in the LSM space; andselectively designating a particular document group from the plurality of document groups as a document cluster based on the particular document group containing a maximum number of document vectors among the plurality of document groups. 2. The method of claim 1, further comprising: removing one or more document vectors in the designated document group from the plurality of document vectors in the LSM space; andrepeating the forming and the selectively designating using document vectors still remaining in the LSM space. 3. The method of claim 2, wherein removing and repeating are iteratively performed until the designated document group in the latest iteration contains a number of document vectors that are fewer than a predetermined number of document vectors. 4. The method of claim 2, further comprising compensating one or more groups of document vectors that are overlapped with the designated document group in view of the removed one or more document vectors during the repeating. 5. The method of claim 2, wherein the predetermined hypersphere diameter is selected from a range of hypersphere diameters having incremental size in sequence, and wherein the predetermined hypersphere diameter is identified when a difference in numbers of document vectors in two adjacent hypersphere diameters in the range reaches the maximum. 6. The method of claim 2, further comprising: in response to a new document, mapping the new document into a new document vector in the LSM space;determining a closeness measure between the new document vector and each of the document clusters that have been designated in the LSM space; andclassifying the new document as a member of one or more of the document clusters based on the determined closeness measure. 7. The method of claim 6, wherein the closeness measure is determined by measuring a distance between the new document vector and a respective centroid document vector of each document cluster that has been designated in the LSM space. 8. The method of claim 6, further comprising reevaluating the one or more document clusters in view of the new document as a part of the collection of the plurality of documents. 9. A non-transitory machine-readable storage medium having instructions stored thereon, which when executed by a machine, cause the machine to perform a method for clustering documents, the method comprising: generating a latent semantic mapping (LSM) space from a collection of a plurality of documents, the LSM space includes a plurality of document vectors, each representing one of the documents in the collection;identifying a plurality of centroid document vectors from the plurality of document vectors;forming a plurality of document groups each including a respective group of document vectors in the LSM space that are within a predetermined hypersphere diameter from a respective one of the plurality of centroid document vectors, wherein the predetermined hypersphere diameter represents a predetermined closeness measure among the document vectors in the LSM space; andselectively designating a particular document group from the plurality of groups as a document cluster-based on the particular document group containing a maximum number of document vectors among the plurality of document groups. 10. The machine-readable storage medium of claim 9, wherein the method further comprises: removing one or more document vectors in the designated document group from the plurality of document vectors in the LSM space; andrepeating the forming and the selectively designating using document vectors still remaining in the LSM space. 11. The machine-readable storage medium of claim 10, wherein removing and repeating are iteratively performed until the designated document group in the latest iteration contains a number of document vectors that are fewer than a predetermined number of document vectors. 12. The machine-readable storage medium of claim 10, wherein the method further comprises compensating one or more groups of document vectors that are overlapped with the designated document group in view of the removed one or more document vectors during the repeating. 13. The machine-readable storage medium, wherein the predetermined hypersphere diameter is selected from a range of hypersphere diameters having incremental size in sequence, and wherein the predetermined hypersphere diameter is identified when a difference in numbers of document vectors in two adjacent hypersphere diameters in the range reaches the maximum. 14. The machine-readable storage medium of claim 10, wherein the method further comprises: in response to a new document, mapping the new document into a new document vector in the LSM space;determining a closeness measure between the new document vector and each of the document clusters that have been designated in the LSM space; andclassifying the new document as a member of one or more of the document clusters based on the determined closeness measure. 15. The machine-readable storage medium of claim 14, wherein the closeness measure is determined by measuring a distance between the new document vector and a respective centroid document vector of each document cluster that has been designated in the LSM space. 16. The machine-readable storage medium of claim 14, wherein the method further comprises reevaluating the one or more document clusters in view of the new document as a part of the collection of the plurality of documents. 17. A data processing system, comprising: one or more processors; anda memory coupled to the one or more processors and storing instructions, which when executed by the one or more processors, cause the processors to: generate a latent semantic mapping (LSM) space from a collection of a plurality of documents, the LSM space includes a plurality of document vectors, each representing one of the documents in the collection,identify a plurality of centroid document vectors from the plurality of document vectors;form a plurality of document clusters each including a respective group of document vectors in the LSM space that are within a predetermined hypersphere diameter from the centroid document vector, wherein the predetermined hypersphere diameter represents a predetermined closeness measure among the document vectors in the LSM space, andselectively designate a particular document group from the plurality of document groups as a document cluster based on the particular document group containing a maximum number of document vectors among the plurality of document groups. 18. A computer-implemented method for classifying a document, comprising: at a device comprising one or more processors and memory: in response to receiving a new document to be classified, mapping the new document into a new document vector in a latent semantic mapping (LSM) space, the LSM space having one or more semantic anchors representing one or more document clusters, wherein each of the one or more document clusters is generated based on a respective iteration of an iterative process performed on a given collection of document vectors, wherein, during the respective iteration, a particular document group from a plurality of document groups is selectively designated as the document cluster based on the particular document group containing a maximum number of document vectors among the plurality of document groups, and wherein each of the plurality of document groups includes a respective group of document vectors within a predetermined closeness measure of a respective one of a plurality of centroid document vectors in the LSM space;determining a closeness distance between the new document vector and each of the semantic anchors in the LSM space; andclassifying the new document as a member of one or more of the document clusters if the closeness distance between the new document vector and one or more corresponding semantic anchors is within a predetermined threshold. 19. The method of claim 18, wherein the one or more document clusters-are reevaluated in view of the new document which is considered as a part of the given collection of documents. 20. A computer-implemented method for clustering documents, comprising: at a device comprising one or more processors and memory: selecting a hypersphere diameter as a current hypersphere diameter from a range of a plurality of hypersphere diameters in a latent semantic mapping (LSM) space, the LSM space having a plurality of document vectors, each representing one of a plurality of documents of a collection; andfor each of the document vectors in the LSM space considered as a centroid document vector, iteratively performing the following: identifying a document group in the LSM space, the document group including a respective group of document vectors that are within the current hypersphere diameter from the centroid document vector,calculating a ratio between a first number of document vectors of the identified document group associated with the current hypersphere diameter and a second number of document vectors of a document group associated with a previous hypersphere diameter,adjusting the current hypersphere diameter by a predetermined value,repeating the identifying and calculating operations one or more times to form a plurality of document groups, andselectively designating a particular document group associated with a maximum ratio among the calculated plurality of ratios as an initial cluster candidate. 21. The method of claim 20, further comprising: selectively designating a particular initial cluster candidate of a plurality of initial cluster candidates as a final cluster candidate based on the final cluster candidate having the maximum number of document vectors among the plurality of initial cluster candidates;removing one or more document vectors of the final cluster candidate from the plurality of document vectors in the LSM space; andrepeating operations of the selecting a hypersphere diameter, the identifying a document group, the calculating a ratio, the adjusting the current hypersphere diameter, the selectively designating a particular document group as an initial cluster candidate, and the selectively designating a particular initial cluster candidate as a final cluster candidate, to form one or more document clusters. 22. The method of claim 21, wherein removing one or more document vectors and repeating the operations are iteratively performed until the final cluster candidate in the latest iteration contains a number of document vectors that are fewer than a predetermined number of document vectors. 23. The method of claim 21, further comprising: in response to a new document, mapping the new document into a new document vector in the LSM space;determining a closeness measure between the new document vector and each of the document clusters in the LSM space; andclassifying the new document as a member of one or more of the document_clusters based on the determined closeness measure. 24. The method of claim 23, further comprising reevaluating the one or more document clusters in view of the new document as a part of the collection of the plurality of documents.
Copyright KISTI. All Rights Reserved.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.