IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0991990
(2004-11-18)
|
등록번호 |
US-7444387
(2008-10-28)
|
발명자
/ 주소 |
- Douceur,John R.
- Theimer,Marvin M.
- Adya,Atul
- Bolosky,William J.
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
35 인용 특허 :
97 |
초록
▼
Potentially identical objects (e.g., files) are located across multiple computers based on stochastic partitioning of workload. For each of a plurality of objects stored on a plurality of computers in a network, a portion of object information corresponding to the object is selected. The object inf
Potentially identical objects (e.g., files) are located across multiple computers based on stochastic partitioning of workload. For each of a plurality of objects stored on a plurality of computers in a network, a portion of object information corresponding to the object is selected. The object information can be generated in a variety of manners (e.g., based on hashing the object, based on characteristics of the object, and so forth). Any of a variety of portions of the object information can be used (e.g., the least significant bits of the object information). A stochastic partitioning process is then used to identify which of the plurality of computers to communicate the object information to for identification of potentially identical objects on the plurality of computers.
대표청구항
▼
The invention claimed is: 1. A method, implemented in a computer that is one of a plurality of computers in a network, the method comprising: grouping the plurality of computers into a plurality of groups, wherein the grouping is based at least in part on the number of the plurality of computers in
The invention claimed is: 1. A method, implemented in a computer that is one of a plurality of computers in a network, the method comprising: grouping the plurality of computers into a plurality of groups, wherein the grouping is based at least in part on the number of the plurality of computers in the network of which the computer is aware; selecting a portion of object information corresponding to an object, wherein the portion of object information comprises a set of least significant bits of the object information and a number of bits in the set of least significant bits is based at least in part on the number of other computers in the network of which the computer is aware; and identifying to which of the plurality of computers to communicate the object information for identification of potentially identical objects stored on the plurality of computers, wherein the identifying is based at least in part on comparing at a bit level the selected portion of the object information to a portion of a computer identifier of one or more of the plurality of computers such that the bits in the portion of the computer identifier are compared to the bits in the selected portion of the object information. 2. A method as recited in claim 1, wherein the object is stored at the computer. 3. A method as recited in claim 1, wherein the object is stored at another of the plurality of computers. 4. A method as recited in claim 1, further comprising: receiving object information corresponding to another object; comparing the received object information to an object information database; checking whether the received object information matches any of the object information in the database; and determining that two potentially identical objects exist if the received object information matches any of the object information in the database. 5. A method as recited in claim 1, wherein a size of the portion of the object information is further based at least in part on an average number of computers that a particular object identifier should be communicated to. 6. A method as recited in claim 1, wherein the object information is a semi-unique value based at least in part on the data in the object. 7. A method as recited in claim 1, wherein the size of the portion of the object information is the same as the size of the portions of the computer identifiers. 8. A method as recited in claim 1, wherein the grouping comprises identifying as a first group one or more of the plurality of computers having computer identifiers each with a portion that is the same as a portion of a computer identifier of the computer. 9. A method as recited in claim 8, wherein the identifying comprises: checking whether all of the bits in the portion of the computer identifier match all of the bits in the selected portion of the object information; and identifying, as computers to communicate the object information to, the computers in the first group if all of the bits in the portion of the computer identifier match all of the bits in the selected portion of the object information. 10. A method as recited in claim 1, wherein the grouping comprises: identifying as a first group one or more of the plurality of computers having computer identifiers each with a portion that is the same as a portion of a computer identifier of the computer; and identifying as a second group one or more of the plurality of computers that do not have computer identifiers each with the portion that is the same as the portion of the computer identifier of the computer, but that do have computer identifiers each with a first subset of the portion that is the same as a first subset of the portion of the computer identifier of the computer. 11. A method as recited in claim 10, wherein the first subset of each portion comprises the even bits of the portion. 12. A method as recited in claim 1, wherein the grouping comprises: identifying as a first group one or more of the plurality of computers having computer identifiers each with a portion that is the same as a portion of a computer identifier of the computer; identifying as a second group one or more of the plurality of computers that do not have computer identifiers each with the portion that is the same as the portion of the computer identifier of the computer, but that do have computer identifiers each with a first subset of the portion that is the same as a first subset of the portion of the computer identifier of the computer; and identifying as a third group one or more of the plurality of computers that do not have computer identifiers each with the first subset of the portion that is the same as the first subset of the portion of the computer identifier of the computer, but that do have computer identifiers each with a second subset of the portion that is the same as a second subset of the portion of the computer identifier of the computer. 13. A method as recited in claim 12, wherein the first subset of each portion comprises the even bits of the portion and the second subset of each portion comprises the odd bits of the portion. 14. A method as recited in claim 1, wherein the grouping comprises: identifying as one group one or more of the plurality of computers that have computer identifiers each with a subset of the portion of the computer identifier that is the same as a subset of the portion of the computer identifier of the computer. 15. A method as recited in claim 14, wherein the identifying comprises: checking whether each bit in another subset of the portion of the computer identifier matches a corresponding bit in the selected portion of the object information; and if each bit in the other subset of the portion of the computer identifier matches the corresponding bit in the selected portion of the object information, then identifying, as computers to communicate the object information to, the computers in the one group having a portion of their computer identifiers matching the corresponding bits in the selected portion of the object information. 16. A method as recited in claim 14, wherein the identifying comprises: checking whether each bit in another subset of the portion of the computer identifier matches a corresponding bit in the selected portion of the object information; and if one or more bits in the other subset of the portion of the computer identifier do not match the corresponding bit in the selected portion of the object information, then identifying, as computers to communicate the object information to, the computers in another group having the bits of a subset of their computer identifiers matching the selected portion of the object information. 17. A method as recited in claim 1, wherein the grouping comprises: identifying as a first group one or more of the plurality of computers having computer identifiers each with a portion that is the same as a portion of a computer identifier of the computer; identifying as a second group one or more of the plurality of computers that do not have computer identifiers each with the portion that is the same as the portion of the computer identifier of the computer, but that do have computer identifiers each with a first subset of the portion tat is the same as a first subset of the portion of the computer identifier of the computer; identifying as a third group one or more of the plurality of computers that do not have computer identifiers each with the portion that is the same as the portion of the computer identifier of the computer, but that do have computer identifiers each with a second subset of the portion that is the same as a second subset of the portion of the computer identifier of the computer; checking whether all of the bits in the portion of the computer identifier match all of the bits in the selected portion of the object information; if all of the bits in the portion of the computer identifier match all of the bits in the selected portion of the object information, then identifying, as computers to communicate the object information to, the computers in the first group; and if all of the bits in the portion of the computer identifier do not match all of the bits in the selected portion of the object information, then, checking whether the bits in the second subset of the portion of the object information match the bits in the second subset of the computer identifier, if the bits in the second subset of the portion of the object information match the bits in the second subset of the computer identifier, then identifying, as computers to communicate the object information to, the computers in the third group having their computer identifiers matching the selected portion of the object information, and if the bits in the second subset of the portion of the object information do not match the bits in the second subset of the computer identifier, then identifying, as computers to communicate the object information to, the computers in the second group having the subset of bits in their computer identifiers matching the corresponding bits in the selected portion of the object information. 18. One or more computer storage media having stored thereon a plurality of instructions that, when executed by one or more processors of a computer that is one of a plurality of computers in a network, causes the one or more processors to perform the following acts: selecting ones of the plurality of computers to populate a plurality of groups, wherein the selecting is based at least in part on the number of computers in the network that the computer is aware of; selecting a plurality of bits of file information corresponding to a file, wherein the plurality of bits of file information comprises a set of least significant bits of the file information and a number of bits in the set of least significant bits is based at least in part on the number of other computers in the network of which the computer is aware; and identifying which of the selected ones of the plurality of computers to communicate the file information to for identification of potentially identical files on the plurality of computers, wherein the identifying is based at least in part on comparing the selected plurality of bits of the file information to a corresponding plurality of bits of a computer identifier of one or more of the selected ones of the plurality of computers. 19. One or more computer storage media as recited in claim 18, wherein the file is stored at the computer. 20. One or more computer storage media as recited in claim 18, wherein the file is stored at another of the plurality of computers. 21. One or more computer storage media as recited in claim 18, wherein the plurality of instructions further cause the one or more processors to perform the following acts: receiving file information corresponding to another file; comparing the received file information to a file information database; checking whether the received file information matches any of the file information in the database; and determining that two potentially identical files exist if the received file information matches any of the file information in the database. 22. One or more computer storage media as recited in claim 18, wherein the selecting comprises identifying as a first group one or more of the plurality of computers having computer identifiers each with a plurality of bits that is the same as a plurality of bits of a computer identifier of the computer. 23. One or more computer storage media as recited in claim 22, wherein the identifying comprises: checking whether all of the plurality of bits of the computer identifier match all of the plurality of bits of the file information; and identifying, as computers to communicate the file information to, the computers in the first group if all of the plurality of bits of the computer identifier match all of the plurality of bits of the file information. 24. One or more computer storage media as recited in claim 18, wherein the selecting comprises: identifying as a first group one or more of the plurality of computers having computer identifiers each with a plurality of bits that is the same as a plurality of bits of a computer identifier of the computer; and identifying as a second group one or more of the plurality of computers that do not have computer identifiers each with a plurality of bits that is the same as the plurality of bits of the computer identifier of the computer, but that do have computer identifiers each with a first subset of the plurality of bits that is the same as a first subset of the plurality of bits of the computer identifier of the computer. 25. A computing device that facilitates locating potentially identical objects across a plurality of computing devices, wherein the computing device is one of the plurality of computing devices, the computing device comprising: a processing unit; a memory coupled to the processing unit; a distributed file system interface connecting the computing device to a network, wherein the network comprises the plurality of computing devices; a grouping module stored in the memory and executed on the processing unit to determine a plurality of groups of the plurality of computing devices, wherein criteria for determining the plurality of groups comprises: determining a number of computing devices of which the computing device is aware; and determining a number of the plurality of groups of the plurality of computing devices as a function of the number of computing devices of which the computing device is aware; a file information generation module stored in the memory and executed on the processing unit to generate object information for use in locating potentially identical objects across the plurality of computing devices; a forwarding location determination module stored in the memory and executed on the processing unit to perform a bit level comparison of the object information and a computer identifier representing a particular one of the plurality of computing devices; and in an event that the object information matches the computer identifier: identifying a particular one of the plurality of groups to which the particular one of the plurality of computing devices belongs; facilitating each of the plurality of computing devices that belongs to the particular one of the plurality of groups to determine potentially identical objects.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.