IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0145342
(2008-06-24)
|
등록번호 |
US-8219524
(2012-07-10)
|
발명자
/ 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
218 인용 특허 :
98 |
초록
▼
A method and system for reducing storage requirements and speeding up storage operations by reducing the storage of redundant data includes receiving a request that identifies one or more files or data objects to which to apply a storage operation. For each file or data object, the storage system de
A method and system for reducing storage requirements and speeding up storage operations by reducing the storage of redundant data includes receiving a request that identifies one or more files or data objects to which to apply a storage operation. For each file or data object, the storage system determines if the file or data object contains data that matches another file or data object to which the storage operation was previously applied, based on awareness of the application that created the data object. If the data objects do not match, then the storage system performs the storage operation in a usual manner. However, if the data objects do match, then the storage system may avoid performing the storage operation with respect to the particular file or data object.
대표청구항
▼
1. A method of storing application-specific data objects included within a file in a data storage system, the method comprising: receiving a request to store data contained in a file generated by an application, wherein the data includes multiple discrete application-specific data objects having dif
1. A method of storing application-specific data objects included within a file in a data storage system, the method comprising: receiving a request to store data contained in a file generated by an application, wherein the data includes multiple discrete application-specific data objects having differing sizes;determining the application that generated the file that includes the multiple discrete application-specific data objects;based on the determination of the application, identifying at least some of the multiple discrete application-specific data objects within the data, wherein the identifying includes parsing the file using an already existing module particular to the application that generated the file, andwherein the module does not expose the format of the file; andfor at least a first one of the identified multiple discrete application-specific data objects: generating a substantially unique identifier that represents the first discrete application-specific data object;based on the generated substantially unique identifier, determining whether an instance of the first discrete application-specific data object is already stored in a data storage system; andif an instance of the first discrete application-specific data object is not already stored in the data storage system, then storing the first discrete application-specific data object in the data storage system. 2. The method of claim 1 wherein the file is a data file created by an electronic mail server application, and the multiple discrete application-specific data objects are electronic mail messages included within the data file. 3. The method of claim 1 wherein generating a substantially unique identifier includes using a cryptographic hash function to generate a hash of the first discrete application-specific data object, and wherein determining whether an instance of the first discrete application-specific data object is already stored in a data storage system includes comparing the generated hash to another hash stored by the data storage system. 4. The method of claim 1 wherein generating a substantially unique identifier includes applying a cryptographic hash algorithm to only the portion of the file corresponding to the first discrete application-specific data object to generate a hash value of the first discrete application-specific data object. 5. The method of claim 1, further comprising: if an instance of the first discrete application-specific data object is already stored in the data storage system, incrementing a reference count corresponding to the first discrete application-specific data object. 6. The method of claim 1, further comprising: if an instance of the first discrete application-specific data object is already stored in the data storage system: identifying metadata associated with the first discrete application-specific data object; andstoring the identified metadata. 7. The method of claim 1, further comprising: for the first discrete application-specific data object: determining a first timestamp of the first discrete application-specific data object;determining a second timestamp of an instance of the first discrete application-specific data object already stored in the data storage system;comparing the first and second timestamps; andif the first timestamp exceeds the second timestamp by a threshold amount, then storing the first discrete application-specific data object in the data storage system. 8. The method of claim 7, further comprising if the first timestamp exceeds the second timestamp by a threshold amount, then removing the instance of the first discrete application-specific data object from the data storage system. 9. The method of claim 1 , wherein the first discrete application-specific data object is encrypted, and wherein the method further comprises: determining an encryption scheme of the encrypted first discrete application-specific data object;determining an encryption scheme of an encrypted instance of the first discrete application-specific data object already stored in the data storage system;comparing the two encryption schemes; andif the two encryption schemes are identical, then storing the encrypted first discrete application-specific data object in the data storage system. 10. A system for managing application-generated data objects, the system comprising: a processor;a storage operation manager component, coupled to the processor, configured to receive a request to perform a storage operation on a logical data container, wherein the logical data container includes data objects generated by one or more applications;a data object identification component configured to identify the application-generated data objects included within the logical data container;an application data extraction component configured to extract the identified application-generated data objects from the logical data container;an identifier generation component configured to generate substantially unique identifiers for the extracted application-generated data objects;an index configured to store substantially unique identifiers;an identifier comparison component configured to determine whether the generated substantially unique identifiers are already stored in the index; anda single instance data store configured to communicate with the identifier comparison component and store a subset of the extracted application-generated data objects, the subset including the extracted application-generated data objects whose substantially unique identifiers were not determined to be stored in the index, wherein only a single instance of an extracted application-generated data object is stored in the single instance data store; andwherein the data object identification component is further configured to determine the application that created the logical data container; andwherein the data object identification component is further configured to utilize the results of the determination made by the data object identification component to invoke an already existing module to parse the logical data container in order to identify the application-generated data objects included within the logical data container, wherein the module is particular to the application that created the logical data container andwherein the module is configured to avoid exposing the format of the logical data container. 11. The system of claim 10: wherein the storage operation manager component is further configured to receive a request to perform a storage operation on a first logical data container and a second, different, logical data container, wherein the first and second logical data containers each include an instance of an identical data object generated by an application, and wherein the first and second logical data containers are files or databases;wherein the identifier generation component is further configured to generate an identical substantially unique identifier for the instances of the identical data object; andwherein the single instance data store is further configured to store only a single instance of the identical data object. 12. The system of claim 10 wherein the storage operation manager is further configured to receive requests to continuously replicate data from one or more client computer systems to a data storage system. 13. The system of claim 10 wherein the application data extraction component is further configured to invoke the already-existing module particular to the application to parse the logical data container in order to extract the application-generated data objects from the logical data container. 14. The system of claim 10 wherein: the application data extraction component is further configured to extract metadata associated with the application-generated data objects; andthe single instance data store is further configured to store the extracted metadata. 15. A non-transitory computer-readable storage medium whose contents cause a computer system to perform operation of storing application-specific data objects, the operation comprising: receiving a first file that was created by a first application in a first format, wherein the first file contains multiple data objects;receiving a second file that was created by a second, different application in a second format,wherein the second file contains multiple data objects, and wherein the second format differs from the first format;determining the application that created the first file and based on that determination, selecting an object model that is particular to the first application;determining the application that created the second file and based on that determination, selecting an already created module that is particular to the second application;utilizing the object model that is particular to the first application to identify the data objects within the first file;utilizing the already created module that is particular to the second application to parse the second file in order to identify the data objects within the second file,wherein the already created module differs from the object model, andwherein the module avoids exposing the second format of the second file;generating substantially unique identifiers for the identified data objects within the first and second files;determining whether the identified data objects in the first and second files are already stored in a single instance data store;for each of the identified data objects in the first and second files that are already stored in the single instance data store, adding a reference in an index to the already stored data object;utilizing the object model that is particular to the first application to extract the identified data objects in the first file that are not already stored in the single instance data store from the first file;utilizing the already created object model that is particular to the second application module to extract the identified data objects in the second file that are not already stored in the single instance data store from the second file; andstoring the extracted data objects in the single instance data store. 16. The non-transitory computer-readable storage medium of claim 15 wherein: the first and second files each include an instance of an identical data object;generating substantially unique identifiers for the instances of the identical data object includes generating an identical substantially unique identifier for the instances of the identical data object; andstoring the extracted data objects in the single instance data store includes storing a single instance of the identical data object in the single instance data store. 17. The non-transitory computer-readable storage medium of claim 15 wherein the first and second files are first and second database files, and wherein: identifying the data objects includes identifying entries within tables within the first and second database filesgenerating substantially unique identifiers includes generating substantially unique identifiers for the data within the entries;determining whether the data objects in the first and second files are already stored includes determining whether the data within the entries is already stored in the single instance data store;extracting the data objects includes extracting the data within the entries from the first and second database files that is not already stored in the single instance data store; andstoring the extracted data objects includes storing the extracted data in the single instance data store. 18. The non-transitory computer-readable storage medium of claim 15 wherein determining whether the data objects in the first and second files are already stored includes generating a digest of each data object and comparing the generated digest with one or more previously stored digests. 19. The non-transitory computer-readable storage medium of claim 15 wherein the operation further comprises: extracting metadata from the extracted data objects; andstoring the extracted metadata in the single instance data store. 20. The non-transitory computer-readable storage medium of claim 15 wherein the operation further comprises: receiving an indication of a size threshold, wherein data objects whose size does not exceed the size threshold are to be stored in the single instance data store;determining the sizes of the data objects within the first and second files; andcomparing the sizes of the data objects with the size threshold,wherein determining whether the data objects in the first and second files are already stored in a single instance data store includes determining that the data objects within the first and second files whose size does not exceed the size threshold are not already stored in a single instance data store, thereby causing such data objects to be stored in the single instance data store. 21. The non-transitory computer-readable storage medium of claim 15 wherein the operation further comprises: receiving an indication of a category or type, wherein data objects of the indicated category or type are not to be stored in the single instance data store;determining the categories or types of the data objects within the first and second files; andcomparing the categories or types of the data objects with the indicated category or type,wherein determining whether the data objects in the first and second files are already stored in a single instance data store includes determining that the data objects within the first and second files of the indicated category or type are already stored in a single instance data store. 22. The non-transitory computer-readable storage medium of claim 15 wherein the object model enumerates data objects within the first file. 23. The non-transitory computer-readable storage medium of claim 15 wherein the object model or a storage operation manager component understands a format used by the first application for the first file.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.