Systems and methods for managing single instancing data
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-007/00
G06F-017/30
G06F-011/14
출원번호
US-0565576
(2009-09-23)
등록번호
US-9015181
(2015-04-21)
발명자
/ 주소
Kottomtharayil, Rajiv
Attarde, Deepak R.
Vijayan, Manoj K.
출원인 / 주소
CommVault Systems, Inc.
대리인 / 주소
Perkins Coie LLP
인용정보
피인용 횟수 :
7인용 특허 :
186
초록▼
Described in detail herein are systems and methods for managing single instancing data. Using a single instance database and other constructs (e.g. sparse files), data density on archival media (e.g. magnetic tape) is improved, and the number of files per storage operation is reduced. According to o
Described in detail herein are systems and methods for managing single instancing data. Using a single instance database and other constructs (e.g. sparse files), data density on archival media (e.g. magnetic tape) is improved, and the number of files per storage operation is reduced. According to one aspect of a method for managing single instancing data, for each storage operation, a chunk folder is created on a storage device that stores single instancing data. The chunk folder contains three files: 1) a file that contains data objects that have been single instanced; 2) a file that contains data objects that have not been eligible for single instancing; and 3) a metadata file used to track the location of data objects within the other files. A second storage operation subsequent to a first storage operation contains references to data objects in the chunk folder created by the first storage operation instead of the data objects themselves.
대표청구항▼
1. A method of deduplicating data performed by one or more computing systems, each computing system having a processor and memory, the method comprising: receiving an indication of a first storage operation;creating a first directory associated with the first storage operation, wherein the first dir
1. A method of deduplicating data performed by one or more computing systems, each computing system having a processor and memory, the method comprising: receiving an indication of a first storage operation;creating a first directory associated with the first storage operation, wherein the first directory includes three files, wherein the first file of the first directory stores data associated with the first storage operation, and wherein the data in the first file is de-duplicated,wherein the second file of the first directory stores data associated with the first storage operation, and wherein the data in the second file is not de-duplicated,wherein the third file of the first directory stores a first data structure that includes information identifying file locations of data within the first file corresponding to data in the second file, andwherein the data stored in the first and second files of the first directory is not tracked by file systems of the one or more computing systems;receiving a first set of multiple, discrete data objects associated with the first storage operation, wherein each of the multiple data objects of the first set include a header portion and a payload portion;determining, by the one or more computing systems, that the payload portion of a first data object of the first set has already been stored in the first file of the first directory, and updating the first data structure to track the location of the payload portion of the first data object;storing the header portion of the first data object in the second file of the first directory; anddetermining, by the one or more computing systems, that the payload portion of a second data object of the first set has not already been stored in the first file of the first directory, and both storing the payload portion of the second data object in the first file of the first directory andupdating the first data structure to track the location of the payload portion of the second data object; andstoring the header portion of the second data object in the second file of the first directory. 2. The method of claim 1, further comprising: receiving an indication of a second storage operation;creating a second directory associated with the second storage operation, wherein the second directory includes three files,wherein the first file of the second directory stores data associated with the second storage operation that is de-duplicated,wherein the second file of the second directory stores data associated with the second storage operation that is not de-duplicated,wherein the third file of the second directory stores a second data structure that includes information identifying locations of data in the second directory, andwherein the data stored in the first and second files of the second directory is not tracked by file systems of the one or more computing systems;receiving a second set of multiple data objects associated with the second storage operation, each of the multiple data objects of the second set including a header portion and a payload portion;determining that the payload portion of a third data object of the second set has already been stored in the first file of the first directory or in the first file of the second directory and updating the second data structure to track the location of the payload portion of the third data object; anddetermining that the payload portion of a fourth data object of the second set has not already been stored in the first file of the first directory or in the first file of the second directory and both storing the payload portion of the fourth data object in the first file of the second directory and updating the second data structure to track the location of the payload portion of the fourth data object. 3. The method of claim 1, further comprising: determining that a size of the payload portion of a third data object of the first set does not exceed a threshold; andstoring the payload portion of the third data object in the second file of the first directory,regardless of whether the payload portion of the third data object has already been stored in the second file of the first directory. 4. The method of claim 1, wherein the first file is configured as a sparse file. 5. The method of claim 1, wherein the first directory is located on one or more sequentially accessed media. 6. The method of claim 1, wherein the multiple data objects of the first set are blocks of data that comprise one or more files. 7. A computer-implemented method for copying multiple files to a secondary storage device, wherein the secondary storage device is coupled to a computer executing a file system, the method comprising: receiving a copy operation request to copy n number of files to the secondary storage device, wherein each of the n number of files includes metadata and data, andwherein the n number of files exceeds a number of files that the file system can operate on without system degradation;receiving by the computer the n number of files to copy; andprocessing by the computer the n number of files by— copying the metadata of each of the n number of files to a first file in a directory, wherein the copying is performed without deduplicating the metadata, andwherein the first file does not contain the data for the n number of files;copying and deduplicating at least a portion of the data for the n number of files into a second file in the directory, wherein the second file is separate from the first file, andwherein the second file does not include the metadata for the n number of files; andupdating a data structure in the directory, wherein the data structure— tracks, for each of the n number of files, a location of the metadata for that file in the first file, andtracks, for the at least a portion of the data for the n number of files, a location of the data in the second file. 8. A method of single instancing a large number of data files during a single copy operation that is performed by one or more computing systems, each computing system including a processor and memory, the method comprising: receiving multiple data files to be copied during a single copy operation;removing user or access control metadata from at least some of the multiple data files;for each of the multiple data files, determining if an instance of data in the data file has already been stored;for each of the multiple data files, if an instance of the data in the data file has not already been stored, then storing the data in a single data file, wherein the single data file includes the data for each of the multiple data files to be copied during the single copy operation, without duplicate instances for data in the multiple data files that have already been stored;creating, by the one or more computing systems, an index file associated with the single data file, wherein the index file includes information identifying, for each of the multiple data files, a location of the data in the single data file;for each of the multiple data files from which user or access control information was removed, storing the removed user or access control metadata in a one and the same user metadata file; andstoring in a single directory associated with the single copy operation, the single data file, the index file, and the user metadata file. 9. The method of claim 8, wherein the single copy operation is a regularly scheduled data archive or backup job, wherein the determining includes, for each of the multiple data files, generating a substantially unique identifier for the data in the data file, and comparing the substantially unique identifier to a stored table of substantially unique identifiers for previously stored data from previously copied data files. 10. The method of claim 8, further comprising: before the determining, and for each of the multiple data files, determining if the data in the data file satisfies at least one criterion, andwherein the criterion is whether the data in the data file exceeds a data size threshold. 11. The method of claim 8, further comprising: at a second time that differs from the first time, at a later time, repeating the receiving, removing, determining, storing, and creating for new data files to be copied during another single copy operation, and storing another single data file with another metadata file in another directory at the later time and based on the repeated receiving, removing, determining, storing, and creating. 12. A method, performed by computing systems, of avoiding the storage of duplicate data, wherein each computing system includes a processor and memory, the method comprising: receiving an indication to perform a storage operation;receiving a set of data objects involved in the storage operation;for each of the data objects in the set, by the one or more computing systems: determining if the data object satisfies at least one criterion;if the data object satisfies the at least one criterion, then: generating an identifier for the data object;determining, based on the identifier, if an instance of the data object has already been stored;if an instance of the data object has already been stored, then: determining the location of the instance of the data object; andstoring a reference to the location of the instance of the data object in a first file in a directory, wherein the first file is configured to store multiple references, and wherein each reference refers to a location of an instance of a data object; andif an instance of the data object has not already been stored, then storing the data object in a second file in the directory, wherein the second file is configured to store only a single instance of each data object; andif the data object does not satisfy the at least one criterion, then storing the data object in a third file in the directory, wherein the third file is configured to store multiple instances of data objects. 13. The method of claim 12, wherein the data object has a type, and wherein determining if the data object satisfies at least one criterion includes determining whether the type of the data object is data. 14. The method of claim 12, wherein the data object has a size, and wherein determining if the data object satisfies at least one criterion includes determining whether the size of the data object exceeds a predetermined size. 15. The method of claim 12, wherein the data object has a size in kilobytes, and wherein determining if the data object satisfies at least one criterion includes determining whether the size of the data object exceeds 64 kilobytes. 16. The method of claim 12, wherein the directory is located on sequential media. 17. The method of claim 12, wherein the directory is a first directory, and wherein the method further comprises: if an instance of the data object has already been stored, then: determining the location of the instance of the data object includes determining that the instance of the data object has already been stored in a third file in a second directory, wherein the third file is configured to store only a single instance of each data object; andstoring a reference to the location of the instance of the data object includes storing a reference to the third file in the first file in the first directory.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (186)
Tarui Toshiaki (Kokubunji JPX) Sukegawa Naonobu (Kokubunji JPX) Fujii Hiroaki (Hadano JPX) Kitai Katsuyoshi (Kokubunji JPX), Access control method for a shared main memory in a multiprocessor based upon a directory held at a storage location of.
Yuval Ofek ; Zoran Cakeljic ; Samuel Krikler IL; Sharon Galtzur IL; Michael Hirsch IL; Dan Arnon ; Peter Kamvysselis, Apparatus and methods for copying, backing up, and restoring data using a backup segment size larger than the storage block size.
Griffin David (Maynard MA) Campbell Jonathan (Acton MA) Reilly Michael (Sterling MA) Rosenbaum Richard (Pepperell MA), Arrangement with cooperating management server node and network service node.
Dile, James Michael; Nguyen, Joanne T.; Piletski, Vadzim Ivanovich; Smith, James Patrick, Backing-up and restoring files including files referenced with multiple file names.
Nakano Toshio (Odawara JPX) Nozawa Masafumi (Odawara JPX) Kurano Akira (Odawara JPX) Hisano Kiyoshi (Odawara JPX) Hoshino Masayuki (Odawara JPX), Backup control method and system in data processing system using identifiers for controlling block data transfer.
Kitajima Hiroyuki (Yokohama) Yamamoto Akira (Yokohama) Doi Takashi (Hadano) Nozawa Masafumi (Odawara JPX), Buffered peripheral system and method for backing up and retrieving data to and from backup memory device.
Worley ; Jr. William S. (Saratoga CA) Bryg William R. (Saratoga CA) Baum Allen (Palo Alto CA), Cache memory consistency control with explicit software instructions.
Cole Leo J. (Raleigh NC) Frantz Curtis J. (Durham NC) Lee Jeannette (Raleigh NC) Ordanic Zvonimir (Raleigh NC) Plank Larry K. (Rochester MN), Centralized management in a computer network.
Carpenter Kelly S. (Fremont CA) Dearing Gerard M. (San Jose CA) Nick Jeffrey M. (Fishkill NY) Strickland Jimmy P. (Saratoga CA) Swanson Michael D. (Poughkeepsie NY) Wilkinson Wendell W. (Hyde Park NY, Coherence controls for store-multiple shared data coordinated by cache directory entries in a shared electronic storage.
Senator Steven T. ; Fuller Billy J., Computer system method and apparatus providing for various versions of a file without requiring data copy or log operati.
Fecteau Jean G. (Toronto NY CAX) Gdaniec Joseph M. (Vestal NY) Hennessy James P. (Endicott NY) MacDonald John F. (Vestal NY) Osisek Damian L. (Vestal NY), Computer system which supports asynchronous commitment of data.
Reed Drummond Shattuck ; Heymann Peter Earnshaw ; Mushero Steven Mark ; Jones Kevin Benard ; Oberlander Jeffrey Todd ; Banay Dan, Computer-based communication system and method using metadata defining a control structure.
Midgely Christopher W. (Framingham MA) Holland Charles J. (Northboro MA) Webb John W. (Sutton MA) Gonsalves Manuel (Brookline MA), Continuously-snapshotted protection of computer files.
Dunphy William E. (Westminster CO) Halladay Steven M. (Louisville CO) Moy Michael E. (Lafayette CO) Munro Frederick G. (Broomfield CO), Data storage and protection system.
Yanai Moshe (Framingham MA) Vishlitzky Natan (Brookline MA) Alterescu Bruno (Newton MA) Castel Daniel (Framingham MA) Shklarsky Gadi (Brookline MA), Data storage system controlled remote data mirroring with respectively maintained data indices.
Hagerstrom, Carl F.; Hutchinson, Thomas Dixon; Bharthulwar, Shridhar; Tinius, Paul E., Detecting and managing orphan files between primary and secondary data stores.
Fortier Richard W. (Acton MA) Mastors Robert M. (Ayer MA) Taylor Tracy M. (Upton MA) Wallace John J. (Franklin MA), Digital data processor with improved backup storage.
Kenley Gregory (Northboro MA) Ericson George (Schrewsbury MA) Fortier Richard (Acton MA) Holland Chuck (Northboro MA) Mastors Robert (Ayer MA) Pownell James (Natick MA) Taylor Tracy (Upton MA) Wallac, Digital data storage system with improved data migration.
Christenson,Nikolai Paul; Fritchie,Scott Ernest Lystig; Larson,James Stephen, Electronic mail system with methodology providing distributed message store.
Alam Salim ; Bhalerao Vinayak A. ; Wu Charles ; Hu George ; Ferrell John I., File object synchronization between a desktop computer and a mobile device.
Xu Yikang ; Vahalia Uresh K. ; Jiang Xiaoye ; Gupta Uday ; Tzelnic Percy, File server system using file system storage, data movers, and an exchange of meta data among data movers for file locking and direct access to shared file systems.
Bates, Allen K.; Haustein, Nils; Klein, Craig A.; Krick, Frank; Troppens, Ulf; Winarski, Daniel, File system with internal deduplication and management of data blocks.
Lagueux, Jr., Richard A.; Stave, Joel H.; Yeaman, John B.; Stevens, Brian E.; Higgins, Robert M.; Collins, James M., Graphical user interface for configuration of a storage system.
Urevig Paul D. ; Malnati James R. ; Ethen Donald J. ; Weber Herbert L., Grouping shared resources into one or more pools and automatically re-assigning shared resources from where they are not currently needed to where they are needed.
Barney Rock D. ; Schwols Keith ; Nelson Ellen M., Integration of a database into file management software for protecting, tracking and retrieving data.
Douceur,John R.; Theimer,Marvin M.; Adya,Atul; Bolosky,William J., Locating potentially identical objects across multiple computers based on stochastic partitioning of workload.
Douceur,John R.; Theimer,Marvin M.; Adya,Atul; Bolosky,William J., Locating potentially identical objects across multiple computers based on stochastic partitioning of workload.
Martin Charles W. (Richardson TX) Reid Fredrick S. (Plano TX) Forbus Gary L. (Dallas TX) Adams Steve M. (Plano TX) Shannon C. Patrick (Garland TX) Pirpich Eric A. (Garland TX), Mass data storage and retrieval system.
Kedem Nadav,ILX, Mass storage subsystem and backup arrangement for digital data processing system which permits information to be backed up while host computer(s) continue(s) operating in connection with information .
Long Robert M., Media element library with non-overlapping subset of media elements and non-overlapping subset of media element drives accessible to first host and unaccessible to second host.
Kullick Steven E. ; Spirakis Charles S. ; Titus Diane J., Method and apparatus for transferring archival data among an arbitrarily large number of computer devices in a networked.
Archibald, Jr., John Edward; McKean, Brian Dennis, Method and apparatus for using extended disk sector formatting to assist in backup and hierarchical storage management.
Eastridge Lawrence E. (Tucson AZ) Kern Robert F. (Tucson AZ) Kern Ronald M. (Tucson AZ) Mikkelsen Claus W. (Morgan Hill CA) Ratliff James M. (Tucson AZ), Method and system for automated backup copy ordering in a time zero backup copy session.
Eastridge Lawrence E. (Tucson AZ) Kern Robert F. (Tucson AZ) Micka William F. (Tucson AZ) Mikkelsen Claus W. (Morgan Hill CA) Ratliff James M. (Tucson AZ), Method and system for automated termination and resumption in a time zero backup copy process.
Walter A. Hubis ; William G. Deitz, Method and system for controlling access share storage devices in a network environment by configuring host-to-volume mapping data structures in the controller memory for granting and denying access .
Chefalas, Thomas E.; Mastrianni, Steven J., Method and system for processing backup data associated with application, querying metadata files describing files accessed by the application.
Chron, Edward Gustav; Menon, Jaishankar Moothedath, Method and system for providing consistent data modification information to clients in a storage system.
Aoyama Yuki,JPX ; Takahashi Toru,JPX ; Wakayama Satoshi,JPX, Method of and an apparatus for displaying version information and configuration information and a computer-readable recording medium on which a version and configuration information display program i.
Wolfgang, John Jay; Boyd, Kenneth Wayne; Day, III, Kenneth Fairclough; Doatmas, Philip Matthew; Dahman, Kirby Grant, Method, system, and program for data synchronization between a primary storage device and a secondary storage device by determining whether a first identifier and a second identifier match, where a unique identifier is associated with each portion of data.
Palliyil, Sudarshan; Venkateshamurthy, Shivakumara; Vijayaraghavan, Srinivas Belur; Aswathanarayana, Tejasvi, Methods, apparatus and computer programs for enhanced access to resources within a network.
MacHardy, Earle; Harvey, David; Duprey, Dennis, Methods, systems, and computer program products for mapped logical unit (MLU) replications, storage, and retrieval in a redundant array of inexpensive disks (RAID) environment.
Crescenti,John; Kavuri,Srinivas; Oshinsky,David Alan; Prahlad,Anand, Modular backup and retrieval system used in conjunction with a storage area network.
Pisello Thomas (De Bary FL) Crossmier David (Casselberry FL) Ashton Paul (Oviedo FL), Network management system having virtual catalog overview of files distributively stored across network domain.
Faibish, Sorin; Whitney, William; Brashers, Per; Cotter, Gerald E., Object classification and indexing of very large name spaces using grid technology.
Sawdon, Wayne A.; Haskin, Roger L.; Schmuck, Frank B.; Wyllie, James C., Plurality of file systems using weighted allocation to allocate space on one or more storage devices.
Bruce, Buford L.; Kim, Peter C.; Levi, Michael; Silliman, Albert; Wissmann, Joseph T.; Zaremba, Christopher, Providing archiving of individual mail content while maintaining a single copy mail store.
Prahlad, Anand; May, Andreas; Lunde, Norman R.; Zhou, Lixin; Kumar, Avinash; Ngo, David, Snapshot storage and management system with indexing and user interface.
Crockett Robert N. (Tucson AZ) Kern Ronald M. (Tucson AZ) Micka William F. (Tucson AZ), Software directed microcode state save for distributed storage controller.
Ting, Daniel; Zheng, Ling; Manley, Stephen L.; DeStefano, John Frederick, System and method for managing data deduplication of storage systems utilizing persistent consistency point images.
Mutalik Madhav ; Senie Faith M., System and method for performing file-handling operations in a digital data processing system using an operating system-independent file map.
Huang,Jau Hsiung; Tseng,Wei Hsin; Chou,Hung Te; Weng,Yung Chiuan, System and method for providing access to computer files across computer operating systems.
Moulton, Gregory Hagan, System and method for unorchestrated determination of data sequences using sticky byte factoring to determine breakpoints in digital sequences.
Huai ReiJane (Old Brookville NY) Daly Robert (Ronkonkoma NY) Curti Walter (Dix Hills NY) Mohan Deepak (Huntington NY) Chueh James Kuang-Ru (Bayside NY) Louie Larry (Forest Hills NY), System and parallel streaming and data stripping to back-up a network.
Frasier, Lawrence Martin; Resino, Robert George, System for adjusting resource allocation to a logical partition based on rate of page swaps and utilization by changing a boot configuration file.
Stoppani ; Jr. Peter (Woodinville WA), System for allocating storage spaces based upon required and optional service attributes having assigned piorities.
Flynn Rex A. (Belmont MA) Anick Peter G. (Marlboro MA), System for reconstructing prior versions of indexes using records indicating changes between successive versions of the.
Morris Robert J. T. (Los Gatos CA), System for reducing storage requirements and transmission loads in a backup subsystem in client-server environment by tr.
Saether Christian D. (Seattle WA) Stoppani ; Jr. Peter (Woodinville WA), System of device independent file directories using a tag between the directories and file descriptors that migrate with.
Prahlad, Anand; Schwartz, Jeremy A.; Ngo, David; Brockway, Brian; Muller, Marcus S., Systems and methods for classifying and transferring information in a storage network.
Senthilnathan, Muthusamy; Thati, Ravi; Kumarasamy, Paramasivam; Mishra, Hemant, Predicting scale of data migration between production and archive storage systems, such as for enterprise customers having large and/or numerous files.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.