Systems and methods for byte-level or quasi byte-level single instancing
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-017/30
G06F-011/14
출원번호
US-0276622
(2014-05-13)
등록번호
US-9158787
(2015-10-13)
발명자
/ 주소
Klose, Michael F.
출원인 / 주소
Commvault Systems, Inc
대리인 / 주소
Perkins Coie LLP
인용정보
피인용 횟수 :
1인용 특허 :
228
초록▼
Described in detail herein are systems and methods for deduplicating data using byte-level or quasi byte-level techniques. In some embodiments, a file is divided into multiple blocks. A block includes multiple bytes. Multiple rolling hashes of the file are generated. For each byte in the file, a sea
Described in detail herein are systems and methods for deduplicating data using byte-level or quasi byte-level techniques. In some embodiments, a file is divided into multiple blocks. A block includes multiple bytes. Multiple rolling hashes of the file are generated. For each byte in the file, a searchable data structure is accessed to determine if the data structure already includes an entry matching a hash of a minimum sequence length. If so, this indicates that the corresponding bytes are already stored. If one or more bytes in the file are already stored, then the one or more bytes in the file are replaced with a reference to the already stored bytes. The systems and methods described herein may be used for file systems, databases, storing backup data, or any other use case where it may be useful to reduce the amount of data being stored.
대표청구항▼
1. A method of deduplicating data performed by one or more computing systems which are coupled to one or more storage devices via a network, wherein the one or more computing system each comprise at least one processor and memory, and the one or more storage devices include a searchable data structu
1. A method of deduplicating data performed by one or more computing systems which are coupled to one or more storage devices via a network, wherein the one or more computing system each comprise at least one processor and memory, and the one or more storage devices include a searchable data structure and a first set of data, the method comprising: receiving a second set of data;dividing the second set of data into at least one block, wherein the one block includes a total number of bytes;accessing the searchable data structure;determining whether one or more bytes of the one block are included in a portion of the first set of data in the searchable data structure, wherein the number of the one or more bytes is less than the total number of bytes of the one block;replacing the one or more bytes with a reference to the portion of the first set of data when the one or more bytes of the one block are included in the portion of the first set of data in the searchable data structure; andcausing the one block to be stored using the one or more storage devices. 2. The method of claim 1, further comprising generating powers of 2 rolling hashes for the second set of data. 3. The method of claim 1, wherein the searchable data structure includes a hierarchical data structure that includes multiple nodes, andwherein a first node can reference data in any other node excepting nodes that are descendants of the first node. 4. The method of claim 1, further comprising compressing the second set of data. 5. A system for deduplicating data, the system comprising: at least one processor;memory communicatively coupled to the at least one processor;means for receiving a file including multiple bytes;means for accessing at least some of multiple blocks of data in a data structure: wherein the multiple blocks of data have a first size,wherein the multiple blocks of data represent a set of data having a second size that is greater than the first size,wherein a block of data is associated with multiple first identifiers, andwherein the multiple blocks of data are identified in the data structure by the associated multiple first identifiers;means for determining whether one or more of the multiple bytes are already stored based at least partly upon accessing of the data structure, wherein the number of the one or more bytes is less than a number of bytes in a block of data; andmeans for causing bytes that are not already stored using a storage device to be stored. 6. The system of claim 5, further comprising means for generating multiple second identifiers for the file,wherein the multiple second identifiers for the file include powers of 2 rolling hashes. 7. The system of claim 5, further comprising means for generating multiple second identifiers for the file,wherein the multiple second identifiers include powers of 2 rolling hashes, andwherein means for determining whether the one or more of the multiple bytes are already stored includes means for comparing the multiple second identifiers for the file with the multiple first identifiers associated with the multiple blocks of data. 8. The system of claim 5, wherein the data structure includes a hierarchical data structure that includes multiple nodes, andwherein a first node can reference data in any other node excepting nodes that are descendants of the first node. 9. The system of claim 5 wherein at least some of the multiple blocks of data are compressed. 10. A method of deduplicating data performed by a computing system having a processor and memory, the method comprising: dividing a first set of data into at least one block, wherein the one block includes a total number of bytes;accessing a searchable data structure, wherein the searchable data structure includes a second set of data;determining whether one or more bytes of the one block are included in a portion of the second set of data in the searchable data structure, wherein the number of the one or more bytes is less than the total number of bytes of the one block;replacing the one or more bytes with a reference to the portion of the second set of data, when the one or more bytes of the one block are included in the portion of the second set of data in the searchable data structure; andcausing the block to be stored. 11. The method of claim 10, further comprising generating powers of 2 rolling hashes. 12. The method of claim 10, wherein the searchable data structure includes a hierarchical data structure that includes multiple nodes, andwherein a first node can reference data in any other node excepting nodes that are descendants of the first node. 13. The method of claim 10, further comprising compressing data. 14. A method of building a search data structure for deduplicating data, wherein the method is performed by a computing system having a processor and memory, the method comprising: receiving a set of data;dividing the set of data into at least a first block, wherein the first block includes a number of bytes;accessing a search data structure containing zero or more nodes, wherein each node represents a block including the number of bytes or represents a portion of a block, andwherein each node contains a reference to a stored block, to a portion of a stored block, or to another node;first determining whether the first block is identical to a second block represented by a first node in the search data structure;when a result of the first determining is positive, creating a node in the search data structure representing the first block and containing a reference to the first node; andwhen a result of the first determining is negative: second determining whether a portion of the first block is identical to a portion of a third block represented by a second node in the search data structure;when a result of the second determining is positive: storing a portion of the first block that is not identical to the portion of the third block;creating a third node in the search data structure representing the portion of the third block and containing a reference to the stored portion of the third block; andcreating a node in the search data structure representing the first block and containing a reference to the third node and a reference to the stored portion of the first block; andwhen a result of the second determining is negative, storing the first block; creating a node in the search data structure representing the first block and containing a reference to the stored first block. 15. The method of claim 14, wherein the search data structure is implemented as a skipped list or a red-black tree. 16. The method of claim 14, wherein a reference to a portion of a stored block comprises a reference to the stored block, an offset within the stored block, and a length. 17. The method of claim 14, wherein each node in the search data structure contains metadata, including a number of references to the node or a hash code. 18. The method claim 17, wherein when a node contains a reference to a stored block or a portion of a stored block, the hash code is computed from the stored block or the portion of the stored block, andwherein when a node contains a reference to another node, the hash code is computed from a stored block or a portion of a stored block to which the reference is resolved. 19. The method of claim 18, wherein the first determining is performed based on the hash code contained in the first block and the hash code contained in the second block, andwherein the second determining is performed using the hash code contained in the first block and the hash code contained in the third block. 20. The method of claim 17, wherein the number of references to a third node is updated when a node that contains a reference to the third node is added to or deleted from the search data structure.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (228)
Tarui Toshiaki (Kokubunji JPX) Sukegawa Naonobu (Kokubunji JPX) Fujii Hiroaki (Hadano JPX) Kitai Katsuyoshi (Kokubunji JPX), Access control method for a shared main memory in a multiprocessor based upon a directory held at a storage location of.
Bates, Allen Keith; Haustein, Nils; Klein, Craig Anthony; Troppens, Ulf; Winarski, Daniel James, Apparatus and method to select a deduplication protocol for a data storage library.
Yuval Ofek ; Zoran Cakeljic ; Samuel Krikler IL; Sharon Galtzur IL; Michael Hirsch IL; Dan Arnon ; Peter Kamvysselis, Apparatus and methods for copying, backing up, and restoring data using a backup segment size larger than the storage block size.
Griffin David (Maynard MA) Campbell Jonathan (Acton MA) Reilly Michael (Sterling MA) Rosenbaum Richard (Pepperell MA), Arrangement with cooperating management server node and network service node.
Dile, James Michael; Nguyen, Joanne T.; Piletski, Vadzim Ivanovich; Smith, James Patrick, Backing-up and restoring files including files referenced with multiple file names.
Nakano Toshio (Odawara JPX) Nozawa Masafumi (Odawara JPX) Kurano Akira (Odawara JPX) Hisano Kiyoshi (Odawara JPX) Hoshino Masayuki (Odawara JPX), Backup control method and system in data processing system using identifiers for controlling block data transfer.
Kitajima Hiroyuki (Yokohama) Yamamoto Akira (Yokohama) Doi Takashi (Hadano) Nozawa Masafumi (Odawara JPX), Buffered peripheral system and method for backing up and retrieving data to and from backup memory device.
Worley ; Jr. William S. (Saratoga CA) Bryg William R. (Saratoga CA) Baum Allen (Palo Alto CA), Cache memory consistency control with explicit software instructions.
Cole Leo J. (Raleigh NC) Frantz Curtis J. (Durham NC) Lee Jeannette (Raleigh NC) Ordanic Zvonimir (Raleigh NC) Plank Larry K. (Rochester MN), Centralized management in a computer network.
Carpenter Kelly S. (Fremont CA) Dearing Gerard M. (San Jose CA) Nick Jeffrey M. (Fishkill NY) Strickland Jimmy P. (Saratoga CA) Swanson Michael D. (Poughkeepsie NY) Wilkinson Wendell W. (Hyde Park NY, Coherence controls for store-multiple shared data coordinated by cache directory entries in a shared electronic storage.
Senator Steven T. ; Fuller Billy J., Computer system method and apparatus providing for various versions of a file without requiring data copy or log operati.
Fecteau Jean G. (Toronto NY CAX) Gdaniec Joseph M. (Vestal NY) Hennessy James P. (Endicott NY) MacDonald John F. (Vestal NY) Osisek Damian L. (Vestal NY), Computer system which supports asynchronous commitment of data.
Reed Drummond Shattuck ; Heymann Peter Earnshaw ; Mushero Steven Mark ; Jones Kevin Benard ; Oberlander Jeffrey Todd ; Banay Dan, Computer-based communication system and method using metadata defining a control structure.
Prahlad, Anand; Agrawal, Vijay H., Continuous data protection over intermittent connections, such as continuous data backup for laptops or wireless devices.
Prahlad, Anand; Agrawal, Vijay H., Continuous data protection over intermittent connections, such as continuous data backup for laptops or wireless devices.
Midgely Christopher W. (Framingham MA) Holland Charles J. (Northboro MA) Webb John W. (Sutton MA) Gonsalves Manuel (Brookline MA), Continuously-snapshotted protection of computer files.
Dunphy William E. (Westminster CO) Halladay Steven M. (Louisville CO) Moy Michael E. (Lafayette CO) Munro Frederick G. (Broomfield CO), Data storage and protection system.
Yanai Moshe (Framingham MA) Vishlitzky Natan (Brookline MA) Alterescu Bruno (Newton MA) Castel Daniel (Framingham MA) Shklarsky Gadi (Brookline MA), Data storage system controlled remote data mirroring with respectively maintained data indices.
Hagerstrom, Carl F.; Hutchinson, Thomas Dixon; Bharthulwar, Shridhar; Tinius, Paul E., Detecting and managing orphan files between primary and secondary data stores.
Fortier Richard W. (Acton MA) Mastors Robert M. (Ayer MA) Taylor Tracy M. (Upton MA) Wallace John J. (Franklin MA), Digital data processor with improved backup storage.
Kenley Gregory (Northboro MA) Ericson George (Schrewsbury MA) Fortier Richard (Acton MA) Holland Chuck (Northboro MA) Mastors Robert (Ayer MA) Pownell James (Natick MA) Taylor Tracy (Upton MA) Wallac, Digital data storage system with improved data migration.
Christenson,Nikolai Paul; Fritchie,Scott Ernest Lystig; Larson,James Stephen, Electronic mail system with methodology providing distributed message store.
Alam Salim ; Bhalerao Vinayak A. ; Wu Charles ; Hu George ; Ferrell John I., File object synchronization between a desktop computer and a mobile device.
Xu Yikang ; Vahalia Uresh K. ; Jiang Xiaoye ; Gupta Uday ; Tzelnic Percy, File server system using file system storage, data movers, and an exchange of meta data among data movers for file locking and direct access to shared file systems.
Bates, Allen K.; Haustein, Nils; Klein, Craig A.; Krick, Frank; Troppens, Ulf; Winarski, Daniel, File system with internal deduplication and management of data blocks.
Lagueux, Jr., Richard A.; Stave, Joel H.; Yeaman, John B.; Stevens, Brian E.; Higgins, Robert M.; Collins, James M., Graphical user interface for configuration of a storage system.
Urevig Paul D. ; Malnati James R. ; Ethen Donald J. ; Weber Herbert L., Grouping shared resources into one or more pools and automatically re-assigning shared resources from where they are not currently needed to where they are needed.
Barney Rock D. ; Schwols Keith ; Nelson Ellen M., Integration of a database into file management software for protecting, tracking and retrieving data.
Douceur,John R.; Theimer,Marvin M.; Adya,Atul; Bolosky,William J., Locating potentially identical objects across multiple computers based on stochastic partitioning of workload.
Douceur,John R.; Theimer,Marvin M.; Adya,Atul; Bolosky,William J., Locating potentially identical objects across multiple computers based on stochastic partitioning of workload.
Martin Charles W. (Richardson TX) Reid Fredrick S. (Plano TX) Forbus Gary L. (Dallas TX) Adams Steve M. (Plano TX) Shannon C. Patrick (Garland TX) Pirpich Eric A. (Garland TX), Mass data storage and retrieval system.
Kedem Nadav,ILX, Mass storage subsystem and backup arrangement for digital data processing system which permits information to be backed up while host computer(s) continue(s) operating in connection with information .
Long Robert M., Media element library with non-overlapping subset of media elements and non-overlapping subset of media element drives accessible to first host and unaccessible to second host.
Kullick Steven E. ; Spirakis Charles S. ; Titus Diane J., Method and apparatus for transferring archival data among an arbitrarily large number of computer devices in a networked.
Archibald, Jr., John Edward; McKean, Brian Dennis, Method and apparatus for using extended disk sector formatting to assist in backup and hierarchical storage management.
Eastridge Lawrence E. (Tucson AZ) Kern Robert F. (Tucson AZ) Kern Ronald M. (Tucson AZ) Mikkelsen Claus W. (Morgan Hill CA) Ratliff James M. (Tucson AZ), Method and system for automated backup copy ordering in a time zero backup copy session.
Eastridge Lawrence E. (Tucson AZ) Kern Robert F. (Tucson AZ) Micka William F. (Tucson AZ) Mikkelsen Claus W. (Morgan Hill CA) Ratliff James M. (Tucson AZ), Method and system for automated termination and resumption in a time zero backup copy process.
Walter A. Hubis ; William G. Deitz, Method and system for controlling access share storage devices in a network environment by configuring host-to-volume mapping data structures in the controller memory for granting and denying access .
Chefalas, Thomas E.; Mastrianni, Steven J., Method and system for processing backup data associated with application, querying metadata files describing files accessed by the application.
Chron, Edward Gustav; Menon, Jaishankar Moothedath, Method and system for providing consistent data modification information to clients in a storage system.
Aoyama Yuki,JPX ; Takahashi Toru,JPX ; Wakayama Satoshi,JPX, Method of and an apparatus for displaying version information and configuration information and a computer-readable recording medium on which a version and configuration information display program i.
Wahlert, Brian M; Berkowitz, Brian T; van Ingen, Catharine; Rangegowda, Dharshan; Jazayeri, Mike, Method, system, and apparatus for creating saved searches and auto discovery groups for a data protection system.
Wolfgang, John Jay; Boyd, Kenneth Wayne; Day, III, Kenneth Fairclough; Doatmas, Philip Matthew; Dahman, Kirby Grant, Method, system, and program for data synchronization between a primary storage device and a secondary storage device by determining whether a first identifier and a second identifier match, where a unique identifier is associated with each portion of data.
Palliyil, Sudarshan; Venkateshamurthy, Shivakumara; Vijayaraghavan, Srinivas Belur; Aswathanarayana, Tejasvi, Methods, apparatus and computer programs for enhanced access to resources within a network.
MacHardy, Earle; Harvey, David; Duprey, Dennis, Methods, systems, and computer program products for mapped logical unit (MLU) replications, storage, and retrieval in a redundant array of inexpensive disks (RAID) environment.
Crescenti,John; Kavuri,Srinivas; Oshinsky,David Alan; Prahlad,Anand, Modular backup and retrieval system used in conjunction with a storage area network.
Pisello Thomas (De Bary FL) Crossmier David (Casselberry FL) Ashton Paul (Oviedo FL), Network management system having virtual catalog overview of files distributively stored across network domain.
Sawdon, Wayne A.; Haskin, Roger L.; Schmuck, Frank B.; Wyllie, James C., Plurality of file systems using weighted allocation to allocate space on one or more storage devices.
Bruce, Buford L.; Kim, Peter C.; Levi, Michael; Silliman, Albert; Wissmann, Joseph T.; Zaremba, Christopher, Providing archiving of individual mail content while maintaining a single copy mail store.
Prahlad, Anand; May, Andreas; Lunde, Norman R.; Zhou, Lixin; Kumar, Avinash; Ngo, David, Snapshot storage and management system with indexing and user interface.
Prahlad, Anand; May, Andreas; Lunde, Norman R.; Zhou, Lixin; Kumar, Avinash; Ngo, David, Snapshot storage and management system with indexing and user interface.
Crockett Robert N. (Tucson AZ) Kern Ronald M. (Tucson AZ) Micka William F. (Tucson AZ), Software directed microcode state save for distributed storage controller.
Ting, Daniel; Zheng, Ling; Manley, Stephen L.; DeStefano, John Frederick, System and method for managing data deduplication of storage systems utilizing persistent consistency point images.
Ting, Daniel; Zheng, Ling; Manley, Stephen L.; DeStefano, John Frederick, System and method for managing data deduplication of storage systems utilizing persistent consistency point images.
Mutalik Madhav ; Senie Faith M., System and method for performing file-handling operations in a digital data processing system using an operating system-independent file map.
Huang,Jau Hsiung; Tseng,Wei Hsin; Chou,Hung Te; Weng,Yung Chiuan, System and method for providing access to computer files across computer operating systems.
Prahlad, Anand; Gokhale, Parag; Kottomtharayil, Rajiv; Retnamma, Manoj K. Vijayan; Attarde, Deepak R., System and method for storing redundant information.
Prahlad, Anand; Gokhale, Parag; Kottomtharayil, Rajiv; Vijayan Retnamma, Manoj K.; Attarde, Deepak R., System and method for storing redundant information.
Moulton, Gregory Hagan, System and method for unorchestrated determination of data sequences using sticky byte factoring to determine breakpoints in digital sequences.
Patel, Sujal M.; Mikesell, Paul A., System and methods for providing a distributed file system utilizing metadata to track information about data stored throughout the system.
Huai ReiJane (Old Brookville NY) Daly Robert (Ronkonkoma NY) Curti Walter (Dix Hills NY) Mohan Deepak (Huntington NY) Chueh James Kuang-Ru (Bayside NY) Louie Larry (Forest Hills NY), System and parallel streaming and data stripping to back-up a network.
Frasier, Lawrence Martin; Resino, Robert George, System for adjusting resource allocation to a logical partition based on rate of page swaps and utilization by changing a boot configuration file.
Stoppani ; Jr. Peter (Woodinville WA), System for allocating storage spaces based upon required and optional service attributes having assigned piorities.
Sim-Tang, Siew Yong; Fraisl, Daniel J., System for moving real-time data events across a plurality of devices in a network for simultaneous data protection, replication, and access services.
Flynn Rex A. (Belmont MA) Anick Peter G. (Marlboro MA), System for reconstructing prior versions of indexes using records indicating changes between successive versions of the.
Morris Robert J. T. (Los Gatos CA), System for reducing storage requirements and transmission loads in a backup subsystem in client-server environment by tr.
Saether Christian D. (Seattle WA) Stoppani ; Jr. Peter (Woodinville WA), System of device independent file directories using a tag between the directories and file descriptors that migrate with.
Prahlad, Anand; Schwartz, Jeremy A.; Ngo, David; Brockway, Brian; Muller, Marcus S., Systems and methods for classifying and transferring information in a storage network.
Prahlad, Anand; Schwartz, Jeremy A.; Ngo, David; Brockway, Brian; Muller, Marcus S., Systems and methods for classifying and transferring information in a storage network.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.