Deriving encryption rules based on file content
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-021/62
H04L-009/00
G06N-099/00
H04L-029/06
H04L-029/08
출원번호
US-0489222
(2014-09-17)
등록번호
US-9405928
(2016-08-02)
발명자
/ 주소
Amarendran, Arun Prasad
Chatterjee, Tirthankar
Yuan, Yun
Liu, Yongtao
출원인 / 주소
Commvault Systems, Inc.
대리인 / 주소
Knobbe, Martens, Olson & Bear, LLP
인용정보
피인용 횟수 :
9인용 특허 :
171
초록▼
Data storage systems are disclosed for automatically generating encryption rules based on a set of training files that are known to include sensitive information. The system may use a number of heuristic algorithms to generate one or more encryption rules for determining whether a file includes sens
Data storage systems are disclosed for automatically generating encryption rules based on a set of training files that are known to include sensitive information. The system may use a number of heuristic algorithms to generate one or more encryption rules for determining whether a file includes sensitive information. Further, the system may apply the heuristic algorithms to the content of the files, as determined by using natural language processing algorithms, to generate the encryption rules. Moreover, systems are disclosed that are capable of automatically determining whether to encrypt a file based on the generated encryption rules. The content of the file may be determined using natural language processing algorithms and then the encryption rules may be applied to the content of the file to determine whether to encrypt the file.
대표청구항▼
1. A data storage system comprising: a content analyzer comprising computer hardware, the content analyzer configured to: access a set of training files that include content designated as sensitive information; anduse one or more processing algorithms with respect to the set of training files to obt
1. A data storage system comprising: a content analyzer comprising computer hardware, the content analyzer configured to: access a set of training files that include content designated as sensitive information; anduse one or more processing algorithms with respect to the set of training files to obtain a set of data tokens for each training file, each of the data tokens from the set of data tokens comprising a portion of a training file from the set of training files, the portion of the training file comprising content included in the training file, at least some of the training files including at least some of the sensitive information;an encryption rules generator comprising computer hardware, the encryption rules generator configured to: use one or more algorithms to generate a set of encryption rules based on the set of data tokens obtained for each training file, wherein at least some of the set of encryption rules are configured to identify a file to encrypt based at least in part on a correspondence between portions of the file and at least some of the set of data tokens;generate a prospective encryption rule based on an aggregated set of data tokens, the aggregated set of data tokens based on the set of data tokens for each training file;perform the prospective encryption rule using the set of training files:determine a number of training files from the set of training files identified for encryption based on the prospective encryption rule; andresponsive, at least in part, to the number of training files identified for encryption satisfying a threshold, adding the prospective encryption rule to the set of encryption rules; andan encryption processor comprising computer hardware, the encryption processor configured to encrypt the file based at least in part on one of the encryption rules from the set of encryption rules. 2. The data storage system of claim 1, further comprising an encryption rules repository configured to store the set of encryption rules, wherein the encryption rules repository is accessible by one or more computing systems. 3. The data storage system of claim 1, wherein the encryption rules generator is further configured to: determine a context condition for an encryption rule of the set of encryption rules, the context condition identifying when to apply the encryption rule to the file; andassociate the context condition with the encryption rule. 4. The data storage system of claim 3, wherein the context condition comprises at least one of an identity of a user, an identity of a department that includes the user within an entity, a geographic location of a computing device storing the file, a network location of a computing device storing the file, and a device type of the computing device. 5. The data storage system of claim 1, wherein the encryption rules generator is configured to determine an encryption rule based on the set of data tokens obtained for a plurality of training files. 6. The data storage system of claim 1, wherein the encryption rules generator is further configured to: present the prospective encryption rule to a user;receive an input from the user responsive to presenting the prospective encryption rule to the user; anddetermine whether to include the prospective encryption rule in the set of encryption rules based at least in part on the input received from the user. 7. The data storage system of claim 1, wherein the content analyzer is further configured to remove a data token from a set of data tokens of a training file based on an identified set of non-sensitive data tokens. 8. The data storage system of claim 1, further comprising: a file monitor configured to monitor creation of the file; andan encryption rules engine configured to determine whether the file satisfies an encryption rule from the set of encryption rules. 9. A method of automatically generating encryption rules using machine learning techniques, the method comprising: accessing, by a rules generation system comprising computer hardware, a set of one or more training files that include content designated as sensitive information;applying, by the rules generation system, one or more processing algorithms to each training file included in the set of training files to obtain a set of data tokens for each training file, wherein each of the set of data tokens for a training file corresponds to a portion of the training file, the portion of the training file comprising content included in the training file, at least some of the training files including at least some of the sensitive information, wherein applying the one or more processing algorithms to the set of data tokens comprises: generating a prospective encryption rule based on the set of data tokens;performing the prospective encryption rule with respect to the set of training files;determining a percentage of training files from the set of training files identified for encryption using the prospective encryption rule; andresponsive to the percentage of training files identified for encryption satisfying a threshold, adding the prospective encryption rule to the set of encryption rules;applying, by the rules generation system, one or more algorithms to the set of data tokens for each training file to generate a set of encryption rules for identifying files with sensitive information, wherein at least some of the set of encryption rules are configured to identify a file to encrypt based at least in part on a correspondence between portions of the file and at least some of the set of data tokens; andstoring the set of encryption rules in an encryption rules repository accessible for one or more systems for determining whether to encrypt the file. 10. The method of claim 9, wherein the one or more processing algorithms comprise natural language processing algorithms. 11. The method of claim 9, wherein the one or more algorithms comprise heuristic algorithms. 12. The method of claim 9, wherein at least one of the one or more processing algorithms comprises a natural language processing algorithm and wherein applying the one or more processing algorithms comprises performing at least one of the following natural language processing tasks: automatic summarization, coreference resolution, discourse analysis, machine translation, morphological segmentation, named entity recognition, natural language understanding, optical character recognition, part-of-speech tagging, parsing, relationship extraction, sentence boundary disambiguation, sentiment analysis, topic segmentation and recognition, word segmentation, word sense disambiguation, singular value decomposition, latent semantic analysis, latent Dirichlet allocation, pachinko allocation, and probabilistic latent semantic analysis. 13. The method of claim 9, wherein applying the one or more algorithms to the set of data tokens for each training file comprises applying the one or more algorithms on a file-by-file basis, separately to each set of data tokens. 14. The method of claim 9, wherein applying the one or more algorithms to the set of data tokens for each training file comprises applying the one or more algorithms to a cumulative set of data tokens formed by combining the sets of data tokens from a plurality of training files. 15. The method of claim 9, further comprising presenting the set of encryption rules to a user for confirmation, wherein storing the set of encryption rules comprises storing encryption rules from the set of encryption rules confirmed by the user. 16. The method of claim 9, further comprising filtering data tokens identified as non-sensitive by a user from the set of data tokens for each training file prior to applying the one or more algorithms. 17. The method of claim 9, further comprising: monitoring file creation and/or file modification activity;in response to detecting a file creation and/or modification event with respect to the file, determining whether the file satisfies an encryption rule from the set of encryption rules; andin response to determining that the file satisfies the encryption rule from the set of encryption rules, identifying the file as protected. 18. The method of claim 17, further comprising: determining whether the file satisfies a context condition associated with the encryption rule; andin response to determining that the context condition is satisfied, encrypting the file.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (171)
Yuval Ofek ; Zoran Cakeljic ; Samuel Krikler IL; Sharon Galtzur IL; Michael Hirsch IL; Dan Arnon ; Peter Kamvysselis, Apparatus and methods for copying, backing up, and restoring data using a backup segment size larger than the storage block size.
Kitamura,Yuuji, Apparatus, method, and program product for secure data formatting and retriving, and computer readable transportable data recording medium storing the program product.
Griffin David (Maynard MA) Campbell Jonathan (Acton MA) Reilly Michael (Sterling MA) Rosenbaum Richard (Pepperell MA), Arrangement with cooperating management server node and network service node.
Nakano Toshio (Odawara JPX) Nozawa Masafumi (Odawara JPX) Kurano Akira (Odawara JPX) Hisano Kiyoshi (Odawara JPX) Hoshino Masayuki (Odawara JPX), Backup control method and system in data processing system using identifiers for controlling block data transfer.
Kitajima Hiroyuki (Yokohama) Yamamoto Akira (Yokohama) Doi Takashi (Hadano) Nozawa Masafumi (Odawara JPX), Buffered peripheral system and method for backing up and retrieving data to and from backup memory device.
Myers James J. (San Francisco CA) Wang Pong-Sheng (San Jose CA), CPU implemented method for backing up modified data sets in non-volatile store for recovery in the event of CPU failure.
Cole Leo J. (Raleigh NC) Frantz Curtis J. (Durham NC) Lee Jeannette (Raleigh NC) Ordanic Zvonimir (Raleigh NC) Plank Larry K. (Rochester MN), Centralized management in a computer network.
Carpenter Kelly S. (Fremont CA) Dearing Gerard M. (San Jose CA) Nick Jeffrey M. (Fishkill NY) Strickland Jimmy P. (Saratoga CA) Swanson Michael D. (Poughkeepsie NY) Wilkinson Wendell W. (Hyde Park NY, Coherence controls for store-multiple shared data coordinated by cache directory entries in a shared electronic storage.
Senator Steven T. ; Fuller Billy J., Computer system method and apparatus providing for various versions of a file without requiring data copy or log operati.
Fecteau Jean G. (Toronto NY CAX) Gdaniec Joseph M. (Vestal NY) Hennessy James P. (Endicott NY) MacDonald John F. (Vestal NY) Osisek Damian L. (Vestal NY), Computer system which supports asynchronous commitment of data.
Prahlad, Anand; Muller, Marcus S.; Kottomtharayil, Rajiv; Kavuri, Srinivas; Gokhale, Parag; Vijayan, Manoj, Data object store and server for a cloud storage environment, including data deduplication and data management across multiple cloud storage sites.
Dunphy William E. (Westminster CO) Halladay Steven M. (Louisville CO) Moy Michael E. (Lafayette CO) Munro Frederick G. (Broomfield CO), Data storage and protection system.
Yanai Moshe (Framingham MA) Vishlitzky Natan (Brookline MA) Alterescu Bruno (Newton MA) Castel Daniel (Framingham MA) Shklarsky Gadi (Brookline MA), Data storage system controlled remote data mirroring with respectively maintained data indices.
Fortier Richard W. (Acton MA) Mastors Robert M. (Ayer MA) Taylor Tracy M. (Upton MA) Wallace John J. (Franklin MA), Digital data processor with improved backup storage.
Kenley Gregory (Northboro MA) Ericson George (Schrewsbury MA) Fortier Richard (Acton MA) Holland Chuck (Northboro MA) Mastors Robert (Ayer MA) Pownell James (Natick MA) Taylor Tracy (Upton MA) Wallac, Digital data storage system with improved data migration.
Xu Yikang ; Vahalia Uresh K. ; Jiang Xiaoye ; Gupta Uday ; Tzelnic Percy, File server system using file system storage, data movers, and an exchange of meta data among data movers for file locking and direct access to shared file systems.
Lagueux, Jr., Richard A.; Stave, Joel H.; Yeaman, John B.; Stevens, Brian E.; Higgins, Robert M.; Collins, James M., Graphical user interface for configuration of a storage system.
Urevig Paul D. ; Malnati James R. ; Ethen Donald J. ; Weber Herbert L., Grouping shared resources into one or more pools and automatically re-assigning shared resources from where they are not currently needed to where they are needed.
Prahlad,Anand; Kavuri,Srinivas; Madeira,Andre Duque; Lunde,Norman R.; Bunte,Alan G.; May,Andreas; Schwartz,Jeremy, Hierarchical systems and methods for providing a unified view of storage information.
Dechant Thomas E. (Bainbridge OH) Glaser Edward L. (Santa Monica CA) Pitt Paul E. (Santa Monica CA) Way Frederick (Cleveland Heights OH), Information storage and retrieval system.
Barney Rock D. ; Schwols Keith ; Nelson Ellen M., Integration of a database into file management software for protecting, tracking and retrieving data.
Oshinsky, David Alan; Ignatius, Paul; Prahlad, Anand; May, Andreas, Logical view and access to data managed by a modular data and storage management system.
Oshinsky,David Alan; Ignatius,Paul; Prahlad,Anand; May,Andreas, Logical view and access to data managed by a modular data and storage management system.
Ignatius, Paul; Theisen, Marjorie H.; Oshinsky, David Alan; Kavuri, Srinivas, Logical view and access to physical storage in modular data and storage management system.
Prahlad,Anand; De Meno,Randy; Schwartz,Jeremy A.; McGuigan,James J., Logical view with granular access to exchange data managed by a modular data and storage management system.
Prahlad,Anand; Meno,Randy De; Schwartz,Jeremy A.; McGuigan,James J., Logical view with granular access to exchange data managed by a modular data and storage management system.
Martin Charles W. (Richardson TX) Reid Fredrick S. (Plano TX) Forbus Gary L. (Dallas TX) Adams Steve M. (Plano TX) Shannon C. Patrick (Garland TX) Pirpich Eric A. (Garland TX), Mass data storage and retrieval system.
Kedem Nadav,ILX, Mass storage subsystem and backup arrangement for digital data processing system which permits information to be backed up while host computer(s) continue(s) operating in connection with information .
Long Robert M., Media element library with non-overlapping subset of media elements and non-overlapping subset of media element drives accessible to first host and unaccessible to second host.
Hori, Yoshihiro; Kanai, Yuichi; Ohno, Ryoji; Ohishi, Takeo; Tada, Kenichiro; Hirai, Tatsuya; Tsuru, Masafumi; Hasebe, Takayuki, Method and apparatus for encrypting data to be secured and inputting/outputting the same.
Kullick Steven E. ; Spirakis Charles S. ; Titus Diane J., Method and apparatus for transferring archival data among an arbitrarily large number of computer devices in a networked.
Eastridge Lawrence E. (Tucson AZ) Kern Robert F. (Tucson AZ) Kern Ronald M. (Tucson AZ) Mikkelsen Claus W. (Morgan Hill CA) Ratliff James M. (Tucson AZ), Method and system for automated backup copy ordering in a time zero backup copy session.
Eastridge Lawrence E. (Tucson AZ) Kern Robert F. (Tucson AZ) Micka William F. (Tucson AZ) Mikkelsen Claus W. (Morgan Hill CA) Ratliff James M. (Tucson AZ), Method and system for automated termination and resumption in a time zero backup copy process.
Walter A. Hubis ; William G. Deitz, Method and system for controlling access share storage devices in a network environment by configuring host-to-volume mapping data structures in the controller memory for granting and denying access .
Prahlad, Anand; Schwartz, Jeremy A.; Ngo, David; Brockway, Brian; Muller, Marcus S.; Gokhale, Parag; Kottomtharayil, Rajiv, Method and system for offline indexing of content and classifying stored data.
Aoyama Yuki,JPX ; Takahashi Toru,JPX ; Wakayama Satoshi,JPX, Method of and an apparatus for displaying version information and configuration information and a computer-readable recording medium on which a version and configuration information display program i.
Crescenti,John; Kavuri,Srinivas; Oshinsky,David Alan; Prahlad,Anand, Modular backup and retrieval system used in conjunction with a storage area network.
Pisello Thomas (De Bary FL) Crossmier David (Casselberry FL) Ashton Paul (Oviedo FL), Network management system having virtual catalog overview of files distributively stored across network domain.
Prahlad, Anand; Kottomtharayil, Rajiv; Kavuri, Srinivas; Gokhale, Parag; Vijayan, Manoj, Performing data storage operations in a cloud storage environment, including searching, encryption and indexing.
Crockett Robert N. (Tucson AZ) Kern Ronald M. (Tucson AZ) Micka William F. (Tucson AZ), Software directed microcode state save for distributed storage controller.
Retnamma,Manoj Vijayan; Amarendran,Arun; Kottomtharayil,Rajiv, System and method for combining data streams in pipelined storage operations in a storage network.
Vogl, Norbert George; Purdy, Geoffrey Hale; Flavin, Robert Alan; Feng, Yuan; Clarke, Jr., Edward Payson, System and method for dispatching and scheduling network transmissions with feedback.
Kottomtharayil,Rajiv; Gokhale,Parag; Prahlad,Anand; Vijayan Retnamma,Manoj Kumar; Ngo,David; Devassy,Varghese, System and method for dynamically performing storage operations in a computer network.
Kottomtharayil,Rajiv; Gokhale,Parag; Prahlad,Anand; Vijayan Retnamma,Manoj Kumar; Ngo,David; Devassy,Varghese, System and method for dynamically sharing media in a computer network.
Mutalik Madhav ; Senie Faith M., System and method for performing file-handling operations in a digital data processing system using an operating system-independent file map.
Kottomtharayil,Rajiv; Gokhale,Parag; Prahlad,Anand; Vijayan Retnamma,Manoj Kumar; Ngo,David; Devassy,Varghese, System and method for performing storage operations in a computer network.
Ignatius,Paul; Prahlad,Anand; Tyagarajan,Mahesh; Vijayan Retnamma,Manoj; Amarendran,Arun; Kottomtharayil,Rajiv, System and method for providing encryption in a storage network by storing a secured encryption key with encrypted archive data in an archive storage device.
Ignatius, Paul; Prahlad, Anand; Tyagarajan, Mahesh; Retnamma, Manoj Vijayan; Amarendran, Arun; Kottomtharayil, Rajiv, System and method for providing encryption in storage operations in a storage network, such as for use by application service providers that provide data storage services.
Ignatius, Paul; Prahlad, Anand; Tyagarajan, Mahesh; Vijayan Retnamma, Manoj; Amarendran, Arun; Kottomtharayil, Rajiv, System and method for providing encryption in storage operations in a storage network, such as for use by application service providers that provide data storage services.
Huai ReiJane (Old Brookville NY) Daly Robert (Ronkonkoma NY) Curti Walter (Dix Hills NY) Mohan Deepak (Huntington NY) Chueh James Kuang-Ru (Bayside NY) Louie Larry (Forest Hills NY), System and parallel streaming and data stripping to back-up a network.
Stoppani ; Jr. Peter (Woodinville WA), System for allocating storage spaces based upon required and optional service attributes having assigned piorities.
Capozzi ; Anthony J. ; Cordi ; Vincent A. ; Edson ; Bruce A., System for facilitating the copying back of data in disc and tape units of a memory hierarchial system.
Flynn Rex A. (Belmont MA) Anick Peter G. (Marlboro MA), System for reconstructing prior versions of indexes using records indicating changes between successive versions of the.
Saether Christian D. (Seattle WA) Stoppani ; Jr. Peter (Woodinville WA), System of device independent file directories using a tag between the directories and file descriptors that migrate with.
Kottomtharayil, Rajiv; Gokhale, Parag; Prahlad, Anand; Vijayan Retnamma, Manoj Kumar; Ngo, David; Devassy, Varghese, Systems and methods for performing storage operations in a computer network.
Kottomtharayil, Rajiv; Gokhale, Parag; Prahlad, Anand; Vijayan Retnamma, Manoj Kumar; Ngo, David; Devassy, Varghese, Systems and methods for sharing media in a computer network.
Kottomtharayil,Rajiv; Gokhale,Parag; Prahlad,Anand; Retnamma,Manoj Kumar Vijayan; Ngo,David; Devassy,Varghese, Systems and methods for sharing media in a computer network.
Prahlad, Anand; Schwartz, Jeremy Alan; Ngo, David; Brockway, Brian; Muller, Marcus S., Systems and methods for using metadata to enhance data identification operations.
Prahlad, Anand; Schwartz, Jeremy Alan; Ngo, David; Brockway, Brian; Muller, Marcus S., Systems and methods for using metadata to enhance data management operations.
Mourad,Magda M.; Munson,Jonathan P.; Nadeem,Tamer; Pacifici,Giovanni; Pistoia,Marco; Youssef,Alaa S., Transparent digital rights management for extendible content viewers.
Prahlad, Anand; Schwartz, Jeremy Alan; Ngo, David; Brockway, Brian; Muller, Marcus S., Systems and methods for using metadata to enhance data identification operations.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.