Systems and methods for probabilistic data classification
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-007/00
G06F-017/30
G06F-011/14
출원번호
US-0968719
(2015-12-14)
등록번호
US-9740764
(2017-08-22)
발명자
/ 주소
Lunde, Norman R.
출원인 / 주소
Commvault Systems, Inc.
대리인 / 주소
Knobbe, Martens, Olson & Bear, LLP
인용정보
피인용 횟수 :
0인용 특허 :
134
초록▼
A system for performing data classification operations. In one embodiment, the system comprises a file system configured to store a plurality of computer files and a scanning agent configured to traverse the file system and compile data regarding the attributes and content of the plurality of comput
A system for performing data classification operations. In one embodiment, the system comprises a file system configured to store a plurality of computer files and a scanning agent configured to traverse the file system and compile data regarding the attributes and content of the plurality of computer files. The system also comprises an index configured to store the data regarding attributes and content of the plurality of computer files and a file classifier configured to analyze the data regarding the attributes and content of the plurality of computer files and to classify the plurality of computer files into one or more categories based on the data regarding the attributes and content of the plurality of computer files. Results of the file classification operations can be used to set appropriate security permissions on files which include sensitive information or to control the way that a file is backed up or the schedule according to which it is archived.
대표청구항▼
1. A computer system comprising: a file system configured to store electronic files;a plurality of file system scanning agents configured to access the electronic files, the file system scanning agents comprising computer hardware with one or more processors and configured to compile, based on the e
1. A computer system comprising: a file system configured to store electronic files;a plurality of file system scanning agents configured to access the electronic files, the file system scanning agents comprising computer hardware with one or more processors and configured to compile, based on the electronic files, index data usable for classifying the electronic files, and transmit the index data over a network to be stored in one or more indexes stored separately from the file system; anda file classification server configured to, without directly accessing the content of the electronic files, access the index data previously compiled by the plurality of file system scanning agents from the one or more indexes stored separately from the file system, and classify the electronic files based on the index data, wherein classifying the electronic files comprises assigning one or more labels to the electronic files based at least in part on a set of user-defined rules and the index data previously compiled by the plurality of file system scanning agents, and wherein the file classification server is further configured to determine a probability that one or more of the electronic files should be classified as members of a category, determine that the probability is within a threshold amount from a probability threshold for classifying the one or more of the electronic files as the members of the category, and mark the one or more of the electronic files as being questionable members of the category. 2. The system of claim 1, wherein the file system comprises one or more data storage devices coupled to a plurality of client computers via a Storage Area Network (SAN), a Network Attached Storage (NAS) unit, or some combination of the two. 3. The system of claim 1, wherein the index data comprises data indicating file size, name, path, type, or date of creation or modification of the electronic files. 4. The system of claim 1, wherein the index data comprises data indicating at least one classification category that the one or more of the electronic files have been identified as being the members of. 5. The system of claim 4, wherein the system is configured to alter security access restrictions of the one or more of the electronic files based upon the at least one classification category. 6. The system of claim 4, wherein the system is configured to alter a data backup schedule of the one or more of the electronic files based upon the at least one classification category. 7. The system of claim 4, wherein the system is configured to alter a data migration plan of the one or more of the electronic files based upon the at least one classification category. 8. The system of claim 1, wherein the one or more indexes include a file content index, the plurality of file system scanning agents configured to store, for each electronic file, a list of keywords in the electronic file and a frequency count for each keyword in the file content index. 9. The system of claim 1, wherein the content comprises a keyword present in an electronic file. 10. A method comprising: with a plurality of file system scanning agents, accessing electronic files stored in a file system;compiling index data usable for classifying the electronic files; andtransmitting the index data over a network to be stored in one or more indexes stored separately from the file system; andwith a file classification server separate from the plurality of file system scanning agents, classifying the electronic files without directly accessing the content of the electronic files by assigning one or more labels to the electronic files based at least in part on a set of user-defined rules and the index data previously compiled by the plurality of file system scanning agents;determining a probability that one or more of the electronic files should be classified as members of a category;determining that the probability is within a threshold amount from a probability threshold for classifying the one or more of the electronic files as the members of the category; andmarking the one or more of the electronic files as being questionable members of the category. 11. The method of claim 10, wherein the index data comprises data indicating file size, name, path, type, or date of creation or modification of the electronic files. 12. The method of claim 10, wherein the index data comprises data indicating at least one classification category that the one or more of the electronic files have been identified as being the members of. 13. The system of claim 12, wherein the system is configured to alter security access restrictions of the one or more of the electronic files based upon the at least one classification category. 14. The system of claim 12, wherein the system is configured to alter a data backup schedule of the one or more of the electronic files based upon the at least one classification category. 15. The system of claim 12, wherein the system is configured to alter a data migration plan of the one or more of the electronic files based upon the at least one classification category. 16. The method of claim 10, wherein the one or more indexes include a file content index, the method further comprising, with the plurality of file system scanning agents, storing, for each electronic file, a list of keywords in the electronic file and a frequency count for each keyword in the file content index. 17. A computer system comprising: means for accessing electronic files stored in a file system;means for compiling index data usable for classifying the electronic files;means for transmitting the index data over a network to be stored in one or more indexes stored separately from the file system;means for classifying the electronic files without directly accessing the content of the electronic files by assigning one or more labels to the electronic files based at least in part on a set of user-defined rules and the index data previously compiled and transmitted to the one or more indexes;means for determining a probability that one or more of the electronic files should be classified as members of a category;means for determining that the probability is within a threshold amount from a probability threshold for classifying the one or more of the electronic files as the members of the category; andmeans for marking the one or more of the electronic files as being questionable members of the category. 18. The system of claim 17, wherein the means for classifying comprises a Naïve Bayes classifier.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (134)
Crouse Donald D. ; Coverston Harriet G. ; Cychosz Joseph M., Archiving file system for data servers in a distributed network environment.
Griffin David (Maynard MA) Campbell Jonathan (Acton MA) Reilly Michael (Sterling MA) Rosenbaum Richard (Pepperell MA), Arrangement with cooperating management server node and network service node.
Nakano Toshio (Odawara JPX) Nozawa Masafumi (Odawara JPX) Kurano Akira (Odawara JPX) Hisano Kiyoshi (Odawara JPX) Hoshino Masayuki (Odawara JPX), Backup control method and system in data processing system using identifiers for controlling block data transfer.
Kitajima Hiroyuki (Yokohama) Yamamoto Akira (Yokohama) Doi Takashi (Hadano) Nozawa Masafumi (Odawara JPX), Buffered peripheral system and method for backing up and retrieving data to and from backup memory device.
Cole Leo J. (Raleigh NC) Frantz Curtis J. (Durham NC) Lee Jeannette (Raleigh NC) Ordanic Zvonimir (Raleigh NC) Plank Larry K. (Rochester MN), Centralized management in a computer network.
Carpenter Kelly S. (Fremont CA) Dearing Gerard M. (San Jose CA) Nick Jeffrey M. (Fishkill NY) Strickland Jimmy P. (Saratoga CA) Swanson Michael D. (Poughkeepsie NY) Wilkinson Wendell W. (Hyde Park NY, Coherence controls for store-multiple shared data coordinated by cache directory entries in a shared electronic storage.
J. Paul Dourish ; John O. Lamping ; Thomas Rodden GB, Collaborative document management system with customizable filing structures that are mutually intelligible.
Eric C. Peters ; Stanley Rabinowitz ; Herbert R. Jacobs ; Richard Baker Gillett, Jr. ; Peter J. Fasciano, Computer system and process for transferring multiple high bandwidth streams of data between multiple storage units and multiple applications in a scalable and reliable manner.
Senator Steven T. ; Fuller Billy J., Computer system method and apparatus providing for various versions of a file without requiring data copy or log operati.
Fecteau Jean G. (Toronto NY CAX) Gdaniec Joseph M. (Vestal NY) Hennessy James P. (Endicott NY) MacDonald John F. (Vestal NY) Osisek Damian L. (Vestal NY), Computer system which supports asynchronous commitment of data.
Prahlad, Anand; Schwartz, Jeremy Alan; Ngo, David; Brockway, Brian; Muller, Marcus S., Data classification systems and methods for organizing a metabase.
Koseki, Michihiko; Yokoyama, Mamoru; Sumi, Masashi; Yamaguchi, Satoru; Taniwaki, Sadayoshi; Hamanaka, Seishiro, Data processing system with mechanism for restoring file systems based on transaction logs.
Dunphy William E. (Westminster CO) Halladay Steven M. (Louisville CO) Moy Michael E. (Lafayette CO) Munro Frederick G. (Broomfield CO), Data storage and protection system.
Yanai Moshe (Framingham MA) Vishlitzky Natan (Brookline MA) Alterescu Bruno (Newton MA) Castel Daniel (Framingham MA) Shklarsky Gadi (Brookline MA), Data storage system controlled remote data mirroring with respectively maintained data indices.
Fortier Richard W. (Acton MA) Mastors Robert M. (Ayer MA) Taylor Tracy M. (Upton MA) Wallace John J. (Franklin MA), Digital data processor with improved backup storage.
Kenley Gregory (Northboro MA) Ericson George (Schrewsbury MA) Fortier Richard (Acton MA) Holland Chuck (Northboro MA) Mastors Robert (Ayer MA) Pownell James (Natick MA) Taylor Tracy (Upton MA) Wallac, Digital data storage system with improved data migration.
Xu Yikang ; Vahalia Uresh K. ; Jiang Xiaoye ; Gupta Uday ; Tzelnic Percy, File server system using file system storage, data movers, and an exchange of meta data among data movers for file locking and direct access to shared file systems.
Lagueux, Jr., Richard A.; Stave, Joel H.; Yeaman, John B.; Stevens, Brian E.; Higgins, Robert M.; Collins, James M., Graphical user interface for configuration of a storage system.
Urevig Paul D. ; Malnati James R. ; Ethen Donald J. ; Weber Herbert L., Grouping shared resources into one or more pools and automatically re-assigning shared resources from where they are not currently needed to where they are needed.
Leighton,F. Thomson; Lewin, legal representative,Anne E.; Lewin, deceased,Daniel M., HTML delivery from edge-of-network servers in a content delivery network (CDN).
Ito Hiromichi,JPX ; Arai Masato,JPX ; Nakata Yukio,JPX ; Ito Toshiya,JPX ; Mori Mitsuru,JPX, Information processing system enabling access to different types of files, control method for the same and storage mediu.
Barney Rock D. ; Schwols Keith ; Nelson Ellen M., Integration of a database into file management software for protecting, tracking and retrieving data.
Oshinsky, David Alan; Ignatius, Paul; Prahlad, Anand; May, Andreas, Logical view and access to data managed by a modular data and storage management system.
Ignatius, Paul; Theisen, Marjorie H.; Oshinsky, David Alan; Kavuri, Srinivas, Logical view and access to physical storage in modular data and storage management system.
Martin Charles W. (Richardson TX) Reid Fredrick S. (Plano TX) Forbus Gary L. (Dallas TX) Adams Steve M. (Plano TX) Shannon C. Patrick (Garland TX) Pirpich Eric A. (Garland TX), Mass data storage and retrieval system.
Kedem Nadav,ILX, Mass storage subsystem and backup arrangement for digital data processing system which permits information to be backed up while host computer(s) continue(s) operating in connection with information .
Long Robert M., Media element library with non-overlapping subset of media elements and non-overlapping subset of media element drives accessible to first host and unaccessible to second host.
Amundson Daniel L. ; Halley Donald Ray ; Koeller Paul Douglas ; Koser Leonard William ; Smith Lynda Marie, Method and apparatus for data backup and recovery.
Kullick Steven E. ; Spirakis Charles S. ; Titus Diane J., Method and apparatus for transferring archival data among an arbitrarily large number of computer devices in a networked.
Eastridge Lawrence E. (Tucson AZ) Kern Robert F. (Tucson AZ) Kern Ronald M. (Tucson AZ) Mikkelsen Claus W. (Morgan Hill CA) Ratliff James M. (Tucson AZ), Method and system for automated backup copy ordering in a time zero backup copy session.
Eastridge Lawrence E. (Tucson AZ) Kern Robert F. (Tucson AZ) Micka William F. (Tucson AZ) Mikkelsen Claus W. (Morgan Hill CA) Ratliff James M. (Tucson AZ), Method and system for automated termination and resumption in a time zero backup copy process.
Walter A. Hubis ; William G. Deitz, Method and system for controlling access share storage devices in a network environment by configuring host-to-volume mapping data structures in the controller memory for granting and denying access .
Aoyama Yuki,JPX ; Takahashi Toru,JPX ; Wakayama Satoshi,JPX, Method of and an apparatus for displaying version information and configuration information and a computer-readable recording medium on which a version and configuration information display program i.
Biettron,Laurent; Pallu,Fr챕d챕ric; Tricot,Sylvie, Method of thematic classification of documents, themetic classification module, and search engine incorporating such a module.
Crescenti,John; Kavuri,Srinivas; Oshinsky,David Alan; Prahlad,Anand, Modular backup and retrieval system used in conjunction with a storage area network.
Pisello Thomas (De Bary FL) Crossmier David (Casselberry FL) Ashton Paul (Oviedo FL), Network management system having virtual catalog overview of files distributively stored across network domain.
Crockett Robert N. (Tucson AZ) Kern Ronald M. (Tucson AZ) Micka William F. (Tucson AZ), Software directed microcode state save for distributed storage controller.
Thomas Michael W. ; Allard James E. ; Howard Michael ; Chung Sophia ; Ferroni Cameron ; Henbenthal Douglas C. ; Ludeman John ; Stebbens Kim ; Sanders ; II Henry L. ; Treadwell ; III David R., System and method for administering a meta database as an integral component of an information server.
Kottomtharayil,Rajiv; Gokhale,Parag; Prahlad,Anand; Vijayan Retnamma,Manoj Kumar; Ngo,David; Devassy,Varghese, System and method for dynamically performing storage operations in a computer network.
Diaz Perez, Milton, System and method for managing, converting and displaying video content on a video-on-demand platform, including ads used for drill-down navigation and consumer-generated classified ads.
Richard J. Huebsch ; Robert J. Prieve ; Leonard Kampa, System and method for multiplexed data back-up to a storage tape and restore operations using client identification tags.
Mutalik Madhav ; Senie Faith M., System and method for performing file-handling operations in a digital data processing system using an operating system-independent file map.
Huai ReiJane (Old Brookville NY) Daly Robert (Ronkonkoma NY) Curti Walter (Dix Hills NY) Mohan Deepak (Huntington NY) Chueh James Kuang-Ru (Bayside NY) Louie Larry (Forest Hills NY), System and parallel streaming and data stripping to back-up a network.
Stoppani ; Jr. Peter (Woodinville WA), System for allocating storage spaces based upon required and optional service attributes having assigned piorities.
Flynn Rex A. (Belmont MA) Anick Peter G. (Marlboro MA), System for reconstructing prior versions of indexes using records indicating changes between successive versions of the.
Saether Christian D. (Seattle WA) Stoppani ; Jr. Peter (Woodinville WA), System of device independent file directories using a tag between the directories and file descriptors that migrate with.
Prahlad, Anand; Schwartz, Jeremy Alan; Ngo, David; Brockway, Brian; Muller, Marcus S., Systems and methods for using metadata to enhance data management operations.
Prahlad, Anand; Schwartz, Jeremy Alan; Ngo, David; Brockway, Brian; Muller, Marcus S., Systems and methods for using metadata to enhance storage operations.
Horvitz, Eric J.; Kadie, Carl M.; Ozer, Stuart; Wong, Curtis G., Training, inference and user interface for guiding the caching of media content on local stores.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.