최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
DataON 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Edison 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Kafe 바로가기국가/구분 | United States(US) Patent 등록 |
---|---|
국제특허분류(IPC7판) |
|
출원번호 | US-0686338 (2012-11-27) |
등록번호 | US-9323794 (2016-04-26) |
발명자 / 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 | 피인용 횟수 : 0 인용 특허 : 313 |
Disclosed herein is a method and system for accelerating the generation of pattern indexes. In exemplary embodiments, regular expression pattern matching can be performed at high speeds on data to determine whether a pattern is present in the data. Pattern indexes can then be built based on the resu
Disclosed herein is a method and system for accelerating the generation of pattern indexes. In exemplary embodiments, regular expression pattern matching can be performed at high speeds on data to determine whether a pattern is present in the data. Pattern indexes can then be built based on the results of such regular expression pattern matching. Reconfigurable logic such a field programmable gate arrays (FPGAs) can be used to hardware accelerate these operations.
1. An apparatus comprising: a field programmable gate array (FPGA); and a bus through which streaming data is delivered to the FPGA at a bus bandwidth rate;the FPGA having firmware logic deployed thereon, the firmware logic configured to(1) receive the streaming data,(2) perform regular expression p
1. An apparatus comprising: a field programmable gate array (FPGA); and a bus through which streaming data is delivered to the FPGA at a bus bandwidth rate;the FPGA having firmware logic deployed thereon, the firmware logic configured to(1) receive the streaming data,(2) perform regular expression pattern matching on the streaming data with respect to a plurality of patterns to detect whether any pattern matches exist within the streaming data,and (3) concurrently build, at the bus bandwidth rate, a plurality of pattern indexes for the streaming data based on detected pattern matches, wherein each pattern index corresponds to a pattern against which regular expression pattern matching was performed and comprises location data for detected pattern matches. 2. The apparatus of claim 1 wherein the firmware logic is further configured to perform the regular expression pattern matching on the streaming data at hardware speeds. 3. The apparatus of claim 1 wherein the bus is a PCI-Express bus. 4. The apparatus of claim 1 further comprising a network interface, the network interface configured to receive the streaming data from a network and deliver the streaming data to the FPGA through the bus at the bus bandwidth rate. 5. The apparatus of claim 4 further comprising a processor, the processor configured to manage a flow of the streaming data to the FPGA via the network interface and the bus. 6. The apparatus of claim 1 further comprising a disk controller, the disk configured to receive the streaming data from a disk and deliver the streaming data to the FPGA through the bus at the bus bandwidth rate. 7. The apparatus of claim 6 further comprising a processor, the processor configured to manage a flow of the streaming data to the FPGA via the network interface and the bus. 8. The apparatus of claim 1 wherein the firmware logic comprises a plurality of regular expression pattern matching modules in parallel, each regular expression pattern matching module being keyed with a pattern; each regular expression pattern matching module being configured to (1) receive the streaming data, (2) perform a regular expression pattern matching operation on the streaming data to detect any pattern matches that exist within the streaming data with respect to its keyed pattern, and (3) operate simultaneously in parallel with the other regular expression pattern matching modules. 9. The apparatus of claim 8 wherein each regular expression pattern matching module is keyed with a different pattern. 10. The apparatus of claim 8 wherein the streaming data comprises a plurality of unstructured data objects; wherein the pattern indexes comprise a plurality of terms and a plurality of pointers associated with the terms, the terms corresponding to portions of the data stream for which a pattern match was detected, wherein each pointer identifies where at least one data object related to the term associated with that pointer can be located in a computer system;wherein the firmware logic is further configured to (1) pre-process the streaming data by (i) parsing the data objects into a plurality of words, (ii) generating a data object identifier that corresponds to each data object, wherein each data object identifier is indicative of its corresponding data object's position within the streaming data, and (iii) generating a position identifier that corresponds to each parsed word, wherein each position identifier is indicative of its corresponding word's position within the streaming data, and (2) build the pattern indexes by generating the pointers for the pattern indexes based at least in part on the generated data object identifiers and the generated position identifiers for the detected pattern matches. 11. The apparatus of claim 10 wherein the firmware logic is further configured to build a general index from the pre-processed streaming data using the generated data object identifiers and the generated position identifiers such that (1) the general index's terms comprise the words within the pre-processed streaming data and (2) the terms' associated pointers comprise the data object identifiers and position identifiers corresponding to those words. 12. The apparatus of claim 8 wherein the firmware logic further comprises an exact matching module in parallel with the parallel regular expression pattern matching modules, the exact matching module being keyed with a dictionary, the dictionary comprising a plurality of words; wherein the exact matching module is configured to perform an exact matching operation on the streaming data to thereby detect any exact matches that exist within the streaming data with respect to any of the dictionary words; andwherein the firmware logic is further configured to build a dictionary index for the streaming data based on the detected exact matches. 13. The apparatus of claim 12 wherein the exact matching module and the parallel regular expression pattern matching modules are configured to operate on the streaming data at hardware speeds. 14. The apparatus of claim 8 wherein the firmware logic further comprises an approximate matching module in parallel with the parallel regular expression pattern matching modules, the approximate matching module being keyed with a dictionary, the dictionary comprising a plurality of words; wherein the approximate matching module is configured to perform an approximate matching operation on the streaming data to thereby detect any approximate matches that exist within the streaming data with respect to any of the dictionary words; andwherein the firmware logic is further configured to build a dictionary index for the streaming data based on the detected approximate matches. 15. The apparatus of claim 14 wherein the approximate matching module and the parallel regular expression pattern matching modules are configured to operate on the streaming data at hardware speeds. 16. The apparatus of claim 8 wherein the streaming data comprises a plurality of data objects, and wherein the firmware logic is further configured to, in response to a detection from by a regular expression pattern matching operation that data within a data object is a pattern match with respect to a pattern, tag that data object as belonging to a class. 17. The apparatus of claim 16 wherein the firmware logic is further configured to build a class index based on the tagged data objects, the class index being associated with the class and configured to identify the data objects that belong to the associated class. 18. The apparatus of claim 1 wherein at least one of the patterns comprises a credit card number pattern. 19. The apparatus of claim 1 wherein at least one of the patterns comprises a social security number pattern. 20. The apparatus of claim 1 wherein at least one of the patterns comprises an email address pattern. 21. The apparatus of claim 1 wherein at least one of the patterns comprises a telephone number pattern. 22. The apparatus of claim 1 wherein at least one of the patterns comprises an Internet uniform resource locator (URL) pattern. 23. The apparatus of claim 1 wherein the streaming data comprises a plurality of unstructured data objects; wherein the pattern indexes comprise a plurality of terms and a plurality of pointers associated with the terms, the terms corresponding to portions of the streaming data for which a pattern match was detected, wherein each pointer identifies where at least one data object related to the term associated with that pointer can be located in a computer system;wherein the firmware logic is further configured to (1) pre-process the streaming data by (i) parsing the data objects into a plurality of words, (ii) generating a data object identifier that corresponds to each data object, wherein each data object identifier is indicative of its corresponding data object's position within the streaming data, and (iii) generating a position identifier that corresponds to each parsed word, wherein each position identifier is indicative of its corresponding word's position within the streaming data, and (2) build the pattern indexes by generating the pointers for the pattern indexes based at least in part on the generated data object identifiers and the generated position identifiers for the detected pattern matches. 24. A method comprising: delivering streaming data to a field programmable gate array (FPGA) via a bus at a bus bandwidth rate;the FPGA receiving the streaming data, the FPGA having firmware logic deployed thereon;the firmware logic performing regular expression pattern matching on the streaming data with respect to a plurality of patterns to detect whether any pattern matches exist within the streaming data; andthe firmware logic concurrently building, at the bus bandwidth rate as the streaming data continues to be delivered to the FPGA, a plurality of pattern indexes for the streaming data based on detected pattern matches, wherein each pattern index corresponds to a pattern against which regular expression pattern matching was performed and comprises location data for detected pattern matches. 25. The method of claim 24 wherein the performing step comprises the firmware logic performing the regular expression pattern matching on the streaming data at hardware speeds as the streaming data continues to be received by the FPGA. 26. The method of claim 24 wherein the bus is a PCI-Express bus. 27. The method of claim 26 wherein the delivering step comprises delivering the streaming data to the FPGA via the PCI-Express bus from a network source. 28. The method of claim 26 wherein the delivering step comprises delivering the streaming data to the FPGA via the PCI-Express bus from a disk source. 29. The method of claim 24 wherein at least one of the patterns comprises a credit card number pattern. 30. The method of claim 24 wherein at least one of the patterns comprises a social security number pattern. 31. The method of claim 24 wherein at least one of the patterns comprises an email address pattern. 32. The method of claim 24 wherein at least one of the patterns comprises a telephone number pattern. 33. The method of claim 24 wherein at least one of the patterns comprises an Internet uniform resource locator (URL) pattern. 34. The method of claim 24 wherein the streaming data comprises a plurality of web pages, and wherein each pattern index comprises a plurality of pointers to the web pages which contain the pattern corresponding to that pattern index. 35. The method of claim 34 further comprising: an Internet search engine receiving a pattern query; andthe Internet search engine looking up the web pages that are responsive to the pattern query based on at least one of the pattern indexes; andthe Internet search engine providing search results for the pattern query in response to the looking up step. 36. The method of claim 35 further comprising merging the pattern indexes into a plurality of operational pattern indexes, and wherein the looking up step is performed on the operational indexes. 37. The method of claim 35 wherein the performing step comprises the firmware logic performing the regular expression pattern matching on the streaming data at hardware speeds as the streaming data continues to be received by the FPGA. 38. The method of claim 24 wherein the firmware logic comprises a plurality of regular expression pattern matching modules in parallel, each regular expression pattern matching module being keyed with a pattern, and wherein the performing step comprises: each regular expression pattern matching module (1) receiving the streaming data, (2) performing a regular expression pattern matching operation on the streaming data to detect any pattern matches that exist within the streaming data with respect to its keyed pattern, and (3) operating simultaneously in parallel with the other regular expression pattern matching modules. 39. The method of claim 38 wherein each regular expression pattern matching module is keyed with a different pattern. 40. The method of claim 38 wherein the streaming data comprises a plurality of unstructured data objects, wherein the pattern indexes comprise a plurality of terms and a plurality of pointers associated with the terms, the terms corresponding to portions of the data stream for which a pattern match was detected, wherein each pointer identifies where at least one data object related to the term associated with that pointer can be located in a computer system, the method further comprising: the firmware logic pre-processing the streaming data by (i) parsing the data objects into a plurality of words, (ii) generating a data object identifier that corresponds to each data object, wherein each data object identifier is indicative of its corresponding data object's position within the streaming data, and (iii) generating a position identifier that corresponds to each parsed word, wherein each position identifier is indicative of its corresponding word's position within the streaming data; andwherein the building step comprises the firmware logic building the pattern indexes by generating the pointers for the pattern indexes based at least in part on the generated data object identifiers and the generated position identifiers for the detected pattern matches. 41. The method of claim 40 further comprising the firmware logic building a general index from the pre-processed streaming data using the generated data object identifiers and the generated position identifiers such that (1) the general index's terms comprise the words within the pre-processed streaming data and (2) the terms' associated pointers comprise the data object identifiers and position identifiers corresponding to those words. 42. The method of claim 38 wherein the firmware logic further comprises an exact matching module in parallel with the parallel regular expression pattern matching modules, the exact matching module being keyed with a dictionary, the dictionary comprising a plurality of words, the method further comprising: the exact matching module performing an exact matching operation on the streaming data to thereby detect any exact matches that exist within the streaming data with respect to any of the dictionary words; andthe firmware logic building a dictionary index for the streaming data based on the detected exact matches. 43. The method of claim 42 wherein the exact matching module and the parallel regular expression pattern matching modules are operating at hardware speeds on the streaming data as the streaming data continues to be received by the FPGA. 44. The method of claim 38 wherein the firmware logic further comprises an approximate matching module in parallel with the parallel regular expression pattern matching modules, the approximate matching module being keyed with a dictionary, the dictionary comprising a plurality of words, the method further comprising: the approximate matching module performing an approximate matching operation on the streaming data to thereby detect any approximate matches that exist within the streaming data with respect to any of the dictionary words; andthe firmware logic is further building a dictionary index for the streaming data based on the detected approximate matches. 45. The method of claim 44 wherein the approximate matching module and the parallel regular expression pattern matching modules are operating at hardware speeds on the streaming data as the streaming data continues to be received by the FPGA. 46. The method of claim 38 wherein the streaming data comprises a plurality of data objects, the method further comprising: the firmware logic, in response to the regular expression pattern matching operation detecting that data within a data object is a pattern match with respect to a pattern, tagging that data object as belonging to a class. 47. The method of claim 46 further comprising: the firmware logic building a class index based on the tagged data objects, the class index being associated with the class and configured to identify the data objects that belong to the associated class. 48. The method of claim 24 wherein the streaming data comprises a plurality of unstructured data objects, wherein the pattern indexes comprise a plurality of terms and a plurality of pointers associated with the terms, the terms corresponding to portions of the streaming data for which a pattern match was detected, wherein each pointer identifies where at least one data object related to the term associated with that pointer can be located in a computer system, the method further comprising: the firmware logic pre-processing the streaming data by (i) parsing the data objects into a plurality of words, (ii) generating a data object identifier that corresponds to each data object, wherein each data object identifier is indicative of its corresponding data object's position within the streaming data, and (iii) generating a position identifier that corresponds to each parsed word, wherein each position identifier is indicative of its corresponding word's position within the streaming data; andwherein the building step comprises the firmware logic building the pattern indexes by generating the pointers for the pattern indexes based at least in part on the generated data object identifiers and the generated position identifiers for the detected pattern matches. 49. A method comprising: delivering streaming data to a field programmable gate array (FPGA) via a bus at a bus bandwidth rate, wherein the streaming data comprises a plurality of web pages;the FPGA receiving the streaming data, the FPGA having firmware logic deployed thereon;the firmware logic performing regular expression pattern matching on the streaming data with respect to a plurality of patterns to detect whether any pattern matches exist within the streaming data;the firmware logic concurrently building, at the bus bandwidth rate as the streaming data continues to be delivered to the FPGA, a plurality of pattern indexes for the streaming data based on detected pattern matches, wherein each pattern index corresponds to a pattern against which regular expression pattern matching was performed and comprises a plurality of pointers to the web pages which contain the pattern corresponding to that pattern index;an Internet search engine receiving a pattern query;the Internet search engine looking UP the web pages that are responsive to the pattern query based on at least one of the pattern indexes; andthe Internet search engine providing search results for the pattern query in response to the looking UP step. 50. The method of claim 49 wherein the bus is a PCI-Express bus. 51. An apparatus comprising: a processor;a coprocessor; anda bus over which streaming data flows to the coprocessor at a bus bandwidth rate;wherein the processor is configured to manage a flow of the streaming data to the coprocessor;wherein the coprocessor comprises a plurality of regular expression pattern matching modules in parallel, each regular expression pattern matching module being keyed with a pattern;each regular expression pattern matching module being configured to(1) receive the streaming data,(2) perform a regular expression pattern matching operation on the streaming data to detect any pattern matches that exist within the streaming data with respect to its keyed pattern, and(3) operate simultaneously in parallel with the other regular expression pattern matching modules; andwherein the coprocessor is further configured to concurrently build, at the bus bandwidth rate, a plurality of pattern indexes for the streaming data based on detected pattern matches, wherein each pattern index corresponds to a pattern against which regular expression pattern matching was performed and comprises location data for detected pattern matches. 52. The apparatus of claim 51 wherein the coprocessor comprises a reconfigurable logic device, the regular expression pattern matching modules being implemented by the reconfigurable logic device. 53. The apparatus of claim 51 wherein the coprocessor comprises a graphics processor unit (GPU), the regular expression pattern matching modules being implemented by the GPU. 54. The apparatus of claim 51 wherein the coprocessor comprises a chip multi-processor (CMP), the regular expression pattern matching modules being implemented by the CMP. 55. The apparatus of claim 51 wherein the streaming data comprises a plurality of unstructured data objects; wherein the pattern indexes comprise a plurality of terms and a plurality of pointers associated with the terms, the terms corresponding to portions of the streaming data for which a pattern match was detected, wherein each pointer identifies where at least one data object related to the term associated with that pointer can be located in a computer system;wherein the coprocessor is further configured to (1) pre-process the streaming data by (i) parsing the data objects into a plurality of words, (ii) generating a data object identifier that corresponds to each data object, wherein each data object identifier is indicative of its corresponding data object's position within the streaming data, and (iii) generating a position identifier that corresponds to each parsed word, wherein each position identifier is indicative of its corresponding word's position within the streaming data, and (2) build the pattern indexes by generating the pointers for the pattern indexes based at least in part on the generated data object identifiers and the generated position identifiers for the detected pattern matches.
Copyright KISTI. All Rights Reserved.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.