System and method for keyword spotting using representative dictionary
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-017/27
G06F-017/30
G06F-021/55
출원번호
US-0704702
(2017-09-14)
등록번호
US-10198427
(2019-02-05)
우선권정보
IL-224482 (2013-01-29)
발명자
/ 주소
Yishay, Yitshak
출원인 / 주소
VERINT SYSTEMS LTD.
대리인 / 주소
Meunier Carlin & Curfman LLC
인용정보
피인용 횟수 :
0인용 특허 :
69
초록▼
Methods and systems for keyword spotting, i.e., for identifying textual phrases of interest in input data. In the embodiments described herein, the input data comprises communication packets exchanged in a communication network. The disclosed keyword spotting techniques can be used, for example, in
Methods and systems for keyword spotting, i.e., for identifying textual phrases of interest in input data. In the embodiments described herein, the input data comprises communication packets exchanged in a communication network. The disclosed keyword spotting techniques can be used, for example, in applications such as Data Leakage Prevention (DLP), Intrusion Detection Systems (IDS) or Intrusion Prevention Systems (IPS), and spam e-mail detection. A keyword spotting system holds a dictionary of textual phrases for searching input data. In a communication analytics system, for example, the dictionary defines textual phrases to be located in communication packets—such as e-mail addresses or Uniform Resource Locators (URLs).
대표청구항▼
1. A method for searching input data for textual phrases, the method comprising: providing a system having an external memory containing a first dictionary of first textual phrases and a cache memory containing a second dictionary of second textual phrases, wherein the cache memory has a faster acce
1. A method for searching input data for textual phrases, the method comprising: providing a system having an external memory containing a first dictionary of first textual phrases and a cache memory containing a second dictionary of second textual phrases, wherein the cache memory has a faster access speed than the external memory, and wherein the second dictionary represents the first dictionary but has a smaller data size than the first dictionary because the second textual phrases are sub-strings derived from the first textual phrases that are shorter than the first textual phrases;receiving input data using the system;searching the input data with the second dictionary;in response to identifying in the input data a second textual phrase from the second dictionary, locating in the input data a first textual phrase from the first dictionary corresponding to the identified second textual phrase; andusing the located first textual phrase to perform one of data leakage prevention, intrusion detection, intrusion prevention, spam e-mail detection, or detection of inappropriate content. 2. The method according to claim 1, wherein each first textual phrase in the first dictionary corresponds to at least one of the second textual phrases in the second dictionary. 3. The method according to claim 1, wherein the first textual phrases are strings of characters that include wildcard characters. 4. The method according to claim 3, wherein a string of characters corresponds to a data communication packet. 5. The method according to claim 4, wherein the second dictionary comprises rectangles, wherein each rectangle comprises a list of sub-strings. 6. The method according to claim 5, wherein each sub-string in a rectangle has the same number of characters. 7. The method according to claim 1, wherein a plurality of first textual phrases in the first dictionary correspond to a single second textual phrase in the second dictionary. 8. The method according to claim 1, wherein the first textual phrases include commonly found sub-strings that are common to a majority of the first textual phrases, and wherein the second textual phrases do not include the commonly found sub-strings. 9. The method according to claim 1, wherein the cache memory is large enough to contain the second dictionary but is too small to contain the first dictionary. 10. A system for searching input data for textual phrases, the system comprising: an external memory containing a first dictionary of first textual phrases;a cache memory containing a second dictionary of second textual phrases, wherein the cache memory has a faster access speed than the external memory, and wherein the second dictionary represents the first dictionary but has a smaller data size than the first dictionary because the second textual phrases are sub-strings derived from the first textual phrases that are shorter than the first textual phrases;a network interface card (NIC) that receives input data from a network; anda processor that is communicatively coupled to the external memory, the cache memory, and the NIC, wherein the processor is configured by software to:receive the input data from the NIC,search the input data with the second dictionary,in response to identifying in the input data a second textual phrase from the second dictionary, locating in the input data a first textual phrase from the first dictionary corresponding to the identified second textual phrase, andusing the located first textual phrase to perform one of data leakage prevention,intrusion detection, intrusion prevention, spam e-mail detection, or detection of inappropriate content. 11. The system according to claim 10, wherein the textual phrases comprise e-mail addresses and/or uniform resource locators (URLs). 12. The system according to claim 10, wherein each first textual phrase in the first dictionary corresponds to at least one of the second textual phrases in the second dictionary. 13. The system according to claim 10, wherein the first textual phrases are strings of characters that include wildcard characters. 14. The system according to claim 13, wherein a string of characters corresponds to a data communication packet. 15. The system according to claim 14, wherein the second dictionary comprises rectangles, wherein each rectangle comprises a list of sub-strings. 16. The system according to claim 15, wherein each sub-string in a rectangle has the same number of characters. 17. The system according to claim 10, wherein a plurality of first textual phrases in the first dictionary correspond to a single second textual phrase in the second dictionary. 18. The system according to claim 10, wherein the first textual phrases include commonly found sub-strings that are common to a majority of the first textual phrases, and wherein the second textual phrases do not include the commonly found sub-strings. 19. The system according to claim 10, wherein the cache memory is large enough to contain the second dictionary but is too small to contain the first dictionary. 20. The system according to claim 10, wherein the cache memory is a level-two (L2) cache of the processor.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (69)
Baarman David W. (Zeeland MI) Richards David M. (Littleton CO 4), Apparatus and method for effecting data compression.
Bass Vance R. (Austin TX) Bonebrake Veronica A. (Leander TX) Garrison David A. (Austin TX) Landis James K. (Austin TX) Neff Mary S. (Montrose NY) Urquhart Robert J. (Austin ; both of TX) Williams Sus, Compound word spelling verification.
Bass Vance R. (Austin TX) Bonebrake Veronica A. (Leander TX) Garrison David A. (Austin TX) Landis James K. (Austin TX) Neff Mary S. (Montrose NY) Urquhart Robert J. (Austin TX) Williams Susan C. (Aus, Compound word suitability for spelling verification.
Cerna, Michael D.; Nagle, James C.; Ruan, Qing; Schmidt, Darren R.; Wenzel, Lothar, Identifying randomly distributed microparticles in images to sequence a polynucleotide.
Carus Alwin B. ; Good Kathleen, Method and apparatus for automatic identification of word boundaries in continuous text and computation of word boundary scores.
Chiang, Hui-Hwa; Lee, Kuo-Chun; Chen, Tsung-Yen (Eric); Han, Ching-Chih (Jason), Method and apparatus for automatically recording snapshots of a computer screen during a computer session for later playback.
Zolotov, Moshe, Method and system for creating real time integrated Call Details Record (CDR) databases in management systems of telecommunication networks.
Sykes, Mark; Baldock, George Ronald, Method for converting speech to text, performing natural language processing on the text output, extracting data values and matching to an electronic ticket form.
Potter Terry W. (Acton MA) Worrell Glen C. (Auburn MA), Parallel associative memory having improved selection and decision mechanisms for recognizing and sorting relevant patte.
Hermansen, John Christian; Shaefer, Jr., Leonard Arthur; McCallum-Bayliss, Heather; Lutz, Richard D., System and method for adaptive multi-cultural searching and matching of personal names.
Honig,Andrew; Howard,Andrew; Eskin,Eleazar; Stolfo,Salvatore J., System and methods for adaptive model generation for detecting intrusions in computer systems.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.