Device and method for full-text large-dictionary string matching using n-gram hashing
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-017/21
G06F-017/20
출원번호
US-0247318
(1999-02-10)
발명자
/ 주소
Cohen Jonathan Drew
출원인 / 주소
The United States of America as represented by the Director of the National Security Agency
대리인 / 주소
Bloor
인용정보
피인용 횟수 :
126인용 특허 :
10
초록▼
A method and apparatus providing full-text scanning for matches in a large dictionary is described. The invention is suitable for SDI (selective dissemination of information) systems, accommodating large dictionaries (10.sup.4 to 10.sup.5 entries) and rapid processing. A preferred embodiment employs
A method and apparatus providing full-text scanning for matches in a large dictionary is described. The invention is suitable for SDI (selective dissemination of information) systems, accommodating large dictionaries (10.sup.4 to 10.sup.5 entries) and rapid processing. A preferred embodiment employs a hardware primary test on a single commercially-available gate-array board hosted by a computer, in which a software secondary test is conducted. No delimiter cues such as spaces or punctuation are required.
대표청구항▼
[ I claim:] [1.] A dictionary string matching method for locating all matches of a keyword dictionary in a sample byte stream, said keyword dictionary consisting of d keywords, each of said keywords being composed of a sequence of bytes of a general nature, comprising the steps of:(a) initializing,
[ I claim:] [1.] A dictionary string matching method for locating all matches of a keyword dictionary in a sample byte stream, said keyword dictionary consisting of d keywords, each of said keywords being composed of a sequence of bytes of a general nature, comprising the steps of:(a) initializing, comprising the steps of building and clearing m hash tables, said m hash tables being composed of presence values, said presence values being binary entries of logical value, said clearing consisting of the setting the value of each of said binary entries to false;(b) thereafter, for each distinct length of keywords in the dictionary, choosing m n-gram selection positions;(c) thereafter, for each integer i taking values of 1 through d, inclusive, performing the steps of:(i) determining the length of the ith of the keywords;(ii) extracting m keyword n-grams from the ith keyword beginning at said m n-gram selection positions in the keyword, respectively;(iii) hashing each of said keyword n-grams, producing m keyword hash addresses, each of said keyword hash addresses corresponding to one of said keyword n-grams; said hashing being performed using m hashing functions; and(iv) recording, comprising the step of posting said m keyword n-grams in said m hash tables, performed by registering each of said m keyword n-grams in the corresponding one of said m hash tables, said registering within one of said m hash tables being performed by setting one of said presence values therein to the value of true, at the address identified by the corresponding one of said m keyword hash addresses;(d) thereafter, creating m presence values streams by performing, for each position in the sample byte stream, n, n+1, n+2, . . . , in order, the steps of:(i) extracting from the sample byte stream the sample n-gram consisting of the n consecutive bytes terminating in the byte at said position;(ii) computing m sample hash addresses for said sample n-gram, the jth of said sample hash addresses calculated by applying to said sample n-gram the jth of said m hashing functions;(iii) reading m sample presence values from said m hash tables, the jth of said m sample presence values obtained by reading the value from the jth of said m hash tables at the address identified by the jth of said m sample hash addresses; and(iv) appending said m sample presence values to said m presence values streams by appending the jth of said m sample presence values to the jth of said m presence values streams;(e) generating primary alarms, by performing, for each distinct length of the keywords in the dictionary, the steps of:(i) calculating m delaying amounts, the jth of said m delaying amounts being additively complementary to the jth of said m n-gram selection positions for said distinct length of the keywords in the dictionary;(i) forming m delayed presence values streams according to said m delaying amounts, the jth of said m delayed presence values streams being produced by delaying the jth of said m presence values streams by an amount equal to the jth of said m delaying amounts; and(ii) applying an alarm sensing method to said m delayed presence values streams, producing one of said primary alarms only when said alarm sensing method senses the coincidence of true values in each of said m delayed presence values streams; and(f) for each of said primary alarms, applying a secondary testing method to verify said primary alarm, resulting in a match report if verification holds.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (10)
Mayers Clay (San Diego CA) Whiting Douglas L. (Carlsbad CA), Data compression apparatus and method using matching string searching and Huffman encoding.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Adding information or functionality to a rendered document via association with an electronic counterpart.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Aggregate analysis of text captures performed by multiple users from rendered documents.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Association of a portable scanner with input/output and storage devices.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Association of a portable scanner with input/output and storage devices.
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Automatically capturing information, such as capturing information using a document-aware device.
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J.; Daley-Watson, Christopher J., Automatically providing content associated with captured information, such as information captured in real-time.
King, Martin Towle; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Quentin, Capturing text from rendered documents using supplement information.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford Fraser, James Q., Capturing text from rendered documents using supplemental information.
King, Martin Towle; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Quentin, Capturing text from rendered documents using supplemental information.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Handheld device for capturing text from both a document printed on paper and a document displayed on a dynamic display device.
Milliken, Walter Clark; Strayer, William Timothy; Milligan, Stephen Douglas; Sanchez, Luis; Partridge, Craig, Hash-based systems and methods for detecting and preventing transmission of polymorphic network worms and viruses.
Milliken, Walter Clark; Strayer, William Timothy; Milligan, Stephen Douglas; Sanchez, Luis; Partridge, Craig, Hash-based systems and methods for detecting and preventing transmission of polymorphic network worms and viruses.
Milliken, Walter Clark; Strayer, William Timothy; Milligan, Stephen Douglas, Hash-based systems and methods for detecting and preventing transmission of unwanted e-mail.
Parsons, Scott; Taylor, David E.; Schuehler, David Vincent; Franklin, Mark A.; Chamberlain, Roger D., High speed processing of financial information using FPGA devices.
Parsons, Scott; Taylor, David E.; Schuehler, David Vincent; Franklin, Mark A.; Chamberlain, Roger D., High speed processing of financial information using FPGA devices.
Parsons, Scott; Taylor, David E.; Schuehler, David Vincent; Franklin, Mark A.; Chamberlain, Roger D., High speed processing of financial information using FPGA devices.
Parsons, Scott; Taylor, David E.; Schuehler, David Vincent; Franklin, Mark A.; Chamberlain, Roger D., High speed processing of financial information using FPGA devices.
Parsons, Scott; Taylor, David E.; Schuehler, David Vincent; Franklin, Mark A.; Chamberlain, Roger D., High speed processing of financial information using FPGA devices.
Parsons, Scott; Taylor, David E.; Schuehler, David Vincent; Franklin, Mark A.; Chamberlain, Roger D., High speed processing of financial information using FPGA devices.
Parsons, Scott; Taylor, David E.; Schuehler, David Vincent; Franklin, Mark A.; Chamberlain, Roger D., High speed processing of financial information using FPGA devices.
Parsons, Scott; Taylor, David E.; Schuehler, David Vincent; Franklin, Mark A.; Chamberlain, Roger D., High speed processing of financial information using FPGA devices.
Parsons, Scott; Taylor, David E.; Schuehler, David Vincent; Franklin, Mark A.; Chamberlain, Roger D., High speed processing of financial information using FPGA devices.
Parsons, Scott; Taylor, David E.; Schuehler, David Vincent; Franklin, Mark A.; Chamberlain, Roger D., High speed processing of financial information using FPGA devices.
Parsons, Scott; Taylor, David E.; Schuehler, David Vincent; Franklin, Mark A.; Chamberlain, Roger D., High speed processing of financial information using FPGA devices.
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Identifying a document by performing spectral analysis on the contents of the document.
Chamberlain, Roger D.; Brink, Benjamin M.; White, Jason R.; Franklin, Mark A.; Cytron, Ron K., Intelligent data storage and processing using FPGA devices.
Chamberlain, Roger D.; Franklin, Mark Allen; Indeck, Ronald S.; Cytron, Ron K.; Cholleti, Sharath R., Intelligent data storage and processing using FPGA devices.
Chamberlain, Roger D.; Franklin, Mark Allen; Indeck, Ronald S.; Cytron, Ron K.; Cholleti, Sharath R., Intelligent data storage and processing using FPGA devices.
Chamberlain, Roger D.; Franklin, Mark Allen; Indeck, Ronald S.; Cytron, Ron K.; Cholleti, Sharath R., Intelligent data storage and processing using FPGA devices.
Chamberlain, Roger D.; Franklin, Mark Allen; Indeck, Ronald S.; Cytron, Ron K.; Cholleti, Sharath R., Intelligent data storage and processing using FPGA devices.
Chamberlain, Roger D.; Franklin, Mark Allen; Indeck, Ronald S.; Cytron, Ron K.; Cholleti, Sharath R., Intelligent data storage and processing using FPGA devices.
Lancaster, Joseph M.; Henrichs, Michael John; Tidwell, Terry; St. John, Alex; Sprague, Kevin Brian, Method and apparatus for accelerated data translation using record layout detection.
Henrichs, Michael John; Lancaster, Joseph M.; Chamberlain, Roger Dean; White, Jason R.; Sprague, Kevin Brian; Tidwell, Terry, Method and apparatus for accelerated format translation of data in a delimited data format.
Henrichs, Michael John; Lancaster, Joseph M.; Chamberlain, Roger Dean; White, Jason R.; Sprague, Kevin Brian; Tidwell, Terry, Method and apparatus for accelerated format translation of data in a delimited data format.
Indeck, Ronald S.; Cytron, Ron Kaplan; Franklin, Mark Allen, Method and apparatus for approximate matching where programmable logic is used to process data being written to a mass storage medium and process data being read from a mass storage medium.
Dharmapurikar,Sarang; Krishnamurthy,Praveen; Sproull,Todd; Lockwood,John, Method and apparatus for detecting predefined signatures in packet payload using Bloom filters.
Taylor, David E.; Parsons, Scott; Whatley, Jeremy Walter; Bradley, Richard; Gyang, Kwame; DeWulf, Michael, Method and apparatus for high-speed processing of financial market depth data.
Taylor, David E.; Parsons, Scott; Whatley, Jeremy Walter; Bradley, Richard; Gyang, Kwame; DeWulf, Michael, Method and apparatus for high-speed processing of financial market depth data.
Taylor, David E.; Parsons, Scott; Whatley, Jeremy Walter; Bradley, Richard; Gyang, Kwame; DeWulf, Michael, Method and apparatus for high-speed processing of financial market depth data.
Buhler, Jeremy Daniel; Chamberlain, Roger Dean; Franklin, Mark Allen; Gyang, Kwame; Jacob, Arpith Chacko; Krishnamurthy, Praveen; Lancaster, Joseph Marion, Method and apparatus for performing similarity searching.
Buhler, Jeremy Daniel; Chamberlain, Roger Dean; Franklin, Mark Allen; Gyang, Kwame; Jacob, Arpith Chacko; Krishnamurthy, Praveen; Lancaster, Joseph Marion, Method and apparatus for performing similarity searching.
Indeck, Ronald S.; Cytron, Ron Kaplan; Franklin, Mark Allen; Chamberlain, Roger D., Method and apparatus for processing financial information at hardware speeds using FPGA devices.
Indeck, Ronald S.; Indeck, David Mark; Singla, Naveen; Taylor, David E., Method and system for high performance integration, processing and searching of structured and unstructured data.
Indeck, Ronald S.; Indeck, David Mark, Method and system for high performance integration, processing and searching of structured and unstructured data using coprocessors.
Indeck, Ronald S.; Indeck, David Mark, Method and system for high performance integration, processing and searching of structured and unstructured data using coprocessors.
Indeck, Ronald S.; Indeck, David Mark, Method and system for high performance integration, processing and searching of structured and unstructured data using coprocessors.
Indeck, Ronald S.; Indeck, David Mark, Method and system for high performance integration, processing and searching of structured and unstructured data using coprocessors.
Taylor, David E.; Indeck, Ronald S.; White, Jason R.; Chamberlain, Roger D., Method and system for high throughput blockwise independent encryption/decryption.
Taylor, David E.; Indeck, Ronald S.; White, Jason R.; Chamberlain, Roger D., Method and system for high throughput blockwise independent encryption/decryption.
Taylor, David E.; Indeck, Ronald S.; White, Jason R.; Chamberlain, Roger D., Method and system for high throughput blockwise independent encryption/decryption.
Taylor, David E.; Singla, Naveen; Brodie, Benjamin C.; McVicar, Nathaniel Sutton; Thiel, Justin Ryan; Indeck, Ronald S., Method and system for low latency basket calculation.
Dharmapurikar, Sarang; Krishnamurthy, Praveen; Taylor, David Edward, Method and system for performing longest prefix matching for network address lookup using bloom filters.
King,Martin T.; Grover,Dale L.; Kushler,Clifford A.; Stafford Fraser,James Q., Methods, systems and computer program products for data gathering in a digital and hard copy document environment.
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Performing actions based on capturing information from rendered documents, such as documents under copyright.
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Performing actions based on capturing information from rendered documents, such as documents under copyright.
King, Martin Towle; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Quentin, Processing techniques for text capture from a rendered document.
King, Martin T.; Kushler, Clifford A.; Stafford-Fraser, James Q.; Grover, Dale L., Processing techniques for visual capture data from a rendered document.
King, Martin T.; Kushler, Clifford A.; Stafford-Fraser, James Q.; Grover, Dale L., Processing techniques for visual capture data from a rendered document.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Search engines and systems with handheld document data capture devices.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Search engines and systems with handheld document data capture devices.
Kulig, Matthew P.; Brooks, Timmy L.; Lockwood, John W.; Reddick, David Kyle, System and method for controlling transmission of data packets over an information network.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.