IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0046395
(2011-03-11)
|
등록번호 |
US-8515682
(2013-08-20)
|
발명자
/ 주소 |
- Buhler, Jeremy Daniel
- Chamberlain, Roger Dean
- Franklin, Mark Allen
- Gyang, Kwame
- Jacob, Arpith Chacko
- Krishnamurthy, Praveen
- Lancaster, Joseph Marion
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
23 인용 특허 :
132 |
초록
▼
A system and method for performing similarity searching is disclosed. This includes a programmable logic device configured to include a pipeline that comprises a matching stage, the matching stage being configured to receive a data stream comprising a plurality of possible matches between a pluralit
A system and method for performing similarity searching is disclosed. This includes a programmable logic device configured to include a pipeline that comprises a matching stage, the matching stage being configured to receive a data stream comprising a plurality of possible matches between a plurality of data strings and a plurality of substrings of a query string. The pipeline may further include an ungapped extension prefilter stage located downstream from the matching stage, the prefilter stage being configured to shift through pattern matches between the data strings and the plurality of substrings of a query string and provide a score so that only pattern matches that exceed a user defined score will pass downstream from the prefilter stage. The matching stage may include at least one Bloom filter.
대표청구항
▼
1. A digital logic circuit comprising: a programmable logic device configured to include a pipeline that comprises a matching stage and a downstream ungapped extension prefilter stage, the matching stage being configured to receive a data stream comprising a plurality of possible matches between a p
1. A digital logic circuit comprising: a programmable logic device configured to include a pipeline that comprises a matching stage and a downstream ungapped extension prefilter stage, the matching stage being configured to receive a data stream comprising a plurality of possible matches between a plurality of data strings and a plurality of substrings of a query string, the ungapped extension prefilter stage being configured to (1) shift through pattern matches between the data strings and the plurality of substrings of a query string and (2) provide a score so that only pattern matches that exceed a user defined score will pass downstream from the ungapped extension prefilter stage. 2. The circuit of claim 1, wherein the programmable logic device comprises an FPGA. 3. The circuit of claim 1, wherein the data strings comprise biological sequence data strings. 4. The circuit of claim 3, wherein the matching stage comprises at least a portion of a first stage for BLAST, and wherein the ungapped extension prefilter stage comprises at least a portion of a second stage for BLAST. 5. The circuit of claim 3, wherein the BLAST first stage is selected from the group consisting of a first stage for BLASTN, BLASTP, BLASTX, TBLASTN, and TBLASTX and the BLAST second stage is selected from the group consisting of a second stage for BLASTN, BLASTP, BLASTX, TBLASTN, and TBLASTX. 6. The circuit of claim 1, wherein the matching stage comprises a Bloom filter stage. 7. The circuit of claim 6, wherein the Bloom filter stage comprises a plurality of Bloom filters. 8. The circuit of claim 6, wherein the matching stage further comprises a hashing stage downstream from the Bloom filter stage, the hashing stage being configured to map the possible matches to at least one position within the query string. 9. The circuit of claim 8, further comprising a redundancy filter downstream from the hashing stage, the redundancy filter being configured to receive data representative of possible matches from the hashing stage and remove possible matches that are redundant with previous possible matches within a selectable degree of redundancy. 10. The circuit of claim 1, wherein the ungapped extension prefilter stage comprises an extension controller, a window lookup module and a scoring module. 11. The circuit of claim 1, wherein the ungapped extension prefilter stage utilizes parameters selected from the group consisting of a matching score, a mismatching score and a cutoff threshold. 12. The circuit of claim 1, wherein the ungapped extension prefilter stage comprises an extension controller. 13. The circuit of claim 12, wherein the extension controller comprises at least one of hardware, firmware and software. 14. The circuit of claim 12, wherein the extension controller comprises a plurality of data inputs including at least one first input for plurality of possible pattern matches of a predetermined length between a plurality of the data strings and a plurality of substrings of the query string and at least one second input for the data strings. 15. The circuit of claim 14, wherein at least one second input can receive commands for controlling parameters selected from the group consisting of a match score, a mismatch score or a cutoff threshold. 16. The circuit of claim 12, wherein the extension controller parses the data strings received by the at least one second input to demultiplex a shared command and the pattern matches of a predetermined length prior to submission into the window lookup module. 17. The circuit of claim 1, wherein the ungapped extension prefilter stage comprises a window lookup module. 18. The circuit of claim 17, wherein the window lookup module can fetch at least one substring of the data strings and at least one substring of the query string to form a predefined window of data that can be utilized for analysis. 19. The circuit of claim 18, wherein the at least one substring of the data strings and at least one substring of the query string are buffered on a plurality of BRAMS. 20. The circuit of claim 17, wherein the window lookup module receives at least one substring of the data strings in a database buffer, at least one substring of the query string in a query buffer, an offset for at least one substring of the data strings and an offset for at least one substring of the query string, wherein the window lookup module calculates a beginning of a window that is passed to a plurality of buffer modules, wherein the plurality of buffer modules move the at least one substring of the query string and the at least one substring of the data strings to an output stage to create at least one predefined window of data that can be utilized for analysis. 21. The circuit of claim 1, wherein the ungapped extension prefilter stage comprises a scoring module. 22. The circuit of claim 21, wherein the scoring module is capable of fetching data strings within a predefined window. 23. The circuit of claim 22, wherein the scoring module utilizes at least one iteration of algorithm instructions. 24. The circuit of claim 21, wherein the scoring module further comprises a base comparator, a plurality of scoring stages and a threshold comparator. 25. The circuit of claim 21, wherein the scoring module further comprises a base comparator. 26. The circuit of claim 25, wherein the base comparator receives at least one pair of a substring of the data strings and a substring of the query string and compares these values and obtains at least one comparison score. 27. The circuit of claim 26, wherein the at least one comparison score includes a positive value for each matching pair and a negative score for each mismatching pair. 28. The circuit of claim 26, wherein the at least one comparison score is substituted from values located on a lookup table. 29. The circuit of claim 21, wherein the scoring module further comprises a plurality of scoring stages. 30. The circuit of claim 29, further comprising discarding at least one comparison score from each subsequent downstream scoring stage of the plurality of scoring stages. 31. The circuit of claim 29, wherein at least one scoring stage of the plurality of scoring stages receives input selected from the group of comparisons of the data strings in addition to data strings and query positions within a predetermined window of data and at least one scoring stage of the plurality of scoring stages receives input from at least one other scoring stage of the plurality of scoring stages, wherein the input is selected from the group consisting of scores of high-scoring segment pairs and endpoints of high-scoring segment pairs. 32. The circuit of claim 21, wherein the scoring module further comprises a threshold comparator. 33. The circuit of claim 32, wherein the threshold comparator obtains a pattern match of a predetermined length that includes at least one score and determines if the at least one score exceeds a predetermined threshold value and passes the pattern match downstream. 34. The circuit of claim 32, wherein the threshold comparator obtains a pattern match of a predetermined length and determines if a high scoring substring intersects a boundary of a predetermined window of data. 35. A method comprising: processing, by a programmable logic device, a data stream, the programmable logic device configured to include a pipeline that comprises a matching stage and a downstream ungapped extension prefilter stage, wherein the processing step comprises: the matching stage receiving the data stream, the data stream comprising a plurality of possible matches between a plurality of data strings and a plurality of substrings of a query string; andthe ungapped extension prefilter stage shifting through pattern matches between the data strings and the plurality of substrings of the query string and providing a score so that only pattern matches that exceed a user defined score will pass downstream from the ungapped extension prefilter stage. 36. The method of claim 35, wherein the programmable logic device comprises an FPGA. 37. The method of claim 35, wherein the data strings comprise biological sequence data strings. 38. The method of claim 37, wherein the matching stage includes utilizing at least a portion of a first stage for BLAST and the ungapped extension prefilter stage includes utilizing at least a portion of a second stage for BLAST. 39. The method of claim 38, wherein the BLAST first stage is selected from the group consisting of a first stage for BLASTN, BLASTP, BLASTX, TBLASTN, and TBLASTX and the BLAST second stage is selected from the group consisting of a second stage for BLASTN, BLASTP, BLASTX, TBLASTN, and TBLASTX. 40. The method of claim 35, comprises utilizing a Bloom filter stage in the matching stage of the pipeline. 41. The method of claim 39, wherein the Bloom filter stage includes a plurality of Bloom filters. 42. The method of claim 39, further comprising utilizing a hashing stage downstream from the Bloom filter stage and mapping the possible matches to at least one position within the query string with the hashing stage. 43. The method of claim 42, further comprising utilizing a redundancy filter downstream from the hashing stage, with the redundancy filter being configured to receive data representative of possible matches from the hashing stage and further comprising removing possible matches that are redundant with previous possible matches within a selectable degree of redundancy. 44. The method of claim 35, wherein the ungapped extension prefilter stage comprises an extension controller, a window lookup module and a scoring module. 45. The method of claim 35, further comprising utilizing parameters selected from the group consisting of a match score, a mismatch score or a cutoff threshold with the ungapped extension prefilter stage. 46. The method of claim 35, wherein the ungapped extension prefilter stage comprises an extension controller. 47. The method of claim 46, wherein the extension controller comprises at least one of hardware, firmware and software. 48. The method of claim 46, further comprising receiving a plurality of possible pattern matches of a predetermined length between a plurality of the data strings and a plurality of substrings of the query string as at least one first input and receiving the data strings with at least one second input. 49. The method of claim 46, further comprising receiving commands for controlling parameters selected from the group consisting of a match score, a mismatch score or a cutoff threshold with the at least one second input. 50. The method of claim 46, further comprising parsing the data received by the at least one second input to demultiplex the shared command and the pattern matches of a predetermined length prior to submission into the window lookup module. 51. The method of claim 35, wherein the ungapped extension prefilter stage comprises a window lookup module. 52. The method of claim 51, further comprising fetching at least one substring of the data strings and at least one substring of a query string to form a predefined window of biological data for analysis with the window lookup module. 53. The method of claim 52, further comprising buffering the at least one substring of the data strings and at least one substring of a query string on a plurality of BRAMS. 54. The method of claim 51, further comprising: receiving at least one substring of the data strings in a database buffer with the window lookup module;receiving at least one substring of the query string in a query buffer with the window lookup module;receiving an offset for the at least one substring of the data strings and an offset for the at least one substring of the query string that is utilized to calculate a beginning of a window;moving the offset for the at least one substring of the data strings and the offset for at least one substring of the query string with a plurality of buffer modules; andcreating at least one predefined window of data for analysis based on the at least one substring of the data strings, the at least one substring of the query string, the offset for the at least one substring of the data strings and the offset for at least one substring of the query string. 55. The method of claim 35, wherein the ungapped extension prefilter stage comprises a scoring module. 56. The method of claim 55, further comprising fetching the data strings within a predefined window with the scoring module. 57. The method of claim 55, further utilizing at least one iteration of algorithm instructions with the scoring module. 58. The method of claim 55, wherein the scoring module further comprises a base comparator, a plurality of scoring stages and a threshold comparator. 59. The method of claim 55, wherein the scoring module further comprises a base comparator. 60. The method of claim 59, further comprising receiving at least one pair of a substring of the data strings and a substring of the query string and comparing these values and obtaining at least one comparison score with the base comparator. 61. The method of claim 60, wherein the at least one comparison score includes a positive value for each matching pair and a negative score for each mismatching pair. 62. The method of claim 60, further comprising substituting the at least one comparison score from values located on a lookup table. 63. The method of claim 55, wherein the scoring module further comprises a plurality of scoring stages. 64. The method of claim 63, further comprising discarding at least one comparison score for each stage of the plurality of scoring stages. 65. The method of claim 63, further comprising receiving input selected from the group of comparisons of the data strings and query positions within a predetermined window of data with at least one scoring stage of the plurality of scoring stages and further comprising receiving input is selected from the group consisting of scores of high-scoring segment pairs and endpoints of high-scoring segment pairs from at least one other scoring stage of the plurality of scoring stages. 66. The method of claim 55, wherein the scoring module further comprises a threshold comparator. 67. The method of claim 66, further comprising obtaining a pattern match of a predetermined length that includes at least one score and determines if the at least one score exceeds a predetermined threshold value with the threshold comparator and passing the pattern match downstream. 68. The method of claim 66, further comprising obtaining a pattern match of a predetermined length and determines if a high scoring substring intersects a boundary of a predetermined window of data with the threshold comparator.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.