IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0060313
(2013-10-22)
|
등록번호 |
US-9633093
(2017-04-25)
|
발명자
/ 주소 |
- Henrichs, Michael John
- Lancaster, Joseph M.
- Chamberlain, Roger Dean
- White, Jason R.
- Sprague, Kevin Brian
- Tidwell, Terry
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
4 인용 특허 :
183 |
초록
▼
Various methods and apparatuses are described for performing high speed format translations of incoming data, where the incoming data is arranged in a delimited data format. As an example, the data in the delimited data format can be translated to a fixed field format using pipelined operations. A r
Various methods and apparatuses are described for performing high speed format translations of incoming data, where the incoming data is arranged in a delimited data format. As an example, the data in the delimited data format can be translated to a fixed field format using pipelined operations. A reconfigurable logic device can be used in exemplary embodiments as a platform for the format translation.
대표청구항
▼
1. A method comprising: receiving data in a delimited data format, wherein the received data in the delimited data format comprises (1) a plurality of data characters arranged in a plurality of fields, (2) a plurality of shield characters, and (3) a plurality of field delimiter characters, the field
1. A method comprising: receiving data in a delimited data format, wherein the received data in the delimited data format comprises (1) a plurality of data characters arranged in a plurality of fields, (2) a plurality of shield characters, and (3) a plurality of field delimiter characters, the field delimiter characters defining a plurality of boundaries between the fields;converting the received data to a fixed field format, wherein the converting step comprises the reconfigurable logic device (1) distinguishing between field delimiter characters and data characters in the received data based on the shield characters, (2) identifying the fields in the received data based on the field delimiter characters, and (3) arranging the data characters sharing the same identified fields into fixed-size fields such that the converted data comprises the data characters in the fixed-size fields stripped of the field delimiter characters and the shield characters;performing a plurality of processing operations on the converted data to generate processed data in the fixed field format;loading the processed data into a database; andwherein the converting step is performed by a reconfigurable logic device. 2. The method of claim 1 wherein the converted data comprises a plurality of data fields, each data field having a known field length, wherein the processing operations comprise a plurality of field-specific data processing operations, and wherein the performing step comprises targeting a specific field of the converted data for a field-specific processing operation without analyzing the data content of the data fields. 3. The method of claim 1 wherein the data processing operations comprise data quality checking operations as part of an extract, transfer, load (ETL) procedure. 4. The method of claim 1 wherein the at least one of the processing operations is performed by software executed by a processor. 5. The method of claim 1 wherein the converting step comprises converting a plurality of characters of the received data to the fixed field format per clock cycle. 6. The method of claim 1 wherein the at least one of the processing operations is performed by a reconfigurable logic device. 7. A method comprising: a reconfigurable logic device receiving an incoming stream comprising a plurality of bytes arranged in a delimited data format, the incoming byte stream being representative of data arranged in a plurality of fields, wherein the received byte stream comprises a plurality of data characters, a plurality of field delimiter characters, and a plurality of shield characters, the field delimiter characters defining a plurality of boundaries between the fields;the reconfigurable logic device processing the received byte stream, wherein the processing step comprises: the reconfigurable logic device distinguishing between field delimiter characters and data characters in the received byte stream based on the shield characters to identify the field delimiter characters that are present in the received byte stream; andthe reconfigurable logic device identifying the fields in the received byte stream based on the identified field delimiter characters; andthe reconfigurable logic device translating the received byte stream to an outgoing byte stream arranged in a fixed field format based on the identified field delimiter characters, the outgoing byte stream comprising a plurality of the data characters of the received byte stream arranged in a plurality of fixed-size fields, and wherein the translating step comprises the reconfigurable logic device arranging the data characters sharing the same identified field into the fixed-size fields such that the outgoing byte stream comprises the data characters in the fixed-size fields stripped of the field delimiter characters and the shield characters. 8. The method of claim 7wherein the processing step further comprises the reconfigurable logic device identifying the shield characters that are present in the received byte stream. 9. The method of claim 8 wherein the translating step comprises the reconfigurable logic device removing the identified field delimiter characters from the outgoing byte stream. 10. The method of claim 9 wherein the translating step further comprises the reconfigurable logic device removing the identified shield characters from the outgoing byte stream. 11. The method of claim 8 further comprising the reconfigurable logic device converting the received byte stream to an internal format tagged with associated control data that identifies the boundaries between the fields. 12. The method of claim 11 wherein the converting step further comprises the reconfigurable logic device generating a shield character mask associated with the received byte stream to identify the bytes in the received byte stream that are eligible for consideration as to whether they contain a field delimiter character. 13. The method of claim 12 wherein the converting step further comprises the reconfigurable logic device processing the bytes of the received byte stream and the generated shield character mask to generate field delimiter flag data associated with the received byte stream, the field delimiter flag data being indicative of whether an associated byte corresponds to a field delimiter character. 14. The method of claim 13 wherein the incoming byte stream is further representative of a plurality of records, at least one of the records comprising at least one of the fields, the incoming byte stream further comprising a plurality of record delimiter characters, the record delimiter characters defining a plurality of boundaries between the records, and wherein the converting step further comprises the reconfigurable logic device processing the bytes of the received byte stream and the generated shield character mask to generate record delimiter flag data associated with the received byte stream, the record delimiter flag data being indicative of whether an associated byte corresponds to a record delimiter character. 15. The method of claim 14 wherein the converting step further comprises the reconfigurable logic device identifying any empty fields that exist within the received byte stream based on the field delimiter flag data and the record delimiter flag data. 16. The method of claim 15 wherein the converting step further comprises the reconfigurable logic device removing the field delimiter characters and the record delimiter characters from the internally formatted byte stream based on the field delimiter flag data and the record delimiter flag data. 17. The method of claim 16 wherein the converting step further comprises the reconfigurable logic device generating control data associated with the internally formatted byte stream, the control data comprising (1) a start of field flag, (2) an end of field flag, (3) a start of record flag, (4) an end of record flag, and (5) a field identifier. 18. The method of claim 11 wherein the shield character identifying step further comprises the reconfigurable logic device performing a shield character removal operation on the bytes of the received byte stream. 19. The method of claim 18 wherein the shield character removal performing step comprises the reconfigurable logic device (1) distinguishing between the data characters that match the shield character and the shield characters, and (2) removing the identified shield characters. 20. The method of claim 11 further comprising the reconfigurable logic device generating the outgoing byte stream in the fixed field format from the internally formatted byte stream and the associated control data. 21. The method of claim 20 wherein the generating step further comprises the reconfigurable logic device filling a register corresponding to a fixed length field with the data characters of a field of the internally formatted byte stream based on the associated control data. 22. The method of claim 21 wherein the generating step further comprises the reconfigurable logic device filling the register with padding characters if there are not enough data characters of the field of the internally formatted byte stream to complete the fixed length field. 23. The method of claim 18 further comprising: the reconfigurable logic device providing the outgoing byte stream to a data processing component for processing thereby; andthe data processing component selectively targeting a field of the outgoing byte stream for processing without analyzing the data characters of the outgoing byte stream. 24. The method of claim 23 further comprising: the reconfigurable logic device receiving processed data representative of the outgoing byte stream from the data processing component; andthe reconfigurable logic device translating the processed data back to the delimited data format. 25. The method of claim 8 further comprising: the reconfigurable logic device converting the received byte stream to an internal format tagged with associated control data that identifies the boundaries between the fields;the reconfigurable logic device performing a shield character removal operation on the bytes of the received byte stream; andthe reconfigurable logic device generating the outgoing byte stream in the fixed field format from the internally formatted byte stream and the associated control data; andwherein the reconfigurable logic device performs the converting step, the shield character removal performing step, and the generating step simultaneously with respect to each other in a pipelined fashion. 26. The method of claim 7 wherein the reconfigurable logic device performs the processing and translating steps for a plurality of characters in the byte stream per clock cycle. 27. The method of claim 7 wherein the delimited data format comprises a comma separated value (CSV) format. 28. The method of claim 7 further comprising: selectively targeting a field of the outgoing byte stream for processing without analyzing the data characters of the outgoing byte stream; andperforming a processing operation on the selectively targeted field; andwherein the selectively targeting and performing steps are performed by a processor downstream from the reconfigurable logic device. 29. The method of claim 7 wherein the reconfigurable logic device comprises a field programmable gate array (FPGA). 30. An apparatus comprising: a reconfigurable logic device configured to (1) receive an incoming stream comprising a plurality of bytes arranged in a delimited data format, the incoming byte stream being representative of data arranged in a plurality of fields, wherein the received byte stream comprises a plurality of data characters, a plurality of field delimiter characters, and a plurality of shield characters,the field delimiter characters defining a plurality of boundaries between the fields, (2) process the received byte stream to (i) distinguish between field delimiter characters and data characters in the received byte stream based on the shield characters to identify the field delimiter characters that are present in the received byte stream, and (ii) identify the fields in the received byte stream based on the identified field delimiter characters, and (3) translate the received byte stream to an outgoing byte stream arranged in a fixed field format based on the identified field delimiter characters, the outgoing byte stream comprising a plurality of the data characters of the received byte stream arranged in a plurality of fixed-size fields, wherein as part of the translation the reconfigurable logic device is further configured to arrange the data characters sharing the same identified field into the fixed-size fields such that the outgoing byte stream comprises the data characters in the fixed-size fields stripped of the field delimiter characters and the shield characters. 31. The apparatus of claim 30 wherein the reconfigurable logic device comprises: a plurality of hardware logic circuits arranged in a pipeline, the hardware logic circuits configured to operate simultaneously in a pipelined fashion, the pipeline configured to perform the receive, process, and translate operations. 32. The apparatus of claim 31 wherein the pipeline comprises: a first hardware logic circuit configured to convert the received byte stream to an internal variable format having associated control data to identify records and fields in the data;a second hardware logic circuit downstream from the first hardware logic circuit, the second hardware logic circuit configured to remove shield characters from the data in the internal variable format; anda third hardware logic circuit downstream from the second hardware logic circuit, the third hardware logic circuit configured to convert the data in the variable format into the outgoing byte stream in the fixed field format. 33. The apparatus of claim 32 wherein the first hardware logic circuit is further configured to simultaneously test the same portion of the received byte stream to determine whether the tested stream portion comprises record delimiters or field delimiters. 34. The apparatus of claim 31 wherein the pipeline is further configured to ingest and process a plurality of characters of the received byte stream per clock cycle. 35. The apparatus of claim 32 wherein the received byte stream is further representative of a plurality of fields arranged in a plurality of records and further comprises a plurality of record delimiter characters, the record delimiter characters defining a plurality of boundaries between the records; and wherein the first hardware logic circuit is further configured to strip the field delimiter characters and record delimiter characters of the incoming stream from the converted data while preserving data characters of incoming fields in the converted data. 36. The apparatus of claim 35 wherein the first hardware logic circuit is further configured to simultaneously test the same characters of the incoming stream to determine whether the tested characters are record delimiter characters or field delimiter characters. 37. The apparatus of claim 30 wherein the reconfigurable logic device comprises: a variable record gate hardware logic circuit configured to convert the received byte stream to an internal variable format having associated control data to identify records and fields in the received byte stream. 38. The apparatus of claim 30 wherein the reconfigurable logic device comprises: a shield character masker hardware logic circuit configured to mask fields of the received byte stream that are wrapped by shield characters. 39. The apparatus of claim 38 wherein the reconfigurable logic device further comprises: a delimiter finder hardware logic circuit downstream from the shield character masker hardware logic circuit, the delimiter finder hardware logic circuit configured to detect delimiter characters in the received byte stream based on a mask generated by the shield character masker hardware logic circuit. 40. The apparatus of claim 39 wherein the delimiter characters comprise field delimiter characters. 41. The apparatus of claim 39 wherein the delimiter characters comprise record delimiter characters. 42. The apparatus of claim 30 wherein the reconfigurable logic device comprises: a delimiter finder hardware logic circuit configured to detect delimiter characters in the received byte stream. 43. The apparatus of claim 42 wherein the delimiter characters comprise field delimiter characters. 44. The apparatus of claim 42 wherein the delimiter characters comprise record delimiter characters. 45. The apparatus of claim 42 wherein the delimiter characters comprise field delimiter characters and record delimiter characters; and wherein the delimiter finder hardware logic circuit is further configured to simultaneously test whether the same portion of the received byte stream includes field delimiter characters or record delimiter characters. 46. The apparatus of claim 30 further comprising: a processor, the processor configured to (1) selectively target a field of the outgoing byte stream for processing without analyzing the data characters of the outgoing byte stream, and (2) perform a processing operation on the selectively targeted field. 47. The apparatus of claim 30 wherein the reconfigurable logic device comprises a field programmable gate array (FPGA). 48. A method comprising: a reconfigurable logic device receiving an incoming stream comprising a plurality of bytes arranged in a delimited data format, the incoming byte stream being representative of data arranged in a plurality of fields, wherein the received byte stream comprises a plurality of data characters, a plurality of field delimiter characters, and a plurality of shield characters, the field delimiter characters defining a plurality of boundaries between the fields;the reconfigurable logic device processing the received byte stream to identify the field delimiter characters that are present in the received byte stream, wherein the processing step comprises: the reconfigurable logic device distinguishing between field delimiter characters and data characters in the received byte stream based on the shield characters; andthe reconfigurable logic device identifying the fields in the received byte stream based on the identified field delimiter characters; andthe reconfigurable logic device translating the received byte stream to an outgoing byte stream based on the identified field delimiter characters, the outgoing byte stream arranged in a structured format and being representative of the data in the fields of the received byte stream, the outgoing byte stream comprising a plurality of the data characters of the received byte stream, the structured format being configured to permit a downstream processing component to jump from field to field in the outgoing byte stream without analyzing the data characters of the outgoing byte stream, and wherein the translating step comprises the reconfigurable logic device arranging the data characters sharing the same identified field into fields of the structured format such that the outgoing byte stream comprises the data characters in the fields of the structured format stripped of the field delimiter characters and the shield characters. 49. The method of claim 48 wherein the incoming byte stream is further representative of a plurality of records, at least one of the records comprising at least one of the fields, the incoming byte stream further comprising a plurality of record delimiter characters, the record delimiter characters defining a plurality of boundaries between the records; wherein the processing step further comprises the reconfigurable logic device identifying the record delimiter characters that are present in the received byte stream; andwherein the translating step further comprises the reconfigurable logic device translating the received byte stream to the outgoing byte stream having the structured format based on the identified field delimiter characters and the identified record delimiter characters. 50. The method of claim 48 wherein the structured format is further configured to permit the downstream processing component to jump from record to record in the outgoing byte stream without analyzing the data characters of the outgoing byte stream. 51. The method of claim 49 wherein the translating step further comprises the reconfigurable logic device removing the identified record delimiter characters from the outgoing byte stream. 52. The method of claim 48wherein the processing step further comprises the reconfigurable logic device identifying the shield characters that are present in the received byte stream. 53. The method of claim 52 wherein the translating step further comprises the reconfigurable logic device removing the identified shield characters from the outgoing byte stream. 54. The method of claim 48 wherein the translating step further comprises removing the identified field delimiter characters from the outgoing byte stream. 55. The method of claim 48 further comprising the reconfigurable logic device providing the outgoing byte stream to the downstream processing component for processing thereby. 56. The method of claim 48 further comprising the downstream processing component performing a plurality of processing operations on the outgoing byte stream to generate processed data from the outgoing byte stream. 57. The method of claim 56 wherein the processing operations include a plurality of extract, transfer, and load (ETL) database operations. 58. The method of claim 56 wherein the processing operations comprise a plurality of data validation operations. 59. The method of claim 48 further comprising the reconfigurable logic device translating the processed data back to the delimited data format of the received byte stream. 60. The method of claim 48 wherein the downstream processing component is implemented on the reconfigurable logic device. 61. The method of claim 48 wherein the downstream processing component is implemented in software on a processor. 62. The method of claim 48 wherein the delimited data format comprises a comma separated value (CSV) format. 63. The method of claim 48 wherein the structured format comprises a fixed field format. 64. The method of claim 48 wherein the reconfigurable logic device performs the processing and translating steps for a plurality of characters in the byte stream per clock cycle. 65. The method of claim 48 further comprising: a processor downstream from the reconfigurable logic device (1) selectively jumping to a field in the outgoing byte stream without analyzing the data characters of the outgoing byte stream, (2) performing a processing operation on that field after jumping thereto, and (3) repeating the selectively jumping and performing steps for a plurality of the fields in the outgoing byte stream. 66. The method of claim 48 wherein the reconfigurable logic device comprises a field programmable gate array (FPGA).
※ AI-Helper는 부적절한 답변을 할 수 있습니다.