Table boundary detection in data blocks for compression
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-007/00
G06F-017/30
H03M-007/30
출원번호
US-0847478
(2015-09-08)
등록번호
US-9514179
(2016-12-06)
발명자
/ 주소
Amit, Jonathan
Demidov, Lilia
Halowani, Nir
출원인 / 주소
INTERNATIONAL BUSINESS MACHINES CORPORATION
대리인 / 주소
Griffiths & Seaton PLLC
인용정보
피인용 횟수 :
0인용 특허 :
4
초록▼
Data is converted into a minimized data representation using a suffix tree by sorting data streams according to symbolic representations for building table boundary formation patterns. The converted data is fully reversible for reconstruction while retaining minimal header information. A scanning op
Data is converted into a minimized data representation using a suffix tree by sorting data streams according to symbolic representations for building table boundary formation patterns. The converted data is fully reversible for reconstruction while retaining minimal header information. A scanning operation is performed by searching a suffix of each of the sorted data streams for identifying a data sequence that includes a first symbol representing textual data, and a second symbol representing numerical data. The suffix tree for the converted data is then built.
대표청구항▼
1. A system of identifying table boundaries in data blocks for compression in a computing environment, the system comprising: a processor device, operable in the computing environment, wherein the processor device:converting data into a minimized data representation using a suffix tree by sorting da
1. A system of identifying table boundaries in data blocks for compression in a computing environment, the system comprising: a processor device, operable in the computing environment, wherein the processor device:converting data into a minimized data representation using a suffix tree by sorting data streams according to a plurality of symbolic representations for building table boundary formation patterns, wherein the converted data is fully reversible for reconstruction while retaining minimal header information; andperforming a scanning operation according to each of the following:searching a suffix of each of the sorted data streams for identifying a data sequence that includes a first symbol representing textual data and a second symbol representing numerical data,skips a data that only includes a third symbol until identifying a next data sequence that includes the first and the second symbol representing the textual and the numerical data, andbuilding the suffix tree for the converted data. 2. The system of claim 1, wherein a delimiters used for separation is represented by the third symbol. 3. The system of claim 1, further including eliminating each scan-order not matching the searching and the skipping. 4. The system of claim 2, wherein the textual data is a sequence of characters not included in a delimiters list and a digits list, and the numerical data is a sequence of digit characters not included in the delimiters list. 5. The system of claim 1, further including, in conjunction with the sorting, matching together those of the table boundary formation patterns that are similar for identifying a longest minimized data representation the table boundary formation pattern. 6. The system of claim 5, further including, performing at least one of: reordering each of the table boundary formation patterns to form an output data file, andadding to a header of the output data file at least the table boundary formation patterns, a number of rows of the output data file, and the third symbol used for separation of the table boundary formation patterns. 7. The system of claim 5, further including, in conjunction with the matching, searching for node branches within the suffix tree. 8. The system of claim 1, further including, performing the converting for a plurality of data type blocks. 9. A computer program product for identifying table boundaries in data blocks for compression by a processor device, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising: a first executable portion that converts data into a minimized data representation using a suffix tree by sorting data streams according to a plurality of symbolic representations for building table boundary formation patterns, wherein the converted data is fully reversible for reconstruction while retaining minimal header information; anda second executable portion that performs a scanning operation according to each of the following:searches a suffix of each of the sorted data streams for identifying a data sequence that includes a first symbol representing textual data and a second symbol representing numerical data,skips a data that only includes a third symbol until identifying the next data sequence that includes the first and the second symbol representing the textual and the numerical data, andbuilds the suffix tree for the converted data. 10. The computer program product of claim 9, wherein textual data is represented by a first symbol, numerical data is represented with a second symbol, and a delimiters used for separation is represented by the third symbol. 11. The computer program product of claim 9, further including a third executable portion that eliminates each scan-order not matching the searching and the skipping. 12. The computer program product of claim 10, wherein the textual data is a sequence of characters not included in a delimiters list and a digits list, and the numerical data is a sequence of digit characters not included in the delimiters list. 13. The computer program product of claim 9, further including the third executable portion that, in conjunction with the sorting, matches together those of the table boundary formation patterns that are similar for identifying a longest minimized data representation table boundary formation pattern. 14. The computer program product of claim 13, further including a fourth executable portion that: reorders each of the table boundary formation patterns to form an output data file, and adds to a header of the output data file at least the table boundary formation patterns, a number of rows of the output data file, and the third symbol used for separation of the table boundary formation patterns. 15. The computer program product of claim 13, further including the fourth executable portion that, in conjunction with the matching, searches for node branches within the suffix tree. 16. The computer program product of claim 9, further including the third executable portion that performs the converting for a plurality of data type blocks.
Iyer Balakrishna R. (Mountain View CA) Langdon ; Jr. Glen G. (Aptos CA) Zandi Ahmad (Santa Cruz CA), Sort order preserving method for data storage compression.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.