Enhanced data compression for sparse multidimensional ordered series data
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
H03M-007/34
H03M-007/30
H03M-007/24
H03M-007/42
출원번호
US-0200494
(2016-07-01)
등록번호
US-9571122
(2017-02-14)
발명자
/ 주소
Kletter, Doron
출원인 / 주소
Protein Metrics Inc.
대리인 / 주소
Shay Glenn LLP
인용정보
피인용 횟수 :
2인용 특허 :
28
초록▼
Disclosed are methods and systems for significantly compressing sparse multidimensional ordered series data comprised of indexed data sets, wherein each data set comprises an index, a first variable and a second variable. The methods and systems are particularly suited for compression of data record
Disclosed are methods and systems for significantly compressing sparse multidimensional ordered series data comprised of indexed data sets, wherein each data set comprises an index, a first variable and a second variable. The methods and systems are particularly suited for compression of data recorded in double precision floating point format.
대표청구항▼
1. A computer-implemented method of compressing a sparse multidimensional ordered series of spectroscopic data, the method comprising: a) receiving the sparse multidimensional ordered series data containing values that fall within a dynamic range of less than 10 orders of magnitude, wherein the data
1. A computer-implemented method of compressing a sparse multidimensional ordered series of spectroscopic data, the method comprising: a) receiving the sparse multidimensional ordered series data containing values that fall within a dynamic range of less than 10 orders of magnitude, wherein the data comprise indexed data sets, each indexed data set comprising an index (n), a first variable (xn) representing a mass to charge ratio (m/z), and a second variable (yn) representing signal intensity;b) defining a predictor that calculates each first variable (xn);c) assigning an amplitude code word to each yn;d) calculating a hop offset value (Δn) for each yn;e) assigning a hop code word to each Δn based on the value of the Δn; andf) generating a compressed output, said compressed output comprising: i) a decoder legend comprising: a reverse amplitude code word dictionary associated with yn; anda reverse hop code word dictionary associated with Δn; andii) code word data comprising an amplitude code word and a hop code word for each yn and each Δn. 2. The method of claim 1, wherein the sparse multidimensional ordered series data is in double precision floating point format. 3. The method of claim 1, wherein the sparse multidimensional ordered series data comprises a plurality of indexed x,y pairs. 4. The method of claim 1, wherein the predictor is a global predictor function. 5. The method of claim 4, wherein the global predictor is an nth order polynomial function. 6. The method of claim 5, wherein the function is g(n)=a0+a1*n+a2*n2+a3*n3. 7. The method of claim 1, wherein the predictor is a piecewise predictor. 8. The method of claim 1, wherein the predictor is a local predictor. 9. The method of claim 1, wherein the predictor further comprises an error correction mechanism. 10. The method of claim 1, wherein the second variable yn data is comprised of a sequence of variable amplitude measurements interspaced with intervals of relatively quiet periods during which the yn data remains moderately constant and primarily dominated by noise. 11. The method of claim 1, wherein the second variable yn data is comprised of a non-uniform multi-modal distribution of amplitude ranges, where certain amplitude ranges that occur frequently are interspaced with other amplitude ranges that occur much less frequently. 12. The method of claim 11, wherein the second variable yn data is comprised of a discrete set of observable amplitude ranges interspaced with intervals of amplitude ranges that are not observed in the data. 13. The method of claim 1, wherein assigning an amplitude code word to each yn comprises: i) generating a hash table for amplitude values;ii) looking up each of the second variable (yn) value in turn, wherein if the yn value is not previously seen, then the yn value is added to a list of amplitude values and an associated frequency occurrence is set to one, andwherein if the yn is already present on the list of amplitude values, then the associated frequency occurrence is incremented by one;iii) sorting the list of amplitude values by their associated frequency occurrence;iv) assigning a unique amplitude code word to each unique amplitude value in the list of amplitude values, wherein the shortest code words are assigned to the most frequently occurring amplitude values. 14. The method of claim 13, wherein any second variable (yn) value less than or equal to a baseline threshold is skipped. 15. The method of claim 1, wherein the sparse multidimensional ordered series data describe a non-uniform multi-modal distribution of hop Δn ranges, where certain hop ranges that are frequently and considerably more likely to occur are interspaced with other hop ranges that are much less likely to occur. 16. The method of claim 15, wherein the hop offset values are comprised of a discrete set of observable amplitude ranges interspaced with intervals of amplitude ranges that are not observed in the data. 17. The method of claim 1, wherein calculating a hop offset value (Δn) for each yn comprises: i) identifying an initial hop offset value (Δ0) and entering the Δ0 into a previous register as a previous peak location;ii) feeding each index (n) into the previous register subtracting the previous peak location from the index (n) to calculate the hop offset value (Δn) and then replacing the previous peak location with the index (n);iii) repeating step ii) for each index (n) in the sparse multidimensional ordered series data. 18. The method of claim 1, wherein calculating a hop offset value (Δn) for each yn comprises: i) identifying an initial hop offset value (Δ0) and entering the Δ0 into a previous register as a previous peak location;ii) feeding each first variable (xn) value into the previous register subtracting the previous peak location from the first variable (xn) value to calculate the hop offset value (Δn) and then replacing the previous peak location with the first variable (xn) value;iii) repeating step ii) for each first variable (xn) value in the sparse multidimensional ordered series data. 19. The method of claim 1, wherein assigning a hop code word to each Δn based on the value and frequency of the Δn comprises: i) generating a hash table for hop offset values;ii) looking up each hop offset value (Δn) value in turn, wherein if the Δn value is not previously seen, then the Δn value is added to a list of hop values and an associated frequency occurrence is set to one, andwherein if the Δn is already present on the list of hop values, then the associated frequency occurrence is incremented by one;iii) sorting the list of hop values by their associated frequency occurrence;iv) assigning a unique hop code word to each unique hop value in the list of hop values, wherein the shortest code words are assigned to the most frequently occurring hop values.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (28)
Eastman Willard L. (Lexington MA) Lempel Abraham (Haifa ILX) Ziv Jacob (Haifa MA ILX) Cohn Martin (Arlington MA), Apparatus and method for compressing data signals and restoring the compressed data signals.
Subbaraman, Vignesh; Fuchs, Guillaume; Multrus, Markus; Rettelbach, Nikolaus; Gayer, Marc; Weiss, Oliver; Griebel, Christian; Warmbold, Patrick, Audio encoder, audio decoder, method for encoding and audio information, method for decoding an audio information and computer program using a hash table describing both significant state values and interval boundaries.
Qian, Shen-En; Hollinger, Allan B., Method and system for compressing a continuous data flow in real-time using recursive hierarchical self-organizing cluster vector quantization (HSOCVQ).
Koza John R. (25372 La Rena La. Los Altos CA 94022) Rice James P. (Redwood City CA), Non-linear genetic process for data encoding and for solving problems using automatically defined functions.
Frossard,Pascal; Vandergheynst,Pierre; Verscheure,Olivier, System and method for encoding three-dimensional signals using a matching pursuit algorithm.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.