Enhanced data compression for sparse multidimensional ordered series data
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
H03M-007/34
H03M-007/30
H03M-007/24
H03M-007/42
출원번호
US-0877875
(2015-10-07)
등록번호
US-9385751
(2016-07-05)
발명자
/ 주소
Kletter, Doron
출원인 / 주소
Protein Metrics Inc.
대리인 / 주소
Shay Glenn LLP
인용정보
피인용 횟수 :
3인용 특허 :
25
초록▼
Disclosed are methods and systems for significantly compressing sparse multidimensional ordered series data comprised of indexed data sets, wherein each data set comprises an index, a first variable and a second variable. The methods and systems are particularly suited for compression of data record
Disclosed are methods and systems for significantly compressing sparse multidimensional ordered series data comprised of indexed data sets, wherein each data set comprises an index, a first variable and a second variable. The methods and systems are particularly suited for compression of data recorded in double precision floating point format.
대표청구항▼
1. A computer-implemented method of compressing a sparse multidimensional ordered series of spectroscopic data, the method comprising: a) receiving the sparse multidimensional ordered series data, wherein the data comprise indexed data sets, each indexed data set comprising an index (n), a first var
1. A computer-implemented method of compressing a sparse multidimensional ordered series of spectroscopic data, the method comprising: a) receiving the sparse multidimensional ordered series data, wherein the data comprise indexed data sets, each indexed data set comprising an index (n), a first variable (xn) representing a mass to charge ratio (m/z), and a second variable (yn) representing signal intensity;b) defining a predictor that calculates each first variable (xn) as a function of the index (n);c) assigning an amplitude code word to each yn based on the value and frequency of the yn;d) calculating a hop offset value (Δn) for each yn;e) assigning a hop code word to each Δn based on the value and frequency of the Δn; andf) generating a compressed output, said compressed output comprising: i) a decoder legend comprising: a description of the predictor;a reverse amplitude code word dictionary associated with yn; anda reverse hop code word dictionary associated with Δn; andii) code word data comprising an amplitude code word and a hop code word for each yn and each Δn, respectively. 2. The method of claim 1, wherein the sparse multidimensional ordered series data is in double precision floating point format. 3. The method of claim 1, wherein the multidimensional ordered series data contain values that fall within a dynamic range of less than 10 orders of magnitude. 4. The method of claim 1, wherein the sparse multidimensional ordered series data comprises a plurality of indexed x,y pairs. 5. The method of claim 1, wherein the predictor is a global predictor function. 6. The method of claim 5, wherein the global predictor is an nth order polynomial function. 7. The method of claim 6, wherein the function is g(n)=a0+a1*n+a2*n2+a3*n3. 8. The method of claim 1, wherein the predictor is a piecewise predictor. 9. The method of claim 1, wherein the predictor is a local predictor. 10. The method claim 1, wherein the predictor further comprises an error correction mechanism. 11. The method of claim 1, wherein the second variable yn data is comprised of a sequence of variable amplitude measurements interspaced with intervals of relatively quiet periods during which the yn data remains moderately constant and primarily dominated by noise. 12. The method of claim 1, wherein the second variable yn data is comprised of a non-uniform multi-modal distribution of amplitude ranges, where certain amplitude ranges that occur frequently are interspaced with other amplitude ranges that occur much less frequently. 13. The method of claim 12, wherein the second variable yn data is comprised of a discrete set of observable amplitude ranges interspaced with intervals of amplitude ranges that are not observed in the data. 14. The method of claim 1, wherein assigning an amplitude code word to each yn based on the value and frequency of the yn comprises: i) generating a hash table for amplitude values;ii) looking up each of the second variable (yn) value in turn, wherein if the yn value is not previously seen, then the yn value is added to a list of amplitude values and an associated frequency occurrence is set to one, andwherein if the yn is already present on the list of amplitude values, then the associated frequency occurrence is incremented by one;iii) sorting the list of amplitude values by their associated frequency occurrence;iv) assigning a unique amplitude code word to each unique amplitude value in the list of amplitude values, wherein the shortest code words are assigned to the most frequently occurring amplitude values. 15. The method of claim 14, wherein any second variable (yn) value less than or equal to a baseline threshold is skipped. 16. The method of claim 1, wherein the sparse multidimensional ordered series data describe a non-uniform multi-modal distribution of hop Δn ranges, where certain hop ranges that are frequently and considerably more likely to occur are interspaced with other hop ranges that are much less likely to occur. 17. The method of claim 16, wherein the hop offset values are comprised of a discrete set of observable amplitude ranges interspaced with intervals of amplitude ranges that are not observed in the data. 18. The method of claim 1, wherein calculating a hop offset value (Δn) for each yn comprises: i) identifying an initial hop offset value (Δ0) and entering the Δ0 into a previous register as a previous peak location;ii) feeding each index (n) into the previous register subtracting the previous peak location from the index (n) to calculate the hop offset value (Δn) and then replacing the previous peak location with the index (n);iii) repeating step ii) for each index (n) in the sparse multidimensional ordered series data. 19. The method of claim 1, wherein calculating a hop offset value (Δn) for each yn comprises: i) identifying an initial hop offset value (Δ0) and entering the Δ0 into a previous register as a previous peak location;ii) feeding each first variable (xn) value into the previous register subtracting the previous peak location from the first variable (xn) value to calculate the hop offset value (Δn) and then replacing the previous peak location with the first variable (xn) value;iii) repeating step ii) for each first variable (xn) value in the sparse multidimensional ordered series data. 20. The method of claim 1, wherein assigning a hop code word to each Δn based on the value and frequency of the Δn comprises: i) generating a hash table for hop offset values;ii) looking up each hop offset value (Δn) value in turn, wherein if the Δn value is not previously seen, then the Δn value is added to a list of hop values and an associated frequency occurrence is set to one, andwherein if the Δn is already present on the list of hop values, then the associated frequency occurrence is incremented by one;iii) sorting the list of hop values by their associated frequency occurrence;iv) assigning a unique hop code word to each unique hop value in the list of hop values, wherein the shortest code words are assigned to the most frequently occurring hop values. 21. A non-transitory computer readable medium having instructions stored therein, which, when executed by a process, cause the processor to perform operations, the operations comprising: receiving sparse multidimensional ordered series data, wherein the data comprise indexed data sets, each indexed data set comprising an index (n), a first variable (xn) and a second variable (yn);defining a predictor that calculates each first variable (x11) as a function of the index (n);assigning an amplitude code word to each yn based on the value and frequency of the yn;calculating a hop value (Δn) for each yn and assigning a hop code word to each Δn based on the value and frequency of the Δn; andgenerating a compressed output, said compressed output comprising: a decoder legend comprising: a description of the predictor;a reverse amplitude code word dictionary associated with yn; anda reverse hop code word dictionary associated with Δn; andcode word data comprising an amplitude code word and a hop code word for each yn and each Δn, respectively. 22. An ordered series data encoder comprising a data receiver for receiving sparse multidimensional ordered series data, wherein the data comprise indexed data sets, each indexed data set comprising an index (n), a first variable (xn) and a second variable (yn);a predictor that predicts each first variable (xn) as a function of the index (n);an amplitude coder that assigns an amplitude code word to each yn based on the value and frequency of the yn;a hop coder that calculates a hop value (Δn) for each yn and assigns a hop code word to each Δn based on the value and frequency of the Δn; andan encoder that generates a compressed output, said compressed output comprising: a decoder legend comprising: a description of the predictor;a reverse amplitude code word dictionary associated with yn; anda reverse hop code word dictionary associated with Δn; andcode word data comprising an amplitude code word and a hop code word for each yn and each Δn, respectively. 23. The ordered series encoder of claim 22, wherein the ordered series is a time series. 24. A non-transitory readable medium comprising compressed sparse multidimensional ordered series data, said compressed data comprising: a decoder legend comprising: a description of a predictor, wherein the predictor calculates a first variable (xn) as a function of an index (n);a reverse amplitude dictionary, wherein the reverse amplitude dictionary includes a plurality of amplitude code words, wherein each amplitude code word is associated with an amplitude value; anda reverse hop offset dictionary, wherein the reverse hop offset dictionary associates a hop code word to each Δn based on the value and frequency of the Δn; andcode word data comprising a plurality of pairs of an amplitude code word and a hop code word, wherein each pair of an amplitude code word and a hop code word, is capable of being decompressed to an index (n), a first variable (xn) and a second variable (yn). 25. The non-transitory medium of claim 24, wherein the decoder legend further comprises an initial hop offset value (Δ0).
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (25)
Eastman Willard L. (Lexington MA) Lempel Abraham (Haifa ILX) Ziv Jacob (Haifa MA ILX) Cohn Martin (Arlington MA), Apparatus and method for compressing data signals and restoring the compressed data signals.
Qian, Shen-En; Hollinger, Allan B., Method and system for compressing a continuous data flow in real-time using recursive hierarchical self-organizing cluster vector quantization (HSOCVQ).
Frossard,Pascal; Vandergheynst,Pierre; Verscheure,Olivier, System and method for encoding three-dimensional signals using a matching pursuit algorithm.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.