[특허]Method for searching content particularly for extracts common to two computer files

Method for searching content particularly for extracts common to two computer files 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-007/00
출원번호	UP-0599655 (2005-03-18)
등록번호	US-7599927 (2009-10-20)
우선권정보	FR-04 03556(2004-04-05)
국제출원번호	PCT/FR05/000673 (2005-03-18)
§371/§102 date	20070717 (20070717)
국제공개번호	WO05/101292 (2005-10-27)
발명자 / 주소	Lebrat, François
출원인 / 주소	Francois Lebrat & TDF
대리인 / 주소	Miller, Matthias & Hull
인용정보	피인용 횟수 : 18 인용 특허 : 5

초록 ▼

The present invention relates to searching content, particularly for at least one extract common to a first data file and a second data file. The method comprises a preliminary step of preparing at least the first file by (a) dividing the first file into a series of data packets having a predetermined size, and identifying packet addresses in said file, (b) combining each packet address with a digital signature that defines one of three fuzzy logic states, namely true, false and indeterminate, and is the result of a combinatorial computation on data from said file; whereafter said method comprises performing an actual search for a common extract by (c) comparing the fuzzy logic states combined with each packet address of the first file with fuzzy logic states determined on the basis of data from the second file, and (d) removing from said common extract search the respective address pairs from the first and second files that have the respective logic states true and false or false and true, and retaining the other address pairs that identify data packets that may comprise said common extract.

대표청구항 ▼

The invention claimed is: 1. A method implemented by computer means of searching content, for at least one extract common to a first file and to a second file, in the form of binary data, being searched for, wherein the method comprises: a prior preparation of the first file at least, comprising the following steps: a) segmenting the first file into a succession of data packets, of chosen size, and identifying addresses of packets in said file, b) associating with the address of each packet a digital signature defining a fuzzy logic state from among at least three states: "true", "false" and "undetermined", said signature resulting from a combinatorial calculation on data emanating from said file, and the search for common extract, itself, comprising the following steps: c) comparing the fuzzy logic states associated with each packet address of the first file, with fuzzy logic states determined on the basis of data emanating from the second file, d) eliminating from said search for common extract, pairs of respective addresses of the first and second files whose respective logic states are "true" and "false" or "false" and "true", and preserving the other pairs of addresses identifying data packets liable to comprise said common extract. 2. The method as claimed in claim 1, wherein, in step b), a data packet is assigned the state: "true" if all the data of the packet satisfy a first condition, "false" if all the data of the packet satisfy a second condition, contrary to the first condition, and "undetermined" if certain data of the packet satisfy the first condition, while other data of the packet satisfy the second condition. 3. The method as claimed in claim 1, wherein a processing prior to step b) is applied to the data of a file, said processing comprising the following steps: a1) the data of the file are considered as a string of samples obtained at a predetermined sampling frequency Fe, and of values coded according to a binary representation code, and a2) a digital filter is applied to said samples fn, said filter being adapted to minimize a probability of obtaining the "undetermined" state for the digital signatures associated with the packets of samples. 4. The method as claimed in claim 3, wherein the application of said digital filter amounts to: applying a spectral transform to the sampled data, applying a low-pass filter to said spectral transform, and applying an inverse spectral transform after said low-pass filter. 5. The method as claimed in claim 4, wherein the low-pass filter operates on a frequency band comprising substantially the interval: [-Fe/2(k-1),+Fe/2(k-1)], where Fe is said sampling frequency, and k is the number of samples that a packet comprises. 6. The method as claimed in claim 4, wherein said digital filter comprises a predetermined number of coefficients of like value, and wherein the frequency response of the associated low-pass filter is expressed, as a function of frequency f, by an expression of the type: sin(PI·f·T)/(PI·f·T), where sin( ) is the sine function, and with: PI=3.1416, and T=(K-1)/Fe where K is said predetermined number of coefficients and Fe said sampling frequency. 7. The method as claimed in claim 6, wherein said predetermined number of coefficients of the filter (2K+1) is greater than or equal to 2k-1, where k is the number of samples that a packet comprises. 8. The method as claimed in claim 3, wherein: said digital filter is a mean value filter of a predetermined number (2K+1) of coefficients, the difference between two successive filtered samples (rn+1-rn) is proportional to the difference between two unfiltered samples (fn+K+1-fn-K), respectively of a first rank and of a second rank, said ranks being spaced apart by said predetermined number of coefficients, and the calculation of said filtered samples is performed by utilizing this relation to reduce the number of calculation operations to be performed. 9. The method as claimed in claim 3, wherein: the "true" state is assigned to the address of a packet if, for this packet, all the filtered samples have a value greater than a chosen reference value, the "false" state is assigned to the address of a packet if, for this packet, all the filtered samples have a value less than a chosen reference value, and the "undetermined" state is assigned to the address of a packet if, for this packet, the filtered samples have, for certain of them, a value less than said reference value, and, for other filtered samples, a value greater than said reference value. 10. The method as claimed in claim 9, wherein, for any filtered sample rn of given order n, said reference value Vref is calculated by averaging the values of the unfiltered samples fk over a chosen number of unfiltered consecutive samples Kref about an unfiltered sample fn of the same given order n. 11. The method as claimed in claim 10, wherein the values of the filtered samples are made relative, for comparison, to a zero threshold value, and wherein said filtered samples r'n are expressed by a sum of the type: fn+k are unfiltered samples obtained in step a1), K is the number of coefficients of the digital filter, preferably chosen to be even, and Kref is said number of unfiltered samples around an unfiltered sample fn, preferably chosen to be even and greater than said number of coefficients K. 12. The method as claimed in claim 11, wherein said sum is applied to the unfiltered samples fn a plurality of times, according to a processing performed in parallel, while respectively varying the number of coefficients K. 13. The method as claimed in claim 3, wherein each filtered sample rn is expressed as a sum of the type: f(n+i) are unfiltered samples, filteri are coefficients of a digital filter, integrating, as the case may be, a threshold value referred to zero, and wherein a number k of unfiltered samples that a packet comprises is chosen, at minimum equal to 2 and less than or equal to an expression of the type: (TEF-I1-I2+1)/2, where TEF is a desired minimum size of the common extracts searched for. 14. The method as claimed in claim 13, wherein: for a given value TEF of the desired minimum size of common extracts searched for, a span of usable values for said number k of unfiltered samples that a packet comprises is determined, and, for each usable value of the number k, an optimal size TES is determined of a succession of data of digital signatures, for which succession the detection of a common extract of size TEF is guaranteed, and wherein said optimal size TES is less than or equal to an expression of the type: E[(TEF-I1-I2+1)/k]-1, where E(X) designates the integer part of X. 15. The method as claimed in claim 3, wherein the method comprises, for the first file: the sampling at a chosen sampling frequency, the digital filtering corresponding to a low-pass filtering in the frequency space, and the combination of the filtered samples to obtain digital signatures in the "true", "false" or "undetermined" state, associated with the respective addresses of the first file, while the method comprises, for the second file: the sampling at a chosen sampling frequency, the digital filtering corresponding to a low-pass filtering in the frequency space, and the logic state associated with each packet of filtered samples is determined on the basis of the logic state associated with a single filtered sample chosen born each packet, in such a way as to obtain digital signatures comprising only "true" or "false" logic states and thus to improve the selectivity of the comparison of the digital signatures. 16. The method as claimed in claim 15, wherein, if the logic state associated with an address of the first file is "true" or "undetermined", while the logic state associated with an address of the second file is "true", the pair of said addresses is retained born the search of common extract, if the logic state associated with an address of the first file is "false" or "undetermined", while the logic state associated with an address of the second file is "false", the pair of said addresses is retained for the search for common extract, while the other pairs of addresses are excluded born the search. 17. The method as claimed in claim 1, wherein the fuzzy states associated with the first file at least are each coded on at least two bits. 18. The method as claimed in claim 17, wherein the fuzzy states determined for a least number of coefficients K are coded on least significant bits and the fuzzy states determined for a larger number of coefficients K are coded on subsequent bits, up to a chosen total number of bits. 19. The method as claimed in claim 1, the two files to be compared comprising data representative of alphanumeric characters, and in particular representative of a text and/or a computer or a genetic code, wherein the method comprises: a first group of steps comprising the formation of the digital signatures and their comparison, for a coarse search, and a second group of steps comprising an identicalness comparison in the spans of addresses satisfying the coarse comparison, and in that the data of a file are considered as a string of samples, with a chosen number k of samples per packet, and wherein the value of this chosen number k is optimized initially by searching for a minimum of comparison operations to be performed. 20. The method as claimed in claim 19, wherein, for the optimization of the chosen number k of samples per packet, account is taken of a total number: of operations of comparison of digital signatures to be performed, and of operations of identicalness comparison of data to be performed thereafter, and wherein said total number of operations is a minimum for a finite set of numbers k. 21. The method as claimed in claim 19, wherein an information relating to a minimum desired size of common extracts searched for is obtained, and used to optimize said chosen number k of samples per packet, and wherein the optimal number k of samples per packet varies substantially as said minimum size so that the larger desired minimum size of common extracts searched for, the shorter the duration of the search for common extract. 22. The method as claimed in claim 1, wherein it comprises the search for common extracts consists of a single group of steps comprising the formation of the digital signatures and their comparison, and wherein the number of data items per packet is optimized by initially fixing a confidence index characterizing an acceptable threshold of probability of false detection of common extracts. 23. The method as claimed in claim 22, the first and second files being files of samples of digitized signals, wherein the method comprises a step of preprocessing of the data and a taking into account of the data associated with portions of signal of level greater than a noise reference. 24. The method as claimed in claim 22, the first and second files are files of samples of digitized signals, wherein the method provides for a step of consolidation of the search results, preferably by adjustment of relative sizes of the packets of the first and second files, in such a way as to tolerate a discrepancy in respective speeds of retrieval of the first and second files. 25. The method as claimed in claim 1, wherein one at least of the first and second files is a data stream, and wherein the method of searching for common extracts is executed in real time. 26. A computer program product, stored in a memory of a central unit of a computer or on a removable medium to cooperate with a reader of said central unit, the program comprising instructions for conducting steps for searching at least one extract common to a first file and to a second file, in the form of binary data, said steps comprising: a prior preparation of the first file at least, including the following steps: a) segmenting the first file into a succession of data packets, of chosen size, and identifying addresses of packets in said file, b) associating with the address of each packet a digital signature defining a fuzzy logic state from among at least three states: "true", "false" and "undetermined", said signature resulting from a combinatorial calculation on data emanating from said file, and the search for common extract, itself, including the following steps: c) comparing the fuzzy logic states associated with each packet address of the first file, with fuzzy logic states determined on the basis of data emanating from the second file, d) eliminating from said search for common extract, pairs of respective addresses of the first and second files whose respective logic states are "true" and "false" or "false" and "true", and preserving the other pairs of addresses identifying data packets liable to comprise said common extract. 27. A computer device, comprising a memory for storing at least first and second files, for the search for at least one extract common to the first file and the second file, wherein it comprises a memory suitable for storing the instructions of a computer program product as claimed in claim 26. 28. A computer installation, comprising: a first computer entity suitable for storing a first file, a second computer entity suitable for storing a second file, and means of communications between the first and second computer units, wherein at least one of said entities comprises a memory suitable for storing the computer program product as claimed in claim 26, for the search of extract common to the first and second files. 29. The installation as claimed in claim 28, wherein the entity storing the computer program product is designed to perform a remote update of one of the first and second files with respect to the other of the first and second files. 30. A data structure stored on a computer-readable storage medium used for a search of at least one extract common to a first and a second file, the data structure being representative of the first file, the data structure being obtained by the implementation of the following steps: a) segmenting the first file into a succession of data packets of chosen size, and identifying addresses of packets in said file, b) associating with the address of each packet a digital signature defining a fuzzy logic state from among at least three states: "true", "false" and "undetermined", said signature resulting from a combinatorial calculation on data emanating from said file, the data structure thus comprising a succession of addresses identifying addresses of the first file, a fuzzy logic state from among the states: "true", "false" and "undetermined", being assigned to each of said addresses of the data structure, c) comparing the fuzzy logic states associated with each packet address of the first file, with fuzzy logic states determined on the basis of data emanating from the second file, d) eliminating from said search for common extract, pairs of respective addresses of the first and second files whose respective logic states are "true" and "false" or "false" and "true", and preserving the other pairs of addresses identifying data packets liable to comprise said common extract.

이 특허에 인용된 특허 (5)

Pai Hsin-Ying,TWX ; Hou Chien-Tzu, Computer system having a genetic code that cannot be directly accessed and a method of maintaining the same.
상세보기
Xu, Changsheng; Wu, Jiankang; Sun, Qibin; Xin, Kai; Li, Haizhou, Digital audio watermarking using content-adaptive, multiple echo hopping.
상세보기
Braudaway Gordon Wesley ; Mintzer Frederick Cole, Protecting images with an image watermark.
상세보기
Kirovski, Darko; Malvar, Henrique, Watermark detection via cardinality-scaled correlation.
상세보기
Girod Bernd,DEX ; Hartung Frank H. D.,DEX, Watermarking method and apparatus for compressed digital video.
상세보기

이 특허를 인용한 특허 (18)

Fraser, Jay; Weber, Lawrence, Authentication method and system.
상세보기
Cowburn, Russell Paul, Authenticity verification methods, products, and apparatuses.
상세보기
Cowburn, Russell Paul, Authenticity verification of articles using a database.
상세보기
Adams, Phillip M., Computerized, copy detection and discrimination apparatus and method.
상세보기
Cowburn, Russell Paul, Creating authenticatable printed articles and subsequently verifying them based on scattered light caused by surface structure.
상세보기
Hawkes, Philip; Gantman, Alexander; McDonald, Cameron A.; Wiggers De Vries, Miriam M.; Northway, Craig W.; Brown, Craig M., Method and apparatus for determining and utilizing value of digital assets.
상세보기
Tormasov, Alexander G., Method and system for data backup with capacity and traffic optimization.
상세보기
Cowburn, Russell Paul, Methods and apparatuses for creating authenticatable printed articles and subsequently verifying them.
상세보기
Cowburn, Russell Paul, Methods and apparatuses for creating authenticatable printed articles and subsequently verifying them.
상세보기
Cowburn, Russell Paul; Buchanan, James David Ralph; Seem, Peter Robert, Optical authentication.
상세보기
Cowburn, Russell Paul; Buchanan, James David Ralph, Optimisation.
상세보기
Cowburn, Russell Paul, Self-calibration.
상세보기
Cowburn, Russell Paul; Seem, Peter, Signature generation for use in authentication and verification using a non-coherent radiation source.
상세보기
Cowburn, Russell Paul, System and method for article authentication using blanket illumination.
상세보기
Cowburn, Russell Paul, System and method for article authentication using encoded signatures.
상세보기
Cowburn, Russell Paul, System and method for article authentication using signatures.
상세보기
Cowburn, Russell Paul, System and method for article authentication using thumbnail signatures.
상세보기
Cowburn, Russell Paul; Buchanan, James David Ralph, Verification of authenticity.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Method for searching content particularly for extracts common to two computer files 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (5)

이 특허를 인용한 특허 (18)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Method for searching content particularly for extracts common to two computer files 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (5)

이 특허를 인용한 특허 (18)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트