최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
DataON 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Edison 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Kafe 바로가기국가/구분 | United States(US) Patent 등록 |
---|---|
국제특허분류(IPC7판) |
|
출원번호 | US-0839476 (2001-04-20) |
발명자 / 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 | 피인용 횟수 : 519 인용 특허 : 13 |
A method for recognizing an audio sample locates an audio file that most closely matches the audio sample from a database indexing a large set of original recordings. Each indexed audio file is represented in the database index by a set of landmark timepoints and associated fingerprints. Landmarks o
A method for recognizing an audio sample locates an audio file that most closely matches the audio sample from a database indexing a large set of original recordings. Each indexed audio file is represented in the database index by a set of landmark timepoints and associated fingerprints. Landmarks occur at reproducible locations within the file, while fingerprints represent features of the signal at or near the landmark timepoints. To perform recognition, landmarks and fingerprints are computed for the unknown sample and used to retrieve matching fingerprints from the database. For each file containing matching fingerprints, the landmarks are compared with landmarks of the sample at which the same fingerprints were computed. If a large number of corresponding landmarks are linearly related, i.e., if equivalent fingerprints of the sample and retrieved file have the same time evolution, then the file is identified with the sample. The method can be used for any type of sound or music, and is particularly effective for audio signals subject to linear and nonlinear distortion such as background noise, compression artifacts, or transmission dropouts. The sample can be identified in a time proportional to the logarithm of the number of entries in the database; given sufficient computational power, recognition can be performed in nearly real time as the sound is being sampled.
What is claimed is: 1. A method for recognizing a media entity from a media sample, comprising: computing a set of sample fingerprints, each sample fingerprint characterizing a particular sample landmark within said media sample; obtaining a set of file fingerprints, each file fingerprint character
What is claimed is: 1. A method for recognizing a media entity from a media sample, comprising: computing a set of sample fingerprints, each sample fingerprint characterizing a particular sample landmark within said media sample; obtaining a set of file fingerprints, each file fingerprint characterizing a particular file landmark within a media entity to be identified; generating correspondences between said sample landmarks and said obtained file landmarks, wherein corresponding landmarks have equivalent fingerprints; and identifying said media entity if a plurality of said corresponding landmarks are substantially linearly related. 2. The method of claim 1 wherein said sample landmarks are computed in dependence on said media sample. 3. The method of claim 1 wherein said each sample fingerprint represents one or more features of said media sample at or near said particular sample landmark. 4. The method of claim 1 wherein said sample fingerprints and said file fingerprints have numerical values. 5. The method of claim 1 wherein values of said sample fingerprints specify a method for computing said sample fingerprints. 6. The method of claim 1 wherein said media sample is an audio sample. 7. The method of claim 6 wherein said sample landmarks are timepoints within said audio sample. 8. The method of claim 7 wherein said timepoints occur at local maxima of spectral Lp norms of said audio sample. 9. The method of claim 6 wherein said sample fingerprints are computed from a frequency analysis of said audio sample. 10. The method of claim 6 wherein said sample fingerprints are selected from the group consisting of spectral slice fingerprints, LPC coefficients, and cepstral coefficients. 11. The method of claim 6 wherein said sample fingerprints are computed from a spectrogram of said audio sample. 12. The method of claim 11 wherein salient points of said spectrogram comprise time coordinates and frequency coordinates, and wherein said sample landmarks are computed from said time coordinates, and said sample fingerprints are computed from said frequency coordinates. 13. The method of claim 12, further comprising linking at least one of said salient points to an anchor salient point, wherein one of said sample landmarks is computed from a time coordinate of said anchor salient point, and a corresponding fingerprint is computed from frequency coordinates of at least one of said linked salient points and said anchor point. 14. The method of claim 13, wherein said linked salient points fall within a target zone. 15. The method of claim 14, wherein said target zone is defined by a time range. 16. The method of claim 14, wherein said target zone is defined by a frequency range. 17. The method of claim 14, wherein said target zone is variable. 18. The method of claim 13 wherein said corresponding fingerprint is computed from a quotient between two of said frequency coordinates of said linked salient points and said anchor point, whereby said corresponding fingerprint is time-stretch invariant. 19. The method of claim 13 wherein said corresponding fingerprint is further computed from at least one time difference between said time coordinate of said anchor point and said time coordinates of said linked salient points. 20. The method of claim 19, wherein said corresponding fingerprint is further computed from a product of one of said time differences and one of said frequency coordinates of said linked salient points and said anchor point, whereby said corresponding fingerprint is time-stretch invariant. 21. The method of claim 6 wherein said sample landmarks and said sample fingerprints are computed from salient points of a multidimensional function of said audio sample, wherein at least one of said dimensions is a time dimension and at least one of said dimensions is a non-time dimension. 22. The method of claim 21 wherein said sample landmarks are computed from said time dimensions. 23. The method of claim 21 wherein said sample fingerprints are computed from at least one of said non-time dimensions. 24. The method of claim 21 wherein said salient points are selected from the group consisting of local maxima, local minima, and zero crossings of said multidimensional function. 25. The method of claim 6 wherein said sample fingerprints are time-stretch invariant. 26. The method of claim 6 wherein each sample fingerprint is computed from multiple timeslices of said audio sample. 27. The method of claim 26 wherein said multiple timeslices are offset by a variable amount of time. 28. The method of claim 27 wherein each fingerprint is computed in part from said variable amount. 29. The method of claim 1 wherein said identifying step comprises locating a diagonal line within a scatter plot of said corresponding landmarks. 30. The method of claim 29 wherein locating said diagonal line comprises forming differences between said corresponding landmarks. 31. The method of claim 30 wherein locating said diagonal line further comprises sorting said differences. 32. The method of claim 30 wherein locating said diagonal line further comprises calculating the peak of a histogram of said differences. 33. The method of claim 1 wherein said identifying step comprises computing one of a Hough transform and a Radon transform of said correspondences. 34. The method of claim 33 wherein said identifying step further comprises locating a peak of said Hough transform. 35. The method of claim 1 wherein said identifying step comprises determining whether a number of said correspondences exceeds a threshold value. 36. The method of claim 1 further comprising: obtaining from a database index additional fingerprints characterizing file locations of additional media entities to be identified; generating additional correspondences between said sample landmarks and file landmarks of said additional media entities, wherein corresponding landmarks have equivalent fingerprints; and identifying media entities for which a plurality of said corresponding landmarks are substantially linearly related. 37. The method of claim 36 further comprising selecting a winning media entity from said identified media entities, wherein said winning media entity has a largest plurality of substantially linearly related corresponding landmarks. 38. The method of claim 36 wherein the step of identifying said media entities for which a plurality of said corresponding landmarks are substantially linearly related further comprises searching a first subset of said additional media entities. 39. The method of claim 38 wherein additional media entities in said first subset have a higher probability of being identified than additional media entities that are not in said first subset. 40. The method of claim 39 wherein said probability of being identified is computed in dependence on a recency of previous identification. 41. The method of claim 39 wherein said probability of being identified is computed in dependence on a frequency of previous identification. 42. The method of claim 38 wherein the step of identifying said media entities for which a plurality of said corresponding landmarks are substantially linearly related further comprises searching a second subset of said additional media entities, wherein no media entities in said first subset are identified. 43. The method of claim 36, further comprising ranking said additional media entities according to a probability of being identified. 44. The method of claim 43 wherein said probability is computed in part in dependence on a recency of previous identification. 45. The method of claim 44 wherein said probability is computed in part by increasing a recency score of a particular media entity when said particular media entity is identified. 46. The method of claim 44 wherein said probability is computed in part by decreasing recency scores of said additional media entities at regular time intervals. 47. The method of claim 46 wherein said recency scores are decreased exponentially in time. 48. The method of claim 43 wherein the step of identifying said media entities for which a plurality of said corresponding landmarks are substantially linearly related further comprises searching said additional media entities according to said ranking. 49. The method of claim 36 wherein the step of identifying said media entities for which a plurality of said corresponding landmarks are substantially linearly related further comprises terminating said search at a media entity having a number of said substantially linearly related corresponding landmarks that exceeds a predetermined threshold. 50. The method of claim 1 wherein said method is implemented in a distributed system. 51. The method of claim 50 wherein said computing step is performed in a client device, said obtaining, generating, and identifying steps are performed in a central location, and the method further comprises transmitting said sample fingerprints from said client device to said central location. 52. The method of claim 1, further comprising repeating said computing, obtaining, generating, and identifying steps for sequentially growing size of said media sample. 53. The method of claim 1, further comprising performing said obtaining, generating, and identifying steps at periodic intervals on a rolling buffer storing said computed sample fingerprints. 54. The method of claim 1, further comprising obtaining said media sample and simultaneously performing said computing step. 55. A method for recognizing a media entity from a media sample, comprising: receiving a set of sample fingerprints, each sample fingerprint characterizing a particular sample landmark within said media sample; obtaining a set of file fingerprints, each file fingerprint characterizing a particular file landmark within a media entity to be identified; generating correspondences between said sample landmarks and said obtained file landmarks, wherein corresponding landmarks have equivalent fingerprints; and identifying said media entity if a plurality of said corresponding landmarks are substantially linearly related. 56. A method for recognizing a media sample, comprising: continually sampling into a sound buffer N seconds of said media sample; computing a set of sample fingerprints characterizing a segment of said media sample stored in said sound buffer, wherein said segment has one or more distinct landmarks occurring at reproducible locations of said media sample; storing said fingerprints in a rolling buffer; obtaining a set of matching fingerprints in a database index, each matching fingerprint characterizing at least one distinct landmark of a media file and is equivalent to at least one fingerprint in said rolling buffer; identifying at least one media file having a plurality of matching fingerprints; reporting presence of said at least one media file; and removing at least one sample fingerprint from said rolling buffer. 57. The method of claim 56, further comprising repeating said method for additional segments of said media sample. 58. The method of claim 56 wherein said computing, storing, and removing steps are performed in a client device and said locating and identifying steps are performed in a central location, and wherein the method further comprises transmitting said sample fingerprints from said client device to said central location. 59. The method of claim 56 wherein said computing step is performed in a client device and said storing, locating, identifying, and removing steps are performed in a central location, and wherein the method further comprises transmitting said fingerprints from said client device to said central location. 60. A computer system programmed to perform the method steps of claim 1. 61. The method of claim 56, wherein said reproducible locations and said sample fingerprints are computed simultaneously. 62. A program storage device accessible by a computer, tangibly embodying a program of instructions executable by said computer to perform method steps for recognizing a media entity from a media sample, said program of instructions comprising: code for computing a set of sample fingerprints, each sample fingerprint characterizing a particular sample landmark within said media sample; code for obtaining a set of file fingerprints, each file fingerprint characterizing a particular file landmark within a media entity to be identified; code for generating correspondences between said sample landmarks and said obtained file landmarks, wherein corresponding landmarks have equivalent fingerprints; and code for identifying said media entity if a plurality of said corresponding landmarks are substantially linearly related. 63. A system for recognizing a media entity from a media sample, comprising: a landmarking and fingerprinting object for computing a set of particular sample landmarks within said media sample and a set of sample fingerprints, each sample fingerprint characterizing one of said particular sample landmarks; a database index containing file landmarks and corresponding file fingerprints for at least one media entity to be identified; and an analysis object for: locating a set of matching fingerprints in said database index, wherein said matching fingerprints are equivalent to said sample fingerprints; generating correspondences between said sample landmarks and said file landmarks, wherein corresponding landmarks have equivalent fingerprints; and identifying at least one media entity for which a plurality of said corresponding landmarks are substantially linearly related. 64. A computer-implemented method for recognizing an audio sample, comprising: creating a database index of at least one audio file in a database, comprising: computing landmarks and fingerprints for each audio file, wherein each landmark occurs at a particular location within said audio file and is associated with a fingerprint; associating, for each audio file, said landmarks and fingerprints with an identifier; and storing said fingerprints, said landmarks, and said identifier in a memory. 65. The method of claim 64, further comprising sorting said database index by fingerprint value. 66. The method of claim 64 wherein said particular locations of each audio file are computed in dependence on said audio file. 67. The method of claim 64 wherein each fingerprint represents at least one feature of said audio file near said particular location. 68. The method of claim 64 wherein said fingerprints are numerical values. 69. The method of claim 64 wherein values of said fingerprints specify a method for computing said fingerprints. 70. The method of claim 64 wherein said particular locations are timepoints within said audio file. 71. The method of claim 70 wherein said timepoints occur at local maxima of spectral Lp norms of said audio file. 72. The method of claim 64 wherein said fingerprints are computed from a frequency analysis of said audio file. 73. The method of claim 64 wherein said fingerprints are selected from the group consisting of spectral slice fingerprints, LPC coefficients, and cepstral coefficients. 74. The method of claim 64 wherein said fingerprints are computed from a spectrogram of said audio file. 75. The method of claim 74 wherein salient points of said spectrogram comprise time coordinates and frequency coordinates, and wherein said particular locations are computed from said time coordinates, and said fingerprints are computed from said frequency coordinates. 76. The method of claim 75, further comprising linking at least one of said salient points to an anchor salient point, wherein one of said particular locations is computed from a time coordinate of said anchor salient point, and a corresponding fingerprint is computed from frequency coordinates of at least one of said linked salient points and said anchor point. 77. The method of claim 76, wherein said linked salient points fall within a target zone. 78. The method of claim 77, wherein said target zone is defined by a time range. 79. The method of claim 77, wherein said target zone is defined by a frequency range. 80. The method of claim 77, wherein said target zone is variable. 81. The method of claim 76, wherein said corresponding fingerprint is computed from a quotient between two of said frequency coordinates of said linked salient points and said anchor point, whereby said corresponding fingerprint is time-stretch invariant. 82. The method of claim 76, wherein said corresponding fingerprint is further computed from at least one time difference between said time coordinate of said anchor point and said time coordinates of said linked salient points. 83. The method of claim 82, wherein said corresponding fingerprint is further computed from a product of one of said time differences and one of said frequency coordinates of said linked salient points and said anchor point, whereby said corresponding fingerprint is time-stretch invariant. 84. The method of claim 64 wherein said particular locations and said fingerprints are computed from salient points of a multidimensional function of said audio file, wherein at least one of said dimensions is a time dimension and at least one of said dimensions is a non-time dimension. 85. The method of claim 84 wherein said particular locations are computed from said time dimensions. 86. The method of claim 84 wherein said fingerprints are computed from at least one of said non-time dimensions. 87. The method of claim 84 wherein said salient points are selected from the group consisting of local maxima, local minima, and zero crossings of said multidimensional function. 88. The method of claim 64 wherein said fingerprints are time-stretch invariant. 89. The method of claim 64 wherein each fingerprint is computed from multiple timeslices of said audio file. 90. The method of claim 89 wherein said multiple timeslices are offset by a variable amount of time. 91. The method of claim 90 wherein said fingerprints are computed in part from said variable amounts. 92. A method for recognizing a media entity from a media sample, comprising: generating correspondences between landmarks of said media sample and corresponding landmarks of a media entity to be identified, wherein said landmarks of said media sample and said corresponding landmarks of said media entity have equivalent fingerprints; and identifying said media sample and said media entity if a plurality of said correspondences have a linear relationship defined by landmark*n=m*landmarkn+offset, where landmarkn is a sample landmark, landmark*n is a file landmark that corresponds to landmarkn, and m represents slope. 93. A method for recognizing a media sample, comprising identifying media files that have file landmarks that are substantially linearly related to sample landmarks of said media sample; wherein said file landmarks and said sample landmarks have equivalent fingerprints; and wherein said file landmarks and said sample landmarks have a linear correspondence defined by landmark* n=m*landmarkn+offset, where landmarkn is a sample landmark, landmark*n is a file landmark that corresponds to landmarkn, and m represents slope. 94. A method for comparing an audio sample and an audio entity, comprising: for each of at least one audio entity to be identified, computing a plurality of entity fingerprints representing said audio entity; wherein each entity fingerprint characterizes one or more features of said audio entity at or near an entity landmark in at least one dimensions including time; computing a plurality of sample fingerprints representing said audio sample, wherein said sample fingerprints are invariant to time stretching of said audio sample; and identifying a matching audio entity that has at least a threshold number of said file fingerprints that are equivalent to said sample fingerprints. 95. The method of claim 94 wherein said sample fingerprints comprise quotients of frequency components of said audio sample. 96. The method of claim 94 wherein said sample fingerprints comprise products of frequency components of said audio sample and time differences between points in said audio sample. 97. A method of characterizing an audio sample, comprising computing at least one fingerprint from a spectrogram of said audio sample, wherein said spectrogram comprises an anchor salient point and linked salient points, and wherein said fingerprint is computed from frequency coordinates of said anchor salient point and at least one linked salient point. 98. The method of claim 97, wherein said linked salient points fall within a target zone. 99. The method of claim 98, wherein said target zone is defined by a time range. 100. The method of claim 98, wherein said target zone is defined by a frequency range. 101. The method of claim 98, wherein said target zone is variable. 102. The method of claim 97 wherein said fingerprint is computed from a quotient between two of said frequency coordinates of said linked salient points and said anchor point, whereby said fingerprint is time-stretch invariant. 103. The method of claim 97 wherein said fingerprint is further computed from at least one time difference between said time coordinate of said anchor point and said time coordinates of said linked salient points. 104. The method of claim 103, wherein said fingerprint is further computed from a product of one of said time differences and one of said frequency coordinates of said linked salient points and said anchor point, whereby said fingerprint is time-stretch invariant. 105. The method of claim 97 wherein said anchor salient point and said linked salient points are selected from the group consisting of local maxima, local minima, and zero crossings of said spectrogram. 106. A method for comparing an audio sample and an audio entity, comprising: for each of at least one audio entity to be identified, computing a plurality of entity landmark/fingerprint pairs representing said audio entity, wherein each landmark occurs at a particular location within said audio entity in at least one dimension including time, and wherein each fingerprint characterizes one or more features of said audio entity at or near said particular location; computing a plurality of sample landmark/fingerprint pairs representing said audio sample by obtaining time and frequency coordinates of at least one salient point of a spectrogram of said audio sample, wherein each salient point serves as an anchor point defining a sample landmark; and generating at least one multidimensional sample landmark/fingerprint pair from said at least one salient point, wherein sample landmarks of said audio sample are taken to be time coordinates and wherein corresponding sample fingerprints are computed from at least one of the remaining coordinates; and identifying a winning audio entity that has at least a threshold number of said file fingerprints that are equivalent to said sample fingerprints. 107. The method of claim 106, wherein said linked salient points fall within a target zone. 108. The method of claim 107, wherein said target zone is defined by a time range. 109. The method of claim 107, wherein said target zone is defined by a frequency range. 110. The method of claim 107, wherein said target zone is variable. 111. The method of claim 106 wherein said sample fingerprint is computed from a quotient between two of said frequency coordinates of said linked salient points and said anchor point, whereby said sample fingerprint is time-stretch invariant. 112. The method of claim 106 wherein said sample fingerprint is further computed from at least one time difference between said time coordinate of said anchor point and said time coordinates of said linked salient points. 113. The method of claim 112, wherein said sample fingerprint is further computed from a product of one of said time differences and one of said frequency coordinates of said linked salient points and said anchor point, whereby said sample fingerprint is time-stretch invariant. 114. The method of claim 106 wherein said anchor salient point and said linked salient points are selected from the group consisting of local maxima, local minima, and zero crossings of said spectrogram.
Copyright KISTI. All Rights Reserved.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.