Method, medium, and system detecting speech using energy levels of speech frames
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G10L-015/04
G10L-015/02
G10L-015/00
G10L-021/00
G10L-025/78
출원번호
US-0882444
(2007-08-01)
등록번호
US-9009048
(2015-04-14)
우선권정보
KR-10-2006-0073386 (2006-08-03)
발명자
/ 주소
Jang, Giljin
Kim, Jeongsu
Bridle, John S.
Hunt, Melvyn J.
출원인 / 주소
Samsung Electronics Co., Ltd.
대리인 / 주소
Staas & Halsey LLP
인용정보
피인용 횟수 :
3인용 특허 :
25
초록▼
A speech recognition method, medium, and system. The method includes detecting an energy change of each frame making up signals including speech and non-speech signals, and identifying a speech segment corresponding to frames that include only speech signals from among the frames based on the detect
A speech recognition method, medium, and system. The method includes detecting an energy change of each frame making up signals including speech and non-speech signals, and identifying a speech segment corresponding to frames that include only speech signals from among the frames based on the detected energy change.
대표청구항▼
1. A speech recognition method, comprising: detecting, using at least one processing device, energy changes between a plurality of frames distinguishing portions of a signal, each of the plurality of frames having time lengths less than a whole time length of the signal; andidentifying speech segmen
1. A speech recognition method, comprising: detecting, using at least one processing device, energy changes between a plurality of frames distinguishing portions of a signal, each of the plurality of frames having time lengths less than a whole time length of the signal; andidentifying speech segments and/or non-speech segments from the plurality of frames based on the detected energy changes between the plurality of frames by assigning a predetermined weight to a segment in which an energy level of a respective frame is changed and when an energy difference exists between two neighboring frames. 2. The method of claim 1, further comprising classifying each of the plurality of frames according to respective energy levels based on predetermined criteria, wherein in the detecting of the energy changes between the plurality of frames, detection of the energy change is based on differences in the respective classified energy levels. 3. The method of claim 2, wherein the identifying of the speech segment and/or non-speech segments comprises: repeatedly performing processes of assigning the predetermined weight to a segment in which an energy level of a respective frame is changed and calculating weights for all respective segments; andidentifying a segment corresponding to a minimum weight, among the calculated weights, as being a speech segment,wherein the segment corresponding to the minimum weight has a lower energy level than the other speech segments. 4. The method of claim 2, wherein, in the classifying of the frames, frames are classified according to calculated energies of respective frames. 5. The method of claim 2, further comprising modifying a classified energy level of a frame by changing the classified energy level of the frame, wherein in the detecting the energy changes, a segment in which the classified energy level of the frame is changed is identified. 6. The method of claim 5, wherein the energy change includes a change between energy levels of neighboring frames and a change between an initial energy level of a frame and a changed energy level of the frame. 7. The method of claim 2, further comprising updating the predetermined criteria according to detected energies of the signal. 8. The method of claim 7, wherein frames are classified into three levels including high, medium, and low levels based on the detected energies. 9. The method of claim 1, further comprising combining the identified speech segments with other speech and/or non-speech segments of the signal. 10. The method of claim 1, wherein the non-speech segments include a burst noise which has a frequency characteristic that remarkably changes within a short period of time compared to the whole time length of the signal. 11. At least one non-transitory recording medium comprising computer readable code to control at least one processing element to implement a speech recognition method, comprising: detecting, using at least one processing device, energy changes between a plurality of frames distinguishing portions of a signal, each of the plurality of frames having time lengths less than a whole time length of the signal; andidentifying speech segments and/or non-speech segments from the plurality of frames based on the detected energy changes between the plurality of frames by assigning a predetermined weight to a segment in which an energy level of a respective frame is changed and when an energy difference exists between two neighboring frames. 12. A speech recognition system including at least one processing device, the system comprising: a change detector to detect, using the at least one processing device, energy changes between a plurality of frames distinguishing portions of a signal, each of the plurality of frames having lengths less than a whole time length of the signal; anda determiner to identify speech segments and/or non-speech segments from the plurality of frames based on the detected energy changes between the plurality of frames by assigning a predetermined weight to a segment in which an energy level of a respective frame is changed and when an energy difference exists between two neighboring frames. 13. The system of claim 12, further comprising an energy level classifier to classify each of the plurality of frames according to respective energy levels based on predetermined criteria,wherein the change detector detects a segment in which respective energies of each frame are changed based on the classified energy level. 14. The system of claim 13, further comprising: an energy calculator to calculate energies of each frame;an energy level updater to update the predetermined criteria according to the energies of each signal;wherein the energy level classifier classifies frames into three levels including high, medium, and low levels. 15. The system of claim 13, further comprising a generator to modify an energy level of a frame by changing the classified energy level of the frame, wherein the change detector detects a segment in which the classified energy level of the frame is changed. 16. The system of claim 12, wherein the determiner repeatedly performs processes of assigning the predetermined weight to a segment in which an energy level of a respective frame is changed and calculating weights for all respective segments in order to identify a segment corresponding to a minimum weight, among the calculated weights, as being a speech segment, wherein the segment corresponding to the minimum weight has a lower energy level than the other speech segments. 17. The system of claim 12, further comprising a combiner to combine the identified speech segment with other speech and/or non-speech segments of the signal. 18. A speech recognition system, comprising: an A/D converter to convert an analog input signal including speech and/or non-speech signals transmitted through an audio transducer into a digital input signal;a frame generator to generate a plurality of frames corresponding to the digital input signal;a phoneme detector to generate a phoneme sequence from the frames;a vocabulary recognition device to extract a phoneme sequence most similar to the phoneme detector generated phoneme sequence from a dictionary that stores reference phoneme sequences;a speech segment detection device including a determiner to detect energy changes between the frames distinguishing portions of the signal, each of the frames having time lengths less than a whole time length of the signal, and to identify a speech segment from the frames based on the detected energy changes between the frames by assigning a predetermined weight to a segment in which an energy level of a respective frame is changed and when an energy difference exists between two neighboring frames; anda phoneme sequence editor to edit the phoneme detector generated phoneme sequence based on information on speech segments provided from the speech segment detection device. 19. The system of claim 18, wherein the speech segment detection device combines identified speech segments with other speech and/or non-speech segments and outputs a result of the combination to the phoneme sequence editor. 20. The system of claim 18, wherein the phoneme sequence editor removes phoneme sequences, except phoneme sequences corresponding to speech segments, based on information on the identified speech segment.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (25)
Graumann David L., Adaptive noise reduction technique for multi-point communication system.
Gupta Vishwa N. (Brossard CAX) Lennig Matthew (Montreal CAX) Kenny Patrick J. (Montreal CAX) Toulson Christopher K. (Dollard des Ormeaux CAX), Phoneme based speech recognition.
Rajasekaran Periagaram K. (Richardson TX) Yoshino Toshiaki (Tokyo JPX), Speaker-independent word recognition method and system based upon zero-crossing rate and energy measurement of analog sp.
Bahl Lalit Rai ; Gopalakrishnan Ponani ; Gopinath Ramesh Ambat ; Maes Stephane Herman ; Panmanabhan Mukund ; Polymenakos Lazaros, Transcription of speech data with segments from acoustically dissimilar environments.
Jacobs Paul E. (San Diego CA) Gardner William R. (San Diego CA) Lee Chong U. (San Diego CA) Gilhousen Klein S. (San Diego CA) Lam S. Katherine (San Diego CA) Tsai Ming-Chang (San Diego CA), Variable rate vocoder.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.