[특허]Speech analyzing system with speech codebook

Speech analyzing system with speech codebook 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G10L-011/06 G10L-019/00 G10L-021/02 G10L-015/20 G10L-015/06 G10L-021/04
출원번호	US-0593836 (2006-11-06)
등록번호	US-8219391 (2012-07-10)
발명자 / 주소	Preuss, Robert David Fabbri, Darren Ross Cruthirds, Daniel Ramsay
출원인 / 주소	Raytheon BBN Technologies Corp.
대리인 / 주소	Ropes & Gray LLP
인용정보	피인용 횟수 : 3 인용 특허 : 54

초록 ▼

Presented herein are systems and methods for processing sound signals for use with electronic speech systems. Sound signals are temporally parsed into frames, and the speech system includes a speech codebook having entries corresponding to frame sequences. The system identifies speech sounds in an a

대표청구항 ▼

1. A method for processing a signal, comprising the steps of: receiving an input sound signal including speech and environmental noise;temporally parsing the input sound signal into input frame sequences of at least three input frames, wherein an input frame represents a segment of a waveform of the input sound signal;providing a speech codebook including a plurality of entries corresponding to speech spectral trajectories of reference frame sequences that include at least three reference frames,wherein a reference frame represents a segment of a waveform of a reference sound signal,wherein the reference frame sequence corresponding to the entries are derived from allowable sequences of at least three reference frames, andwherein the speech codebook substantially lacks entries corresponding to (1) reference frame sequences that include a single unvoiced frame between a pair of voiced frames, and (2) reference frame sequences that include a single voiced frame between a pair of unvoiced frames;identifying phones within the speech based on a comparison of an input frame sequence with a plurality of the speech spectral trajectories of reference frame sequences; andencoding the phones. 2. The method of claim 1, wherein the segment of the waveform represented by an input frame is represented by a spectrum. 3. The method of claim 1, wherein the segment of the waveform represented by a reference frame is represented by a spectrum. 4. The method of claim 1, wherein an input frame includes the segment of the waveform of the input sound signal it represents. 5. The method of claim 1, wherein a reference frame includes the segment of the waveform of the reference sound signal that it represents. 6. The method of claim 1, comprising identifying pitch values of the at least two input frames. 7. The method of claim 6, comprising encoding the identified pitch values. 8. The method of claim 1, comprising providing a noise codebook including a plurality of noise codebook entries corresponding to frames of environmental noise;selecting at least one noise sequence of noise codebook entries; andidentifying phones based on a comparison of at least one of the input frame sequences with the at least one noise sequence. 9. The method of claim 8, wherein the at least one noise sequence comprises a first noise codebook entry and a second noise codebook entry. 10. The method of claim 9, wherein the first noise codebook entry and the second noise codebook entry are the same noise codebook entry. 11. The method of claim 8, wherein selecting comprises: calculating frame-level discriminant values for the noise code book entries;creating a matrix having a plurality of matrix entries including the frame-level discriminant values; andidentifying, in respective columns of the matrix, a matrix entry having the largest frame-level discriminant value. 12. The method of claim 1, wherein the at least two input frames are temporally adjacent portions of the input sound signal. 13. The method of claim 1, comprising determining the set of allowable sequences based on sequences of phones that are formable by the average human vocal tract. 14. The method of claim 1, comprising determining the set of allowable sequences based on sequences of phones that are permissible in a selected language. 15. The method of claim 14, wherein the selected language is English. 16. The method of claim 1, comprising creating the at least two input frames from temporally overlapping portions of the input sound signal. 17. The method of claim 1, comprising creating the reference spectral sequences from frames derived from overlapping portions of a speech signal. 18. The method of claim 1, wherein the parsing comprises parsing the input sound signal into variable length frames. 19. The method of claim 18, wherein at least one of the variable length frames corresponds to a phone. 20. The method of claim 18, wherein at least one of the variable length frames corresponds to at least one of a phone and a transition between phones. 21. The method of claim 1, wherein the input sound signal is temporally parsed into frame sequences of one of at least 3 frames, at least 5 frames, at least 7 frames, at least 9 frames, and at least 12 frames. 22. The method of claim 1, wherein encoding the phones comprises encoding the identified phones as a digital signal having a bit rate of less than 2500 bits per second. 23. A device comprising: a receiver for receiving an input sound signal including speech and environmental noise;a first processor for temporally parsing the input sound signal into input frame sequences of at least three input frames, wherein an input frame represents a segment of a waveform of the input sound signal;a first memory for storing a plurality of speech codebook entries corresponding to speech spectral trajectories of reference frame sequences that include at least three reference frames,wherein a reference frame represents a segment of a waveform of a reference sound signal,wherein the reference frame sequence corresponding to the entries are derived from allowable sequences of at least three reference frames, andwherein the speech codebook substantially lacks entries corresponding to (1) reference frame sequences that include a single unvoiced frame between a pair of voiced frames, and (2) reference frame sequences that include a single voiced frame between a pair of unvoiced frames;a second processor for identifying phones within the speech based on a comparison of an input frame sequence with a plurality of the speech spectral trajectories of reference frame sequences; anda third processor for encoding the phones. 24. The device of claim 23, wherein at least two of the first processor, the second processor, and the third processor are the same processor. 25. The device of claim 23, wherein the segment of the waveform represented by an input frame is represented by a spectrum. 26. The device of claim 23, wherein a the segment of the waveform represented by a reference frame is represented by a spectrum. 27. The device of claim 23, wherein an input frame includes the segment of the waveform of the input sound signal it represents. 28. The device of claim 23, wherein a reference frame includes the segment of the waveform of the reference sound signal that it represents. 29. The device of claim 23, comprising a second memory for storing a plurality of noise codebook entries corresponding to spectra of environmental noise;a fourth processor for selecting at least one noise sequence of noise codebook entries; andwherein the second processor identifies phones within the speech based on a comparison of the spectra corresponding to a frame sequence with the at least one noise sequence. 30. The device of claim 23, comprising a fourth processor for identifying pitch values of the at least two input frames. 31. The device of claim 23, wherein the allowable sequences are based on sequences of phones predetermined to be formable by the average human vocal tract. 32. The device of claim 23, wherein allowable sequences are based on sequences of phones predetermined to be permissible in a selected language. 33. The device of claim 32, wherein the selected language is English. 34. The device of claim 23, wherein the first processor creates the at least two input frames from temporally adjacent portions of the input sound signal. 35. The device of claim 23, wherein the first processor creates the at least two input frames from temporally overlapping portions of the input sound signal. 36. The device of claim 23, wherein the reference frame sequences are from reference frames created from overlapping portions of a speech signal. 37. The device of claim 23, wherein the first processor parses the input sound signal into variable length input frames. 38. The device of claim 37, wherein at least one of the variable length input frames corresponds to a phone. 39. The device of claim 37, wherein at least one of the variable length input frames corresponds to at least one of a phone and a transition between phones. 40. The device of claim 23, wherein the first processor temporally parses the input sound signal into input frame sequences of one of at least 3 frames, at least 5 frames, at least 7 frames, at least 9 frames, and at least 12 frames. 41. The device of claim 23, wherein the third processor encodes phones as a digital signal having a bit rate of less than 2500 bits per second. 42. The method of claim 1, wherein non-allowable sequences are reference frame sequences that represent a waveform which is not typical of a speech signal. 43. The method of claim 1, wherein the comparison comprises determining a likelihood that the input frame sequence corresponds to one of the plurality of speech spectral trajectories of reference frame sequences. 44. The method of claim 1, further comprising generating a plurality of noise-corrupted versions of the plurality of the speech spectral trajectories of reference frame sequences using noise entries from a noise codebook, and wherein the comparison comprises comparing the input frame sequence with the noise-corrupted versions of the plurality of the speech spectral trajectories of reference frame sequences. 45. The device of claim 23, wherein the comparison comprises determining a likelihood that the input frame sequence corresponds to one of the plurality of speech spectral trajectories of reference frame sequences. 46. The device of claim 23, further comprising a fourth processor for generating a plurality of noise-corrupted versions of the plurality of the speech spectral trajectories of reference frame sequences using noise entries from a noise codebook, and wherein the comparison comprises comparing the input frame sequence with the noise-corrupted versions of the plurality of the speech spectral trajectories of reference frame sequences.

이 특허에 인용된 특허 (54)

Porter Jack E. (San Diego CA), Apparatus and methods for the selective addition of noise to templates employed in automatic speech recognition systems.
상세보기
Wang, Tian; Khalil, Hosam A.; Koishida, Kazuhito; Chen, Wei-Ge; Han, Mu, Audio encoding and decoding with intra frames and adaptive forward error correction.
상세보기
Seza Katsushi (Kamakura JPX) Tasaki Hirohisa (Kamakura JPX) Nakajima Kunio (Kamakura JPX), Code-book driven vocoder device with voice source generator.
상세보기
Liu Yu-Jih (Wharton NJ), Enhancement of speech coding in background noise for low-rate speech coder.
상세보기
Henn,Fredrik; Kj철rling,Kristofer; Ekstrand,Per; Villemoes,Lars, Enhancing source coding systems by adaptive transposition.
상세보기
Eberman Brian S. ; Moreno Pedro J., Environmently compensated speech processing.
상세보기
Kingsbury Brian E. D. ; Greenberg Steven ; Morgan Nelson H., Feature extraction for automatic speech recognition.
상세보기
Glass James Robert, Feature-based speech recognizer having probabilistic linguistic processor providing word matching based on the entire sp.
상세보기
Droppo, James G.; Acero, Alejandro; Deng, Li, Including the category of environmental noise when processing speech signals.
상세보기
Gersho,Allen; Cuperman,Vladimir; Wang,Tian; Koishida,Kazuhito, LPC-harmonic vocoder with superframe structure.
상세보기
Gersho,Allen; Cuperman,Vladimir; Wang,Tian; Koishida,Kazuhito, LPC-harmonic vocoder with superframe structure.
상세보기
Fette Bruce A. (Mesa AZ) Jaskie Cynthia A. (Scottsdale AZ), Low bit rate vocoder means and method.
상세보기
Das, Amitava; Manjunath, Sharath, Low bit-rate coding of unvoiced segments of speech.
상세보기
Safdar M. Asghar ; Lin Cong, Matrix quantization with vector quantization error compensation and neural network postprocessing for robust speech recognition.
상세보기
Bi, Ning, Method and apparatus for constructing voice templates for a speaker-independent voice recognition system.
상세보기
Benyassine Adil ; Shlomot Eyal, Method and apparatus for generating frame voicing decisions of an incoming speech signal.
상세보기
Droppo,James G.; Acero,Alejandro; Deng,Li, Method and apparatus for identifying noise environments from noisy signals.
상세보기
Byrnes Christopher I. ; Lindquist Anders,SEX, Method and apparatus for speaker recognition using lattice-ladder filters.
상세보기
Adut,Victor, Method and apparatus for speech coding using training and quantizing.
상세보기
Nakadai Yoshio,JPX ; Sakurai Tetsuma,JPX ; Nishino Yutaka,JPX, Method and apparatus for word speech recognition by pattern matching.
상세보기
Junkawitsch, Jochen; Höge, Harald, Method and array for introducing temporal correlation in hidden markov models for speech recognition.
상세보기
Sonmez Mustafa Kemal ; Rajasekaran Periagaram K., Method and system for compensating speech signals using vector quantization codebook adaptation.
상세보기
Kimio Miseki JP; Masahiro Oshikiri JP; Tadashi Amada JP; Masami Akamine JP, Method for encoding speech wherein pitch periods are changed based upon input speech signal.
상세보기
Gournay, Philippe; Chartier, Frederic, Method for quantizing speech coder parameters.
상세보기
Malayath, Narendranath; Garudadri, Harinath, Method for robust voice recognition by analyzing redundant features of source signal.
상세보기
Deng, Li; Droppo, James G.; Acero, Alejandro, Method of iterative noise estimation in a recursive framework.
상세보기
Shvodian,William M.; Odman,Knut T.; Montano,Sergio T., Method of using sub-rate slots in an ultrawide bandwidth system.
상세보기
Gao, Yang; Benyassine, Adil; Thyssen, Jes; Shlomot, Eyal; Su, Huan-yu, Multi-mode bitstream transmission protocol of encoded voice signals with embeded characteristics.
상세보기
Furui,Sadaoki; Zhang,Zhipeng; Horikoshi,Tsutomu; Sugimura,Toshiaki, Noise adaptation system of speech model, noise adaptation method, and noise adaptation program for speech recognition.
상세보기
Gilbert C. Sih ; Ning Bi, Noise-compensated speech recognition templates.
상세보기
Taguchi Tetsu (Tokyo JPX), Pattern matching vocoder.
상세보기
Santoni, Umberto, Pattern recognition based on piecewise linear probability density function.
상세보기
Erell Adoram,ILX ; Burshtein David,ILX, Pattern recognition system and method.
상세보기
Laurent Pierre-Andr (Bessancourt FRX), Quantization process for a predictor filter for vocoder of very low bit rate.
상세보기
Ponting, Keith M; Series, Robert W; Tomlinson, Michael J, Recognition system.
상세보기
Smith,Derek H.; Schmidt,Brian L.; Chrysanthakopoulos,Georgios, Recursive multistage audio processing.
상세보기
Goldenthal William D. (Cambridge MA) Glass James R. (Arlington MA), Segment-based apparatus and method for speech recognition by analyzing multiple speech unit frames and modeling both tem.
상세보기
Huan-Yu Su ; Yang Gao, Speech classification and parameter weighting used in codebook search.
상세보기
Rees, David Llewellyn; Keiller, Robert Alexander, Speech processing apparatus and method measuring signal to noise ratio and scaling speech and noise.
상세보기
Takagi Keizaburo (Tokyo JPX), Speech recognition apparatus.
상세보기
Suzuki Tadashi,JPX, Speech recognition apparatus and method in noisy circumstances.
상세보기
Ma, Changxue; Wei, Yuan-Jun, Speech recognition by dynamical noise model adaptation.
상세보기
Aikawa Kiyoaki (Kyoto JPX) Kawahara Hideki (Kyoto JPX) Tohkura Yoh\ichi (Kyoto JPX), Speech recognition method using time-frequency masking mechanism.
상세보기
Kroeker, John, Speech recognition system and method for generating phonotic estimates.
상세보기
Asghar Safdar M. ; Cong Lin, Speech recognition system having a quantizer using a single robust codebook designed at multiple signal to noise ratios.
상세보기
Koshiba,Ryosuke, Speech recognizing apparatus and speech recognizing method.
상세보기
Muroi Tetsuya,JPX, Speech segment detection and word recognition.
상세보기
Menendez-Pidal, Xavier; Abrego, Gustavo Hernandez, System and method for performing speech recognition in cyclostationary noise environments.
상세보기
Bi Ning ; Chang Chienchung, System and method for segmentation and recognition of speech signals.
상세보기
Goldberg Randy G., Vocoder for coding speech by using a correlation between spectral magnitudes and candidate excitations.
상세보기
Galand Claude (Cagnes Sur Mer) Menez Jean (Cagnes Sur Mer FRX), Voice coding process and device for implementing said process.
상세보기
Barron, David L.; Yip, William Chunhung; Kennedy, Paul Lopez, Voice decoder and method for detecting channel errors using spectral energy evolution.
상세보기
Choi,Yong Soo, Voiced/unvoiced information estimation system and method therefor.
상세보기
Rotola-Pukkila, Jani; Mikkola, Hannu; Vainio, Janne, Wideband speech codec using a higher sampling rate in analysis and synthesis filtering than in excitation searching.
상세보기

이 특허를 인용한 특허 (3)

Newman, David Edward, Efficient discrimination of voiced and unvoiced sounds.
상세보기
Visser, Erik; Liu, Ian Ernan; Shin, Jongwon, Systems, methods, and apparatus for speech feature detection.
상세보기
Shin, Jongwon; Visser, Erik; Liu, Ian Ernan, Systems, methods, and apparatus for voice activity detection.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Speech analyzing system with speech codebook 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (54)

이 특허를 인용한 특허 (3)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Speech analyzing system with speech codebook 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (54)

이 특허를 인용한 특허 (3)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트