[특허]Acoustic speech recognition method and system using stereo vision neural networks with competition and cooperation

Acoustic speech recognition method and system using stereo vision neural networks with competition and cooperation 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G10L-015/16
출원번호	US-0580449 (2000-05-30)
우선권정보	JP-0150284 (1999-05-28)
발명자 / 주소	Kitazoe, Tetsuro Kim, Sung-Ill Ichiki, Tomoyuki
대리인 / 주소	Morgan, Lewis &
인용정보	피인용 횟수 : 5 인용 특허 : 11

초록 ▼

A method and system are provided for speech recognition. The speech recognition method includes the steps of preparing training data representing acoustic parameters of each of phonemes at each time frame; receiving an input signal representing a sound to be recognized and converting the input signal to input data; comparing the input data at each frame with the training data of each of the phonemes to derive a similarity measure of the input data with respect to each of the phonemes; and processing the similarity measures obtained in the comparing step using a neural net model governing development of activities of plural cells to conduct speech recognition of the input signal. In the processing step, each cell is associated with one respective phoneme and one frame, a development of the activity of each cell at each frame in the neural net model is suppressed by the activities of other cells on the same frame corresponding to different phonemes, and the development of the activity of each cell at each frame being enhanced by the activities of other cells corresponding to the same phoneme at different frames. In the process, the phoneme of a cell that has developed the highest activity is determined as a winner at the corresponding frame to produce a list of winners at respective frames. A phoneme is outputted as a recognition result for the input signal in accordance with the list of the winners at the respective frames that have been determined in the step of processing.

대표청구항 ▼

1. A method for speech recognition, comprising the steps of:preparing training data representing acoustic parameters of each of phonemes at each time frame, the training date being represented by a distribution function; receiving an input signal representing a sound to be recognized and converting the input signal to input data; comparing the input data at each frame with the distribution function representing the training data of each of the phonemes to derive a similarity measure of the input data with respect to each of the phonemes; processing the similarity measures obtained in the comparing step using a neural net model differential equations governing development of time-dependent activities of plural cells to conduct speech recognition of the input signal, each cell being associated with one respective phoneme and one frame, the step including numerically and recurrently solving coupled differential pattern recognition equations to produce a cell having the highest activity at each of the frames, the couple differential pattern recognition equations being such that a time-dependent development of the activity of each cell at each frame in the neural net model is dependent upon the similarity measured and is suppressed by the activities of other cells on the same frame corresponding to different phonemes, the time-dependent development of the activity of each cell at each frame is enhanced by the activities of other cells corresponding to the same phoneme at different frames, and the phoneme of a cell that has developed the highest activity is determined as a winner at the corresponding frame to produce a list of winners at respective frames; and outputting a phoneme as a recognition result for the input signal in accordance with the list of the winners at the respective frames that have been determined in the step of processing. 2. The method according to claim 1, wherein the distribution function representing the training data is a Gaussian probability density function.3. The method according to claim 1, wherein the step of processing includes numerically solving two-layered coupled neural net differential equations governing development of the time-dependent activities of the cells of respective two layers to produce the cell having the highest activity at each of the frames.4. The method according to claim 3, wherein the step of numerically solving two-layered coupled neural net differential equations governing development of the time-dependent activities of the cells of respective two layers includes numerically and recurrently solving coupled differential pattern recognition equations given by the following equations:τ1{dot over (ξ)}ua(t)=?ξua(t)+f(αua) where λau represents the similarity measure between the input data at a certain frame u and a particular phoneme /a/, ξua(t) is the time-dependent activities of a cell in an output layer, the highest activities thereof being to be determined, αua(t) represents the time-dependent activity level of a cell to which a similarity measure represented by input λua is inputted and is in a first input layer, A, B, D, τ1, and τ2 are constants, and f(x) and g(u) are given by:f(x)=(tan h(w(x?h)+1)/2 g(u)=u+=(u+|u|)/2, where w and h are constants.5. The method according to claim 1, wherein the step of processing includes numerically solving three-layered coupled neural net differential equations governing development of the time-dependent activities of the cells of respective three layers to produce the cell having the highest activity at each of the frames.6. The method according to claim 5, wherein the step of numerically solving three-layered coupled neural net differential equations governing development of the time-dependent activities of the cells of respective three layers includes numerically and recurrently solving coupled differential pattern recognition equations given by the following equations:τ1{dot over (ξ)}ua(t)=?ξua(t)=f(βua) ?τ3{dot over (β)}ua=?βua+g(αua)+g(ξua)where λau represents the similarity measure between the input data at a certain frame u and a particular phoneme /a/, ξua(t) is the time-dependent activities of a cell in an output layer, the highest activities thereof being to be determined, αua(t) represents the time-dependent activity level of a cell to which a similarity measure represented by input λua is inputted and is in a first input layer, βua represents a cell in a middle layer, f(x) and g(u) are given by:f(x)=(tan h(w)(x?h))+1)/2 g(u)=u+=(u+|u|)/2 and A, B, D, w, h, τ1, τ2, and τ3, are constants.7. The method according to claim 1, further comprising the step of identifying at least one of a word and a sentence based upon the recognition result obtained in the step of outputting.8. The method according to claim 1, wherein the step of outputting includes displaying the recognition result on a display monitor.9. The method according to claim 1, wherein in the step of processing, only phonemes that have the highest to fifth highest similarity measures are processed.10. The method according to claim 1, wherein the step of receiving the input signal includes receiving a continuous speech through a plurality of time windows each having a fixed time interval, andwherein the step of processing and the step of outputting are repeated for every time window. 11. A speech recognition device, comprising:a training unit for storing training data representing acoustic parameters of each of phonemes at each time frame, the training date being represented by a distribution function; a signal input unit for receiving an input signal representing a sound to be recognized and for converting the input signal to input data; a similarity calculation unit for comparing the input data at each frame with the distribution function representing the training data of each of the phonemes to derive a similarity measure of the input data with respect to each of the phonemes; and a processing unit for processing the similarity measures obtained by the similarity calculation unit using a neural net model differential equations governing development of time-dependent activities of plural cells to conduct speech recognition of the input signal, each cell being associated with one respective phoneme and one frame, the processing unit numerically and recurrently solving coupled differential pattern recognition equations to produce a cell having the highest activity at each of the frames, the couple differential pattern recognition equations being such that a time-dependent development of the activity of each cell at each frame in the neural net model is dependent upon the similarity measured and is suppressed by the activities of other cells on the same frame corresponding to different phonemes, the time-dependent development of the activity of each cell at each frame is enhanced by the activities of other cells corresponding to the same phoneme at different frames, and the phoneme of a cell that has developed the highest activity is determined as a winner at the corresponding frame to produce a list of winners at respective frames, the processing unit outputting a phoneme as a recognition result for the input signal in accordance with the list of the winners at the respective frames that have been determined. 12. The device according to claim 11, wherein the distribution function representing the training data is a Gaussian probability density function.13. The device according to claim 11, wherein the processing unit is adapted to numerically solve two-layered coupled neural net differential equations governing development of the time-dependent activities of the cells of respective two layers to produce the cell having the highest activity at each of the frames.14. The device according to claim 13, wherein the processing unit is adapted to numerically solve two-layered coupled neural net differential equations governing development of the time-dependent activities of the cells of respective two layers by numerically and recurrently solving coupled differential pattern recognition equations given by the following equations:τ1{dot over (ξ)}ua(t)=?ξua(t)+f(αua) where λau represents the similarity measure between the input data at a certain frame u and a particular phoneme /a/, ξua(t) is the time-dependent activities of a cell in an output layer, the highest activities thereof being to be determined, αua(t) represents the time-dependent activity level of a cell to which a similarity measure represented by input λu/a is inputted and is in a first input layer, A, B, D, τ1, and τ2 are constants, and f(x) and g(u) are given by:f(x)=(tan h(w(x?h))+1)/2 g(u)=u+=(u+|u|)/2 where w and h are constants.15. The device according to claim 11, wherein the processing unit is adapted to numerically solve three-layered coupled neural net differential equations governing development of the time-dependent activities of the cells of respective three layers to produce the cell having the highest activity at each of the frames.16. The device according to claim 15, wherein the processing unit is adapted to numerically solve three-layered coupled neural net differential equations governing development of the time-dependent activities of the cells of respective three layers by numerically and recurrently solving coupled differential pattern recognition equations given by the following equations:τ1{dot over (ξ)}ua(t)=?ξua(t)+f(βua) ?τ3{dot over (β)}ua=?βua+g(αua)+g(ξua)where ξua(t) is the time-dependent activities of a cell in an output layer, the highest activities thereof being to be determined, αua(t) represents the time-dependent activity level of a cell to which a similarity measure represented by input λua is inputted and is in a first input layer, βua represents a cell in a middle layer, f(x) and g(u) are given by:f(x)=(tan h(w(x?h)+1)/2 g(u)=u+=(u+|u|)/2 and A, B, D, w, h, τ1, τ2, and τ3, are constants.17. The device according to claim 11, wherein the processing unit further identifies at least one of a word and a sentence based upon the recognition result.18. The device according to claim 11, further comprising a display monitor for displaying the recognition result.19. The device according to claim 11, wherein the processing unit processes only phonemes that have the highest to fifth highest similarity measures.20. The method according to claim 11, wherein the signal input unit receives a continuous speech through a plurality of successive time windows each having a fixed time interval.

이 특허에 인용된 특허 (11)

Schuster Mike,JPX ; Fukada Toshiaki,JPX, Apparatus for calculating a posterior probability of phoneme symbol, and speech recognition apparatus.
상세보기
Sakoe Hiroaki (Tokyo JPX), Connected word recognition system including neural networks arranged along a signal time axis.
상세보기
Modi Piyush C. ; Rahim Mazin G., Method and apparatus for discriminative utterance verification using multiple confidence measures.
상세보기
Wang Shay-Ping T. (Long Grove IL) Lindsey Michael K. (Arlington Heights IL), Method and system for continuous speech recognition using voting techniques.
상세보기
Sakoe Hiroaki (Tokyo JPX), Multi-layer neural network to which dynamic programming techniques are applicable.
상세보기
Sakamoto Kenji (Nara JPX) Yamaguchi Kouichi (Tenri JPX), Recognition apparatus using articulation positions for recognizing a voice.
상세보기
Inazumi Mitsuhiro,JPX, Recognition apparatus using neural network, and learning method therefor.
상세보기
Fukuzawa Keiji,JPX, Screen control apparatus and screen control method.
상세보기
Tateishi Masahiko,JPX ; Tamura Shinichi,JPX, Signal extraction system, system and method for speech restoration, learning method for neural network model, constructi.
상세보기
Inazumi Mitsuhiro,JPX, Speech recognition apparatus using neural network and learning method therefor.
상세보기
Mead Carver A. (Pasadena CA) Lazzaro John (Pasadena CA) Mahowald M. A. (Pasadena CA) Ryckebusch Sylvie (Pasadena CA), Winner-take-all circuits for neural computing systems.
상세보기

이 특허를 인용한 특허 (5)

Song, In Chul; Choi, Young Sang; Na, Hwi Dong, Apparatus and method for normalizing input data of acoustic model and speech recognition apparatus.
상세보기
Commons, Michael Lamport, Intelligent control with hierarchical stacked neural networks.
상세보기
Gruhn, Rainer; Vasquez, Daniel; Aradilla, Guillermo, Method for automated training of a plurality of artificial neural networks.
상세보기
Choo,Ki hyun; Kim,Jeong su; Lee,Jae won; Lee,Ki seung, Method of setting optimum-partitioned classified neural network and method and apparatus for automatic labeling using optimum-partitioned classified neural network.
상세보기
Hikishima,Naoki, Terminal device and communication control method.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Acoustic speech recognition method and system using stereo vision neural networks with competition and cooperation 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (11)

이 특허를 인용한 특허 (5)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Acoustic speech recognition method and system using stereo vision neural networks with competition and cooperation 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (11)

이 특허를 인용한 특허 (5)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트