IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0580449
(2000-05-30)
|
우선권정보 |
JP-0150284 (1999-05-28) |
발명자
/ 주소 |
- Kitazoe, Tetsuro
- Kim, Sung-Ill
- Ichiki, Tomoyuki
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
5 인용 특허 :
11 |
초록
▼
A method and system are provided for speech recognition. The speech recognition method includes the steps of preparing training data representing acoustic parameters of each of phonemes at each time frame; receiving an input signal representing a sound to be recognized and converting the input signa
A method and system are provided for speech recognition. The speech recognition method includes the steps of preparing training data representing acoustic parameters of each of phonemes at each time frame; receiving an input signal representing a sound to be recognized and converting the input signal to input data; comparing the input data at each frame with the training data of each of the phonemes to derive a similarity measure of the input data with respect to each of the phonemes; and processing the similarity measures obtained in the comparing step using a neural net model governing development of activities of plural cells to conduct speech recognition of the input signal. In the processing step, each cell is associated with one respective phoneme and one frame, a development of the activity of each cell at each frame in the neural net model is suppressed by the activities of other cells on the same frame corresponding to different phonemes, and the development of the activity of each cell at each frame being enhanced by the activities of other cells corresponding to the same phoneme at different frames. In the process, the phoneme of a cell that has developed the highest activity is determined as a winner at the corresponding frame to produce a list of winners at respective frames. A phoneme is outputted as a recognition result for the input signal in accordance with the list of the winners at the respective frames that have been determined in the step of processing.
대표청구항
▼
1. A method for speech recognition, comprising the steps of:preparing training data representing acoustic parameters of each of phonemes at each time frame, the training date being represented by a distribution function; receiving an input signal representing a sound to be recognized and converting
1. A method for speech recognition, comprising the steps of:preparing training data representing acoustic parameters of each of phonemes at each time frame, the training date being represented by a distribution function; receiving an input signal representing a sound to be recognized and converting the input signal to input data; comparing the input data at each frame with the distribution function representing the training data of each of the phonemes to derive a similarity measure of the input data with respect to each of the phonemes; processing the similarity measures obtained in the comparing step using a neural net model differential equations governing development of time-dependent activities of plural cells to conduct speech recognition of the input signal, each cell being associated with one respective phoneme and one frame, the step including numerically and recurrently solving coupled differential pattern recognition equations to produce a cell having the highest activity at each of the frames, the couple differential pattern recognition equations being such that a time-dependent development of the activity of each cell at each frame in the neural net model is dependent upon the similarity measured and is suppressed by the activities of other cells on the same frame corresponding to different phonemes, the time-dependent development of the activity of each cell at each frame is enhanced by the activities of other cells corresponding to the same phoneme at different frames, and the phoneme of a cell that has developed the highest activity is determined as a winner at the corresponding frame to produce a list of winners at respective frames; and outputting a phoneme as a recognition result for the input signal in accordance with the list of the winners at the respective frames that have been determined in the step of processing. 2. The method according to claim 1, wherein the distribution function representing the training data is a Gaussian probability density function.3. The method according to claim 1, wherein the step of processing includes numerically solving two-layered coupled neural net differential equations governing development of the time-dependent activities of the cells of respective two layers to produce the cell having the highest activity at each of the frames.4. The method according to claim 3, wherein the step of numerically solving two-layered coupled neural net differential equations governing development of the time-dependent activities of the cells of respective two layers includes numerically and recurrently solving coupled differential pattern recognition equations given by the following equations:τ1{dot over (ξ)}ua(t)=?ξua(t)+f(αua) where λau represents the similarity measure between the input data at a certain frame u and a particular phoneme /a/, ξua(t) is the time-dependent activities of a cell in an output layer, the highest activities thereof being to be determined, αua(t) represents the time-dependent activity level of a cell to which a similarity measure represented by input λua is inputted and is in a first input layer, A, B, D, τ1, and τ2 are constants, and f(x) and g(u) are given by:f(x)=(tan h(w(x?h)+1)/2 g(u)=u+=(u+|u|)/2, where w and h are constants.5. The method according to claim 1, wherein the step of processing includes numerically solving three-layered coupled neural net differential equations governing development of the time-dependent activities of the cells of respective three layers to produce the cell having the highest activity at each of the frames.6. The method according to claim 5, wherein the step of numerically solving three-layered coupled neural net differential equations governing development of the time-dependent activities of the cells of respective three layers includes numerically and recurrently solving coupled differential pattern recognition equations given by the following equations:τ1{dot over (ξ)}ua(t)=?ξua(t)=f(βua) ?τ3{dot over (β)}ua=?βua+g(αua)+g(ξua)where λau represents the similarity measure between the input data at a certain frame u and a particular phoneme /a/, ξua(t) is the time-dependent activities of a cell in an output layer, the highest activities thereof being to be determined, αua(t) represents the time-dependent activity level of a cell to which a similarity measure represented by input λua is inputted and is in a first input layer, βua represents a cell in a middle layer, f(x) and g(u) are given by:f(x)=(tan h(w)(x?h))+1)/2 g(u)=u+=(u+|u|)/2 and A, B, D, w, h, τ1, τ2, and τ3, are constants.7. The method according to claim 1, further comprising the step of identifying at least one of a word and a sentence based upon the recognition result obtained in the step of outputting.8. The method according to claim 1, wherein the step of outputting includes displaying the recognition result on a display monitor.9. The method according to claim 1, wherein in the step of processing, only phonemes that have the highest to fifth highest similarity measures are processed.10. The method according to claim 1, wherein the step of receiving the input signal includes receiving a continuous speech through a plurality of time windows each having a fixed time interval, andwherein the step of processing and the step of outputting are repeated for every time window. 11. A speech recognition device, comprising:a training unit for storing training data representing acoustic parameters of each of phonemes at each time frame, the training date being represented by a distribution function; a signal input unit for receiving an input signal representing a sound to be recognized and for converting the input signal to input data; a similarity calculation unit for comparing the input data at each frame with the distribution function representing the training data of each of the phonemes to derive a similarity measure of the input data with respect to each of the phonemes; and a processing unit for processing the similarity measures obtained by the similarity calculation unit using a neural net model differential equations governing development of time-dependent activities of plural cells to conduct speech recognition of the input signal, each cell being associated with one respective phoneme and one frame, the processing unit numerically and recurrently solving coupled differential pattern recognition equations to produce a cell having the highest activity at each of the frames, the couple differential pattern recognition equations being such that a time-dependent development of the activity of each cell at each frame in the neural net model is dependent upon the similarity measured and is suppressed by the activities of other cells on the same frame corresponding to different phonemes, the time-dependent development of the activity of each cell at each frame is enhanced by the activities of other cells corresponding to the same phoneme at different frames, and the phoneme of a cell that has developed the highest activity is determined as a winner at the corresponding frame to produce a list of winners at respective frames, the processing unit outputting a phoneme as a recognition result for the input signal in accordance with the list of the winners at the respective frames that have been determined. 12. The device according to claim 11, wherein the distribution function representing the training data is a Gaussian probability density function.13. The device according to claim 11, wherein the processing unit is adapted to numerically solve two-layered coupled neural net differential equations governing development of the time-dependent activities of the cells of respective two layers to produce the cell having the highest activity at each of the frames.14. The device according to claim 13, wherein the processing unit is adapted to numerically solve two-layered coupled neural net differential equations governing development of the time-dependent activities of the cells of respective two layers by numerically and recurrently solving coupled differential pattern recognition equations given by the following equations:τ1{dot over (ξ)}ua(t)=?ξua(t)+f(αua) where λau represents the similarity measure between the input data at a certain frame u and a particular phoneme /a/, ξua(t) is the time-dependent activities of a cell in an output layer, the highest activities thereof being to be determined, αua(t) represents the time-dependent activity level of a cell to which a similarity measure represented by input λu/a is inputted and is in a first input layer, A, B, D, τ1, and τ2 are constants, and f(x) and g(u) are given by:f(x)=(tan h(w(x?h))+1)/2 g(u)=u+=(u+|u|)/2 where w and h are constants.15. The device according to claim 11, wherein the processing unit is adapted to numerically solve three-layered coupled neural net differential equations governing development of the time-dependent activities of the cells of respective three layers to produce the cell having the highest activity at each of the frames.16. The device according to claim 15, wherein the processing unit is adapted to numerically solve three-layered coupled neural net differential equations governing development of the time-dependent activities of the cells of respective three layers by numerically and recurrently solving coupled differential pattern recognition equations given by the following equations:τ1{dot over (ξ)}ua(t)=?ξua(t)+f(βua) ?τ3{dot over (β)}ua=?βua+g(αua)+g(ξua)where ξua(t) is the time-dependent activities of a cell in an output layer, the highest activities thereof being to be determined, αua(t) represents the time-dependent activity level of a cell to which a similarity measure represented by input λua is inputted and is in a first input layer, βua represents a cell in a middle layer, f(x) and g(u) are given by:f(x)=(tan h(w(x?h)+1)/2 g(u)=u+=(u+|u|)/2 and A, B, D, w, h, τ1, τ2, and τ3, are constants.17. The device according to claim 11, wherein the processing unit further identifies at least one of a word and a sentence based upon the recognition result.18. The device according to claim 11, further comprising a display monitor for displaying the recognition result.19. The device according to claim 11, wherein the processing unit processes only phonemes that have the highest to fifth highest similarity measures.20. The method according to claim 11, wherein the signal input unit receives a continuous speech through a plurality of successive time windows each having a fixed time interval.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.