Methods and apparatus related to training speech recognition devices are presented. A computing device receives training samples for training a neural network to learn an acoustic speech model. A curriculum function for speech modeling can be determined. For each training sample of the training samp
Methods and apparatus related to training speech recognition devices are presented. A computing device receives training samples for training a neural network to learn an acoustic speech model. A curriculum function for speech modeling can be determined. For each training sample of the training samples, a corresponding curriculum function value for the training sample can be determined using the curriculum function. The training samples can be ordered based on the corresponding curriculum function values. In some embodiments, the neural network can be trained utilizing the ordered training samples. The trained neural network can receive an input of a second plurality of samples corresponding to human speech, where the second plurality of samples differs from the training samples. In response to receiving the second plurality of samples, the trained neural network can generate a plurality of phones corresponding to the captured human speech.
대표청구항▼
1. A method, comprising: receiving, at a computing device, training samples for training a neural network to learn an acoustic speech model, wherein at least one training sample of the training samples represents at least one phone of captured speech;determining a curriculum function for acoustic sp
1. A method, comprising: receiving, at a computing device, training samples for training a neural network to learn an acoustic speech model, wherein at least one training sample of the training samples represents at least one phone of captured speech;determining a curriculum function for acoustic speech modeling, wherein the curriculum function assigns a difficulty value for a designated training sample of the training samples based on a combination comprising a duration value for the designated training sample and a sound quality value for the designated training sample;for each training sample of the training samples, determining a corresponding difficulty value for the training sample using the curriculum function;ordering the training samples based on the corresponding difficulty values for the training samples;presenting the ordered training samples to the neural network using the computing device to train the neural network on at least a portion of the acoustic speech model; andrecognizing a received speech sample using the trained neural network. 2. The method of claim 1, wherein at least one training sample of the training samples represents a triphone of captured speech. 3. The method of claim 1, wherein a training sample corresponding to a lower difficulty value is less difficult for the neural network to process than a training sample corresponding to a higher difficulty value. 4. The method of claim 1, wherein ordering the training samples based on the corresponding difficulty values for the training samples comprises ordering the training samples so that training samples corresponding to lower difficulty values are presented to the neural network before samples corresponding to higher difficulty values. 5. The method of claim 1, wherein the combination further comprises an estimate from a posteriori estimator function. 6. The method of claim 1, wherein the combination further comprises a previously trained neural network probability value for the particular sample. 7. The method of claim 1, further comprising: receiving, as an input to the trained neural network, a second plurality of samples corresponding to captured human speech, wherein at least some samples in the second plurality of samples differ from the training samples; andin response to receiving the second plurality of samples, generating a plurality of phones corresponding to the captured human speech using the trained neural network. 8. A method, comprising: receiving, at a computing device, a plurality of training samples for training a neural network to learn an acoustic speech model, wherein at least one training sample of the plurality of training samples represents at least one phone of captured speech;training the neural network to learn the acoustic speech model based on an ordering of a plurality of tasks using the computing device, wherein the ordering is based on a curriculum function that assigns a difficulty value to a designated training sample of the plurality of training samples based on a combination comprising a duration value for the designated training sample and a sound quality value for the designated training sample;for each task in the ordered plurality of tasks, using the computing device for:selecting one or more training samples from the plurality of training samples for the task, andteaching the task to the neural network by presenting the selected one or more training samples to the neural network; andrecognizing a received speech sample using the trained neural network. 9. The method of claim 8, wherein determining the ordering of the plurality of tasks comprises: determining a number of outputs for each task in the plurality of tasks; andordering the plurality of tasks based on the plurality of numbers of outputs. 10. The method of claim 8, wherein selecting the one or more training samples comprises selecting all of the plurality of training samples as the one or more training samples. 11. The method of claim 8, wherein the tasks are ordered based on a task hierarchy that has a plurality of levels, and wherein a task T1 at a specified level L1 of the task hierarchy is less difficult than each task at a level L2, and wherein the level L2 is below the specified level L1 in the task hierarchy. 12. The method of claim 8, wherein the plurality of tasks comprises a task associated with phones, a task associated with context independent sounds, and a task associated with context-dependent sounds. 13. A computing device, comprising: a processor; anda computer-readable storage medium having stored thereon program instructions that, upon execution by the processor, cause the computing device to perform operations comprising:receiving training samples for training a neural network to learn an acoustic speech model, wherein at least one training sample of the training samples represents at least one phone of captured speech,determining a curriculum function for acoustic speech modeling, wherein the curriculum function assigns a difficulty value for a designated training sample of the training samples based on a combination comprising a duration value for the designated training sample and a sound quality value for the designated training sample,for each training sample of the training samples, determining a corresponding difficulty value for the training sample using the curriculum function,ordering the training samples based on the corresponding difficulty values for the training samples,presenting the ordered training samples to train the neural network on at least a portion of the acoustic speech model, andrecognizing a received speech sample using the trained neural network. 14. The computing device of claim 13, wherein at least one training sample of the training samples represents a triphone of captured speech. 15. The computing device of claim 13, wherein a sample corresponding to a lower difficulty value is less difficult for the neural network to process than a sample corresponding to a higher difficulty value. 16. The computing device of claim 13, wherein ordering the training samples based on the corresponding difficulty values for the training samples comprises ordering the training samples so that training samples corresponding to lower difficulty values are presented to the neural network before samples corresponding to higher difficulty values. 17. The computing device of claim 13, wherein the combination further comprises an estimate from a posteriori estimator function. 18. An article of manufacture including a non-transitory computer-readable storage medium having stored thereon program instructions that, upon execution by a computing device, cause the computing device to perform operations comprising: receiving training samples for training a neural network to learn an acoustic speech model, wherein at least one training sample of the training samples represents at least one phone of captured speech;determining a curriculum function for acoustic speech modeling, wherein the curriculum function assigns a difficulty value for a designated training sample of the training samples based on a combination comprising a duration value for the designated training sample and a sound quality value for the designated training sample;for each training sample of the training samples, determining a corresponding difficulty value for the training sample using the curriculum function;ordering the training samples based on the corresponding difficulty values for the training samples;presented presenting the ordered training samples to train the neural network on at least a portion of the acoustic speech model; andrecognizing a received speech sample using the trained neural network. 19. The article of manufacture of claim 18, wherein at least one training sample of the training samples represents a triphone of captured speech. 20. The article of manufacture of claim 18, wherein a sample corresponding to a lower difficulty value is less difficult for the neural network to process than a sample corresponding to a higher difficulty value. 21. The article of manufacture of claim 18, wherein ordering the training samples based on the corresponding difficulty values for the training samples comprises ordering the training samples so that training samples corresponding to lower difficulty values are presented to the neural network before samples corresponding to higher difficulty values. 22. The article of manufacture of claim 18, wherein the combination further comprises an estimate from a posteriori estimator function.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (6)
Schuster Mike,JPX ; Fukada Toshiaki,JPX, Apparatus for calculating a posterior probability of phoneme symbol, and speech recognition apparatus.
Strubbe, Hugo J.; Eshelman, Larry J.; Gutta, Srinivas; Milanski, John, User interface/entertainment device that simulates personal interaction and responds to user's mental state and/or personality.
Zhu, Huifeng; Deng, Yan; Ding, Pei; Yong, Kun; Hao, Jie, Apparatus and method for training a neural network acoustic model, and speech recognition apparatus and method.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.