[특허]Curriculum learning for speech recognition

Curriculum learning for speech recognition 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G10L-015/00 G06F-015/18 G10L-015/16 G06N-003/02
출원번호	US-0859692 (2013-04-09)
등록번호	US-9202464 (2015-12-01)
발명자 / 주소	Senior, Andrew William Ranzato, Marc'Aurelio
출원인 / 주소	Google Inc.
대리인 / 주소	McDonnell Boehnen Hulbert & Berghoff LLP
인용정보	피인용 횟수 : 2 인용 특허 : 6

초록 ▼

Methods and apparatus related to training speech recognition devices are presented. A computing device receives training samples for training a neural network to learn an acoustic speech model. A curriculum function for speech modeling can be determined. For each training sample of the training samples, a corresponding curriculum function value for the training sample can be determined using the curriculum function. The training samples can be ordered based on the corresponding curriculum function values. In some embodiments, the neural network can be trained utilizing the ordered training samples. The trained neural network can receive an input of a second plurality of samples corresponding to human speech, where the second plurality of samples differs from the training samples. In response to receiving the second plurality of samples, the trained neural network can generate a plurality of phones corresponding to the captured human speech.

대표청구항 ▼

1. A method, comprising: receiving, at a computing device, training samples for training a neural network to learn an acoustic speech model, wherein at least one training sample of the training samples represents at least one phone of captured speech;determining a curriculum function for acoustic speech modeling, wherein the curriculum function assigns a difficulty value for a designated training sample of the training samples based on a combination comprising a duration value for the designated training sample and a sound quality value for the designated training sample;for each training sample of the training samples, determining a corresponding difficulty value for the training sample using the curriculum function;ordering the training samples based on the corresponding difficulty values for the training samples;presenting the ordered training samples to the neural network using the computing device to train the neural network on at least a portion of the acoustic speech model; andrecognizing a received speech sample using the trained neural network. 2. The method of claim 1, wherein at least one training sample of the training samples represents a triphone of captured speech. 3. The method of claim 1, wherein a training sample corresponding to a lower difficulty value is less difficult for the neural network to process than a training sample corresponding to a higher difficulty value. 4. The method of claim 1, wherein ordering the training samples based on the corresponding difficulty values for the training samples comprises ordering the training samples so that training samples corresponding to lower difficulty values are presented to the neural network before samples corresponding to higher difficulty values. 5. The method of claim 1, wherein the combination further comprises an estimate from a posteriori estimator function. 6. The method of claim 1, wherein the combination further comprises a previously trained neural network probability value for the particular sample. 7. The method of claim 1, further comprising: receiving, as an input to the trained neural network, a second plurality of samples corresponding to captured human speech, wherein at least some samples in the second plurality of samples differ from the training samples; andin response to receiving the second plurality of samples, generating a plurality of phones corresponding to the captured human speech using the trained neural network. 8. A method, comprising: receiving, at a computing device, a plurality of training samples for training a neural network to learn an acoustic speech model, wherein at least one training sample of the plurality of training samples represents at least one phone of captured speech;training the neural network to learn the acoustic speech model based on an ordering of a plurality of tasks using the computing device, wherein the ordering is based on a curriculum function that assigns a difficulty value to a designated training sample of the plurality of training samples based on a combination comprising a duration value for the designated training sample and a sound quality value for the designated training sample;for each task in the ordered plurality of tasks, using the computing device for:selecting one or more training samples from the plurality of training samples for the task, andteaching the task to the neural network by presenting the selected one or more training samples to the neural network; andrecognizing a received speech sample using the trained neural network. 9. The method of claim 8, wherein determining the ordering of the plurality of tasks comprises: determining a number of outputs for each task in the plurality of tasks; andordering the plurality of tasks based on the plurality of numbers of outputs. 10. The method of claim 8, wherein selecting the one or more training samples comprises selecting all of the plurality of training samples as the one or more training samples. 11. The method of claim 8, wherein the tasks are ordered based on a task hierarchy that has a plurality of levels, and wherein a task T1 at a specified level L1 of the task hierarchy is less difficult than each task at a level L2, and wherein the level L2 is below the specified level L1 in the task hierarchy. 12. The method of claim 8, wherein the plurality of tasks comprises a task associated with phones, a task associated with context independent sounds, and a task associated with context-dependent sounds. 13. A computing device, comprising: a processor; anda computer-readable storage medium having stored thereon program instructions that, upon execution by the processor, cause the computing device to perform operations comprising:receiving training samples for training a neural network to learn an acoustic speech model, wherein at least one training sample of the training samples represents at least one phone of captured speech,determining a curriculum function for acoustic speech modeling, wherein the curriculum function assigns a difficulty value for a designated training sample of the training samples based on a combination comprising a duration value for the designated training sample and a sound quality value for the designated training sample,for each training sample of the training samples, determining a corresponding difficulty value for the training sample using the curriculum function,ordering the training samples based on the corresponding difficulty values for the training samples,presenting the ordered training samples to train the neural network on at least a portion of the acoustic speech model, andrecognizing a received speech sample using the trained neural network. 14. The computing device of claim 13, wherein at least one training sample of the training samples represents a triphone of captured speech. 15. The computing device of claim 13, wherein a sample corresponding to a lower difficulty value is less difficult for the neural network to process than a sample corresponding to a higher difficulty value. 16. The computing device of claim 13, wherein ordering the training samples based on the corresponding difficulty values for the training samples comprises ordering the training samples so that training samples corresponding to lower difficulty values are presented to the neural network before samples corresponding to higher difficulty values. 17. The computing device of claim 13, wherein the combination further comprises an estimate from a posteriori estimator function. 18. An article of manufacture including a non-transitory computer-readable storage medium having stored thereon program instructions that, upon execution by a computing device, cause the computing device to perform operations comprising: receiving training samples for training a neural network to learn an acoustic speech model, wherein at least one training sample of the training samples represents at least one phone of captured speech;determining a curriculum function for acoustic speech modeling, wherein the curriculum function assigns a difficulty value for a designated training sample of the training samples based on a combination comprising a duration value for the designated training sample and a sound quality value for the designated training sample;for each training sample of the training samples, determining a corresponding difficulty value for the training sample using the curriculum function;ordering the training samples based on the corresponding difficulty values for the training samples;presented presenting the ordered training samples to train the neural network on at least a portion of the acoustic speech model; andrecognizing a received speech sample using the trained neural network. 19. The article of manufacture of claim 18, wherein at least one training sample of the training samples represents a triphone of captured speech. 20. The article of manufacture of claim 18, wherein a sample corresponding to a lower difficulty value is less difficult for the neural network to process than a sample corresponding to a higher difficulty value. 21. The article of manufacture of claim 18, wherein ordering the training samples based on the corresponding difficulty values for the training samples comprises ordering the training samples so that training samples corresponding to lower difficulty values are presented to the neural network before samples corresponding to higher difficulty values. 22. The article of manufacture of claim 18, wherein the combination further comprises an estimate from a posteriori estimator function.

이 특허에 인용된 특허 (6)

Schuster Mike,JPX ; Fukada Toshiaki,JPX, Apparatus for calculating a posterior probability of phoneme symbol, and speech recognition apparatus.
상세보기
Tomabechi Hideto,JPX, Neural network, a method of learning of a neural network and phoneme recognition apparatus utilizing a neural network.
상세보기
Nussbaum Paul A., Operator interactions for developing phoneme recognition by neural networks.
상세보기
Inazumi Mitsuhiro,JPX, Recognition apparatus using neural network, and learning method therefor.
상세보기
Walton, Donna L., System and method of developing a curriculum for stimulating cognitive processing.
상세보기
Strubbe, Hugo J.; Eshelman, Larry J.; Gutta, Srinivas; Milanski, John, User interface/entertainment device that simulates personal interaction and responds to user's mental state and/or personality.
상세보기

이 특허를 인용한 특허 (2)

Zhu, Huifeng; Deng, Yan; Ding, Pei; Yong, Kun; Hao, Jie, Apparatus and method for training a neural network acoustic model, and speech recognition apparatus and method.
상세보기
Huang, Yan; Liu, Chaojun; Kumar, Kshitiz; Kalgaonkar, Kaustubh Prakash; Gong, Yifan, Modular deep learning model.
상세보기

내보내기 메뉴

내보내기 구분

파일저장
인쇄
메일전송

구성항목

기본정보
상세정보

관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC

저장형식

Text(ASCII format)
Excel format
PIAS분석(.xls)

메일정보

받는사람 (필수): @
보내는사람 (선택): @
제목
내용: KISTI 검색결과 이메일 서비스

안내

총 건의 자료가 검색되었습니다.

다운받으실 자료의 인덱스를 입력하세요. (1-10,000)

검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다.

데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요)

다운로드 파일은 UTF-8 형태로 저장됩니다.
파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오.

Text(ASCII format)
Excel format

AI-Helper ※ AI-Helper는 을 사용합니다.

AI-Helper

안녕하세요, AI-Helper입니다. 좌측 "선택된 텍스트"에서 텍스트를 선택하여 요약, 번역, 용어설명을 실행하세요.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

연합인증

Curriculum learning for speech recognition 원문보기