[특허]Training multiple neural networks with different accuracy

Training multiple neural networks with different accuracy 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G10L-015/16
출원번호	US-0285801 (2014-05-23)
등록번호	US-9484022 (2016-11-01)
발명자 / 주소	Gruenstein, Alexander H.
출원인 / 주소	Google Inc.
대리인 / 주소	Fish & Richardson P.C.
인용정보	피인용 횟수 : 1 인용 특허 : 14

초록 ▼

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods includes generating a plurality of feature vectors that each model a different portion of an audio waveform, generating a first posterior probability vector for a first feature vector using a first neural network, determining whether one of the scores in the first posterior probability vector satisfies a first threshold value, generating a second posterior probability vector for each subsequent feature vector using a second neural network, wherein the second neural network is trained to identify the same key words and key phrases and includes more inner layer nodes than the first neural network, and determining whether one of the scores in the second posterior probability vector satisfies a second threshold value.

대표청구항 ▼

1. A method comprising: receiving a digital representation of speech;generating a plurality of feature vectors that each model a different portion of an audio waveform from the digital representation of speech during a different period of time, the plurality of feature vectors including a first feature vector and subsequent feature vectors;generating a first posterior probability vector for the first feature vector using a first neural network, the first posterior probability vector comprising one score for each key word or key phrase which the first neural network is trained to identify, wherein the first neural network is trained to identify one or more key words or one or more key phrases;determining whether one of the scores in the first posterior probability vector satisfies a first threshold value using a first posterior handling module; andin response to determining that one of the scores in the first posterior probability vector satisfies the first threshold value and for each of the feature vectors: generating a second posterior probability vector for the respective feature vector using a second neural network, wherein the second neural network is trained to identify the same one or more key words or one or more key phrases as the first neural network, and comprises more inner layer nodes than the first neural network, and the second posterior probability vector comprises one score for each key word or key phrase which the second neural network is trained to identify;determining whether one of the scores in the second posterior probability vector satisfies a second threshold value using a second posterior handling module, the second threshold value being more restrictive than the first threshold value; andin response to determining that one of the scores in the second posterior probability vector satisfies the second threshold value, determining that the digital representation of speech contains a representation of one of the one or more key words or one or more key phrases which the first neural network and the second neural network are trained to identify. 2. The method of claim 1, comprising: storing the first feature vector in a memory; andproviding the first feature vector from the memory to the second neural network in response to determining that one of the scores in the first posterior probability vector satisfies the first threshold value. 3. The method of claim 1, comprising: generating a third posterior probability vector for each of the subsequent feature vectors using the first neural network; anddetermining whether one of the scores in each of the third posterior probability vectors satisfies the first threshold value using the first posterior handling module until the first posterior handling module determines that none of the scores in a particular third posterior probability vector satisfies the first threshold value. 4. The method of claim 3, wherein the second neural network generates the second posterior probability vector and the second posterior handling module determines whether one of the scores in the second posterior probability vector satisfies the second threshold value for each of the subsequent feature vectors until the first posterior handling module determines that none of the scores in the particular third posterior probability vector satisfies the first threshold value. 5. The method of claim 1, wherein the second neural network receives each of the subsequent feature vectors from a front-end feature extraction module. 6. The method of claim 1, comprising: identifying a predetermined clock frequency for a processor to perform the generation of the first posterior probability vector for the first feature vector using the first neural network. 7. The method of claim 6, wherein the processor is a digital signal processor. 8. The method of claim 1, wherein the first neural network comprises a higher false positive rate than the second neural network. 9. The method of claim 1, wherein the first posterior handling module and the second posterior handling module comprise the same posterior handling module. 10. The method of claim 1, wherein the first threshold value and the second threshold value are decimal values between zero and one, inclusive. 11. The method of claim 1, wherein the second neural network is more accurate than the first neural network. 12. The method of claim 1, comprising: analyzing additional digital representations of speech that are generated after the digital representation of speech for commands in response to determining that the digital representation of speech contains a representation of one of the one or more key words or one or more key phrases which the first neural network and the second neural network are trained to identify; andproviding data representing a command included in at least one of the additional digital representations of speech. 13. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising: receiving a digital representation of speech;generating a plurality of feature vectors that each model a different portion of an audio waveform from the digital representation of speech during a different period of time, the plurality of feature vectors including a first feature vector and subsequent feature vectors;generating a first posterior probability vector for the first feature vector using a first neural network, the first posterior probability vector comprising one score for each key word or key phrase which the first neural network is trained to identify, wherein the first neural network is trained to identify one or more key words or one or more key phrases;determining whether one of the scores in the first posterior probability vector satisfies a first threshold value using a first posterior handling module; andin response to determining that one of the scores in the first posterior probability vector satisfies the first threshold value and for each of the feature vectors: generating a second posterior probability vector for the respective feature vector using a second neural network, wherein the second neural network is trained to identify the same one or more key words or one or more key phrases as the first neural network, and comprises more inner layer nodes than the first neural network, and the second posterior probability vector comprises one score for each key word or key phrase which the second neural network is trained to identify;determining whether one of the scores in the second posterior probability vector satisfies a second threshold value using a second posterior handling module, the second threshold value being more restrictive than the first threshold value; andin response to determining that one of the scores in the second posterior probability vector satisfies the second threshold value, determining that the digital representation of speech contains a representation of one of the one or more key words or one or more key phrases which the first neural network and the second neural network are trained to identify. 14. The computer-readable medium of claim 13, the operations comprising: storing the first feature vector in a memory; andproviding the first feature vector from the memory to the second neural network in response to determining that one of the scores in the first posterior probability vector satisfies the first threshold value. 15. The computer-readable medium of claim 13, the operations comprising: generating a third posterior probability vector for each of the subsequent feature vectors using the first neural network; anddetermining whether one of the scores in each of the third posterior probability vectors satisfies the first threshold value using the first posterior handling module until the first posterior handling module determines that none of the scores in a particular third posterior probability vector satisfies the first threshold value. 16. The computer-readable medium of claim 15, wherein the second neural network generates the second posterior probability vector and the second posterior handling module determines whether one of the scores in the second posterior probability vector satisfies the second threshold value for each of the subsequent feature vectors until the first posterior handling module determines that none of the scores in the particular third posterior probability vector satisfies the first threshold value. 17. The computer-readable medium of claim 13, wherein the second neural network receives each of the subsequent feature vectors from a front-end feature extraction module. 18. The computer-readable medium of claim 13, the operations comprising: identifying a predetermined clock frequency for a processor to perform the generation of the first posterior probability vector for the first feature vector using the first neural network. 19. The computer-readable medium of claim 18, wherein the processor is a digital signal processor. 20. The computer-readable medium of claim 13, wherein the first neural network comprises a higher false positive rate than the second neural network. 21. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving a digital representation of speech;generating a plurality of feature vectors that each model a different portion of an audio waveform from the digital representation of speech during a different period of time, the plurality of feature vectors including a first feature vector and subsequent feature vectors;generating a first posterior probability vector for the first feature vector using a first neural network, the first posterior probability vector comprising one score for each key word or key phrase which the first neural network is trained to identify, wherein the first neural network is trained to identify one or more key words or one or more key phrases;determining whether one of the scores in the first posterior probability vector satisfies a first threshold value using a first posterior handling module; andin response to determining that one of the scores in the first posterior probability vector satisfies the first threshold value and for each of the feature vectors: generating a second posterior probability vector for the respective feature vector using a second neural network, wherein the second neural network is trained to identify the same one or more key words or one or more key phrases as the first neural network, and comprises more inner layer nodes than the first neural network, and the second posterior probability vector comprises one score for each key word or key phrase which the second neural network is trained to identify;determining whether one of the scores in the second posterior probability vector satisfies a second threshold value using a second posterior handling module, the second threshold value being more restrictive than the first threshold value; andin response to determining that one of the scores in the second posterior probability vector satisfies the second threshold value, determining that the digital representation of speech contains a representation of one of the one or more key words or one or more key phrases which the first neural network and the second neural network are trained to identify.

이 특허에 인용된 특허 (14)

Gao, Wenfeng; Sadhwani, Shyam, Adaptive codec selection.
상세보기
Scott Brian L. (Denton TX) Smith Lloyd A. (Denton TX) Newell J. Mark (Denton TX) Balentine Bruce E. (Denton TX) Lin Lisan S. (Denton TX), Method and apparatus for extracting information-bearing portions of a signal for recognizing varying instances of simila.
상세보기
Stubley Peter R.,CAX ; Gillet Andre,CAX ; Gupta Vishwa N.,CAX ; Toulson Christopher K. ; Peters David B., Method and apparatus for speech recognition.
상세보기
Flanagan James L. ; Lin Qiguang ; Rahim Mazin ; Che Chiwei, Method and apparatus including microphone arrays and neural networks for speech/speaker recognition systems.
상세보기
Vanhoucke, Vincent, Multi-frame prediction for hybrid neural network/hidden Markov models.
상세보기
Stork David G. (Stanford CA) Wolff Gregory J. (Mountain View CA), Neural network acoustic and visual speech recognition system training method and apparatus.
상세보기
Robillard Serge,CAX ; Girolamo Nadia,CAX ; Gillet Andre,CAX ; Fakhr Waleed,CAX, Search and rescoring method for a speech recognition system.
상세보기
Martin Paul A. (Arlington MA), Speech interpreter with a unified grammer compiler.
상세보기
Gupta Vishwa N. (Brossard CAX) Lennig Matthew (Montreal CAX), Speech recognition method using a two-pass search.
상세보기
Murveit,Hy; Kannan,Ashvin; Shahshahani,Ben; Leggetter,Chris; Knill,Katherine, Speech recognition system to selectively utilize different speech recognition techniques over multiple speech recognition passes.
상세보기
Inazumi Mitsuhiro,JPX, Speech recognition system using neural networks.
상세보기
Wegmann Steven A. ; Gillick Laurence S., Speech recognition using nonparametric speech models.
상세보기
Weinstein, Eugene; Waters, Austin, Two-pass decoding for speech recognition of search and action requests.
상세보기
Kimura Shinta (Kanagawa JPX), Voice recognition system having word frequency and intermediate result display features.
상세보기

이 특허를 인용한 특허 (1)

Seo, Ji-Hyeon; Lee, Jae-Young; Lee, Byung-Wuek; An, Kyung-Jun, System and method for voice recognition.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Training multiple neural networks with different accuracy 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (14)

이 특허를 인용한 특허 (1)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Training multiple neural networks with different accuracy 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (14)

이 특허를 인용한 특허 (1)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트