IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0285801
(2014-05-23)
|
등록번호 |
US-9484022
(2016-11-01)
|
발명자
/ 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
1 인용 특허 :
14 |
초록
▼
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods includes generating a plurality of feature vectors that each model a different portion of an audio waveform, generating a first posterior probability
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods includes generating a plurality of feature vectors that each model a different portion of an audio waveform, generating a first posterior probability vector for a first feature vector using a first neural network, determining whether one of the scores in the first posterior probability vector satisfies a first threshold value, generating a second posterior probability vector for each subsequent feature vector using a second neural network, wherein the second neural network is trained to identify the same key words and key phrases and includes more inner layer nodes than the first neural network, and determining whether one of the scores in the second posterior probability vector satisfies a second threshold value.
대표청구항
▼
1. A method comprising: receiving a digital representation of speech;generating a plurality of feature vectors that each model a different portion of an audio waveform from the digital representation of speech during a different period of time, the plurality of feature vectors including a first feat
1. A method comprising: receiving a digital representation of speech;generating a plurality of feature vectors that each model a different portion of an audio waveform from the digital representation of speech during a different period of time, the plurality of feature vectors including a first feature vector and subsequent feature vectors;generating a first posterior probability vector for the first feature vector using a first neural network, the first posterior probability vector comprising one score for each key word or key phrase which the first neural network is trained to identify, wherein the first neural network is trained to identify one or more key words or one or more key phrases;determining whether one of the scores in the first posterior probability vector satisfies a first threshold value using a first posterior handling module; andin response to determining that one of the scores in the first posterior probability vector satisfies the first threshold value and for each of the feature vectors: generating a second posterior probability vector for the respective feature vector using a second neural network, wherein the second neural network is trained to identify the same one or more key words or one or more key phrases as the first neural network, and comprises more inner layer nodes than the first neural network, and the second posterior probability vector comprises one score for each key word or key phrase which the second neural network is trained to identify;determining whether one of the scores in the second posterior probability vector satisfies a second threshold value using a second posterior handling module, the second threshold value being more restrictive than the first threshold value; andin response to determining that one of the scores in the second posterior probability vector satisfies the second threshold value, determining that the digital representation of speech contains a representation of one of the one or more key words or one or more key phrases which the first neural network and the second neural network are trained to identify. 2. The method of claim 1, comprising: storing the first feature vector in a memory; andproviding the first feature vector from the memory to the second neural network in response to determining that one of the scores in the first posterior probability vector satisfies the first threshold value. 3. The method of claim 1, comprising: generating a third posterior probability vector for each of the subsequent feature vectors using the first neural network; anddetermining whether one of the scores in each of the third posterior probability vectors satisfies the first threshold value using the first posterior handling module until the first posterior handling module determines that none of the scores in a particular third posterior probability vector satisfies the first threshold value. 4. The method of claim 3, wherein the second neural network generates the second posterior probability vector and the second posterior handling module determines whether one of the scores in the second posterior probability vector satisfies the second threshold value for each of the subsequent feature vectors until the first posterior handling module determines that none of the scores in the particular third posterior probability vector satisfies the first threshold value. 5. The method of claim 1, wherein the second neural network receives each of the subsequent feature vectors from a front-end feature extraction module. 6. The method of claim 1, comprising: identifying a predetermined clock frequency for a processor to perform the generation of the first posterior probability vector for the first feature vector using the first neural network. 7. The method of claim 6, wherein the processor is a digital signal processor. 8. The method of claim 1, wherein the first neural network comprises a higher false positive rate than the second neural network. 9. The method of claim 1, wherein the first posterior handling module and the second posterior handling module comprise the same posterior handling module. 10. The method of claim 1, wherein the first threshold value and the second threshold value are decimal values between zero and one, inclusive. 11. The method of claim 1, wherein the second neural network is more accurate than the first neural network. 12. The method of claim 1, comprising: analyzing additional digital representations of speech that are generated after the digital representation of speech for commands in response to determining that the digital representation of speech contains a representation of one of the one or more key words or one or more key phrases which the first neural network and the second neural network are trained to identify; andproviding data representing a command included in at least one of the additional digital representations of speech. 13. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising: receiving a digital representation of speech;generating a plurality of feature vectors that each model a different portion of an audio waveform from the digital representation of speech during a different period of time, the plurality of feature vectors including a first feature vector and subsequent feature vectors;generating a first posterior probability vector for the first feature vector using a first neural network, the first posterior probability vector comprising one score for each key word or key phrase which the first neural network is trained to identify, wherein the first neural network is trained to identify one or more key words or one or more key phrases;determining whether one of the scores in the first posterior probability vector satisfies a first threshold value using a first posterior handling module; andin response to determining that one of the scores in the first posterior probability vector satisfies the first threshold value and for each of the feature vectors: generating a second posterior probability vector for the respective feature vector using a second neural network, wherein the second neural network is trained to identify the same one or more key words or one or more key phrases as the first neural network, and comprises more inner layer nodes than the first neural network, and the second posterior probability vector comprises one score for each key word or key phrase which the second neural network is trained to identify;determining whether one of the scores in the second posterior probability vector satisfies a second threshold value using a second posterior handling module, the second threshold value being more restrictive than the first threshold value; andin response to determining that one of the scores in the second posterior probability vector satisfies the second threshold value, determining that the digital representation of speech contains a representation of one of the one or more key words or one or more key phrases which the first neural network and the second neural network are trained to identify. 14. The computer-readable medium of claim 13, the operations comprising: storing the first feature vector in a memory; andproviding the first feature vector from the memory to the second neural network in response to determining that one of the scores in the first posterior probability vector satisfies the first threshold value. 15. The computer-readable medium of claim 13, the operations comprising: generating a third posterior probability vector for each of the subsequent feature vectors using the first neural network; anddetermining whether one of the scores in each of the third posterior probability vectors satisfies the first threshold value using the first posterior handling module until the first posterior handling module determines that none of the scores in a particular third posterior probability vector satisfies the first threshold value. 16. The computer-readable medium of claim 15, wherein the second neural network generates the second posterior probability vector and the second posterior handling module determines whether one of the scores in the second posterior probability vector satisfies the second threshold value for each of the subsequent feature vectors until the first posterior handling module determines that none of the scores in the particular third posterior probability vector satisfies the first threshold value. 17. The computer-readable medium of claim 13, wherein the second neural network receives each of the subsequent feature vectors from a front-end feature extraction module. 18. The computer-readable medium of claim 13, the operations comprising: identifying a predetermined clock frequency for a processor to perform the generation of the first posterior probability vector for the first feature vector using the first neural network. 19. The computer-readable medium of claim 18, wherein the processor is a digital signal processor. 20. The computer-readable medium of claim 13, wherein the first neural network comprises a higher false positive rate than the second neural network. 21. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving a digital representation of speech;generating a plurality of feature vectors that each model a different portion of an audio waveform from the digital representation of speech during a different period of time, the plurality of feature vectors including a first feature vector and subsequent feature vectors;generating a first posterior probability vector for the first feature vector using a first neural network, the first posterior probability vector comprising one score for each key word or key phrase which the first neural network is trained to identify, wherein the first neural network is trained to identify one or more key words or one or more key phrases;determining whether one of the scores in the first posterior probability vector satisfies a first threshold value using a first posterior handling module; andin response to determining that one of the scores in the first posterior probability vector satisfies the first threshold value and for each of the feature vectors: generating a second posterior probability vector for the respective feature vector using a second neural network, wherein the second neural network is trained to identify the same one or more key words or one or more key phrases as the first neural network, and comprises more inner layer nodes than the first neural network, and the second posterior probability vector comprises one score for each key word or key phrase which the second neural network is trained to identify;determining whether one of the scores in the second posterior probability vector satisfies a second threshold value using a second posterior handling module, the second threshold value being more restrictive than the first threshold value; andin response to determining that one of the scores in the second posterior probability vector satisfies the second threshold value, determining that the digital representation of speech contains a representation of one of the one or more key words or one or more key phrases which the first neural network and the second neural network are trained to identify.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.