최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
DataON 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Edison 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Kafe 바로가기국가/구분 | United States(US) Patent 등록 |
---|---|
국제특허분류(IPC7판) |
|
출원번호 | US-0626825 (2012-09-25) |
등록번호 | US-8935167 (2015-01-13) |
발명자 / 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 | 피인용 횟수 : 3 인용 특허 : 517 |
Methods, systems, and computer-readable media related to selecting observation-specific training data (also referred to as “observation-specific exemplars”) from a general training corpus, and then creating, from the observation-specific training data, a focused, observation-specific acoustic model
Methods, systems, and computer-readable media related to selecting observation-specific training data (also referred to as “observation-specific exemplars”) from a general training corpus, and then creating, from the observation-specific training data, a focused, observation-specific acoustic model for recognizing the observation in an output domain are disclosed. In one aspect, a global speech recognition model is established based on an initial set of training data; a plurality of input speech segments to be recognized in an output domain are received; and for each of the plurality of input speech segments: a respective set of focused training data relevant to the input speech segment is identified in the global speech recognition model; a respective focused speech recognition model is generated based on the respective set of focused training data; and the respective focused speech recognition model is provided to a recognition device for recognizing the input speech segment in the output domain.
1. A method for recognizing speech in an output domain, the method comprising: at a device comprising one or more processors and memory:establishing a global speech recognition model based on an initial set of training data;receiving a plurality of input speech segments to be recognized in the outpu
1. A method for recognizing speech in an output domain, the method comprising: at a device comprising one or more processors and memory:establishing a global speech recognition model based on an initial set of training data;receiving a plurality of input speech segments to be recognized in the output domain; and for each of the plurality of input speech segments:identifying in the global speech recognition model a respective set of focused training data relevant to the input speech segment;generating a respective focused speech recognition model based on the respective set of focused training data; and providing the respective focused speech recognition model to a recognition device for recognizing the input speech segment in the output domain; wherein establishing the global speech recognition model based on the initial set of training data further comprises: generating the initial set of training data from a plurality of training speech samples, the initial set of training data including an initial set of speech segments and an initial set of speech templates;and deriving a global latent space from the initial set of speech segments and the initial set of speech templates. 2. The method of claim 1, wherein the recognition device is a user device, and the plurality of input speech segments have been derived from a speech input received from a user by the user device. 3. The method of claim 1, wherein, for at least one of the plurality of input speech segments, the global speech recognition model is a respective focused speech recognition model generated in a previous iteration of the identifying and generating performed for the at least one input speech segment. 4. The method of claim 1, wherein identifying in the global speech model the respective set of focused training data relevant to the input speech segment further comprises: mapping the input speech segment and a set of candidate training data into the global latent space, the set of candidate training data including candidate speech segments and candidate speech templates; andidentifying, from the candidate speech segments and candidate speech templates, a plurality of exemplar segments and a plurality of exemplar templates for inclusion in the respective set of focused training data, wherein the exemplar segments and exemplar templates satisfy a threshold degree of similarity to the input speech segment as measured in the global latent space. 5. The method of claim 4, further comprising: generating additional training data from the plurality of training speech samples, the additional training data includes additional speech segments and additional speech templates outside of the initial set of speech segments and the initial set of speech templates. 6. The method of claim 4, wherein generating the respective focused speech recognition model based on the respective set of focused training data comprises: deriving a focused latent space from the plurality of exemplar segments and the plurality of exemplar templates. 7. The method of claim 4, wherein deriving the focused latent space from the plurality of exemplar segments and the plurality of exemplar templates comprises: modifying at least one of the pluralities of exemplar templates and exemplar segments based on the pluralities of exemplar segments and exemplar templates; andderiving the focused latent space from the pluralities of exemplar segments and exemplar templates after the modification. 8. The method of claim 4, wherein modifying at least one of the pluralities of exemplar templates and exemplar segments based on the pluralities of exemplar segments and exemplar templates comprises: merging two or more of the plurality of exemplar templates into a new exemplar template in the plurality of exemplar template. 9. The method of claim 4, wherein modifying at least one of the pluralities of exemplar templates and exemplar segments based on the pluralities of exemplar segments and exemplar templates comprises: generating at least one new exemplar template from the plurality of exemplar segments; andincluding the at least one new exemplar template in the plurality of exemplar templates. 10. The method of claim 4, wherein modifying at least one of the pluralities of exemplar templates and exemplar segments based on the pluralities of exemplar segments and exemplar templates comprises: removing at least one exemplar template from the plurality of exemplar templates. 11. A method for recognizing speech in an output domain, the method comprising: at a client device comprising one or more processors and memory: receiving a speech input from a user; for each of a plurality of input speech segments in the speech input: receiving a respective focused speech recognition model, wherein the respective focused speech recognition model is generated based on a respective set of focused training data relevant to the input speech segment, wherein the respective set of focused training data is selected for the input speech segment in a global speech recognition model, and wherein the global speech recognition model is generated based on a set of global training data; and recognizing the input speech segment using the respective focused speech recognition model;wherein establishing the global speech recognition model based on the initial set of training data further comprises: generating the initial set of training data from a plurality of training speech samples, the initial set of training data including an initial set of speech segments and an initial set of speech templates;and deriving a global latent space from the initial set of speech segments and the initial set of speech templates. 12. A non-transitory computer-readable medium having instructions stored thereon, the instructions, when executed by one or more processors, cause the processors to perform operations comprising: establishing a global speech recognition model based on an initial set of training data; receiving a plurality of input speech segments to be recognized in an output domain; and for each of the plurality of input speech segments: identifying in the global speech recognition model a respective set of focused training data relevant to the input speech segment; generating a respective focused speech recognition model based on the respective set of focused training data; and providing the respective focused speech recognition model to a recognition device for recognizing the input speech segment in the output domain; wherein establishing the global speech recognition model based on the initial set of training data further comprises: generating the initial set of training data from a plurality of training speech samples, the initial set of training data including an initial set of speech segments and an initial set of speech templates;and deriving a global latent space from the initial set of speech segments and the initial set of speech templates. 13. The computer-readable medium of claim 12, wherein identifying in the global speech model the respective set of focused training data relevant to the input speech segment further comprises: mapping the input speech segment and a set of candidate training data into the global latent space, the set of candidate training data including candidate speech segments and candidate speech templates; andidentifying, from the candidate speech segments and candidate speech templates, a plurality of exemplar segments and a plurality of exemplar templates for inclusion in the respective set of focused training data, wherein the exemplar segments and exemplar templates satisfy a threshold degree of similarity to the input speech segment as measured in the global latent space. 14. The computer-readable medium of claim 13, wherein the operations further comprise: generating additional training data from the plurality of training speech samples, the additional training data includes additional speech segments and additional speech templates outside of the initial set of speech segments and the initial set of speech templates. 15. The computer-readable medium of claim 13, wherein deriving the focused latent space from the plurality of exemplar segments and the plurality of exemplar templates comprises: modifying at least one of the pluralities of exemplar templates and exemplar segments based on the pluralities of exemplar segments and exemplar templates; andderiving the focused latent space from the pluralities of exemplar segments and exemplar templates after the modification. 16. The computer-readable medium of claim 13, wherein modifying at least one of the pluralities of exemplar templates and exemplar segments based on the pluralities of exemplar segments and exemplar templates comprises: merging two or more of the plurality of exemplar templates into a new exemplar template in the plurality of exemplar template. 17. The computer-readable medium of claim 13, wherein modifying at least one of the pluralities of exemplar templates and exemplar segments based on the pluralities of exemplar segments and exemplar templates comprises: generating at least one new exemplar template from the plurality of exemplar segments; andincluding the at least one new exemplar template in the plurality of exemplar templates. 18. The computer-readable medium of claim 13, wherein modifying at least one of the pluralities of exemplar templates and exemplar segments based on the pluralities of exemplar segments and exemplar templates comprises: removing at least one exemplar template from the plurality of exemplar templates. 19. A non-transitory computer-readable medium having instructions stored thereon, the instructions, when executed by one or more processors, cause the processors to perform operations comprising: at a client device: receiving a speech input from a user; for each of a plurality of input speech segments in the speech input: receiving a respective focused speech recognition model, wherein the respective focused speech recognition model is generated based on a respective set of focused training data relevant to the input speech segment, wherein the respective set of focused training data is selected for the input speech segment in a global speech recognition model, and wherein the global speech recognition model is generated based on a set of global training data; and recognizing the input speech segment using the respective focused speech recognition model; wherein establishing the global speech recognition model based on the initial set of training data further comprises: generating the initial set of training data from a plurality of training speech samples, the initial set of training data including an initial set of speech segments and an initial set of speech templates;and deriving a global latent space from the initial set of speech segments and the initial set of speech templates. 20. A system, comprising: one or more processors; and memory having instructions stored thereon, the instructions, when executed by the one or more processors, cause the processors to perform operations comprising: establishing a global speech recognition model based on an initial set of training data; receiving a plurality of input speech segments to be recognized in an output domain; and for each of the plurality of input speech segments: identifying in the global speech recognition model a respective set of focused training data relevant to the input speech segment; generating a respective focused speech recognition model based on the respective set of focused training data; and providing the respective focused speech recognition model to a recognition device for recognizing the input speech segment in the output domain; wherein establishing the global speech recognition model based on the initial set of training data further comprises: generating the initial set of training data from a plurality of training speech samples, the initial set of training data including an initial set of speech segments and an initial set of speech templates;and deriving a global latent space from the initial set of speech segments and the initial set of speech templates. 21. The system of claim 20, wherein identifying in the global speech model the respective set of focused training data relevant to the input speech segment further comprises: mapping the input speech segment and a set of candidate training data into the global latent space, the set of candidate training data including candidate speech segments and candidate speech templates; andidentifying, from the candidate speech segments and candidate speech templates, a plurality of exemplar segments and a plurality of exemplar templates for inclusion in the respective set of focused training data, wherein the exemplar segments and exemplar templates satisfy a threshold degree of similarity to the input speech segment as measured in the global latent space. 22. The system of claim 21, wherein the operations further comprise: generating additional training data from the plurality of training speech samples, the additional training data includes additional speech segments and additional speech templates outside of the initial set of speech segments and the initial set of speech templates. 23. The system of claim 21, wherein deriving the focused latent space from the plurality of exemplar segments and the plurality of exemplar templates comprises: modifying at least one of the pluralities of exemplar templates and exemplar segments based on the pluralities of exemplar segments and exemplar templates; andderiving the focused latent space from the pluralities of exemplar segments and exemplar templates after the modification. 24. The system of claim 21, wherein modifying at least one of the pluralities of exemplar templates and exemplar segments based on the pluralities of exemplar segments and exemplar templates comprises: merging two or more of the plurality of exemplar templates into a new exemplar template in the plurality of exemplar template. 25. The system of claim 21, wherein modifying at least one of the pluralities of exemplar templates and exemplar segments based on the pluralities of exemplar segments and exemplar templates comprises: generating at least one new exemplar template from the plurality of exemplar segments; andincluding the at least one new exemplar template in the plurality of exemplar templates. 26. The system of claim 21, wherein modifying at least one of the pluralities of exemplar templates and exemplar segments based on the pluralities of exemplar segments and exemplar templates comprises: removing at least one exemplar template from the plurality of exemplar templates. 27. A system, comprising: one or more processors;and memory having instructions stored thereon, the instructions, when executed by one or more processors, cause the processors to perform operations comprising:at a client device:receiving a speech input from a user; for each of a plurality of input speech segments in the speech input: receiving a respective focused speech recognition model from a server, wherein the respective focused speech recognition model is generated based on a respective set of focused training data relevant to the input speech segment, wherein the respective set of focused training data is selected for the input speech segment in a global speech recognition model, and wherein the global speech recognition model is generated based on a set of global training data;and recognizing the input speech segment using the respective focused speech recognition model;wherein establishing the global speech recognition model based on the initial set of training data further comprises: generating the initial set of training data from a plurality of training speech samples, the initial set of training data including an initial set of speech segments and an initial set of speech templates;and deriving a global latent space from the initial set of speech segments and the initial set of speech templates.
Copyright KISTI. All Rights Reserved.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.