최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
DataON 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Edison 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Kafe 바로가기국가/구분 | United States(US) Patent 등록 |
---|---|
국제특허분류(IPC7판) |
|
출원번호 | US-0266932 (2016-09-15) |
등록번호 | US-10192552 (2019-01-29) |
발명자 / 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 | 피인용 횟수 : 0 인용 특허 : 2098 |
Systems and processes for detecting and/or providing a whispered speech response are provided. In one example process, speech is received from a user, and based on the speech input, determined that a whispered speech response is to be provided. Upon determining that a whispered speech response is to
Systems and processes for detecting and/or providing a whispered speech response are provided. In one example process, speech is received from a user, and based on the speech input, determined that a whispered speech response is to be provided. Upon determining that a whispered speech response is to be provided, the whispered speech response is generated and provided to the user.
1. An electronic device, comprising: one or more processors;memory; andone or more programs stored in memory, the one or more programs including instructions for: receiving a speech input from a user;determining, based on the speech input, that a whispered speech response is to be provided;upon dete
1. An electronic device, comprising: one or more processors;memory; andone or more programs stored in memory, the one or more programs including instructions for: receiving a speech input from a user;determining, based on the speech input, that a whispered speech response is to be provided;upon determining that a whispered speech response is to be provided, generating the whispered speech response, wherein generating the whispered speech response comprises: generating text based on the speech input;performing natural language processing of the text;generating an intermediate speech based on a result of the natural language processing;obtaining a residual signal based on a linear prediction analysis of the intermediate speech;modifying the residual signal; andobtaining the whispered speech response based on a linear prediction synthesis of the modified residual signal; andproviding the whispered speech response to the user. 2. The electronic device of claim 1, wherein the speech input comprises at least one of an informational request or a request to perform a task. 3. The electronic device of claim 2, wherein the whispered speech response comprises at least one of a response to the informational request or a response associated with performing the task. 4. The electronic device of claim 1, wherein determining that the whispered speech response is to be provided comprises at least one of: determining whether the speech input includes a whispered speech input; anddetermining whether context data indicates that the whispered speech response is expected. 5. The electronic device of claim 4, wherein the whispered speech input is associated with a first spectrum having one or more first spectrum characteristics associated with a whispered speech. 6. The electronic device of claim 5, wherein the one or more first spectrum characteristics comprise at least one of: a first amplitude, wherein the first amplitude is less than a second amplitude below a threshold frequency, the second amplitude being associated with the non-whispered speech;a first energy, wherein the first energy is less than a second energy below the threshold frequency, the second energy being associated with the non-whispered speech;a first volume, wherein the first volume is less than a second volume by a threshold volume percentage, the second volume being associated with the non-whispered speech; anda first slope of the first spectrum, wherein the first slope of the first spectrum is shifted by a threshold slope percentage with respect to a second slope of the second spectrum, the second slope of the second spectrum being associated with the non-whispered speech. 7. The electronic device of claim 5, wherein determining whether the speech input includes a whispered speech input comprises: determining whether the speech input includes a whispered speech input using one or more features of the speech input, wherein the one or more features represent one or more spectrum characteristics associated with a spectrum of the speech input. 8. The electronic device of claim 7, wherein determining whether the speech input includes a whispered speech input using the one or more features comprises: obtaining the spectrum of the speech input;determining the one or more spectrum characteristics associated with the spectrum of the speech input; anddetermining a first feature and a second feature based on the one or more spectrum characteristics associated with the spectrum of the speech input. 9. The electronic device of claim 8, wherein the first feature is a first mel-frequency cepstrum coefficient (MFCC0) representing an energy or an amplitude associated with the spectrum of the speech input; andwherein the second feature is a second mel-frequency cepstrum coefficient (MFCC1) representing a slope associated with the spectrum of the speech input. 10. The electronic device of claim 8, wherein the one or more programs include further instructions for: obtaining a whisper score based on the first feature to the second feature; anddetermining whether the whisper score satisfies a score threshold. 11. The electronic device of claim 4, wherein determining whether the context data indicates that the whispered speech response is expected comprises: obtaining the context data provided by at least one of the electronic device or one or more additional devices communicatively connected to the electronic device; anddetermining whether the context data satisfy one or more conditions for providing the whispered speech response. 12. The electronic device of claim 1, wherein the intermediate speech has substantially the same content as the whispered speech response. 13. The electronic device of claim 1, wherein obtaining the residual signal based on a linear prediction analysis of the intermediate speech comprises: obtaining a plurality of speech frames using the intermediate speech; andperforming the linear prediction analysis of the plurality of speech frames. 14. The electronic device of claim 13, wherein performing the linear prediction analysis of the plurality of speech frames comprises: pre-emphasizing the plurality of speech frames;estimating a plurality of linear prediction coefficients; andinverse filtering the pre-emphasized speech frames to obtain the residual signal. 15. The electronic device of claim 14, wherein estimating the plurality of linear prediction coefficients comprises: performing a windowing on the pre-emphasized plurality of speech frames. 16. The electronic device of claim 1, wherein modifying the residual signal comprises: receiving a white noise sequence;estimating energy of the white noise sequence and the residual signal;correlating the energy of the white noise sequence and the energy of the residual signal; andcompensating the correlated white noise sequence. 17. The electronic device of claim 16, wherein compensating the correlated white noise sequence comprises performing at least one of differentiating, high-pass filtering, or band-pass filtering with respect to the correlated white noise sequence. 18. The electronic device of claim 1, wherein obtaining the whispered speech response based on the linear prediction synthesis of the modified residual signal comprises: obtaining a plurality of linear prediction coefficients;modifying the linear prediction coefficients; andperforming a linear prediction synthesis of the modified residual signal using the modified linear prediction coefficients. 19. The electronic device of claim 18, wherein modifying the linear prediction coefficients comprises: converting the plurality of linear prediction coefficients to line spectral frequencies;modifying the line spectral frequencies; andgenerating modified linear prediction coefficients based on the modified line spectral frequencies. 20. The electronic device of claim 18, wherein performing a linear prediction synthesis of the modified residual signal using the modified linear prediction coefficients comprises: generating a plurality of whispered frames using a synthesis filter and the modified linear prediction coefficients; andgenerating the whispered speech response using the plurality of whispered frames. 21. The electronic device of claim 1, wherein the one or more programs include further instructions for, prior to determining that a whispered speech response is to be provided: determining whether providing a whispered speech response is disabled; andin accordance with a determination that providing the whispered speech response is disabled, generating a non-whispered speech response, andproviding the non-whispered speech response to the user in lieu of the whispered speech response. 22. The electronic device of claim 1, wherein generating the intermediate speech based on the result of the natural language processing comprises: identifying a user intent based on the result of the natural language processing; andgenerating the intermediate speech according to the user intent. 23. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device, cause the electronic device to: receive a speech input from a user;determine, based on the speech input, that a whispered speech response is to be provided;upon determining that a whispered speech response is to be provided, generate the whispered speech response, wherein generating the whispered speech response comprises: generating text based on the speech input;performing natural language processing of the text;generating an intermediate speech based on a result of the natural language processing;obtaining a residual signal based on a linear prediction analysis of the intermediate speech;modifying the residual signal; andobtaining the whispered speech response based on a linear prediction synthesis of the modified residual signal; andprovide the whispered speech response to the user. 24. The non-transitory computer-readable storage medium of claim 23, wherein determining that the whispered speech response is to be provided comprises at least one of: determining whether the speech input includes a whispered speech input; anddetermining whether context data indicates that the whispered speech response is expected. 25. The non-transitory computer-readable storage medium of claim 24, wherein the whispered speech input is associated with a first spectrum having one or more first spectnim characteristics associated with a whispered speech. 26. The non-transitory computer-readable storage medium of claim 25, wherein the one or more first spectrum characteristics comprise at least one of: a first amplitude, wherein the first amplitude is less than a second amplitude below a threshold frequency, the second amplitude being associated with the non-whispered speech;a first energy, wherein the first energy is less than a second energy below the threshold frequency, the second energy being associated with the non-whispered speech;a first volume, wherein the first volume is less than a second volume by a threshold volume percentage, the second volume being associated with the non-whispered speech; anda first slope of the first spectrum, wherein the first slope of the first spectrum is shifted by a threshold slope percentage with respect to a second slope of the second spectrum, the second slope of the second spectrum being associated with the non-whispered speech. 27. The non-transitory computer-readable storage medium of claim 25, wherein determining whether the speech input includes a whispered speech input comprises: determining whether the speech input includes a whispered speech input using one or more features of the speech input, wherein the one or more features represent one or more spectrum characteristics associated with a spectrum of the speech input. 28. The non-transitory computer-readable storage medium of claim 27, wherein determining whether the speech input includes a whispered speech input using the one or more features comprises: obtaining the spectrum of the speech input;determining the one or more spectrum characteristics associated with the spectrum of the speech input; anddetermining a first feature and a second feature based on the one or more spectrum characteristics associated with the spectrum of the speech input. 29. The non-transitory computer-readable storage medium of claim 28, wherein the one or more programs comprise further instructions, which when executed by one or more processors of the electronic device, cause the electronic device to: obtain a whisper score based on the first feature to the second feature; anddetermine whether the whisper score satisfies a score threshold. 30. The non-transitory computer-readable storage medium of claim 24, wherein determining whether the context data indicates that the whispered speech response is expected comprises: obtaining the context data provided by at least one of the electronic device or one or more additional devices communicatively connected to the electronic device; anddetermining whether the context data satisfy one or more conditions for providing the whispered speech response. 31. The non-transitory computer-readable storage medium of claim 23, wherein obtaining the residual signal based on a linear prediction analysis of the intermediate speech comprises: obtaining a plurality of speech frames using the intermediate speech; andperforming the linear prediction analysis of the plurality of speech frames. 32. The non-transitory computer-readable storage medium of claim 31, wherein performing the linear prediction analysis of the plurality of speech frames comprises: pre-emphasizing the plurality of speech frames;estimating a plurality of linear prediction coefficients; andinverse filtering the pre-emphasized speech frames to obtain the residual signal. 33. The non-transitory computer-readable storage medium of claim 23, wherein modifying the residual signal comprises: receiving a white noise sequence;estimating energy of the white noise sequence and the residual signal;correlating the energy of the white noise sequence and the energy of the residual signal; andcompensating the correlated white noise sequence. 34. The non-transitory computer-readable storage medium of claim 23, wherein obtaining the whispered speech response based on the linear prediction synthesis of the modified residual signal comprises: obtaining a plurality of linear prediction coefficients;modifying the linear prediction coefficients; andperforming a linear prediction synthesis of the modified residual signal using the modified linear prediction coefficients. 35. The non-transitory computer-readable storage medium of claim 34, wherein modifying the linear prediction coefficients comprises: converting the plurality of linear prediction coefficients to line spectral frequencies;modifying the line spectral frequencies; andgenerating modified linear prediction coefficients based on the modified line spectral frequencies. 36. The non-transitory computer-readable storage medium of claim 34, wherein performing a linear prediction synthesis of the modified residual signal using the modified linear prediction coefficients comprises: generating a plurality of whispered frames using a synthesis filter and the modified linear prediction coefficients; andgenerating the whispered speech response using the plurality of whispered frames. 37. The non-transitory computer-readable storage medium of claim 23, wherein the one or more programs comprise further instructions, which when executed by one or more processors of the electronic device, cause the electronic device to, prior to determining that a whispered speech response is to be provided: determine whether providing a whispered speech response is disabled; andin accordance with a determination that providing the whispered speech response is disabled, generate a non-whispered speech response, andprovide the non-whispered speech response to the user in lieu of the whispered speech response. 38. The non-transitory computer-readable storage medium of claim 23, wherein generating the intermediate speech based on the result of the natural language processing comprises: identifying a user intent based on the result of the natural language processing; andgenerating the intermediate speech according to the user intent. 39. A method for operating a digital assistant, comprising: at a user device with one or more processors and memory:receiving a speech input from a user;determining, based on the speech input, that a whispered speech response is to be provided;upon determining that a whispered speech response is to be provided, generating the whispered speech response, wherein generating the whispered speech response comprises: generating text based on the speech input;performing natural language processing of the text;generating an intermediate speech based on a result of the natural language processing;obtaining a residual signal based on a linear prediction analysis of the intermediate speech;modifying the residual signal; andobtaining the whispered speech response based on a linear prediction synthesis of the modified residual signal; andproviding the whispered speech response to the user. 40. The method of claim 39, wherein determining that the whispered speech response is to be provided comprises at least one of: determining whether the speech input includes a whispered speech input; anddetermining whether context data indicates that the whispered speech response is expected. 41. The method of claim 40, wherein the whispered speech input is associated with a first spectrum having one or more first spectrum characteristics associated with a whispered speech. 42. The method of claim 41, wherein the one or more first spectrum characteristics comprise at least one of: a first amplitude, wherein the first amplitude is less than a second amplitude below a threshold frequency, the second amplitude being associated with the non-whispered speech;a first energy, wherein the first energy is less than a second energy below the threshold frequency, the second energy being associated with the non-whispered speech;a first volume, wherein the first volume is less than a second volume by a threshold volume percentage, the second volume being associated with the non-whispered speech; anda first slope of the first spectrum, wherein the first slope of the first spectrum is shifted by a threshold slope percentage with respect to a second slope of the second spectrum, the second slope of the second spectrum being associated with the non-whispered speech. 43. The method of claim 41, wherein determining whether the speech input includes a whispered speech input comprises: determining whether the speech input includes a whispered speech input using one or more features of the speech input, wherein the one or more features represent one or more spectrum characteristics associated with a spectrum of the speech input. 44. The method of claim 43, wherein determining whether the speech input includes a whispered speech input using the one or more features comprises: obtaining the spectrum of the speech input;determining the one or more spectrum characteristics associated with the spectrum of the speech input; anddetermining a first feature and a second feature based on the one or more spectrum characteristics associated with the spectrum of the speech input. 45. The method of claim 44, further comprising: obtaining a whisper score based on the first feature to the second feature; anddetermining whether the whisper score satisfies a score threshold. 46. The method of claim 40, wherein determining whether the context data indicates that the whispered speech response is expected comprises: obtaining the context data provided by at least one of the electronic device or one or more additional devices communicatively connected to the electronic device; anddetermining whether the context data satisfy one or more conditions for providing the whispered speech response. 47. The method of claim 39, wherein obtaining the residual signal based on a linear prediction analysis of the intermediate speech comprises: obtaining a plurality of speech frames using the intermediate speech; andperforming the linear prediction analysis of the plurality of speech frames. 48. The method of claim 47, wherein performing the linear prediction analysis of the plurality of speech frames comprises: pre-emphasizing the plurality of speech frames;estimating a plurality of linear prediction coefficients; andinverse filtering the pre-emphasized speech frames to obtain the residual signal. 49. The method of claim 39, wherein modifying the residual signal comprises: receiving a white noise sequence;estimating energy of the white noise sequence and the residual signal;correlating the energy of the white noise sequence and the energy of the residual signal; andcompensating the correlated white noise sequence. 50. The method of claim 39, wherein obtaining the whispered speech response based on the linear prediction synthesis of the modified residual signal comprises: obtaining a plurality of linear prediction coefficients;modifying the linear prediction coefficients; andperforming a linear prediction synthesis of the modified residual signal using the modified linear prediction coefficients. 51. The method of claim 50, wherein modifying the linear prediction coefficients comprises: converting the plurality of linear prediction coefficients to line spectral frequencies;modifying the line spectral frequencies; andgenerating modified linear prediction coefficients based on the modified line spectral frequencies. 52. The method of claim 50, wherein performing a linear prediction synthesis of the modified residual signal using the modified linear prediction coefficients comprises: generating a plurality of whispered frames using a synthesis filter and the modified linear prediction coefficients; andgenerating the whispered speech response using the plurality of whispered frames. 53. The method of claim 39, further comprising, prior to determining that a whispered speech response is to be provided: determining whether providing a whispered speech response is disabled; andin accordance with a determination that providing the whispered speech response is disabled, generating a non-whispered speech response, andproviding the non-whispered speech response to the user in lieu of the whispered speech response. 54. The method of claim 39, wherein generating the intermediate speech based on the result of the natural language processing comprises: identifying a user intent based on the result of the natural language processing; andgenerating the intermediate speech according to the user intent.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.