Answering questions using environmental context
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G10L-015/00
G10L-017/00
G10L-021/00
G10L-025/00
G10L-015/22
G10L-015/08
G10L-015/24
G10L-015/30
G06F-017/30
출원번호
US-0410180
(2017-01-19)
등록번호
US-9786279
(2017-10-10)
발명자
/ 주소
Sharifi, Matthew
Postelnicu, Gheorghe
출원인 / 주소
Google Inc.
대리인 / 주소
Fish & Richardson P.C.
인용정보
피인용 횟수 :
0인용 특허 :
36
초록▼
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data encoding an utterance and environmental data, obtaining a transcription of the utterance, identifying an entity using the environmental data, submitting a query to a natural la
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data encoding an utterance and environmental data, obtaining a transcription of the utterance, identifying an entity using the environmental data, submitting a query to a natural language query processing engine, wherein the query includes at least a portion of the transcription and data that identifies the entity, and obtaining one or more results of the query.
대표청구항▼
1. A computer-implemented method comprising: generating, by a mobile device, an audio recording of (i) a question about an unidentified item of media content that a different device is playing in a vicinity of the mobile device, and (ii) environmental audio;in response to forwarding the audio record
1. A computer-implemented method comprising: generating, by a mobile device, an audio recording of (i) a question about an unidentified item of media content that a different device is playing in a vicinity of the mobile device, and (ii) environmental audio;in response to forwarding the audio recording to a front end server of a natural language processing system, receiving an answer to the question that is based on processing different portions of the audio recording by a speech recognition engine server associated with the natural language processing system and a content identification engine server associated with the natural language processing system; andin response to the question, providing, by the mobile device, the answer to the question about the unidentified item of media content. 2. The computer-implemented method of claim 1, comprising: identifying one or more keywords corresponding to the question,associating the one or more keywords with one or more types of media content, andproviding the answer based on the question and the one or more types of media content. 3. The computer-implemented method of claim 2, wherein the one or more types of media content includes at least one of movie, music, television show, audio podcast, image, artwork, book, magazine, trailer, video, podcast, Internet video and video game. 4. The computer-implemented method of claim 2, wherein providing the answer based on the one or more types of media content further comprises: identifying two or more candidate answers of the question,generating ranked scores for each of the two or more candidate answers, the ranked scores based on the one or more types of media content, andproviding the answer based on the question and the ranked scores. 5. The computer-implemented method of claim 1, further comprising streaming the environmental audio. 6. The computer-implemented method of claim 1, wherein the speech recognition engine server associated with the natural language processing system and the content identification server associated with the natural language processing system are both the same server. 7. The computer-implemented method of claim 1, further comprising: detecting environmental image data associated with the item of media content, andproviding the answer based on the question and the environmental image data. 8. The computer-implemented method of claim 7, further comprising: identifying one or more types of media content based on the environmental image data, andproviding the answer based on the question, the environmental image data and the one or more types of media content. 9. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: generating, by a mobile device, an audio recording of (i) a question about an unidentified item of media content that a different device is playing in a vicinity of the mobile device, and (ii) environmental audio;in response to forwarding the audio recording to a front end server of a natural language processing system, receiving an answer to the question that is based on processing different portions of the audio recording by a speech recognition engine server associated with the natural language processing system and a content identification engine server associated with the natural language processing system; andin response to the question, providing, by the mobile device, the answer to the question about the unidentified item of media content. 10. The system of claim 9, wherein the operations comprise: identifying one or more keywords corresponding to the question,associating the one or more keywords with one or more types of media content, andproviding the answer based on the question and the one or more types of media content. 11. The system of claim 10, wherein the one or more types of media content includes at least one of movie, music, television show, audio podcast, image, artwork, book, magazine, trailer, video, podcast, Internet video and video game. 12. The system of claim 10, wherein providing the answer based on the one or more types of media content further comprises: identifying two or more candidate answers of the question,generating ranked scores for each of the two or more candidate answers, the ranked scores based on the one or more types of media content, andproviding the answer based on the question and the ranked scores. 13. The system of claim 9, wherein the operations comprise streaming the environmental audio. 14. The system of claim 9, wherein the speech recognition engine server associated with the natural language processing system and the content identification server associated with the natural language processing system are both the same server. 15. The system of claim 9, wherein the operations comprise: detecting environmental image data associated with the item of media content, andproviding the answer based on the question and the environmental image data. 16. The system of claim 15, wherein the operations comprise: identifying one or more types of media content based on the environmental image data, andproviding the answer based on the question, the environmental image data and the one or more types of media content. 17. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising: generating an audio recording of (i) a question about an unidentified item of media content that a different device is playing in a vicinity of a mobile device, and (ii) environmental audio;in response to forwarding the audio to a front end server of a natural language processing system, receiving an answer to the question that is based on processing different portions of the audio recording by a speech recognition engine server associated with the natural language processing system and a content identification engine server associated with the natural language processing system; andin response to the question, providing the answer to the question about the unidentified item of media content. 18. The non-transitory computer-readable medium of claim 17, wherein the operations comprise: identifying one or more keywords corresponding to the question,associating the one or more keywords with one or more types of media content, andproviding the answer based on the question and the one or more types of media content. 19. The non-transitory computer-readable medium of claim 17, wherein the operations comprise streaming the environmental audio. 20. The non-transitory computer-readable medium of claim 17, wherein the operations comprise: detecting environmental image data associated with the item of media content, andproviding the answer based on the question and the environmental image data.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (36)
Toyama,Soichi, Apparatus and method for speech recognition.
VanLund, Peter Spalding; Piersol, Kurt Wesley; Meyers, James David; Simpson, Jacob Michael; Gundeti, Vikram Kumar; Thomas, David Robert; Miles, Andrew Christopher, Application focus in speech-based systems.
Abe,Mototsugu; Nishiguchi,Masayuki, Method and apparatus for classifying signals method and apparatus for generating descriptors and method and apparatus for retrieving signals.
Wang, Avery Li-Chun; Barton, Christopher Jacques Penrose; Mukherjee, Dheeraj Shankar; Inghelbrecht, Philip, Method and system for purchasing pre-recorded music.
Goldberg Randy G. ; Rosen Kenneth H. ; Sachs Richard M. ; Winthrop ; III Joel A., Selective noise/channel/coding models and recognizers for automatic speech recognition.
Petkovic Dragutin ; Ponceleon Dulce Beatriz ; Srinivasan Savitha, System and method for automatic audio content analysis for word spotting, indexing, classification and retrieval.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.