A smart phone senses audio, imagery, and/or other stimulus from a user's environment, and acts autonomously to fulfill inferred or anticipated user desires. In one aspect, the detailed technology concerns phone-based cognition of a scene viewed by the phone's camera. The image processing tasks appli
A smart phone senses audio, imagery, and/or other stimulus from a user's environment, and acts autonomously to fulfill inferred or anticipated user desires. In one aspect, the detailed technology concerns phone-based cognition of a scene viewed by the phone's camera. The image processing tasks applied to the scene can be selected from among various alternatives by reference to resource costs, resource constraints, other stimulus information (e.g., audio), task substitutability, etc. The phone can apply more or less resources to an image processing task depending on how successfully the task is proceeding, or based on the user's apparent interest in the task. In some arrangements, data may be referred to the cloud for analysis, or for gleaning. Cognition, and identification of appropriate device response(s), can be aided by collateral information, such as context. A great number of other features and arrangements are also detailed.
대표청구항▼
1. A method of declarative reconfiguration of a smart phone system, said system having a processor configured to perform one or more acts of the method, said system also including at least first and second sensors for capturing, respectively, first and second different types of media content from a
1. A method of declarative reconfiguration of a smart phone system, said system having a processor configured to perform one or more acts of the method, said system also including at least first and second sensors for capturing, respectively, first and second different types of media content from a user's environment, and for producing, respectively, first and second different types of sensor output data, one of said sensors comprising a microphone for sensing audio content and producing audio output data, and another of said sensors comprising an image sensor for sensing visual content and producing image output data, the method comprising the acts: (a) applying, to a speech recognition module, audio output data corresponding to user speech received by the microphone;(b) receiving, from the speech recognition module, recognized verb data and recognized noun data corresponding, respectively, to a verb and a noun included in said user speech, the noun data identifying a subject in the user's environment from which sensor data is captured;(c) based on said recognized verb data, determining that the user is either interested in the first type of media content or in the second type of media content;(d) based on said recognized noun data, tuning a content recognition operation of the system in accordance with a determined user interest, said tuning comprising establishing a set of one or more audio or image processing operations to be performed on output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content, said set being selected from a larger set of signal processing operations comprising image or audio processing operations, said tuning including accessing a data structure using said recognized noun data to obtain data identifying said set of one or more signal processing operations to be performed on said output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content;(e) performing said tuned content recognition operation on the first sensor output data or on the second sensor output data; and(f) providing results based on said tuned content recognition operation to the user; wherein speech recognition is employed both (1) in identifying a type of media content of interest to the user, and (2) in tuning content recognition processing of said identified type of media content. 2. The method of claim 1 in which the recognized verb data comprises data corresponding to a verb from the list consisting of: look, watch, view, see, and read. 3. The method of claim 1 in which the recognized verb data comprises data corresponding to a verb from the list consisting of: listen, and hear. 4. The method of claim 1 in which the recognized noun data comprises data corresponding to a noun from the list consisting of: newspaper, book, magazine, poster, text, printing, ticket, box, package, carton, wrapper, product, barcode, watermark, photograph, person, man, boy, woman, girl, people, display, screen, monitor, video, movie, television, radio, iPhone, iPad, and Kindle. 5. The method of claim 1 that includes determining, by reference to the recognized verb data, that visual content, rather than audio content, is of interest to the user, and the method includes determining a type of image processing to be applied to the image output data. 6. The method of claim 5 wherein the type of image processing comprises digital watermark decoding. 7. The method of claim 5 wherein the type of image processing comprises image fingerprinting. 8. The method of claim 5 wherein the type of image processing comprises optical character recognition. 9. The method of claim 5 wherein the type of image processing comprises barcode reading. 10. The method of claim 1 that includes: determining, by reference to the recognized verb data, that visual content, rather than audio content, is of interest to the user; anddetermining, by reference to the recognized noun data, a filtering function to be applied to the image output data. 11. The method of claim 1 that includes: determining, by reference to the recognized verb data, that visual content, rather than audio content, is of interest to the user; anddetermining, by reference to the recognized noun data, an optical focusing function to be applied to the image output data. 12. The method of claim 1 in which the user speech data includes a negation from the list: not, no and ignore. 13. The method of claim 1 in which said recognized verb data directs the system that the user is interested in audio content rather than visual content, and said recognized noun data establishes an audio filtering function that is to be applied to said audio output data. 14. The method of claim 13 in which a passband of said audio filtering function depends on said recognized noun data. 15. The method of claim 13 that includes establishing a male voice-tailored audio filtering passband function in response to first recognized noun data, and establishing a female voice-tailored audio filtering passband function in response to second recognized noun data. 16. The method of claim 13 that includes, as a consequence of first user speech, processing audio output data with an audio filtering function having a first passband, and as a consequence of second user speech, processing audio output data with an audio filtering function having a second passband different than the first passband. 17. The method of claim 1 that includes: as a consequence of first user speech, including a first verb and a first noun, directing the system to process audio output data with a first signal processing operation; and as a consequence of second user speech, including a second verb and a second noun, directing the system to process image output data with a second signal processing operation;wherein the first verb is different than the second verb, and the first noun is different than the second noun. 18. The method of claim 1 that further includes, before act (c), detecting a keyword in the user speech, said keyword detection serving as a cue to the system to perform acts (c) through (e). 19. The method of claim 1 in which the first sensor comprises the microphone and the second sensor comprises the image sensor, and the determined user interest comprises an indication of an interest in the first type of media content, in which the first type of media content comprises audio content, and in which act (e) preforms said tuned content recognition operation on the audio output data. 20. The method of claim 1 in which the first sensor comprises the microphone and the second sensor comprises the image sensor, and the determined user interest comprises an indication of an interest in the second type of media content, in which the second type of media content comprises visual content, and in which act (e) preforms said tuned content recognition operation on the image output data. 21. A non-transitory computer readable medium containing programming instructions for configuring a smart phone system that includes a processor and at least first and second sensors for capturing, respectively, first and second different types of media content from a user's environment, and for producing, respectively, first and second different types of sensor output data, one of said sensors comprising a microphone for sensing audio content and producing audio output data, and another of said sensors comprising an image sensor for sensing visual content and producing image output data, said instructions configuring the system programmed thereby to perform acts including: (a) applying, to a speech recognition module, audio output data corresponding to user speech received by the microphone;(b) receiving, from the speech recognition module, recognized verb data and recognized noun data corresponding, respectively, to a verb and a noun included in said user speech, the noun data identifying a subject in the user's environment from which sensor data is captured;(c) based on said recognized verb data, determining that the user is either interested in the first type of media content or in the second type of media content;(d) based on said recognized noun data, tuning a content recognition operation of the system in accordance with a determined user interest, said tuning comprising establishing a set of one or more audio or image processing operations to be performed on output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content, said set being selected from a larger set of signal processing operations comprising image or audio processing operations, said tuning including accessing a data structure using said recognized noun data to obtain data identifying said set of one or more signal processing operations to be performed on said output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content;(e) performing said tuned content recognition operation on the first sensor output data or on the second sensor output data; and(f) providing results based on said tuned content recognition operation to the user; wherein speech recognition is employed both (1) in identifying a type of media content of interest to the user, and (2) in tuning content recognition processing of said identified type of media content. 22. A smart phone system including: a processor;a memory;at least first and second sensors for capturing, respectively, first and second different types of media content from a user's environment, and for producing, respectively, first and second different types of sensor output data, one of said sensors comprising a microphone for sensing audio content and producing audio output data, and another of said sensors comprising an image sensor for sensing visual content and producing image output data; and instructions in said memory that configure the system to perform:(a) applying, to a speech recognition module, audio output data corresponding to user speech received by the microphone;(b) receiving, from the speech recognition module, recognized verb data and recognized noun data corresponding, respectively, to a verb and a noun included in said user speech, the noun data identifying a subject in the user's environment from which sensor data is captured;(c) based on said recognized verb data, determining that the user is either interested in the first type of media content or in the second type of media content;(d) based on said recognized noun data, tuning a content recognition operation of the system in accordance with a determined user interest, said tuning comprising establishing a set of one or more audio or image processing operations to be performed on output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content, said set being selected from a larger set of signal processing operations comprising image or audio processing operations, said tuning including accessing a data structure using said recognized noun data to obtain data identifying said set of one or more signal processing operations to be performed on said output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content;(e) performing said tuned content recognition operation on the first sensor output data or on the second sensor output data; and(f) providing results based on said tuned content recognition operation to the user; wherein speech recognition is employed both (1) in identifying a type of media content of interest to the user, and (2) in tuning content recognition processing of said identified type of media content.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (11)
McCune, Timothy S., Autonomous integrated headset and sound processing system for tactical applications.
Narayanaswami,Chandrasekhar; Kirkpatrick,Edward Scott, Image capturing system and method for automatically watermarking recorded parameters for providing digital image verification.
Nelson Paul E. ; Anderson Christopher H. ; Whitman Ronald M. ; Gardner Paul C. ; David Mark R., Multimedia document retrieval by application of multimedia queries to a unified index of multimedia data for a plurality of multimedia data types.
Kawasaki,Toshinobu; Komoda,Yoshiyuki; Tokunaga,Yoshihiko; Okada,Yukio; Shinomiya,Hirotatsu; Hayami,Takehito, Voice control system for operating home electrical appliances.
Heck, Larry Paul; Chinthakunta, Madhusudan; Mitby, David; Stifelman, Lisa, Augmented conversational understanding agent to identify conversation context between two humans and taking an agent action thereof.
Ligman, Joseph W.; Pistoia, Marco; Ponzo, John J.; Thomas, Gegi, Automatic extraction, modeling, and code mapping of application user interface display screens and components.
Ligman, Joseph W.; Pistoia, Marco; Ponzo, John J.; Thomas, Gegi, Automatic extraction, modeling, and code mapping of application user interface display screens and components.
Ligman, Joseph W.; Pistoia, Marco; Ponzo, John J.; Thomas, Gegi, Automatic extraction, modeling, and code mapping of application user interface display screens and components.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.