Methods, systems, computer program products, and methods of doing business by adapting audio renderings of non-audio messages (for example, e-mail messages that are processed by a text-to-speech translator) to reflect various nuances of the non-audio information. Audio cues are provided for this pu
Methods, systems, computer program products, and methods of doing business by adapting audio renderings of non-audio messages (for example, e-mail messages that are processed by a text-to-speech translator) to reflect various nuances of the non-audio information. Audio cues are provided for this purpose, which are sounds that are "mixed" in with the audio rendering as a separate (background) audio stream. Audio cues may reflect information such as the topical structure of a text file, or changes in paragraphs. Or, audio cues may be used to signal nuances such as changes in the color or font of the source text. Audio cues may also be advantageously used to reflect information about the translation process with which the audio rendering of a text file was created, such as using varying background tones to convey the degree of certainty in the accuracy of translating text to audio using a text-to-speech translation system, or of translating audio to text using a voice recognition system, or of translating between languages, and so forth. Stylesheets, such as those encoded in the Extensible Stylesheet Language ("XSL"), may optionally be used to customize the audio cues. For example, a user-specific stylesheet customization may be performed to override system-wide default audio cues for a particular user, enabling her to hear a different background sound for messages on a particular topic than other users will hear.
대표청구항▼
We claim: 1. A method of enhancing audio renderings of non-audio data sources, comprising: detecting a nuance of a non-audio data source; locating an audio cue corresponding to the detected nuance; and associating the located audio cue with the detected nuance for playback to a listener, wherein de
We claim: 1. A method of enhancing audio renderings of non-audio data sources, comprising: detecting a nuance of a non-audio data source; locating an audio cue corresponding to the detected nuance; and associating the located audio cue with the detected nuance for playback to a listener, wherein detecting a nuance of a non-audio data source detects a plurality of nuances of the non-audio data source, locating an audio cue locates audio cues for each of the detected nuances, and associating the located audio cue with the detected nuance for playback to a listener associates each of the located audio cues with the respective detected nuance, and further comprising: creating an audio rendering of the non-audio data source; and mixing the associated audio cues in with the audio rendering to generate integrated sounds therefrom to the listener. 2. The method according to claim 1, wherein mixing the associated audio cues occurs while playing the audio rendering to the listener. 3. The method according to claim 1, wherein the non-audio data source is a text file and wherein creating an audio rendering of the non-audio data source further comprises processing the text file with a text-to-speech translator. 4. The method according to claim 1, wherein at least one of the detected nuances is presence of a formatting tag. 5. The method according to claim 4, wherein the formatting tag is a new paragraph tag. 6. The method according to claim 1, wherein the non-audio data source is a text file and at least one of the detected nuances is a change in color of text in the text file. 7. The method according to claim 1, wherein the non-audio data source is a text file and the detected nuance is a change in font of text in the text file. 8. The method according to claim 1, wherein the non-audio data source is a text file and the detected nuance is presence of a keyword for the text file. 9. The method according to claim 8, wherein the keyword is supplied by a creator of the text file. 10. The method according to claim 8, wherein the keyword is programmatically detected by evaluating text in the text file. 11. The method according to claim 1, wherein the non-audio data source is a text file and at least one of the detected nuances is presence of an emoticon in the text file. 12. The method according to claim 1, wherein the detected nuance is a change of topic in the non-audio data source. 13. The method according to claim 1, wherein at least one of the detected nuances is a degree of certainty in translation of the non-audio data source from another format. 14. The method according to claim 13, wherein detecting a nuance of a non-audio data source detects at least two different degrees of certainty, and wherein the located audio cues comprise changes in a pitch of a voice used in the audio rendering for each of the different degrees of certainty. 15. The method according to claim 13, wherein detecting a nuance of a non-audio data source detects at least two different degrees of certainty, and further comprising changing a pitch of the associated audio cue used by mixing the associated audio cues in with the audio rendering for each of the different degrees of certainty. 16. The method according to claim 13, wherein detecting a nuance of a non-audio data source detects at least two different degrees of certainty, and wherein mixing the associated audio cues in with the audio rendering further comprises alternating between two of the located audio cues to audibly indicate the different degrees of certainty. 17. The method according to claim 13, wherein the other format is an input audio data source and the non-audio data source is a text file, and the translation is an audio-to-text translation from the input audio data source to the text file, and wherein the degree of certainty reflects accuracy of the audio-to-text translation. 18. The method according to claim 13, wherein the other format is an input audio data source and the non-audio data source is a text file, and the translation is an audio-to-text translation from the input audio data source to the text file, and wherein the degree of certainty reflects identification of a speaker who created the input audio data source. 19. The method according to claim 13, wherein the other format is a source text file and the non-audio data source is an output text file, and the translation is a text-to-text translation from the source text file to the output text file, and wherein the degree of certainty reflects accuracy of the text-to-text translation. 20. The method according to claim 19, wherein the source text file contains text in a first language and the output text file contains text in a second language. 21. A system for enhancing audio renderings of non-audio data sources, comprising: means for detecting one or more nuances of a non-audio data source; means for locating an audio cue corresponding to each of the detected nuances; means for associating the located audio cues with their respective detected nuances for playback to a listener; means for creating an audio rendering of the non-audio data source, wherein the non-audio segment is associated with the nuance; and means for mixing the associated audio cues in with the audio rendering to generate integrated sounds therefrom to the listener. 22. The system according to claim 21, wherein the non-audio data source is a text file and wherein the means for creating further comprises means for processing the text file with a text-to-speech translator. 23. The system according to claim 21, wherein at least one of the detected nuances is presence of a formatting tag. 24. The system according to claim 46, wherein the formatting tag is a new paragraph tag. 25. The system according to claim 21, wherein the non-audio data source is a text file and the detected nuance is a change in font of text in the text file. 26. The system according to claim 21, wherein the non-audio data source is a text file and at least one of the detected nuances is presence of an emoticon in the text file. 27. The system according to claim 21, wherein the detected nuance is a change of topic in the non-audio data source. 28. The system according to claim 21, wherein at least one of the detected nuances is a degree of certainty in translation of the non-audio data source from another format. 29. The system according to claim 28, wherein the means for detecting detects at least two different degrees of certainty, and wherein the located audio cues comprise changes in a pitch of a voice used in the audio rendering for each of the different degrees of certainty. 30. The system according to claim 28, wherein the means for detecting detects at least two different degrees of certainty, and further comprising means for changing a pitch of the associated audio cue used by the means for mixing for each of the different degrees of certainty. 31. The system according to claim 28, wherein the other format is an input audio data source and the non-audio data source is a text file, and the translation is an audio-to-text translation from the input audio data source to the text file, and wherein the degree of certainty reflects accuracy of the audio-to-text translation. 32. The system according to claim 28, wherein the other format is an input audio data source and the non-audio data source is a text file, and the translation is an audio-to-text translation from the input audio data source to the text file, and wherein the degree of certainty reflects identification of a speaker who created the input audio data source. 33. The system according to claim 28, wherein the other format is a source text file and the non-audio data source is an output text file, and the translation is a text-to-text translation from the source text file to the output text file, and wherein the degree of certainty reflects accuracy of the text-to-text translation. 34. The system according to claim 21, wherein the non-audio data source is an e-mail message and at least one of the detected nuances is an e-mail convention found in the e-mail message. 35. The system according to claim 21, wherein the non-audio data source is text provided by a user. 36. The system according to claim 21, wherein the detected nuance is embedded within the non-audio file. 37. The system according to claim 21, wherein the detected nuance comprises metadata associated with the non-audio file. 38. A computer program product for enhancing audio renderings of non-audio data sources, the computer program product embodied on one or more computer-readable media and comprising: computer-readable program code that is configured to detect one or more nuances of a non-audio data source; computer-readable program code that is configured to locate an audio cue corresponding to each of the detected nuances; computer-readable program code that is configured to associate the located audio cues with their respective detected nuances for playback to a listener; computer-readable program code that is configured to create an audio rendering of a non-audio segment of the non-audio data source, wherein the non-audio segment is associated with the nuance; and computer-readable program code that is configured to mix the associated audio cue with the audio rendering of the segment to generate integrated sounds therefrom to the listener. 39. The computer program product according to claim 38, wherein the non-audio data source is a text file and wherein the computer-readable program code that is configured to create further comprises computer-readable program code that is configured to process the text file with a text-to-speech translator. 40. The computer program product according to claim 38, wherein the non-audio data source is a text file and at least one of the detected nuances is a change in color of text in the text file. 41. The computer program product according to claim 38, wherein the non-audio data source is a text file and the detected nuance is presence of a keyword for the text file. 42. The computer program product according to claim 41, wherein the keyword is supplied by a creator of the text file. 43. The computer program product according to claim 41, wherein the keyword is programmatically detected by evaluating text in the text file. 44. The computer program product according to claim 38, wherein at least one of the detected nuances is a degree of certainty in translation of the non-audio data source from another format. 45. The computer program product according to claim 44, wherein the computer-readable program code that is configured to detect detects at least two different degrees of certainty, and wherein the located audio cues comprise changes in a pitch of a voice used in the audio rendering for each of the different degrees of certainty. 46. The computer program product according to claim 44, wherein the computer-readable program code that is configured to detect detects at least two different degrees of certainty, and further comprising changing a pitch of the associated audio cue used by the computer-readable program code that is configured to mix for each of the different degrees of certainty. 47. The computer program product according to claim 44, wherein the other format is an input audio data source and the non-audio data source is a text file, and the translation is an audio-to-text translation from the input audio data source to the text file, and wherein the degree of certainty reflects accuracy of the audio-to-text translation. 48. The computer program product according to claim 44, wherein the other format is an input audio data source and the non-audio data source is a text file, and the translation is an audio-to-text translation from the input audio data source to the text file, and wherein the degree of certainty reflects identification of a speaker who created the input audio data source. 49. The computer program product according to claim 44, wherein the other format is a source text file and the non-audio data source is an output text file, and the translation is a text-to-text translation from the source text file to the output text file, and wherein the degree of certainty reflects accuracy of the text-to-text translation. 50. The computer program product according to claim 49, wherein the source text file contains text in a first language and the output text file contains text in a second language. 51. The computer program product according to claim 38, wherein at least one of the detected nuances is an identification of a creator of the non-audio data source. 52. The computer program product according to claim 51, wherein the identification is used to locate stored preferences of the creator. 53. The computer program product according to claim 38, wherein the non-audio data source is an e-mail message. 54. The computer program product according to claim 38, wherein the detected nuance is embedded within the non-audio file. 55. The computer program product according to claim 38, wherein the detected nuance comprises metadata associated with the non-audio file.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (11)
Cosatto Eric ; Graf Hans Peter ; Schroeter Juergen, Coarticulation method for audio-visual text-to-speech synthesis.
Johnson William J. (Flower Mound TX) Smith Michael D. (Euless TX) Williams Marvin L. (Lewisville TX), Method and system for providing multimedia substitution in messaging systems.
Thomas J. Ball ; Michael Abraham Benedikt ; Peter Andrew Mataga ; Carlos Miguel Puchol ; Kenneth G. Rehor ; Curtis Duane Tuckey, Structured voicemail messages.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Adding information or functionality to a rendered document via association with an electronic counterpart.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Aggregate analysis of text captures performed by multiple users from rendered documents.
Bodin, William K.; Jaramillo, David; Redman, Jerry W.; Thorson, Derral C., Associating user selected content management directives with user selected ratings.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Association of a portable scanner with input/output and storage devices.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Association of a portable scanner with input/output and storage devices.
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Automatically capturing information, such as capturing information using a document-aware device.
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J.; Daley-Watson, Christopher J., Automatically providing content associated with captured information, such as information captured in real-time.
King, Martin Towle; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Quentin, Capturing text from rendered documents using supplement information.
Moore, Richard G.; Mumford, Gregory L.; Gunasekar, Duraisamy, Communication converter for converting audio information/textual information to corresponding textual information/audio information.
Moore, Richard G.; Mumsford, Gregory L.; Gunasekar, Duraisamy, Communication converter for converting audio information/textual information to corresponding textual information/audio information.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Determining actions involving captured information and electronic content associated with rendered documents.
Fleizach, Christopher B.; Minifie, Darren C.; Hughes, Gregory F.; Dour, Ryan N.; Fisch, Ian M.; Lopes Da Silva, Joel M.; Pedersen, II, Michael M.; Seymour, Eric T.; Naik, Devang K.; Dixon, Ryan S., Device, method, and graphical user interface for integrating recognition of handwriting gestures with a screen reader.
Fleizach, Christopher B.; Minifie, Darren C.; Hughes, Gregory F.; Dour, Ryan N.; Fisch, Ian M.; Lopes Da Silva, Joel M.; Pedersen, II, Michael M.; Seymour, Eric T.; Naik, Devang K.; Dixon, Ryan S., Device, method, and graphical user interface for integrating recognition of handwriting gestures with a screen reader.
Fleizach, Christopher B.; Minifie, Darren C.; Hughes, Gregory F.; Dour, Ryan N.; Fisch, Ian M.; Lopes Da Silva, Joel M.; Pedersen, II, Michael M.; Seymour, Eric T.; Naik, Devang K.; Dixon, Ryan S., Device, method, and graphical user interface for integrating recognition of handwriting gestures with a screen reader.
Fleizach, Christopher Brian; Seymour, Eric Taylor; Hudson, Reginald Dean, Devices, methods, and graphical user interfaces for accessibility using a touch-sensitive surface.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Handheld device for capturing text from both a document printed on paper and a document displayed on a dynamic display device.
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Identifying a document by performing spectral analysis on the contents of the document.
Moore, Richard G.; Mumford, Gregory L.; Gunasekar, Duraisamy, Messaging response system providing translation and conversion written language into different spoken language.
Baird, Henry; Bentley, Jon; Lopresti, Daniel; Wang, Sui-Yu, Methods and apparatus for defending against telephone-based robotic attacks using contextual-based degradation.
Baird, Henry; Bentley, Jon; Lopresti, Daniel; Wang, Sui-Yu, Methods and apparatus for defending against telephone-based robotic attacks using permutation of an IVR menu.
Baird, Henry; Bentley, Jon; Lopresti, Daniel; Wang, Sui-Yu, Methods and apparatus for defending against telephone-based robotic attacks using random personal codes.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Methods and systems for initiating application processes by data capture from rendered documents.
Verna, Anthony; Ortiz, Luis M., Mobile communication device including text-to-speech module, a touch sensitive screen, and customizable tiles displayed thereon.
Sakai,Keiichi; Kosaka,Tetsuo, Multimodal document reception apparatus and multimodal document transmission apparatus, multimodal document transmission/reception system, their control method, and program.
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Performing actions based on capturing information from rendered documents, such as documents under copyright.
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Performing actions based on capturing information from rendered documents, such as documents under copyright.
King, Martin T.; Kushler, Clifford A.; Stafford-Fraser, James Q.; Grover, Dale L., Processing techniques for visual capture data from a rendered document.
Bodin, William K.; Jaramillo, David; Redman, Jerry W.; Thorson, Derral C., RSS content administration for rendering RSS content on a digital audio player.
Hofstader, Christian D.; Gordon, Glen; Damery, Eric; Ocampo, Ralph; Baker, David; Stephen, Joseph K., Screen reader having concurrent communication of non-textual information.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Search engines and systems with handheld document data capture devices.
Bodin, William K.; Jaramillo, David; Redman, Jerry W.; Thorson, Derral C., Synthesizing aggregate data of disparate data types into data of a uniform data type.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
Galanes, Francisco M.; Hon, Hsiao-Wuen; Jacoby, James D.; Lecoeuche, Renaud J.; Potter, Stephen F.; Warren, Susan M., Web server controls for web enabled recognition and/or audible prompting.
Galanes, Francisco M.; Hon, Hsiao-Wuen; Jacoby, James D.; Lecoueche, Renaud J.; Potter, Stephen F.; Warren, Susan M., Web server controls for web enabled recognition and/or audible prompting.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.