IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0763149
(1999-08-19)
|
우선권정보 |
DE-198 37 661(1998-08-19) |
국제출원번호 |
PCT/EP99/006081
(1999-08-19)
|
§371/§102 date |
20010430
(20010430)
|
국제공개번호 |
WO00/011647
(2000-03-02)
|
발명자
/ 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
15 인용 특허 :
3 |
초록
▼
The invention provides a method, apparatus, and a computer program stored on a data carrier that generates synthesized acoustical data by concatenating audio segments of sounds to reproduce a sequence of concatenated sounds/phones. The invention has an inventory or sounds and each sound has three ba
The invention provides a method, apparatus, and a computer program stored on a data carrier that generates synthesized acoustical data by concatenating audio segments of sounds to reproduce a sequence of concatenated sounds/phones. The invention has an inventory or sounds and each sound has three bands (FIG. 1b) including an initial co-articulation band, a solo articulation band and a final co-articulation band. The invention selects audio segments that end or begin with a co-articulation band and a solo articulation band of one sound. The instance of concatenation is defined by the co-articulation band and the solo articulation band of the one sound.
대표청구항
▼
The invention claimed is: 1. A method for generating synthesized acoustical data by concatenating audio segments of sounds to reproduce a sequence of concatenated sounds/phones wherein each sound/phone comprises three bands including an initial co-articulation band, a solo articulation band and a f
The invention claimed is: 1. A method for generating synthesized acoustical data by concatenating audio segments of sounds to reproduce a sequence of concatenated sounds/phones wherein each sound/phone comprises three bands including an initial co-articulation band, a solo articulation band and a final co-articulation band, and each segment comprises one or more bands of a sound/phone, said method comprising: generating an inventory of audio segments comprising a plurality of audio segments comprising bands of one or more sounds/phones; establishing an earlier audio segment with at least a portion of one band of a sound/phone selected for including an instance of concatenation; establishing a later audio segment with the rest of the portions of the bands of the selected sound/phone wherein at least one of the earlier or later audio segments comprises bands of at least two adjacent sound/phones, and wherein the solo articulation band of the selected sound/phone is at the trailing end of the earlier segment or at the leading end of the later segment and at least part of one of the co-articulation bands of the selected sound/phone is adjacent to the solo articulation band; and concatenating the two audio segments, whereby the concatenated audio segments comprise at least three bands of two adjacent sounds/phones. 2. The method of claim 1 wherein the solo articulation band of the selected sound/phone is at the leading edge of the later audio segment and the final co-articulation band of the selected sound/phone is adjacent to the solo articulation band. 3. The method of claim 1 wherein the solo articulation band of the selected sound/phone is at the trailing edge of the earlier audio segment and the final co-articulation band of the selected sound/phone is at the leading edge of the later audio segment. 4. The method of claim 1 wherein at least a portion of one of the co-articulation bands of the selected sound/phone is disposed at an end of one of the segments and is opposite the solo articulation band at the end of the other segment. 5. The method of claim 1 wherein the leading band of the later audio segment reproduces a static sound and the two audio segments are concatenated by overlapping the opposite, adjacent solo and co-articulation bands of the selected sound/phone with each other where the transfer function and the length of overlap are determined by acoustical data in the two segments. 6. The method according to claim 5 wherein the static phones include vowels, diphthongs, liquids, vibrants, fricatives and nasals. 7. The method of claim 1 wherein the band in the leading edge of the later audio segment reproduces a dynamic sound and the two audio segments are concatenated in a non-overlapping manner each other with the transfer function determined by acoustical data in the two segments. 8. The method according to claim 7 wherein the dynamic phones include plosives, affricates, glottal stops, and click sounds. 9. The method according to claim 1 wherein the initial co-articulation band of the selected sound/phone is disposed in the earlier audio segment and reproduces the properties of the start of the selected sound/phone sequence. 10. The method according to claim 1 wherein the final co-articulation band of the selected sound/phone is disposed in the later audio segment and reproduces the properties of the end of the selected sound/phone sequence. 11. The method according to claim 1 wherein voice data to be synthesized is combined in groups and each group comprises one or more individual audio segments. 12. The method according to claim 1 wherein an audio segment is established for the later audio segment band comprises the highest number of successive portions of the sounds/phones of the sound/phone sequence in order to use the smallest number of audio segment bands in the generation of the synthesized acoustical data. 13. The method according to claim 1 wherein the bands of the individual audio segments are processed in accordance with properties of the concatenated sound/phone sequence and wherein said properties include one or more of the group consisting of a modification of frequency, duration, amplitude, and spectrum. 14. The method according to claim 1 wherein the bands of individual audio segments are processed in accordance with properties of the selected band wherein the instance of concatenation lies, with these properties including one or more of the group of properties consisting of frequency, duration, amplitude, and spectrum. 15. The method according to claim 1 wherein the instance of concatenation is set in the bands of the selected sound/phone where at least two bands are in agreement with respect to one or more properties of the group of properties consisting of zero point, amplitude, gradients, derivatives of any degree, spectra, tone levels, amplitude values within a frequency band, volume, style of speech, and emotion of speech. 16. The method according to claim 1 wherein the acoustical data to be synthesized comprises voice data, and the sounds are phones. 17. The method according to claim 1 wherein the synthesized acoustical data is converted to acoustical signals and/or voice signals. 18. The method of claim 1 wherein the instance of concatenation is disposed within or at an end of one of the co-articulation bands. 19. A device for generating synthesized acoustical data by concatenating audio segments of sounds to reproduce a sequence of concatenated sounds/phones from sounds/phones that include an initial co-articulation band, a solo articulation band and a final co-articulation band, comprising: segment providing means (107/108) for providing audio segments, said segments comprising bands of one or more sounds/phones; establishing means (105) for establishing at least two audio segments from the segment providing means, said establishing means selecting an earlier audio segment having at least a portion of a band of one band of the selected sound/phone and a later audio segment with the rest of the portions of the bands of the selected sound/phone, wherein at least one of the earlier or later audio segments comprises bands of at least two adjacent sounds/phones, and said earlier audio segment having a solo articulation band of the selected sound/phone at the trailing end of the earlier segment or at the leading end of the later segment, at least part of one of the co-articulation bands of the selected sound/phone is adjacent to the solo articulation band and said selected sound/phone having an instance of concatenation means for determining the duration and position of bands in the audio segments depending on the earlier and later audio segments; and means for concatenating (111) the two audio segments at an instance of concatenation within the selected sound/phone and as a function of properties of the bands at the trailing end of the earlier segment and at the leading end of the later segment, whereby the concatenated audio segments comprise at least three bands of two adjacent sounds/phones. 20. The device of claim 19 wherein the means for providing audio segments comprises a database (107) for storing in which audio segments are stored, each of which reproducing portion of a phone or portions of a sequence of (concatenated) phones or a synthesis means (108) for supplying audio segments or any combination of said database and said synthesis means. 21. The device of claim 19 wherein the solo articulation band of the selected sound/phone is at the leading edge of the later audio segment and the final co-articulation band of the selected sound/phone is adjacent to the solo articulation band. 22. The device of claim 19 wherein the solo articulation band of the selected sound/phone is at the trailing edge of the earlier audio segment and the final co-articulation band of the selected sound/phone is at the leading edge of the later audio segment. 23. The device of claim 19 wherein at least a portion of one of the co-articulation bands of the selected sound/phone is disposed at an end of one of the segments and is opposite the solo articulation band at the end of the other segment. 24. The device of claim 19 wherein said concatenating means overlaps the leading band of the later audio segment having a static sound with the trailing band of the earlier audio segment and the transfer function and the length of overlap are determined by acoustical data in the two segments. 25. The device of claim 24 wherein the static phones include vowels, diphthongs, liquids, vibrants, fricatives and nasals. 26. The device of claim 19 wherein said concatenating means concatenates the audio segments in a non-overlapped manner when the band in the leading edge of the later audio segment reproduces a dynamic sound with the transfer function determined by acoustical data in the two segments. 27. The device according to claim 26 wherein the dynamic phones include plosives, affricates, glottal stops, and click sounds. 28. The device according to claim 19 wherein the selection means (105) selects audio segments which reproduce the greatest number of successive portions of concatenated phones of the concatenated phone sequence. 29. The device according to claim 19 wherein the concatenation means (111) comprises means for processing the bands of individual audio segments depending on properties of the concatenated phone sequence and with one or more functions selected from the group consisting of modification of frequency, duration, amplitude, and spectrum. 30. The device according to claim 19 wherein the concatenation means (111) comprises means for processing the bands of individual audio segments with one or more functions in a band selected from the group consisting of the instance of concatenation, modification of frequency, duration, amplitude, and spectrum. 31. The device according to claim 19 wherein the concatenation means (111) sets the instance of concatenation where at least two bands are in agreement with respect to one or more properties of the group of properties consisting of zero point, amplitude, gradients, derivatives of any degree, spectra, tone levels, amplitude values within a frequency band, volume, style of speech, and emotion of speech. 32. The device according to claim 19 characterized in that wherein the segment providing means includes audio segments with bands, each of which reproduces at least a portion of a sound or phone, respectively, a sound or phone, respectively, portions of phone sequences or polyphones, respectively, or sound sequences or polyphones, respectively. 33. The device according to claim 19 wherein the concatenation means (111) generates synthesized voice data by means of the concatenation of audio segments. 34. The device according to claim 19 wherein further comprising means (117) for converting synthesized acoustical data to acoustical signals and/or voice signals. 35. A data carrier which includes a computer program for the co-articulation specific concatenation of audio segments in order to generate synthesized acoustical data which reproduces a sequence of concatenated phones, wherein each sound/phone comprises three bands including an initial co-articulation band, a solo articulation band and a final co-articulation band, and each segment comprises one or more bands of a sound/phone, comprising the following steps: establishing an earlier audio segment with at least a portion of one band of a sound/phone selected for including an instance of concatenation; establishing a later audio segment with the rest of the portions of the bands of the selected sound/phone wherein at least one of the earlier or later audio segments comprises bands of at least two adjacent sounds/phones, and wherein the solo articulation band of the selected sound/phone is at the trailing end of the earlier segment or at the leading end of the later segment and at least part of one of the co-articulation bands of the selected sound/phone is adjacent to the solo articulation band; and concatenating the two audio segments whereby the concatenated audio segments comprise at least three bands of two adjacent sounds/phones. 36. The data carrier of claim 35 wherein the solo articulation band of the selected sound/phone is at the leading edge of the later audio segment and the final co-articulation band of the selected sound/phone is adjacent to the solo articulation band. 37. The data carrier of claim 35 wherein the solo articulation band of the selected sound/phone is at the trailing edge of the earlier audio segment and the final co-articulation band of the selected sound/phone is at the leading edge of the later audio segment. 38. The data carrier of claim 35 wherein at least a portion of one of the co-articulation bands of the selected sound/phone is disposed at an end of one of the segments and is opposite the solo articulation band at the end of the other segment. 39. The data carrier of claim 35 wherein the leading band of the later audio segment reproduces a static sound and the two audio segments are concatenated by overlapping the opposite, adjacent solo and co-articulation bands of the selected sound/phone with each other where the transfer function and the length of overlap are determined by acoustical data in the two segments. 40. The data carrier according to claim 39 wherein the static phones include vowels, diphthongs, liquids, vibrants, fricatives and nasals. 41. The data carrier of claim 35 wherein the band in the leading edge of the later audio segment reproduces a dynamic sound and the two audio segments are concatenated in a non-overlapping manner each other with the transfer function determined by acoustical data in the two segments. 42. The data carrier according to claim 41 wherein the dynamic phones include plosives, affricates, glottal stops, and click sounds. 43. The data carrier according to claim 35 wherein the initial co-articulation band of the selected sound/phone is disposed in the earlier audio segment and reproduces the properties of the start of the selected sound/phone sequence. 44. The data carrier according to claim 35 wherein the final co-articulation band of the selected sound/phone is disposed in the later audio segment and reproduces the properties of the end of the selected sound/phone sequence. 45. The data carrier according to claim 35 wherein voice data to be synthesized is combined in groups and each group comprises one or more individual audio segments. 46. The data carrier according to claim 35 wherein an audio segment is established for the later audio segment band comprises the highest number of successive portions of the sounds/phones of the sound/phone sequence in order to use the smallest number of audio segment bands in the generation of the synthesized acoustical data. 47. The data carrier according to claim 35 wherein the bands of the individual audio segments are processed in accordance with properties of the concatenated sound/phone sequence and wherein said properties include one or more of the group consisting of a modification of frequency, duration, amplitude, and spectrum. 48. The data carrier according to claim 35 wherein the bands of individual audio segments are processed in accordance with properties of the selected band wherein the instance of concatenation lies, with these properties including one or more of the group of properties consisting of frequency, duration, amplitude, and spectrum. 49. The data carrier according to claim 35 wherein the instance of concatenation is set in the bands of the selected sound/phone where at least two bands are in agreement with respect to one or more properties of the group of properties consisting of zero point, amplitude, gradients, derivatives of any degree, spectra, tone levels, amplitude values within a frequency band, volume, style of speech, and emotion of speech. 50. The data carrier according to claim 35 wherein data to be synthesized comprises voice data, and the sounds are phones. 51. The data carrier according to claim 35 wherein the synthesized data is converted to acoustical signals and/or voice signals. 52. The data carrier of claim 35 wherein the instance of concatenation is disposed within or at an end of one of the co-articulation bands. 53. The data carrier of claim 35 wherein data is stored as acoustical data, optical data, magnetic data or electrical data. 54. The data carrier of claim 53 wherein a group of the audio segments reproduces sounds or phones, respectively, or portions of sounds or phones, respectively. 55. The data carrier of claim 53 wherein a group of the audio segments reproduces phone sequences or portions of phone sequences or polyphones, respectively, or portions of polyphones. 56. A synthesized voice signal comprising a sequence of sounds or phones with the voice signals comprising segments of sounds to reproduce a sequence of concatenated sounds/phones wherein each sound/phone comprises three bands including an initial co-articulation band, a solo articulation band and a final co-articulation band, and each segment comprises one or more bands of a sound, said synthesized voice signals comprising: at least two audio segments concatenated for providing the synthesized voice signal, said two audio segments including an earlier audio segment with at least a part of one band of a sound/phone selected for including an instance of concatenation and a later audio segment with the rest of the portions and bands of the selected sound/phone wherein at least one of the earlier or later audio segments comprises bands of at least two adjacent sounds/phones, and wherein the solo articulation band of the selected sound/phone is at the trailing end of the earlier segment or at the leading end of the later segment and at least part of one of the co-articulation bands of the selected sound/phone is adjacent to the solo articulation band and the two audio segments are concatenated to provide the synthesized voice signal, whereby the concatenated audio segments comprise at least three bands of two adjacent sounds/phones. 57. The synthesized voice signal of claim 56 wherein the solo articulation band of the selected sound/phone is at the leading edge of the later audio segment and the final co-articulation band of the selected sound/phone is adjacent to the solo articulation band. 58. The synthesized voice signal of claims 56 wherein the solo articulation band of the selected sound/phone is at the trailing edge of the earlier audio segment and the final co-articulation band of the selected sound/phone is at the leading edge of the later audio segment. 59. The synthesized voice signal of claim 56 wherein at least a portion of one of the co-articulation bands of the selected sound/phone is disposed at an end of one of the segments and is opposite the solo articulation band at the end of the other segment. 60. The synthesized voice signal of claim 56 wherein the leading band of the later audio segment reproduces a static sound and the two audio segments are concatenated by overlapping the opposite, adjacent solo and co-articulation bands of the selected sound/phone with each other where the transfer function and the length of overlap are determined by acoustical data in the two segments. 61. The method according to claim 60 wherein the static phones include vowels, diphthongs, liquids, vibrants, fricatives and nasals. 62. The synthesized voice signal of claim 56 wherein the band in the leading edge of the later audio segment reproduces a dynamic sound and the two audio segments are concatenated in a non-overlapping manner each other with the transfer function determined by acoustical data in the two segments. 63. The synthesized voice signal of claim 62 wherein the dynamic phones include plosives, affricates, glottal stops, and click sounds. 64. The synthesized voice signal of claim 56 wherein the initial co-articulation band of the selected sound/phone is disposed in the earlier audio segment and reproduces the properties of the start of the selected sound/phone sequence. 65. The synthesized voice signal of claim 56 wherein the final co-articulation band of the selected sound/phone is disposed in the later audio segment and reproduces the properties of the end of the selected sound/phone sequence. 66. The synthesized voice signal of claim 56 wherein voice data to be synthesized is combined in groups and each group comprises one or more individual audio segments. 67. The synthesized voice signal of claim 56 wherein an audio segment is established for the later audio segment band comprises the highest number of successive portions of the sounds/phones of the sound/phone sequence in order to use the smallest number of audio segment bands in the generation of the synthesized acoustical data. 68. The synthesized voice signal of claim 62 wherein the acoustical data to be synthesized comprises voice data, and the sounds are phones. 69. The synthesized voice signal of claim 62 wherein the synthesized acoustical data is converted to acoustical signals and/or voice signals. 70. The synthesized voice signal of claim 62 wherein the instance of concatenation is disposed within or at an end of one of the co-articulation bands.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.