[특허]Method and device for co-articulated concatenation of audio segments

Method and device for co-articulated concatenation of audio segments 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G10L-015/00
출원번호	US-0763149 (1999-08-19)
우선권정보	DE-198 37 661(1998-08-19)
국제출원번호	PCT/EP99/006081 (1999-08-19)
§371/§102 date	20010430 (20010430)
국제공개번호	WO00/011647 (2000-03-02)
발명자 / 주소	Buskies,Christoph
출원인 / 주소	Buskies,Christoph
대리인 / 주소	Hiscock &
인용정보	피인용 횟수 : 15 인용 특허 : 3

초록 ▼

The invention provides a method, apparatus, and a computer program stored on a data carrier that generates synthesized acoustical data by concatenating audio segments of sounds to reproduce a sequence of concatenated sounds/phones. The invention has an inventory or sounds and each sound has three bands (FIG. 1b) including an initial co-articulation band, a solo articulation band and a final co-articulation band. The invention selects audio segments that end or begin with a co-articulation band and a solo articulation band of one sound. The instance of concatenation is defined by the co-articulation band and the solo articulation band of the one sound.

대표청구항 ▼

The invention claimed is: 1. A method for generating synthesized acoustical data by concatenating audio segments of sounds to reproduce a sequence of concatenated sounds/phones wherein each sound/phone comprises three bands including an initial co-articulation band, a solo articulation band and a final co-articulation band, and each segment comprises one or more bands of a sound/phone, said method comprising: generating an inventory of audio segments comprising a plurality of audio segments comprising bands of one or more sounds/phones; establishing an earlier audio segment with at least a portion of one band of a sound/phone selected for including an instance of concatenation; establishing a later audio segment with the rest of the portions of the bands of the selected sound/phone wherein at least one of the earlier or later audio segments comprises bands of at least two adjacent sound/phones, and wherein the solo articulation band of the selected sound/phone is at the trailing end of the earlier segment or at the leading end of the later segment and at least part of one of the co-articulation bands of the selected sound/phone is adjacent to the solo articulation band; and concatenating the two audio segments, whereby the concatenated audio segments comprise at least three bands of two adjacent sounds/phones. 2. The method of claim 1 wherein the solo articulation band of the selected sound/phone is at the leading edge of the later audio segment and the final co-articulation band of the selected sound/phone is adjacent to the solo articulation band. 3. The method of claim 1 wherein the solo articulation band of the selected sound/phone is at the trailing edge of the earlier audio segment and the final co-articulation band of the selected sound/phone is at the leading edge of the later audio segment. 4. The method of claim 1 wherein at least a portion of one of the co-articulation bands of the selected sound/phone is disposed at an end of one of the segments and is opposite the solo articulation band at the end of the other segment. 5. The method of claim 1 wherein the leading band of the later audio segment reproduces a static sound and the two audio segments are concatenated by overlapping the opposite, adjacent solo and co-articulation bands of the selected sound/phone with each other where the transfer function and the length of overlap are determined by acoustical data in the two segments. 6. The method according to claim 5 wherein the static phones include vowels, diphthongs, liquids, vibrants, fricatives and nasals. 7. The method of claim 1 wherein the band in the leading edge of the later audio segment reproduces a dynamic sound and the two audio segments are concatenated in a non-overlapping manner each other with the transfer function determined by acoustical data in the two segments. 8. The method according to claim 7 wherein the dynamic phones include plosives, affricates, glottal stops, and click sounds. 9. The method according to claim 1 wherein the initial co-articulation band of the selected sound/phone is disposed in the earlier audio segment and reproduces the properties of the start of the selected sound/phone sequence. 10. The method according to claim 1 wherein the final co-articulation band of the selected sound/phone is disposed in the later audio segment and reproduces the properties of the end of the selected sound/phone sequence. 11. The method according to claim 1 wherein voice data to be synthesized is combined in groups and each group comprises one or more individual audio segments. 12. The method according to claim 1 wherein an audio segment is established for the later audio segment band comprises the highest number of successive portions of the sounds/phones of the sound/phone sequence in order to use the smallest number of audio segment bands in the generation of the synthesized acoustical data. 13. The method according to claim 1 wherein the bands of the individual audio segments are processed in accordance with properties of the concatenated sound/phone sequence and wherein said properties include one or more of the group consisting of a modification of frequency, duration, amplitude, and spectrum. 14. The method according to claim 1 wherein the bands of individual audio segments are processed in accordance with properties of the selected band wherein the instance of concatenation lies, with these properties including one or more of the group of properties consisting of frequency, duration, amplitude, and spectrum. 15. The method according to claim 1 wherein the instance of concatenation is set in the bands of the selected sound/phone where at least two bands are in agreement with respect to one or more properties of the group of properties consisting of zero point, amplitude, gradients, derivatives of any degree, spectra, tone levels, amplitude values within a frequency band, volume, style of speech, and emotion of speech. 16. The method according to claim 1 wherein the acoustical data to be synthesized comprises voice data, and the sounds are phones. 17. The method according to claim 1 wherein the synthesized acoustical data is converted to acoustical signals and/or voice signals. 18. The method of claim 1 wherein the instance of concatenation is disposed within or at an end of one of the co-articulation bands. 19. A device for generating synthesized acoustical data by concatenating audio segments of sounds to reproduce a sequence of concatenated sounds/phones from sounds/phones that include an initial co-articulation band, a solo articulation band and a final co-articulation band, comprising: segment providing means (107/108) for providing audio segments, said segments comprising bands of one or more sounds/phones; establishing means (105) for establishing at least two audio segments from the segment providing means, said establishing means selecting an earlier audio segment having at least a portion of a band of one band of the selected sound/phone and a later audio segment with the rest of the portions of the bands of the selected sound/phone, wherein at least one of the earlier or later audio segments comprises bands of at least two adjacent sounds/phones, and said earlier audio segment having a solo articulation band of the selected sound/phone at the trailing end of the earlier segment or at the leading end of the later segment, at least part of one of the co-articulation bands of the selected sound/phone is adjacent to the solo articulation band and said selected sound/phone having an instance of concatenation means for determining the duration and position of bands in the audio segments depending on the earlier and later audio segments; and means for concatenating (111) the two audio segments at an instance of concatenation within the selected sound/phone and as a function of properties of the bands at the trailing end of the earlier segment and at the leading end of the later segment, whereby the concatenated audio segments comprise at least three bands of two adjacent sounds/phones. 20. The device of claim 19 wherein the means for providing audio segments comprises a database (107) for storing in which audio segments are stored, each of which reproducing portion of a phone or portions of a sequence of (concatenated) phones or a synthesis means (108) for supplying audio segments or any combination of said database and said synthesis means. 21. The device of claim 19 wherein the solo articulation band of the selected sound/phone is at the leading edge of the later audio segment and the final co-articulation band of the selected sound/phone is adjacent to the solo articulation band. 22. The device of claim 19 wherein the solo articulation band of the selected sound/phone is at the trailing edge of the earlier audio segment and the final co-articulation band of the selected sound/phone is at the leading edge of the later audio segment. 23. The device of claim 19 wherein at least a portion of one of the co-articulation bands of the selected sound/phone is disposed at an end of one of the segments and is opposite the solo articulation band at the end of the other segment. 24. The device of claim 19 wherein said concatenating means overlaps the leading band of the later audio segment having a static sound with the trailing band of the earlier audio segment and the transfer function and the length of overlap are determined by acoustical data in the two segments. 25. The device of claim 24 wherein the static phones include vowels, diphthongs, liquids, vibrants, fricatives and nasals. 26. The device of claim 19 wherein said concatenating means concatenates the audio segments in a non-overlapped manner when the band in the leading edge of the later audio segment reproduces a dynamic sound with the transfer function determined by acoustical data in the two segments. 27. The device according to claim 26 wherein the dynamic phones include plosives, affricates, glottal stops, and click sounds. 28. The device according to claim 19 wherein the selection means (105) selects audio segments which reproduce the greatest number of successive portions of concatenated phones of the concatenated phone sequence. 29. The device according to claim 19 wherein the concatenation means (111) comprises means for processing the bands of individual audio segments depending on properties of the concatenated phone sequence and with one or more functions selected from the group consisting of modification of frequency, duration, amplitude, and spectrum. 30. The device according to claim 19 wherein the concatenation means (111) comprises means for processing the bands of individual audio segments with one or more functions in a band selected from the group consisting of the instance of concatenation, modification of frequency, duration, amplitude, and spectrum. 31. The device according to claim 19 wherein the concatenation means (111) sets the instance of concatenation where at least two bands are in agreement with respect to one or more properties of the group of properties consisting of zero point, amplitude, gradients, derivatives of any degree, spectra, tone levels, amplitude values within a frequency band, volume, style of speech, and emotion of speech. 32. The device according to claim 19 characterized in that wherein the segment providing means includes audio segments with bands, each of which reproduces at least a portion of a sound or phone, respectively, a sound or phone, respectively, portions of phone sequences or polyphones, respectively, or sound sequences or polyphones, respectively. 33. The device according to claim 19 wherein the concatenation means (111) generates synthesized voice data by means of the concatenation of audio segments. 34. The device according to claim 19 wherein further comprising means (117) for converting synthesized acoustical data to acoustical signals and/or voice signals. 35. A data carrier which includes a computer program for the co-articulation specific concatenation of audio segments in order to generate synthesized acoustical data which reproduces a sequence of concatenated phones, wherein each sound/phone comprises three bands including an initial co-articulation band, a solo articulation band and a final co-articulation band, and each segment comprises one or more bands of a sound/phone, comprising the following steps: establishing an earlier audio segment with at least a portion of one band of a sound/phone selected for including an instance of concatenation; establishing a later audio segment with the rest of the portions of the bands of the selected sound/phone wherein at least one of the earlier or later audio segments comprises bands of at least two adjacent sounds/phones, and wherein the solo articulation band of the selected sound/phone is at the trailing end of the earlier segment or at the leading end of the later segment and at least part of one of the co-articulation bands of the selected sound/phone is adjacent to the solo articulation band; and concatenating the two audio segments whereby the concatenated audio segments comprise at least three bands of two adjacent sounds/phones. 36. The data carrier of claim 35 wherein the solo articulation band of the selected sound/phone is at the leading edge of the later audio segment and the final co-articulation band of the selected sound/phone is adjacent to the solo articulation band. 37. The data carrier of claim 35 wherein the solo articulation band of the selected sound/phone is at the trailing edge of the earlier audio segment and the final co-articulation band of the selected sound/phone is at the leading edge of the later audio segment. 38. The data carrier of claim 35 wherein at least a portion of one of the co-articulation bands of the selected sound/phone is disposed at an end of one of the segments and is opposite the solo articulation band at the end of the other segment. 39. The data carrier of claim 35 wherein the leading band of the later audio segment reproduces a static sound and the two audio segments are concatenated by overlapping the opposite, adjacent solo and co-articulation bands of the selected sound/phone with each other where the transfer function and the length of overlap are determined by acoustical data in the two segments. 40. The data carrier according to claim 39 wherein the static phones include vowels, diphthongs, liquids, vibrants, fricatives and nasals. 41. The data carrier of claim 35 wherein the band in the leading edge of the later audio segment reproduces a dynamic sound and the two audio segments are concatenated in a non-overlapping manner each other with the transfer function determined by acoustical data in the two segments. 42. The data carrier according to claim 41 wherein the dynamic phones include plosives, affricates, glottal stops, and click sounds. 43. The data carrier according to claim 35 wherein the initial co-articulation band of the selected sound/phone is disposed in the earlier audio segment and reproduces the properties of the start of the selected sound/phone sequence. 44. The data carrier according to claim 35 wherein the final co-articulation band of the selected sound/phone is disposed in the later audio segment and reproduces the properties of the end of the selected sound/phone sequence. 45. The data carrier according to claim 35 wherein voice data to be synthesized is combined in groups and each group comprises one or more individual audio segments. 46. The data carrier according to claim 35 wherein an audio segment is established for the later audio segment band comprises the highest number of successive portions of the sounds/phones of the sound/phone sequence in order to use the smallest number of audio segment bands in the generation of the synthesized acoustical data. 47. The data carrier according to claim 35 wherein the bands of the individual audio segments are processed in accordance with properties of the concatenated sound/phone sequence and wherein said properties include one or more of the group consisting of a modification of frequency, duration, amplitude, and spectrum. 48. The data carrier according to claim 35 wherein the bands of individual audio segments are processed in accordance with properties of the selected band wherein the instance of concatenation lies, with these properties including one or more of the group of properties consisting of frequency, duration, amplitude, and spectrum. 49. The data carrier according to claim 35 wherein the instance of concatenation is set in the bands of the selected sound/phone where at least two bands are in agreement with respect to one or more properties of the group of properties consisting of zero point, amplitude, gradients, derivatives of any degree, spectra, tone levels, amplitude values within a frequency band, volume, style of speech, and emotion of speech. 50. The data carrier according to claim 35 wherein data to be synthesized comprises voice data, and the sounds are phones. 51. The data carrier according to claim 35 wherein the synthesized data is converted to acoustical signals and/or voice signals. 52. The data carrier of claim 35 wherein the instance of concatenation is disposed within or at an end of one of the co-articulation bands. 53. The data carrier of claim 35 wherein data is stored as acoustical data, optical data, magnetic data or electrical data. 54. The data carrier of claim 53 wherein a group of the audio segments reproduces sounds or phones, respectively, or portions of sounds or phones, respectively. 55. The data carrier of claim 53 wherein a group of the audio segments reproduces phone sequences or portions of phone sequences or polyphones, respectively, or portions of polyphones. 56. A synthesized voice signal comprising a sequence of sounds or phones with the voice signals comprising segments of sounds to reproduce a sequence of concatenated sounds/phones wherein each sound/phone comprises three bands including an initial co-articulation band, a solo articulation band and a final co-articulation band, and each segment comprises one or more bands of a sound, said synthesized voice signals comprising: at least two audio segments concatenated for providing the synthesized voice signal, said two audio segments including an earlier audio segment with at least a part of one band of a sound/phone selected for including an instance of concatenation and a later audio segment with the rest of the portions and bands of the selected sound/phone wherein at least one of the earlier or later audio segments comprises bands of at least two adjacent sounds/phones, and wherein the solo articulation band of the selected sound/phone is at the trailing end of the earlier segment or at the leading end of the later segment and at least part of one of the co-articulation bands of the selected sound/phone is adjacent to the solo articulation band and the two audio segments are concatenated to provide the synthesized voice signal, whereby the concatenated audio segments comprise at least three bands of two adjacent sounds/phones. 57. The synthesized voice signal of claim 56 wherein the solo articulation band of the selected sound/phone is at the leading edge of the later audio segment and the final co-articulation band of the selected sound/phone is adjacent to the solo articulation band. 58. The synthesized voice signal of claims 56 wherein the solo articulation band of the selected sound/phone is at the trailing edge of the earlier audio segment and the final co-articulation band of the selected sound/phone is at the leading edge of the later audio segment. 59. The synthesized voice signal of claim 56 wherein at least a portion of one of the co-articulation bands of the selected sound/phone is disposed at an end of one of the segments and is opposite the solo articulation band at the end of the other segment. 60. The synthesized voice signal of claim 56 wherein the leading band of the later audio segment reproduces a static sound and the two audio segments are concatenated by overlapping the opposite, adjacent solo and co-articulation bands of the selected sound/phone with each other where the transfer function and the length of overlap are determined by acoustical data in the two segments. 61. The method according to claim 60 wherein the static phones include vowels, diphthongs, liquids, vibrants, fricatives and nasals. 62. The synthesized voice signal of claim 56 wherein the band in the leading edge of the later audio segment reproduces a dynamic sound and the two audio segments are concatenated in a non-overlapping manner each other with the transfer function determined by acoustical data in the two segments. 63. The synthesized voice signal of claim 62 wherein the dynamic phones include plosives, affricates, glottal stops, and click sounds. 64. The synthesized voice signal of claim 56 wherein the initial co-articulation band of the selected sound/phone is disposed in the earlier audio segment and reproduces the properties of the start of the selected sound/phone sequence. 65. The synthesized voice signal of claim 56 wherein the final co-articulation band of the selected sound/phone is disposed in the later audio segment and reproduces the properties of the end of the selected sound/phone sequence. 66. The synthesized voice signal of claim 56 wherein voice data to be synthesized is combined in groups and each group comprises one or more individual audio segments. 67. The synthesized voice signal of claim 56 wherein an audio segment is established for the later audio segment band comprises the highest number of successive portions of the sounds/phones of the sound/phone sequence in order to use the smallest number of audio segment bands in the generation of the synthesized acoustical data. 68. The synthesized voice signal of claim 62 wherein the acoustical data to be synthesized comprises voice data, and the sounds are phones. 69. The synthesized voice signal of claim 62 wherein the synthesized acoustical data is converted to acoustical signals and/or voice signals. 70. The synthesized voice signal of claim 62 wherein the instance of concatenation is disposed within or at an end of one of the co-articulation bands.

이 특허에 인용된 특허 (3)

Gagnon Richard T. (Highland MI), Method and apparatus for speech generation from phonetic codes.
상세보기
Hamon Christian (Dinan FRX), Processing device for speech synthesis by addition of overlapping wave forms.
상세보기
Kaja Jaan (Haninge SEX), Speech synthesis with weighted parameters at phoneme boundaries.
상세보기

이 특허를 인용한 특허 (15)

Beutnagel, Mark Charles; Mohri, Mehryar; Riley, Michael Dennis, Concatenation cost in speech synthesis for acoustic unit sequential pair using hash table and default concatenation cost.
상세보기
Partovi, Hadi; Brathwaite, Roderick Steven; Davis, Angus Macdonald; McCue, Michael S.; Porter, Brandon William; Giannandrea, John; Walther, Eckart; Li, Zhe, Content personalization over an interface with adaptive voice character.
상세보기
Conkie, Alistair D.; Syrdal, Ann K., Method and system for enhancing a speech database.
상세보기
Conkie, Alistair D.; Syrdal, Ann K., Method and system for enhancing a speech database.
상세보기
Conkie, Alistair; Syrdal, Ann K, Method and system for enhancing a speech database.
상세보기
Arun, Uma; Voran-Nowak, Sherri J; Chengalvarayan, Rathinavelu; Talwar, Gaurav, Method of recognizing speech.
상세보기
Matsubara, Hiroaki, Method, apparatus and program capable of outputting response perceivable to a user as natural-sounding.
상세보기
Beutnagel, Mark Charles; Mohri, Mehryar; Riley, Michael Dennis, Methods and apparatus for rapid acoustic unit selection from a large speech corpus.
상세보기
Stifelman, Lisa J.; Partovi, Hadi; Partovi, Haleh; Alpert, David Bryan; Marx, Matthew Talin; Bailey, Scott James; Sims, Kyle D.; Bailey, Darby McDonough; Brathwaite, Roderick Steven; Koh, Eugene; Davis, Angus Macdonald, Providing menu and other services for an information processing system using a telephone or other audio interface.
상세보기
Stifelman,Lisa Joy; Partovi,Hadi; Partovi,Haleh; Alpert,David Bryan; Marx,Matthew Talin; Bailey,Scott James; Sims,Kyle D.; Bailey,Darby McDonough; Brathwaite,Roderick Steven; Koh,Eugene; Davis,Angus Macdonald, Providing services for an information processing system using an audio interface.
상세보기
Beutnagel, Mark Charles; Mohri, Mehryar; Riley, Michael Dennis, Recording concatenation costs of most common acoustic unit sequential pairs to a concatenation cost database for speech synthesis.
상세보기
Beutnagel, Mark Charles; Mohri, Mehryar; Riley, Michael Dennis, Speech synthesis from acoustic units with default values of concatenation cost.
상세보기
Eller, David Donald; Morphet, Steven Brian; Boyett, Watson Brent, System and method for synthesizing human speech using multiple speakers and context.
상세보기
Burges,Christopher J. C.; Platt,John C.; Plastina,Daniel; Renshaw,Erin L.; Malvar,Henrique S., Systems and methods for generating audio thumbnails.
상세보기
Saino, Keijiro, Voice synthesis apparatus using a plurality of phonetic piece data.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Method and device for co-articulated concatenation of audio segments 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (3)

이 특허를 인용한 특허 (15)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Method and device for co-articulated concatenation of audio segments 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (3)

이 특허를 인용한 특허 (15)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트