IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0262621
(2002-09-30)
|
등록번호 |
US-7359979
(2008-04-15)
|
발명자
/ 주소 |
- Gentle,Christopher R.
- Michaelis,Paul Roller
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
35 인용 특허 :
58 |
초록
▼
The present invention is directed to voice communication devices in which an audio stream is divided into a sequence of individual packets, each of which is routed via pathways that can vary depending on the availability of network resources. All embodiments of the invention rely on an acoustic prio
The present invention is directed to voice communication devices in which an audio stream is divided into a sequence of individual packets, each of which is routed via pathways that can vary depending on the availability of network resources. All embodiments of the invention rely on an acoustic prioritization agent that assigns a priority value to the packets. The priority value is based on factors such as whether the packet contains voice activity and the degree of acoustic similarity between this packet and adjacent packets in the sequence. A confidence level, associated with the priority value, may also be assigned. In one embodiment, network congestion is reduced by deliberately failing to transmit packets that are judged to be acoustically similar to adjacent packets; the expectation is that, under these circumstances, traditional packet loss concealment algorithms in the receiving device will construct an acceptably accurate replica of the missing packet. In another embodiment, the receiving device can reduce the number of packets stored in its jitter buffer, and therefore the latency of the speech signal, by selectively deleting one or more packets within sustained silences or non-varying speech events. In both embodiments, the ability of the system to drop appropriate packets may be enhanced by taking into account the confidence levels associated with the priority assessments.
대표청구항
▼
What is claimed is: 1. A method for processing voice communications over a data network, comprising: (a) receiving a voice stream from a user, the voice stream comprising a plurality of temporally distinct segments; and (b) processing at least first, second and third segments of the voice stream ac
What is claimed is: 1. A method for processing voice communications over a data network, comprising: (a) receiving a voice stream from a user, the voice stream comprising a plurality of temporally distinct segments; and (b) processing at least first, second and third segments of the voice stream according to the following substeps: (i) selecting the first segment, wherein the contents of the selected first segment are not product of voice activity; (ii) determining that the contents of the selected first segment are not the product of voice activity; (iii) determining a level of confidence that the voice activity determination for the selected first segment is accurate; (iv) when the level of confidence is one of less than and greater than a predetermined threshold, not transmitting the selected first segment to a selected endpoint; (v) selecting the second segment, wherein the contents of the selected second segment are the product of voice activity and wherein the second and third segments are temporally adjacent to one another; (vi) determining that the contents of the selected segment are the product of voice activity; (vii) comparing the selected second segment with the third segment to determine a degree of acoustic similarity between the second and third segments; and (viii) when the selected second segment is similar to the third segment, at least one of not transmitting the selected second segment to the selected endpoint and dropping the second segment during transmission. 2. The method of claim 1, further comprising: (c) selecting a fourth segment of the voice stream; (d) determining that the contents of the fourth segment are not the product of voice activity; (e) determining a level of confidence that the voice activity determination for the selected fourth segment is accurate; (f) determining that the level of confidence is the other of less than and greater than the predetermined threshold; (g) assigning an importance to the fourth segment. 3. The method of claim 2, wherein the importance is a value marker and further comprising: incorporating the value marker into a packet comprising the fourth segment. 4. The method of claim 3, further comprising: when the value of the value marker is one of less than and greater than a predetermined value threshold, removing the packet from a receive buffer. 5. The method of claim 2, wherein the importance is a service class assigned to a packet comprising the fourth segment. 6. The method of claim 2, wherein the importance is a transmission priority assigned to a packet comprising the fourth segment. 7. The method of claim 2, further comprising: (h) when packet traffic congestion is determined to exist, dropping packets having value markers less than a predetermined level. 8. The method of claim 7, further comprising: varying the predetermined threshold based on at least one of jitter, latency, a number of missing packets, a number of packets received out-of-order, a processing delay, a propagation delay, a receive buffer delay, and a number of packets enqueued in a receive buffer. 9. The method of claim 1, further comprising: (ix) when the selected second segment is not similar to the third segment, transmitting the selected second segment to the selected endpoint and not dropping the second segment during transmission. 10. The method of claim 1, further comprising the substep: (ix) assigning an importance to the second segment, wherein the level of importance is at least one of a transmission priority of a packet comprising the second segment and a value marker to be included in the packet. 11. The method of claim 10, wherein the third segment temporally precedes the second segment and a fourth segment temporally follows the second segment and wherein substep (iv) comprises: comparing the second segment with the third segment of the voice stream to determine a first degree of acoustic similarity between the second and third segments; and comparing the second segment with the fourth segment of the voice stream to determine a second degree of acoustic similarity between the second and fourth segments. 12. The method of claim 11, wherein the processing step is based on at least one of the first and second degrees of acoustic similarity one of exceeding or being less than a selected similarity threshold. 13. The method of claim 10, wherein a first packet associated with the first segment is not transmitted and further comprising: later reconstructing the first segment with a packet loss concealment algorithm. 14. The method of claim 1, wherein the first and second segments correspond to a payload of a first packet. 15. The method of claim 1, wherein the first segment corresponds to a frame of a first packet and the second segment to a frame of a second packet. 16. The method of claim 1, wherein different classes of services are used for different segments of the voice stream. 17. The method of claim 1, wherein different transmission priorities are used for different segments of the voice stream. 18. The method of claim 1, wherein the first and third segments are temporally adjacent to the second segment. 19. The method of claim 18, further comprising: determining a type of voice activity associated with the contents of the second segment, wherein the type of voice activity is a plosive. 20. A computer readable circuit containing processor executable instructions to perform steps comprising: (a) receiving a voice stream from a user, the voice stream comprising a plurality of temporally distinct segments; and (b) processing at least first, second and third segments of the voice stream according to the following substeps: (i) selecting the first segment, wherein the contents of the selected first segment are not product of voice activity; (ii) determining that the contents of the selected first segment are not the product of voice activity; (iii) determining a level of confidence that the voice activity determination for the selected first segment is accurate; (iv) when the level of confidence is one of less than and greater than a predetermined threshold, not transmitting the selected first segment to a selected endpoint; (v) selecting the second segment, wherein the contents of the selected second segment are the product of voice activity and wherein the second and third segments are temporally adjacent to one another; (vi) determining that the contents of the selected segment are the product of voice activity; (vii) comparing the selected second segment with the third segment to determine a degree of acoustic similarity between the second and third segments; and (viii) when the selected second segment is similar to the third segment, at least one of not transmitting the selected second segment to the selected endpoint and dropping the second segment during transmission. 21. A logic circuit configured to perform steps comprising: (a) receiving a voice stream from a user, the voice stream comprising a plurality of temporally distinct segments; and (b) processing at least first, second and third segments of the voice stream according to the following substeps: (i) selecting the first segment, wherein the contents of the selected first segment are not product of voice activity; (ii) determining that the contents of the selected first segment are not the product of voice activity; (iii) determining a level of confidence that the voice activity determination for the selected first segment is accurate; (iv) when the level of confidence is one of less than and greater than a predetermined threshold, not transmitting the selected first segment to a selected endpoint; (v) selecting the second segment, wherein the contents of the selected second segment are the product of voice activity and wherein the second and third segments are temporally adjacent to one another; (vi) determining that the contents of the selected segment are the product of voice activity; (vii) comparing the selected second segment with the third segment to determine a degree of acoustic similarity between the second and third segments; and (viii) when the selected second segment is similar to the third segment, at least one of not transmitting the selected second segment to the selected endpoint and dropping the second segment during transmission. 22. A method for processing voice communications over a data network, comprising: (a) receiving a voice stream from a user, the voice stream comprising a plurality of temporally distinct segments; and (b) processing the segments of the voice stream according to the following rules: (i) determining whether or not the content of a selected segment is a product of voice activity; (ii) when the content of the selected segment is determined not to be the product of voice activity, determining a level of confidence that the voice activity determination for the selected segment is accurate; (iii) when the level of confidence is one of less than and greater than a predetermined threshold, not transmitting the selected segment to a selected endpoint; (iv) when the content of the selected segment is determined to be the product of voice activity, comparing the selected segment with at least one temporally adjacent segment to determine a degree of acoustic similarity between the selected and at least one temporally adjacent segments; and (v) when the selected segment is similar to the at least one temporally adjacent segment, at least one of not transmitting the selected segment to the selected endpoint and transmitting a packet comprising the selected segment with a level of importance lower than a packet comprising a dissimilar segment. 23. The method of claim 22, wherein, when the level of confidence is the other of less than and greater than the predetermined threshold, determining a level of importance of the selected segment. 24. The method of claim 23, wherein the level of importance is a transmission priority of a packet comprising the selected segment. 25. The method of claim 24, wherein segments determined not to be a product of voice activity have a lower level of importance than dissimilar segments determined to be a product of voice activity. 26. The method of claim 23, wherein the level of importance is a value marker placed in a header and/or payload of a packet comprising the selected segment. 27. The method of claim 26, wherein, when a communication link with the selected endpoint is determined to be congested, packets having value markers having values less than a predetermined level are dropped. 28. The method of claim 22, wherein, when the selected segment is dissimilar to the at least one temporally adjacent segment, transmitting the selected segment to the selected endpoint. 29. The method of claim 28, wherein the at least one temporally adjacent segment comprises a segment temporally preceding the selected segment and a segment temporally following the selected segment. 30. The method of claim 29, wherein packets comprising similar content are sent with a lower priority than packets comprising dissimilar content. 31. The method of claim 29, wherein packets comprising similar content comprise value markers having a value lower than packets comprising dissimilar content. 32. A computer readable medium comprising processor-executable instructions operable to perform steps comprising: (a) receiving a voice stream from a user, the voice stream comprising a plurally of temporally distinct segments; and (b) processing the segments of the voice stream according to the following rules: (i) determining whether or not the content of a selected segment is a product of voice activity (iii) when the content of the selected segment is determined not to be the product of voice activity, determining a level of confidence that the voice activity determination for the selected segment is accurate; (iii) when the level of confidence is one of less than and greater than a predetermined threshold, not transmitting the selected segment to a selected endpoint; (iv) when the content of the selected segment is determined to be the product of voice activity, comparing the selected segment with at least one temporally adjacent segment to determine a degree of acoustic similarity between the selected and at least one temporally adjacent segments; and (v) when the selected segment is similar to the at least one temporally adjacent segment, at least one of not transmitting the selected segment to the selected endpoint and transmitting a packet comprising the selected segment with a level of importance lower than a packet comprising a dissimilar segment. 33. The medium of claim 32, wherein, when the level of confidence is the other of less than and greater than the predetermined threshold, determining a level of importance of the selected segment. 34. The medium of claim 33, wherein the level of importance is a transmission priority of a packet comprising the selected segment. 35. The medium of claim 33, wherein segments determined not to be a product of voice activity have a lower level of importance than dissimilar segments determined to be a product of voice activity. 36. The medium of claim 32, wherein the level of importance is a value marker placed in a header and/or payload of a packet comprising the selected segment. 37. The medium of claim 36, wherein, when a communication link with the selected endpoint is determined to be congested, packets having value markers having values less than a predetermined level are dropped. 38. The medium of claim 32, wherein, when the selected segment is dissimilar to the at least one temporally adjacent segment, transmitting the selected segment to the selected endpoint. 39. The medium of claim 38, wherein the at least one temporally adjacent segment comprises a segment temporally preceding the selected segment and a segment temporally following the selected segment. 40. The medium of claim 39, wherein packets comprising similar content are sent with a lower priority than packets comprising dissimilar content. 41. The medium of claim 39, wherein packets comprising similar content comprise value markers having a value lower than packets comprising dissimilar content. 42. A logic circuit operable to perform steps comprising: (a) receiving a voice stream from a user, the voice stream comprising a plurality of temporally distinct segments; and (b) processing the segments of the voice stream according to the following rules: (i) determining whether or not the content of a selected segment is a product of voice activity; (iii) when the content of the selected segment is determined not to be the product of voice activity, determining a level of confidence that the voice activity determination for the selected segment is accurate; (iii) when the level of confidence is one of less than and greater than a predetermined threshold, not transmitting the selected segment to a selected endpoint; (iv) when the content of the selected segment is determined to be the product of voice activity, comparing the selected segment with at least one temporally adjacent segment to determine a degree of acoustic similarity between the selected and at least one temporally adjacent segments; and (v) when the selected segment is similar to the at least one temporally adjacent segment, at least one of not transmitting the selected segment to the selected endpoint and transmitting a packet comprising the selected segment with a level of importance lower than a packet comprising a dissimilar segment. 43. The circuit of claim 42, wherein, when the level of confidence is the other of less than and greater than the predetermined threshold, determining a level of importance of the selected segment. 44. The circuit of claim 43, wherein the level of importance is a transmission priority of a packet comprising the selected segment. 45. The circuit of claim 43, wherein segments determined not to be a product of voice activity have a lower level of importance than dissimilar segments determined to be a product of voice activity. 46. The circuit of claim 42, wherein the level of importance is a value marker placed in a header and/or payload of a packet comprising the selected segment. 47. The circuit of claim 46, wherein, when a communication link with the selected endpoint is determined to be congested, packets having value markers having values less than a predetermined level are dropped. 48. The circuit of claim 42, wherein, when the selected segment is dissimilar to the at least one temporally adjacent segment, transmitting the selected segment to the selected endpoint. 49. The circuit of claim 48, wherein the at least one temporally adjacent segment comprises a segment temporally preceding the selected segment and a segment temporally following the selected segment. 50. The circuit of claim 49, wherein packets comprising similar content are sent with a lower priority than packets comprising dissimilar content. 51. The circuit of claim 49, wherein packets comprising similar content comprise value markers having a value lower than packets comprising dissimilar content.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.