Audio-to-video synchronization system and method for packet-based network video conferencing
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
H04L-012/16
H04Q-011/00
H04L-012/66
H04J-003/16
H04J-003/22
H04J-003/24
H04J-003/06
G06F-015/16
출원번호
UP-0890581
(2004-07-13)
등록번호
US-7664057
(2010-04-04)
발명자
/ 주소
Wu, Fang
Chen, Wen-hsiung
Friedrich, Walter R.
Sarkar, Shantanu
출원인 / 주소
Cisco Technology, Inc.
대리인 / 주소
Rosenfeld, Dov
인용정보
피인용 횟수 :
32인용 특허 :
3
초록▼
Synchronizing audio and video streams in packet-based networks requires synchronization of packet timestamps. The present invention provides such synchronization without resort to a network time standard. In one embodiment of the present invention, pairs of timestamp synchronized signals, such as au
Synchronizing audio and video streams in packet-based networks requires synchronization of packet timestamps. The present invention provides such synchronization without resort to a network time standard. In one embodiment of the present invention, pairs of timestamp synchronized signals, such as audio and video signals, not having a common timestamp clock are mixed. One of the signals, for example, the audio signals, is mixed first while preserving the original audio timestamps. The preserved timestamp information is then used to synchronize the timestamps of the unmixed signals, in this example the video signals, to provide synchronization of all signals. In another embodiment, the present invention uses packets containing calibration of timestamps to reduce jitter. The present invention also includes specifications for a packet for transmitting timestamp information.
대표청구항▼
We claim: 1. A method of synchronizing streams of packets of audio data and video data over a network, comprising: accepting a plurality of streams of packets of audio data from pairs of streams of packets of audio and video data, each stream of packets of audio data in each pair synchronized with
We claim: 1. A method of synchronizing streams of packets of audio data and video data over a network, comprising: accepting a plurality of streams of packets of audio data from pairs of streams of packets of audio and video data, each stream of packets of audio data in each pair synchronized with the video data of the pair, such that each stream of packets of audio data includes a respective timestamp that is synchronized with a respective timestamp of a respective matching stream of packets of video data, the streams of audio and video data packets in one pair using timestamps that may be independent of the timestamps used in the streams of audio and video data packets of any other pair; mixing the accepted streams of packets of audio data at a mixing time to form a combined stream of packets of synchronized mixed audio data having a mixing timestamp; forming timing information that relates the values at or prior to the mixing time of the respective timestamps of the packets of the audio data of the accepted streams being mixed at the mixing time, such that the relationship at a later time of the mixing timestamp with the timestamps of the individual streams being mixed is derivable; and transmitting the combined stream, at least one of the plurality of streams of packets of video data whose audio data is mixed into the transmitted combined stream, and the formed timing information over the network; such that a receiver, coupled to the network, receiving the timing information, the combined stream, and the at least one of the plurality of streams of packets of video data can use the formed timing information to synchronize the respective video data of the at least one stream of the plurality of streams of packets of video data with the mixed audio data of said combined stream, and such that no common network clock is needed for the mixing and no common network clock is needed to synchronize the respective video data of the at least one stream of the plurality of streams of packets of video data with the mixed audio data of said combined stream. 2. The method of claim 1, wherein said timing information includes the timestamps of at least one packet of audio data of each of the accepted streams of packets of audio data at the time of mixing. 3. The method of claim 1, wherein said forming timing information includes incorporating said timing information in a special header of the packets of said combined stream. 4. The method of claim 1, wherein said forming timing information includes forming a timestamp information packet separate from the packets containing the mixed audio data of said combined stream. 5. The method of claim 4, wherein the packets of said combined stream include an RTP packet. 6. The method of claim 4, wherein the packets of said combined stream include an RTCP packet. 7. The method of claim 1, wherein said timing information includes the ratio of the differences of sequential timestamps of packets of audio data of two of the accepted streams. 8. The method of claim 1, wherein said network is an IP network. 9. The method of claim 1, wherein said synchronizing by said receiver of said at least one stream of the plurality of streams of packets of video data with said mixed audio data of said combined stream includes updating at least one of the timestamps of the packets of said at least one stream of the plurality of streams of packets of video data with said mixing timestamp. 10. A method of synchronizing streams over a packet network, said method comprising: forming timing information that relates the values at or prior to a mixing time of respective timestamps of the packets of a plurality of streams of packets of audio data used to form, at the mixing time, mixed synchronized audio data of a combined stream of packets by mixing the audio data of the plurality of streams of packets of audio data from a plurality of pairs, the combined stream of packets having a combined timestamp, each pair including a stream of packets of audio data and a matching stream of packets of video data, wherein the audio data in each stream of packets of audio data of a pair is synchronized with the video data of the matching stream of packets of video data of the pair, the audio and video data packets in any pair having respective timestamps that may be independent of the respective timestamps in the audio and video data packets in any other pair, such that the relationship at a later time of the combined timestamp with the timestamps of the individual streams being mixed is derivable using the timing information; and transmitting over a network the combined stream, one or more matching streams of packets of video data of the pairs whose streams of packets of audio data are used in the mixing, and the formed timing information; such that a receiver coupled to the network and receiving the combined stream and the timing information and further receiving at least one of the streams of packets of video data that is transmitted, can use the formed timing information to synchronize the video data of the received at least one matching stream of packets of video data with the mixed synchronized audio data of the combined stream, and such that no common network clock is needed for the mixing and no common network clock is needed to synchronize the video data of the received at least one matching stream of packets of video data with the mixed synchronized audio data of the combined stream. 11. The method of claim 10, wherein said synchronizing by said receiver includes calculating synchronized timestamps for packets of said at least one stream of packets of video data of the pairs using said timing information, such that said synchronized timestamps synchronize the video data of said at least one stream with said mixed synchronized audio data of the combined stream. 12. The method of claim 10, wherein said timing information includes the timestamps of at least one packet of audio data of each of the streams of the audio data used to form the mixed synchronized audio data of the combined stream at the time of said mixing. 13. The method of claim 10, wherein said forming includes incorporating said timing information in a special header of the packets of said combined stream. 14. The method of claim 10, wherein said forming includes forming a timestamp information packet separate from the packets of said combined stream. 15. The method of claim 14, wherein the packets of said combined stream include an RTP packet. 16. The method of claim 14, wherein the packets of said combined stream include an RTCP packet. 17. The method of claim 10, wherein said timing information includes the ratio of the differences of sequential timestamps of two of the plurality of streams of packets of the audio data whose audio data is included in the synchronized mix of audio data of the combined stream. 18. The method of claim 10, wherein said network is an IP network. 19. The method of claim 10, wherein said synchronizing by said receiver of said at least one stream of the plurality of streams of packets of video data with said mixed audio data of said combined stream includes updating at least one of the timestamps of the packets of said at least one stream of the plurality of streams of packets of video data with said synchronized timestamps. 20. A method of synchronizing streams of packets of audio and video data over a network, said method comprising: receiving, over the network, a combined stream of packets of a synchronized mix of audio data formed by mixing at a mixing time the audio data from a plurality of streams of packets of audio data, each stream of packets of audio data used in the mixing being from a pair of the stream of packets of audio data and a matching stream of video data, each pair of streams of packets of audio data and packets of video data synchronized and generated according to a respective clock that may be independent of the clock of any other pair, such that each stream of the plurality of packets of audio data used in the mixing includes a respective timestamp that is synchronized with a respective timestamp of a respective matching stream of packets of video data; receiving, over the network, timing information that relates the respective values at or prior to the mixing time of the timestamps of the packets of the plurality of streams of packets of the audio data of the synchronized mix of audio data in the combined stream, the combined stream having a combined timestamp, such that the relationship at a later time of the combined timestamp with the timestamps of the individual streams being mixed is derivable using the timing information; receiving, over the network, at least one of the plurality of matching streams of packets of video data whose respective pair has a respective stream of packets of audio data is mixed to form the mix of audio data of the combined stream; and using the received timing information to synchronize respective video data of at least one of the received matching streams of packets of video data with said mix of audio data of said combined stream, such that no common network clock is needed for the mixing and no common network clock is needed to synchronize the respective video data of at least one of the received matching streams of packets of video data with said mix of audio data of said combined stream. 21. The method of synchronizing of claim 20, further comprising: calculating synchronized timestamps for packets of said at least one of the received matching streams of packets of video data using said timing information such that said synchronized timestamps synchronize the video data of said at least one of the received matching streams with said mix of audio data of the combined stream. 22. The method of synchronizing of claim 21, further including: synchronizing said at least one stream of the received matching streams of packets of video data with said mix of synchronized audio data of said combined stream, including updating at least one of the timestamps of the packets of said at least one stream of the received matching streams of packets of video data with said synchronized timestamps. 23. The method of synchronizing of claim 20, wherein said timing information includes the timestamps of at least one packet of audio data of each of the streams of the audio data used to form said mix synchronized audio data of the combined stream at the time of mixing. 24. The method of synchronizing of claim 20, wherein said timing information includes information in a special header of the packets of said combined stream. 25. The method of synchronizing of claim 20, wherein said timing information includes a timestamp information packet separate from the packets of said combined stream. 26. The method of claim 25, wherein the packets of said combined stream include an RTP packet. 27. The method of claim 25, wherein the packets of said combined stream include an RTCP packet. 28. The method of synchronizing of claim 20, wherein said timing information includes the ratio of the differences of sequential timestamps of two of the plurality of streams of packets of the audio data whose audio data is included in said mix of synchronized audio data of the combined stream. 29. The method of synchronizing of claim 20, wherein said network is an IP network. 30. An apparatus configured to synchronize streams of packets of audio and video data, said apparatus comprising: means for accepting configured to accept a plurality of streams of packets of audio data from pairs of streams audio and video data, each accepted stream of packets of audio data in each pair synchronized with a matching stream of packets of video data of the pair, the audio and video data in any pair timed according to a respective clock or clocks that may be independent of the respective clock or clocks use in timing the audio and video data in any other pair, such that each stream of packets of audio data includes a respective timestamp that is synchronized with a respective timestamp of a respective matching stream of packets of video data; means for mixing, configured to mix, at a mixing time, —the audio data of the accepted streams to form a combined stream of packets of synchronized mixed audio data having a mixing timestamp; means for forming timing information configured to form timing information that related the values at or prior to the mixing time of the timestamps of the packets of the accepted streams of audio data being mixed at the mixing time, such that the relationship at a later time of the mixing timestamp with the timestamps of the individual streams being mixed is derivable; and means for transmitting configured to transmit the combined stream, at least one of the plurality of streams of packets of video data whose audio data is mixed into the transmitted combined stream, and the formed timing information over the network; such that a receiver, coupled to the network, receiving the timing information, the combined stream, and at least one of the plurality of matching streams of packets of video data whose pair includes an accepted stream of packets of audio data can use the formed information to synchronize the timestamps of the packets of at least one of the plurality of matching streams of video data with the timestamps of said combined stream, and such that no common network clock is needed for the mixing and no common network clock is needed to synchronize the timestamps of the packets of at least one of the plurality of matching streams of video data with the timestamps of said combined stream. 31. The apparatus of claim 30, wherein said timing information includes the timestamps of at least one packet of audio data of each of the accepted streams of packets of audio data at the time of mixing by the means for mixing. 32. The apparatus of claim 30, wherein said means for forming timing information is configured to incorporate said timing information in a special header of the packets of said combined stream. 33. The apparatus of claim 30, wherein said means for forming timing information is configured to form a timestamp information packet separate from the packets of said combined stream. 34. The apparatus of claim 30, wherein said timing information includes the ratio of the differences of sequential timestamps of packets of audio data of two of the accepted streams. 35. The apparatus of claim 30, wherein said network is an IP network. 36. The apparatus of claim 30, wherein the synchronizing by the receiver of said at least one matching stream of video data with said combined stream includes updating at least one of the timestamps of the packets of said at least one matching stream of video data with said synchronized timestamps. 37. An apparatus configured to synchronize streams over a packet network, said apparatus comprising: means for forming timing information that relates the values at or prior to a mixing time of respective timestamps of the packets of a plurality of streams of packets of audio data used to form mixed synchronized audio data of a combined stream of packets by mixing, at the mixing time, the audio data of the plurality of streams of packets of audio data from a plurality of pairs, the combined stream of packets having a combined timestamp, each pair including a stream of packets of audio data and a matching stream of packets of video data, wherein the audio data in each stream of packets of audio data of a pair is synchronized with the video data of the matching stream of packets of video data of the pair, the audio and video data packets in any pair having respective timestamps that may be independent of the respective timestamps in the audio and video data packets in any other pair, such that the relationship at a later time of the combined timestamp with the timestamps of the individual streams being mixed is derivable using the timing information; and means for transmitting, over a network, the means for transmitting configured to transmit the combined stream, one or more matching streams of packets of video data of the pairs whose streams of packets of audio data are used in the mixing, and the timing information; such that a receiver, coupled to the network, and receiving the combined stream and the timing information, and further receiving at least one matching stream of the matching streams of packets of video data of the pairs whose streams of packets of audio data are used in the mixing can use the timing information to synchronize the one or more matching streams of packets of video data with the packets of the combined stream, and such that no common network clock is needed for the mixing and no common network clock is needed to synchronize the one or more matching streams of packets of video data with the packets of the combined stream. 38. The apparatus of claim 37, wherein said synchronizing by said receiver includes calculating synchronized timestamps for packets of said at least one stream of packets of video data of the pairs using said timing information, such that said synchronized timestamps synchronize the video data of said at least one stream with said mixed synchronized audio data of the combined stream. 39. The apparatus of claim 38, wherein said synchronizing by said receiver of said at least one stream of the plurality of streams of packets of video data with said mixed audio data of said combined stream includes updating at least one of the timestamps of the packets of said at least one stream of the plurality of streams of packets of video data with said synchronized timestamps. 40. The apparatus of claim 37, wherein said timing information includes the timestamps of at least one packet of audio data of each of the streams of the audio data used to form the mixed synchronized audio data of the combined stream at the time of said mixing. 41. The apparatus of claim 37, wherein said means for forming includes means for incorporating said timing information in a special header of the packets of said combined stream. 42. The apparatus of claim 37, wherein said means for forming is configured to form a timestamp information packet separate from the packets of said combined stream. 43. The apparatus of claim 37, wherein said timing information includes the ratio of the differences of sequential timestamps of two of the plurality of streams of packets of the audio data whose audio data is included in the synchronized mix of audio data of the combined stream. 44. The apparatus of claim 37, wherein said network is an IP network. 45. A method of synchronizing audio and video streams over a network, said method comprising: receiving over a network timing information that relates the values at or prior to a mixing time of respective timestamps of a plurality of streams of packets of audio data used to form, at the mixing time, a combined stream of packets of a synchronized mix of audio data, the combined mix having a combined timestamp, each stream of packets of audio data used to form the combined stream being from a pair of the stream of packets of audio data and a matching stream of video data, each pair of streams of packets of audio data and packets of video data synchronized and generated according to a respective clock that may be independent of the clock of any other pair, such that each stream of the plurality of packets of audio data used in the mix includes a respective timestamp that is synchronized with a respective timestamp of a respective matching stream of packets of video data, the timing information being such that the relationship at a later time of the combined timestamp with the timestamps of the individual streams being mixed is derivable using the timing information; and calculating synchronized timestamps for the video data of at least one stream of packets of video data whose pair includes a stream of audio data used to form a combined stream, the calculating using said timing information, the calculating not requiring use of a common network clock, such that a receiver coupled to the network and receiving the combined stream of packets of the synchronized mix of audio data, the timing information, and at least one matching stream of packets of video data whose pair includes a stream of audio data used to form the combined stream, can use the calculated synchronized timestamps to synchronize the video data of the at least one received matching stream with the mix of synchronized audio data of the combined stream, and such that no common network clock is needed for the mixing and no common network clock is needed to synchronize the video data of the at least one received matching stream with the mix of synchronized audio data of the combined stream. 46. The method of synchronizing of claim 45, wherein said timing information includes the timestamps of at least one packet of audio data of each of the streams of the audio data used to form said synchronized mix of audio data of the combined stream at the time of forming the synchronized mix. 47. The method of synchronizing of claim 45, wherein said timing information includes information in a special header of the packets of said combined stream. 48. The method of synchronizing of claim 45, wherein said timing information includes a timestamp information packet separate from the packets of said combined stream. 49. The method of synchronizing of claim 45, wherein said timing information includes the ratio of the differences of sequential timestamps of two of the plurality of streams of packets of the audio data whose audio data in included in said synchronized mix of audio data of the combined stream. 50. The method of synchronizing of claim 45, wherein said network is an IP network. 51. The method of synchronizing of claim 45: wherein said synchronizing by said receiver of the video data of said at least one stream of packets of video data whose pair includes a stream of audio data used to form the combined stream with said synchronized mix of audio data of said combined stream includes updating at least one of the timestamps of the packets of said at least one stream of packets of video data with said synchronized timestamps. 52. A computer readable storage medium having coded thereon instructions which, when executed by one or more processors of a processing system, cause performing the steps of a method of synchronizing streams over a network, said method comprising: forming timing information that relates the values at or prior to a mixing time of respective timestamps of the packets of a plurality of streams packets of audio data used to form, at the mixing time, mixed synchronized audio data of a combined stream of packets by mixing the audio data of the plurality of streams of packets of audio data from a plurality of pairs, the combined stream of packets having a combined timestamp, each pair including a stream of packets of audio data and a matching stream of packets of video data, wherein the audio data in each stream of packets of audio data of a pair is synchronized with the video data of the matching stream of packets of video data of the pair, the audio and video data packets in any pair having respective timestamps that may be independent of the respective timestamps in the audio and video data packets in any other pair, such that the relationship at a later time of the combined timestamp with the timestamps of the individual streams being mixed is derivable using the timing information; and transmitting over a network the combined stream, one or more matching streams of packets of video data of the pairs whose streams of packets of audio data are used in the mixing, and the formed timing information, such that a receiver coupled to the network and receiving the combined stream and the timing information and further receiving at least one of the streams of packets of video data that is transmitted, can use the formed timing information to synchronize the video data of the received at least one matching stream of packets of video data with the mixed synchronized audio data of the combined stream, and such that no common network clock is needed for the mixing and no common network clock is needed to synchronize the video data of the received at least one matching stream of packets of video data with the mixed synchronized audio data of the combined stream. 53. A computer readable storage medium of claim 52, wherein said synchronizing by said receiver includes calculating synchronized timestamps for packets of said at least one stream of packets of video data of the pairs using said timing information, such that said synchronized timestamps synchronize the video data of said at least one stream with said mixed synchronized audio data of the combined stream. 54. A computer readable storage medium of claim 53, said method further including: wherein said synchronizing by said receiver of said at least one stream of the plurality of streams of packets of video data with said mixed audio data of said combined stream includes updating at least one of the timestamps of the packets of said at least one stream of the plurality of streams of packets of video data with said synchronized timestamps. 55. A computer readable storage medium of claim 52, wherein said timing information includes the timestamps of at least one packet of audio data of each of the streams of the audio data used to form the mixed synchronized audio data of the combined stream at the time of said mixing. 56. A computer readable storage medium of claim 52, wherein said forming includes incorporating said timing information in a special header of the packets of said combined stream. 57. A computer readable storage medium of claim 52, wherein said forming includes forming a packet separate from the packets of said combined stream. 58. A computer readable storage medium of claim 52, wherein said timing information includes the ratio of the differences of sequential timestamps of two of the plurality of streams of packets of the audio data whose audio data is included in the synchronized mix of audio data of the combined stream. 59. A computer readable storage medium of claim 52, wherein said network is an IP network. 60. An apparatus for synchronizing streams of packets of audio and video data over a network, said apparatus comprising: a first device configured to accept a plurality of streams of packets of audio data from pairs of streams of packets of audio and video data, each stream of packets of audio data in each pair synchronized with the video data of the pair, such that each stream of packets of audio data includes a respective timestamp that is synchronized with a respective timestamp of a respective matching stream of packets of video data, the streams of audio and video data packets in one pair using timestamps that may be independent of the timestamps used in the streams of audio and video data packets of any other pair, the first device further configured to mix at a mixing time the accepted streams of packets of audio data to form a combined stream of packets of synchronized mixed audio data having a mixing timestamp; a second device configured to form timing information that relates values at or prior to the mixing time of the timestamps of the packets of the accepted plurality of streams of packets of audio data used to form the combined stream, such that the relationship at a later time of the mixing timestamp with the timestamps of the individual streams being mixed is derivable; and a transmitter configured to transmit the combined stream, at least one of the plurality of streams of packets of video data whose audio data is mixed into the transmitted combined stream, and the formed timing information over the network; such that a receiver, coupled to the network, receiving the timing information, the combined stream, and the at least one of the plurality of streams of packets of video data can use the formed timing information to synchronize the respective video data of the at least one stream of the plurality of streams of packets of video data with the mixed audio data of said combined stream, and such that no common network clock is needed for the mixing and no common network clock is needed to synchronize the respective video data of the at least one stream of the plurality of streams of packets of video data with the mixed audio data of said combined stream. 61. The apparatus of claim 60, wherein said first device is configured to accept the timing information from said second device and encapsulates said timing information in said combined stream. 62. The apparatus of claim 61, wherein said timing information is included in packet headers of said packets of said combined stream. 63. The apparatus of claim 61, wherein said timing information is included in packet payloads of said packets of said combined stream. 64. The apparatus of claim 60, wherein said second device forms said timing information in a timestamp information stream for transmitting over the network. 65. The apparatus of claim 60, wherein said network is an IP network. 66. The apparatus of claim 60, wherein said synchronizing by said receiver of the video data of said at least one stream of packets of video data includes updating at least one of the timestamps of the packets of said at least one stream of packets of video data with said mixing timestamp. 67. The apparatus of claim 60, wherein said packets of said combined stream include an RTP packet. 68. The apparatus of claim 60, wherein said packets of said combined stream include an RTCP packet.
Tian, Yongjian; Yuan, Zheng; Bhandarkar, Tejas, Efficient and on demand convergence of audio and non-audio portions of a communication session for phones.
Ben-David, Shay; Hazanovich, Evgeny; Mandel, Zak, Synchronization of data streams with associated metadata streams using smallest sum of absolute differences between time indices of data events and metadata events.
Kroepfl, Michael; Neuhold, Gerhard; Bernögger, Stefan; Ponticelli, Martin Josef; Pehserl, Joachim; Kimchi, Gur; Curlander, John Charles, Synchronization of multiple data source to a common time base.
Kroepfl, Michael; Neuhold, Gerhard; Bernögger, Stefan; Ponticelli, Martin Josef; Pehserl, Joachim; Kimchi, Gur; Curlander, John Charles, Synchronization of multiple data sources to a common time base.
Weed, Michael E.; Berecz, Endre, System and methods for wireless health monitoring of a locator beacon which aids the detection and location of a vehicle and/or people.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.