Analytics-modulated coding of surveillance video
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
H04N-019/167
H04N-007/18
H04N-019/176
H04N-019/172
H04N-019/149
H04N-019/115
H04N-019/61
H04N-019/107
H04N-019/114
H04N-019/124
H04N-019/132
H04N-019/14
H04N-019/137
H04N-019/142
H04N-019/152
H04N-019/177
H04N-019/17
출원번호
US-0620232
(2009-11-17)
등록번호
US-9215467
(2015-12-15)
발명자
/ 주소
Cheok, Lai-Tee
Gagvani, Nikhil
출원인 / 주소
CheckVideo LLC
대리인 / 주소
Cooley LLP
인용정보
피인용 횟수 :
1인용 특허 :
78
초록▼
A method and apparatus for encoding surveillance video where one or more regions of interest are identified and the encoding parameter values associated with those regions are specified in accordance with intermediate outputs of a video analytics process. Such an analytics-modulated video compressio
A method and apparatus for encoding surveillance video where one or more regions of interest are identified and the encoding parameter values associated with those regions are specified in accordance with intermediate outputs of a video analytics process. Such an analytics-modulated video compression approach allows the coding process to adapt dynamically based on the content of the surveillance images. In this manner, the fidelity of the region of interest is increased relative to that of a background region such that the coding efficiency is improved, including instances when no target objects appear in the scene. Better compression results can be achieved by assigning different coding priority levels to different types of detected objects.
대표청구항▼
1. A method, comprising: receiving, at an analytics module implemented in at least one of a memory or a processing device, a video frame having a plurality of pixels;assigning, at the analytics module and without user intervention, based on a type of a foreground object from the video frame having t
1. A method, comprising: receiving, at an analytics module implemented in at least one of a memory or a processing device, a video frame having a plurality of pixels;assigning, at the analytics module and without user intervention, based on a type of a foreground object from the video frame having the plurality of pixels, a class from a plurality of predetermined classes to the foreground object;adjusting a quantization parameter value associated with the foreground object based on a weight associated with the class assigned to the foreground object, a size of the foreground object, and a target bit rate associated with the video frame, the weight being based on a coding priority associated with the class assigned to the foreground object;producing a plurality of DCT coefficients for pixels from the plurality of pixels of the video frame associated with the foreground object;quantizing, at a quantization module, the DCT coefficients associated with the foreground object based on the adjusted quantization parameter value;coding the quantized DCT coefficients associated with the foreground object to produce coded quantized DCT coefficients; andsending, to a storage module, a representation of the coded quantized DCT coefficients. 2. The method of claim 1, further comprising: coding the video frame via a first-pass of a low-complexity coding operation;adjusting a human visual system factor associated with the video frame based on the coded video frame; andadjusting the quantization parameter value associated with the foreground object based on the adjusted human visual system factor. 3. The method of claim 1, wherein the foreground object is a first foreground object, the class assigned to the foreground object being a first class, the weight associated with the first class being a first weight, the quantization parameter value associated with the first foreground object being a first quantization parameter value, the method further comprising: assigning a second class from the plurality of predetermined classes to a second foreground object from the video frame, the second class being different from the first class;adjusting a second quantization parameter value associated with the second foreground object based on at least one of a target bit rate and a second weight associated with the second class assigned to the second foreground object, the second quantization parameter value being different from the first quantization parameter value, the second weight being different from the first weight;producing a plurality of DCT coefficients for pixels from the plurality of pixels of the video frame associated with the second foreground object;quantizing the DCT coefficients associated with the second foreground object based on the adjusted second quantization parameter value; andcoding the quantized DCT coefficients associated with the second foreground object. 4. The method of claim 1, wherein: the adjusting includes scaling the quantization parameter value associated with the foreground object based on at least one of the target bit rate or the weight associated with the class assigned to the foreground object. 5. The method of claim 1, further comprising: generating gradient information associated with the video frame via a single pass through the video frame;deriving a human visual system factor associated with the video frame using the gradient information; andadjusting the quantization parameter value associated with the foreground object based on at least one of the target bit rate, the weight associated with the class assigned to the foreground object, or the human visual system factor. 6. The method of claim 1, wherein the type of a foreground object is at least one of a person, an animal, a vehicle, a building, a pole, or a sign. 7. The method of claim 1, wherein the video frame is from a plurality of video frames, the plurality of pixels of the video frame are organized into a plurality of blocks of pixels, the method further comprising: adjusting a bit rate associated with the video frame based on at least one of a target bit rate for the plurality of video frames, a remaining number of video frames from the plurality of video frames, a remaining number of available bits assigned to the video frame, or a scene complexity associated with the video frame;adjusting a bit rate associated with each block of pixels from the plurality of blocks of pixels based at least in part on a target bit rate associated with the video frame or a complexity of each block of pixels from the plurality of blocks of pixels and adjusting the quantization parameter value associated with the foreground object based on at least one of the weight associated with the class assigned to the foreground object, a target bit rate associated with the foreground object, the target bit rate associated with the video frame, or the bit rate associated with each block of pixels from the plurality of blocks of pixels. 8. The method of claim 7, wherein the scene complexity is based on at least one of a number of objects in the video frame, one or more sizes of objects in the video frame, or one or more classes of objects in the video frame. 9. The method of claim 7, wherein the complexity of the each block of pixels from the plurality of blocks of pixels is based at least in part on one or more human visual system factors. 10. The method of claim 1, wherein the foreground object is from a plurality of foreground objects, the adjusting the quantization parameter value associated with the foreground object from the video frame being based on a quantity of foreground objects from the plurality of foreground objects. 11. A method, comprising: receiving, at an analytics module implemented in at least one of a memory or a processing device, a first video frame having a plurality of blocks of pixels and a second video frame having a plurality of blocks of pixels;assigning, at the analytics module and without user intervention, a class, based on a type of a foreground object from the first video frame, from a plurality of predetermined classes to the foreground object, the foreground object including a block of pixels from the plurality of blocks of pixels of the first video frame, each class from the plurality of predetermined classes having associated therewith a coding priority;identifying in the second video frame a prediction block of pixels associated with the block of pixels in the foreground object, the identifying being based on a prediction search window having a search area associated with the class assigned to the foreground object;coding, at a coding module, the first video frame based on the identified prediction block of pixels, a size of the foreground object, and a target bit rate associated with the first video frame to produce a coded video flame; andsending, to a storage module, a representation of the coded video frame. 12. The method of claim 11, further comprising: updating the search area of the prediction search window according to tracked motion information associated with the foreground object over a plurality of video frames including the first video frame. 13. The method of claim 11, wherein: the class assigned to the foreground object is a first class, the plurality of predetermined classes including a second class different from the first class,the first class having an, associated prediction search window,the second class having an associated prediction search window,a search area of the prediction search window associated with the first class being smaller than a search area of the prediction search window associated with the second class when the coding priority associated with the first class is lower than the coding priority associated with the second class. 14. The method of claim 11, further comprising: adjusting the search area of the prediction search window based on moving portions of the foreground object. 15. The method of claim 11, wherein the class from the plurality of predetermined classes is one of a vehicle or a person. 16. The method of claim 11, wherein the foreground object is from a plurality of foreground objects, the coding including coding the first video frame based on a quantity of foreground objects from the plurality of foreground objects. 17. The method of claim 11, wherein the coding includes coding the first video frame based on gradient information associated with at least one of the first video frame or the second video frame. 18. The method of claim 11, wherein the coding includes coding the first video frame based on gradient information associated with the foreground object, and (2) temporal activity associated with the foreground object. 19. A non-transitory processor-readable: medium storing code representing instructions to be executed by a processor, the code comprising code to cause the processor to; assign, without user intervention, based on a type of a foreground object from a video frame having a plurality of pixels, a class from a plurality of predetermined classes to the foreground object, each class from the plurality of predetermined classes having associated therewith a coding priority;track motion information associated with the foreground object in a first video frame having a plurality of blocks of pixels, the foreground object including, a block of pixels from the plurality of blocks of pixels of the first video frame;identity in a second video frame having a plurality of blocks of pixels a prediction block of pixels associated with the block of pixels in the foreground object, the identification of the prediction block of pixels based on a prediction search window having a search area associated with the tracked motion information associated with the foreground object, the search area of the prediction search window is updated according to the class assigned to the foreground object;code the first video frame based on the identified prediction block of pixels and a target bit rate associated with the first video frame to produce a coded Video frame; andsend, to a storage module, a representation of the coded video frame. 20. The non-transitory processor-readable medium of claim 19, wherein the class from the plurality of predetermined classes is one of a vehicle or a person. 21. The non-transitory processor-readable medium of claim 19, wherein the search area of the prediction search window is updated according to the tracked motion information associated with the foreground object. 22. The non-transitory processor-readable medium of claim 19, the code further comprising code to cause the processor to: generate gradient information associated with the first video frame; anddefine, based on the gradient information, a human visual system factor associated with the first video frame,the code to cause the processor to code the first video frame includes code to cause the processor to code the first video frame based on the human visual system factor. 23. A method, comprising: receiving, at an analytics module implemented in at least one of a memory or a processing device, a plurality of pictures associated with a scene;assigning, at the analytics module and without user intervention, based on a type of a foreground object from a picture in a first group of pictures (GOP) from the plurality of pictures, a class from a plurality of predetermined classes to the foreground object, each class from the plurality of predetermined classes having associated therewith a coding priority, the first GOP (1) having a first number of frames between two intra-frames, and (2) associated with the scene at a first time;tracking, at a tracking module, motion information associated with the foreground object over a plurality of pictures;inserting an intra-frame picture in the first GOP based on the tracked motion information associated with the foreground object and the coding priority associated with the class assigned to the foreground object;defining a second GOP from the plurality of pictures and associated with the scene at a second time after the first time to have a second number of frames between two intra-frames based on the foreground object leaving the scene after the first time and before the second time, the second number of frames being different than the first number of frames; andsending, to a storage module, a representation of at least one of the first GOP or the second GOP. 24. The method of claim 23, further comprising: modifying a structure associated with at least one of the first GOP or the second GOP based on segmentation results associated with the foreground object and with the coding priority associated with the class assigned to the foreground object. 25. The method of claim 23, further comprising: modifying a number of pictures associated with at least one of the first GOP or the second GOP based, on segmentation results associated with the foreground object and with the coding priority associated with the class assigned to the foreground object. 26. A method, comprising: receiving, at an analytics module implemented in at least one of a memory or a processing device, a plurality of pictures associated with a scene;assigning, at the analytics module and without user intervention, based on a type of a foreground object from a picture in a first group of pictures (GOP) from the plurality of pictures, a class from a plurality of predetermined classes to the foreground object, each class from the plurality of predetermined classes having associated therewith a coding priority, the first group of pictures (1) having a first number of frames between two intra-frames, and (2) associated with the scene at a first time;tracking, at a tracking module, motion information associated with the foreground object over a plurality of pictures;replacing a block of pixels in the foreground object with an intra-frame block of pixels based on the tracked motion information associated with the foreground object and the coding priority associated with the class assigned to the foreground object;defining a second GOP from the plurality of pictures and associated with the scene at a second time after the first time to have a second number of frames between two intra-frames based on the foreground object leaving the scene after the first time and before the second time, the first number of frames being different than the second number of frames; andsending, to a storage module, a representation of the second GOP. 27. A method, comprising: receiving, at an analytics module implemented in at least one of a memory or a processing device, a group of pictures (GOP);segmenting a foreground object from a background of a picture in the GOP the foreground object of the picture having a plurality of pixels organized into a plurality of blocks of pixels, the background of the picture having a plurality of pixels organized into a plurality of blocks of pixels;tracking motion information associated with a block of pixels from the plurality of blocks of pixels of the foreground object, a first block of pixels from the plurality of blocks of pixels of the background, and a second block of pixels from the plurality of blocks of pixels of the background;encoding, at an encode module, the block of pixels from the plurality of blocks of pixels of the foreground object as an intra-coded block of pixels to produce an encoded intra-coded block of pixels based on (1) the motion information associated with the block of pixels from the plurality of blocks of pixels of the foreground object, (2) a size of the foreground object, and (3) a target bit rate associated with the GOP;encoding, at the encode module, the first block of pixels from the plurality of blocks of pixels of the background as a predictive-coded block of pixels to produce an encoded predictive coded block of pixels based on the motion information associated with the first block of pixels from the plurality of blocks of pixels of the background and the target bit rate associated with the GOP;encoding, at the encode module, the second block of pixels from the plurality of blocks of pixels of the background as a skipped block of pixels to produce an encoded skipped block of pixels based on the motion information associated with the second block of pixels from the plurality of blocks of pixels of the background and the target bit rate associated with the GOP; andsending, to a storage module, a representation of at least one of the encoded intra-coded block of pixels, the encoded predictive-coded block of pixels, or the encoded skipped block of pixels. 28. The method of claim 27, wherein the tracking of motion information includes detecting motion in the first block of pixels from the plurality of blocks of pixels of the background and detecting an absence of motion in the second block of pixels from the plurality of blocks of pixels of the background. 29. An apparatus, comprising: a receive module implemented in at least one of a memory or a processing device the receive module configured to receive a video frame having a plurality of pixels;a segment module configured to segment, from the video frame, a foreground object from a background, the foreground object being from a plurality of foreground objects, the foreground object of the video frame having a plurality of pixels organized into a plurality of blocks of pixels, the background of the video frame having a plurality of pixels organized into a plurality of Hocks of pixels;a track module configured to track motion information associated with the block of pixels from the plurality of blocks of pixels of the foreground, a first block of pixels from the plurality of blocks of pixels of the background, and a second block of pixels from the plurality of blocks of pixels of the background;an encode module configured to encode the block of pixels from the plurality of blocks of pixels of the foreground object as an intra-coded macroblock based on (1) the motion information associated with the block of pixels from the plurality of blocks of pixels of the foreground object, (2) a quantity of foreground objects from the plurality of foreground objects, and (3) a target bit rate associated with the video frame to produce an encoded intra-coded macroblock;an encode module configured to encode the first block of pixels from the plurality of blocks of pixels of the background as a predictive-coded macroblock based on the motion information associated with the first block of pixels from the plurality of blocks of pixels of the background and the target hit rate associated with the video frame to produce an encoded predictive-coded macroblock;an encode module configured to encode the second block of pixels from the plurality of blocks of pixels of the background as at least one of a bidirectionally-predictive coded macroblock or a skipped macroblock based on the motion information associated with the second block of pixels from the plurality of blocks of pixels of the background and the target bit rate associated with the video frame to produce at least one of an encoded bidirectionally-predictive coded macroblock or an encoded skipped macroblock; anda send module configured to send a representation of at least one of the encoded intra-coded macroblock, the encoded predictive-coded macroblock, or the encoded skipped macroblock. 30. The apparatus of claim 29, wherein the type pf foreground object is at least one of a person, an animal, a vehicle, a building, a pole, or a sign. 31. The apparatus of claim 29, wherein the track module is configured to detect motion in the first block of pixels from the plurality of blocks of pixels of the background, the track module configured to detect an absence of motion in the second block of pixels from the plurality of blocks of pixels of the background. 32. The apparatus of claim 29, wherein at least one of the block of pixels from the plurality of blocks of pixels of the foreground object defines a contour associated with the foreground object. 33. The apparatus of claim 29, wherein the encode module is configured to encode the block of pixels from the plurality of blocks of pixels of the foreground object based on a size of the foreground object.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (78)
Sun Huifang ; Vetro Anthony, Adaptive video coding method.
Barber Ronald J. (San Jose CA) Beitel Bradley J. (Woodside CA) Equitz William R. (Palo Alto CA) Niblack Carlton W. (San Jose CA) Petkovic Dragutin (Saratoga CA) Work Thomas R. (San Francisco CA) Yank, Image query system and method.
MacCormack David Ross ; Wilson Charles Park ; Winter Gerhard Josef ; Nunally Patrick O., Intelligent video information management system performing multiple functions in parallel.
Jang Euee-seon,KRX, Method of coding an arbitrary shape of an object when all pixels of an entire region of a display are used as texture for the object.
Eleftheriadis Alexandros ; Anastassiou Dimitris ; Chang Shif-Fu ; Nayar Shree, Methods and apparatus for performing digital image and video segmentation and compression using 3-D depth information.
Oh Seong-Jun,KRX ; Chun Sung-Moon,KRX ; Moon Joo-Hee,KRX ; Kim Jae-Kyoon,KRX, Object-by information coding apparatus and method thereof for MPEG-4 picture instrument.
Yokoyama, Yutaka; Ooi, Yasushi, Video coding by adaptively controlling the interval between successive predictive-coded frames according to magnitude of motion.
Okada Tomoyuki,JPX ; Tsuga Kazuhiro,JPX ; Hamasaka Hiroshi,JPX ; Saeki Shinichi,JPX, Video data editing apparatus, optical disc for use as a recording medium of a video data editing apparatus, and computer readable recording medium storing an editing program.
Begeja, Lee; Liu, Zhu; Mu, Yadong; Renger, Bernard S.; Gibbon, David Crawford; Shahraray, Behzad; Gopalan, Raghuraman; Zavesky, Eric, Method and system for aggregating video content.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.