Spatial pyramid pooling networks for image processing
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06K-009/62
G06K-009/46
G06K-009/66
G06N-003/04
출원번호
US-0617936
(2015-02-10)
등록번호
US-9542621
(2017-01-10)
우선권정보
CN-PCT/CN2014/088166 (2014-10-09)
발명자
/ 주소
He, Kaiming
Sun, Jian
Zhang, Xiangyu
Ren, Shaoqing
출원인 / 주소
MICROSOFT TECHNOLOGY LICENSING, LLC
대리인 / 주소
Swain, Cassandra T
인용정보
피인용 횟수 :
0인용 특허 :
10
초록▼
Spatial pyramid pooling (SPP) layers are combined with convolutional layers and partition an input image into divisions from finer to coarser levels, and aggregate local features in the divisions. A fixed-length output may be generated by the SPP layer(s) regardless of the input size. The multi-leve
Spatial pyramid pooling (SPP) layers are combined with convolutional layers and partition an input image into divisions from finer to coarser levels, and aggregate local features in the divisions. A fixed-length output may be generated by the SPP layer(s) regardless of the input size. The multi-level spatial bins used by the SPP layer(s) may provide robustness to object deformations. An SPP layer based system may pool features extracted at variable scales due to the flexibility of input scales making it possible to generate a full-image representation for testing. Moreover, SPP networks may enable feeding of images with varying sizes or scales during training, which may increase scale-invariance and reduce the risk of over-fitting.
대표청구항▼
1. A method to perform image processing, the method comprising: receiving an input image;generating feature maps by one or more filters on one or more convolutional layers of a neural network;spatially pooling responses of each filter of a top convolutional layer at a spatial pyramid pooling (SPP) n
1. A method to perform image processing, the method comprising: receiving an input image;generating feature maps by one or more filters on one or more convolutional layers of a neural network;spatially pooling responses of each filter of a top convolutional layer at a spatial pyramid pooling (SPP) network following the top convolutional layer, wherein the SPP network comprises one or more layers; andproviding outputs of a top SPP network layer to a fully-connected layer as fixed dimensional vectors. 2. The method of claim 1, further comprising: employing an output of the fully-connected layer for one or more of: training a classifier, scene reconstruction, event detection, video tracking, object recognition, image indexing, and motion estimation. 3. The method of claim 1, wherein spatially pooling responses of each filter of the top convolutional layer at the SPP network comprises: pooling responses of each filter in a plurality of spatial bins of the SPP network. 4. The method of claim 3, wherein providing outputs of the top SPP network layer to the fully-connected layer comprises: providing the outputs of the top SPP network layer as kM-dimensional vectors, where M denotes a number of the spatial bins in the SPP network and k denotes a number of filters at the top convolutional layer. 5. The method of claim 1, further comprising: resizing the input image to fit a window size of the SPP network. 6. The method of claim 1, further comprising: training the neural network using back-propagation. 7. The method of claim 1, further comprising: pre-computing a number of spatial bins of the SPP network based on a size of the input image. 8. The method of claim 7, further comprising: for an image size of a×a and an SPP network layer that includes n×n bins, implementing the SPP network layer as a sliding window pooling layer, where a window size is defined by win=┌a/n┐ and a stride is defined by str=└a/n┘ with ┌.┐ and └.┘ denoting ceiling and floor operations. 9. The method of claim 1, further comprising: concatenating outputs of the SPP network layers at the fully-connected layer. 10. The method of claim 1, wherein spatially pooling responses of each filter of the top convolutional layer at the SPP network comprises: employing maximum pooling on responses of the filters of the top convolutional layer. 11. A computing device to perform image processing, the computing device comprising: an input module configured to receive an input image through one or more of a wired or wireless communication;a memory configured to store instructions; anda processor coupled to the memory and the input module, the processor executing an image processing application, wherein the image processing application is configured to: receive an input image;generate feature maps by one or more filters on one or more convolutional layers of a neural network;spatially pool responses of each filter of a top convolutional layer in a plurality of spatial bins at a spatial pyramid pooling (SPP) network following the top convolutional layer, wherein the SPP network comprises one or more layers; andprovide outputs of a top SPP network layer to a fully-connected layer as fixed dimensional vectors. 12. The computing device of claim 11, wherein the feature maps are generated once from the entire input image at one or more scales. 13. The computing device of claim 11, wherein the image processing application is further configured to: employ two or more fixed-size neural networks with respective SPP networks to process images of two or more sizes. 14. The computing device of claim 13, wherein the outputs of top SPP network layers of the two or more fixed-size neural networks are configured to have a same fixed length. 15. The computing device of claim 13, wherein the image processing application is further configured to: train a first full epoch on a first one of the two or more fixed-size neural networks; andtrain a second full epoch on a second one of the two or more fixed-size neural networks. 16. The computing device of claim 15, wherein the image processing application is further configured to: copy weights of the first one of the two or more fixed-size neural networks to the second one of the two or more fixed-size neural networks prior to training the second epoch on the second one of the two or more fixed-size neural networks. 17. The computing device of claim 15, wherein the image processing application is further configured to: perform the training on different neural network in an iterative manner. 18. A computer-readable memory device with instructions stored thereon to perform image processing, the instructions comprising: receiving an input image;generating feature maps by one or more filters on one or more convolutional layers of a neural network;spatially pooling responses of each filter of a top convolutional layer in a plurality of spatial bins of a spatial pyramid pooling (SPP) network following the top convolutional layer, wherein the SPP network comprises one or more layers;providing outputs of a top SPP network layer to a fully-connected layer as fixed dimensional vectors; andtraining a classifier to tag the input image based on the fixed dimensional vectors received at the fully-connected layer. 19. The computer-readable memory device of claim 18, wherein the instructions further comprise: resizing the input image such that min (w; h)=s, where w is a width of the image, h is a height of the image, and s represents a predefined scale for the image. 20. The computer-readable memory device of claim 18, wherein the instructions further comprise: training different full epochs on different fixed-size neural networks by copying weights of a first fixed-size neural network to subsequent fixed-size neural networks in an iterative manner.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (10)
Hoffberg-Borghesani, Linda; Hoffberg, Steven M., Adaptive pattern recognition based controller apparatus and method and human-factored interface therefore.
Jung, Edward K. Y.; Leuthardt, Eric C.; Levien, Royce A.; Lord, Robert W.; Malamud, Mark A.; Rinaldo, Jr., John D.; Wood, Jr., Lowell L., Methods and systems for comparing media content.
Levin, David N., Self-referential method and apparatus for creating stimulus representations that are invariant under systematic transformations of sensor states.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.