[미국특허]
MIXED-PRECISION NPU TILE WITH DEPTH-WISE CONVOLUTION
원문보기
IPC분류정보
국가/구분
United States(US) Patent
공개
국제특허분류(IPC7판)
G06N-003/063
G06F-013/16
G06F-009/30
출원번호
16840172
(2020-04-03)
공개번호
20200349420
(2020-11-05)
발명자
/ 주소
Ovsiannikov, Ilia
Shafiee Ardestani, Ali
Abdelaziz, Hamzah Ahmed Ali
Hassoun, Joseph H.
출원인 / 주소
Ovsiannikov, Ilia
인용정보
피인용 횟수 :
0인용 특허 :
0
초록▼
A processor to perform inference on deep learning neural network models. In some embodiments, the process includes: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile including: a first weight register, a seco
A processor to perform inference on deep learning neural network models. In some embodiments, the process includes: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile including: a first weight register, a second weight register, an activations cache, a shuffler, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible, the first tile being configured: to receive a tensor including a plurality of two-dimensional arrays, each representing one color component of the image; and to perform a convolution of a kernel with one of the two-dimensional arrays.
대표청구항▼
1. A processor, comprising: a first tile,a second tile,a memory, anda bus,the bus being connected to: the memory,the first tile, andthe second tile,the first tile comprising: a first weight register,a second weight register,an activations cache,a shuffler,an activations buffer,a first multiplier, an
1. A processor, comprising: a first tile,a second tile,a memory, anda bus,the bus being connected to: the memory,the first tile, andthe second tile,the first tile comprising: a first weight register,a second weight register,an activations cache,a shuffler,an activations buffer,a first multiplier, anda second multiplier,the activations buffer being configured to include: a first queue connected to the first multiplier, anda second queue connected to the second multiplier,the activations cache including a plurality of independent lanes,each of the independent lanes being randomly accessible,the first tile being configured: to receive a tensor of activations representing an image comprising a plurality of pixels each having a plurality of color components, the tensor comprising a plurality of two-dimensional arrays, each representing one color component of the image; andto perform a convolution of a kernel with one of the two-dimensional arrays. 2. The processor of claim 1, wherein the shuffler is connected to an output of the activations cache. 3. The processor of claim 2, wherein the first tile comprises a plurality of multipliers including the first multiplier and the second multiplier, arranged in a plurality of columns and a plurality of lanes, the lanes being arranged in groups of four, each group of lanes including an adder tree for summing outputs of the multipliers. 4. The processor of claim 3, wherein the first tile further comprises an accumulator for each group of lanes, for accumulating outputs of the adder tree. 5. The processor of claim 3, wherein the first tile further comprises, for a set of four groups of lanes: a plurality of bit shifters, for shifting products involving at least one most significant nibble to be offset from products involving two least significant nibbles, anda plurality of accumulators, for accumulating the outputs of the bit shifters. 6. The processor of claim 2, wherein the shuffler has a granularity of four lanes. 7. The processor of claim 2, wherein the shuffler has a granularity of one lane. 8. The processor of claim 1, wherein the shuffler is connected to an input of the activations cache. 9. The processor of claim 8, wherein the first tile comprises a plurality of multipliers including the first multiplier and the second multiplier, arranged in a columns and a plurality of lanes, the lanes being arranged in groups of four, each group of lanes including an adder tree for summing outputs of the multipliers. 10. The processor of claim 9, wherein the first tile further comprises an accumulator for each group of lanes, for accumulating outputs of the adder tree. 11. The processor of claim 9, wherein the first tile further comprises, for a set of four groups of lanes: a plurality of bit shifters, for shifting products involving at least one most significant nibble to be offset from products involving two least significant nibbles, anda plurality of accumulators, for accumulating the outputs of the bit shifters. 12. The processor of claim 9, wherein the shuffler has a granularity of four lanes. 13. The processor of claim 9, wherein the shuffler has a granularity of one lane. 14. A method for calculating with a processing circuit, the processing circuit comprising: a first tile,a second tile,a memory, anda bus,the bus being connected to: the memory,the first tile, andthe second tile,the first tile comprising: a first weight register,a second weight register,an activations cache,a shuffler,an activations buffer,a first multiplier, anda second multiplier,the activations buffer being configured to include: a first queue connected to the first multiplier, anda second queue connected to the second multiplier,the activations cache including a plurality of independent lanes,each of the independent lanes being randomly accessible,the method comprising: receiving a tensor of activations representing an image comprising a plurality of pixels each having a plurality of color components, the tensor comprising a plurality of two-dimensional arrays, each representing one color component of the image; andperforming a convolution of a kernel with one of the two-dimensional arrays. 15. The method of claim 14, wherein the shuffler is connected to an output of the activations cache. 16. The method of claim 15, wherein the first tile comprises a plurality of multipliers including the first multiplier and the second multiplier, arranged in a plurality of columns and a plurality of lanes, the lanes being arranged in groups of four, each group of lanes including an adder tree for summing outputs of the multipliers. 17. The method of claim 16, wherein the first tile further comprises an accumulator for each group of lanes, for accumulating outputs of the adder tree. 18. The method of claim 16, wherein the first tile further comprises, for a set of four groups of lanes: a plurality of bit shifters, for shifting products involving at least one most significant nibble to be offset from products involving two least significant nibbles, anda plurality of accumulators, for accumulating the outputs of the bit shifters. 19. The method of claim 15, wherein the shuffler has a granularity of four lanes. 20. A method for calculating with a means for processing, the means for processing comprising: a first tile,a second tile,a memory, anda bus,the bus being connected to: the memory,the first tile, andthe second tile,the first tile comprising: a first weight register,a second weight register,an activations cache,a shuffler,an activations buffer,a first multiplier, anda second multiplier,the activations buffer being configured to include: a first queue connected to the first multiplier, anda second queue connected to the second multiplier,the activations cache including a plurality of independent lanes,each of the independent lanes being randomly accessible,the method comprising: receiving a tensor of activations representing an image comprising a plurality of pixels each having a plurality of color components, the tensor comprising a plurality of two-dimensional arrays, each representing one color component of the image; andperforming a convolution of a kernel with one of the two-dimensional arrays.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
활용도 분석정보
상세보기
다운로드
내보내기
활용도 Top5 특허
해당 특허가 속한 카테고리에서 활용도가 높은 상위 5개 콘텐츠를 보여줍니다. 더보기 버튼을 클릭하시면 더 많은 관련자료를 살펴볼 수 있습니다.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.