[특허]Architecture and execution for efficient mixed precision computations in single instruction multiple data/thread (SIMD/T) devices

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-015/76 G06F-009/38 G06F-009/30 G06F-009/46 G06F-008/41
출원번호	US-0672694 (2015-03-30)
등록번호	US-10061592 (2018-08-28)
발명자 / 주소	Lukyanov, Maxim Grosul, Alexander Alsup, Mitchell Beylin, Boris
출원인 / 주소	Samsung Electronics Co., Ltd.
대리인 / 주소	Innovation Counsel LLP
인용정보	피인용 횟수 : 0 인용 특허 : 15

초록 ▼

A method for improving power, performance, area (PPA) for mixed precision computations in a processing environment. The method includes determining a braiding factor as a number of units of work encoded into a physical thread. A value of the braiding factor is determined based on a mix of precision

A method for improving power, performance, area (PPA) for mixed precision computations in a processing environment. The method includes determining a braiding factor as a number of units of work encoded into a physical thread. A value of the braiding factor is determined based on a mix of precision requirements presented for individual units of work. Units of work are classified as instructions for applied code transformation based on associated precision requirements for the processing environment. Instruction inputs from specified registers are packed together into a destination register according to the determined value of the braiding factor. The packed instructions presented in vector form are executed with an instruction set architecture configured for executing packed instructions of different precisions.

대표청구항 ▼

1. A method for improving power, performance, area (PPA) for mixed precision computations in a processing environment, the method comprising: determining a braiding factor as a number of units of work encoded into a physical thread prior to processing, a unit of work being a set of input data or ins

1. A method for improving power, performance, area (PPA) for mixed precision computations in a processing environment, the method comprising: determining a braiding factor as a number of units of work encoded into a physical thread prior to processing, a unit of work being a set of input data or instructions for processing;determining a value of the braiding factor based on a mix of precision requirements presented for individual units of work;classifying units of work as instructions for applied code transformation based on associated precision requirements for the processing environment, which comprises a single instruction multiple thread (SIMT) or a single instruction multiple data (SIMD) processing architecture;after classifying the units of work as instructions for applied code transformation, replicating instructions for the units of work to generate replicated instructions that are identical instructions for processing by neighboring threads, wherein the replicating is done with precision requirements greater than precision requirements corresponding to the determined value of the braiding factor as multiple threads for SIMT or SIMD processing;packing instruction inputs from specified registers together into a destination register according to the determined value of the braiding factor; andexecuting the packed instructions presented in vector form with an instruction set architecture configured for executing packed instructions of different precisions. 2. The method of claim 1, wherein multiple units of work are packed for parallel processing of multiple data elements into the physical thread for execution based on associated precision requirements for the processing environment. 3. The method of claim 2, wherein the number of units of work packed into the physical thread is determined based on compiler analysis or explicit qualification in a source language. 4. The method of claim 3, wherein the mix of precision requirements are presented for the individual units of work in the source language to reduce fragmentation or scattering of contents in the register file. 5. The method of claim 2, further comprising: selectively replicating and packing instructions into the physical thread for units of work with precision requirements less than a basis precision of an instruction set architecture of the processing environment. 6. The method of claim 5, further comprising: narrowing or widening operands of instructions to be applied as necessary to ensure consistent precision types of instruction inputs according to precision requirements of an instruction output; and packing instruction inputs from specified registers together into a destination register according to the determined value of the braiding factor. 7. The method of claim 2, further comprising: handling flow control divergence per unit of work within packed instructions with predication masks. 8. The method of claim 7, wherein handling flow control further comprises: designating a predication mask for each unit of work within packed instructions to manage independent determination of branch outcomes for each unit of work;determining whether each unit of work within packed instructions is currently active or should be reactivated with the predication mask; andproviding tracking information for each unit of work within packed instructions for inspection and modification by an executing program. 9. The method of claim 1, wherein the processing environment is included in a graphics processing unit (GPU) of a mobile electronic device. 10. A non-transitory computer-readable storage medium embodied thereon instructions being executable by at least one processor to perform a method for improving power, performance, area (PPA) for mixed precision computations in a processing environment, the method comprising: determining a braiding factor as a number of units of work encoded into a physical thread prior to processing, a unit of work being a set of input data or instructions for processing;determining a value of the braiding factor based on a mix of precision requirements presented for individual units of work;classifying units of work as instructions for applied code transformation based on associated precision requirements for the processing environment, which comprises a single instruction multiple thread (SIMT) or single instruction multiple data (SIMD) processing architecture;after classifying the units of work as instructions for applied code transformation, replicating instructions for the units of work to generate replicated instructions that are identical instructions for processing by neighboring threads, wherein the replicating is done with precision requirements greater than precision requirements corresponding to the determined value of the braiding factor as multiple threads for SIMT or SIMD processing;packing instruction inputs from specified registers together into a destination register according to the determined value of the braiding factor; andexecuting the packed instructions presented in vector form with an instruction set architecture configured for executing packed instructions of different precisions. 11. The non-transitory computer-readable storage medium of claim 10, wherein multiple units of work are packed for parallel processing of multiple data elements into the physical thread for execution based on associated precision requirements for the processing environment. 12. The non-transitory computer-readable storage medium of claim 11, wherein: the number of units of work packed into the physical thread are determined based on compiler analysis or explicit qualification in a source language; andthe mix of precision requirements is presented for the individual units of work in the source language to reduce fragmentation or scattering of contents in the register file. 13. The non-transitory computer-readable storage medium of claim 11, further comprising: selectively replicating and packing instructions into the physical thread for units of work with precision requirements less than a basis precision of an instruction set architecture of the processing environment. 14. The non-transitory computer-readable storage medium of claim 13, further comprising: narrowing or widening operands of instructions to be applied as necessary to ensure consistent precision types of instruction inputs according to precision requirements of an instruction output; andpacking instruction inputs from specified registers together into a destination register according to the determined value of the braiding factor. 15. The non-transitory computer-readable storage medium of claim 11, further comprising: handling flow control divergence per unit of work within packed instructions with predication masks. 16. The non-transitory computer-readable storage medium of claim 15, wherein handling flow control further comprises: designating a predication mask for each unit of work within packed instructions to manage independent determination of branch outcomes for each unit of work;determining whether each unit of work within packed instructions is currently active or should be reactivated with the predication mask; andproviding tracking information for each unit of work within packed instructions for inspection and modification by an executing program. 17. The non-transitory computer-readable storage medium of claim 10, wherein the processing environment is included in a graphics processing unit (GPU) of a mobile electronic device. 18. A graphics processor for an electronic device comprising: one or more processing elements coupled to a memory device, wherein the one or more processing elements are configured to:determine a braiding factor as a number of units of work encoded into a physical thread;determine a value of the braiding factor based on a mix of precision requirements presented for individual units of work prior to processing, a unit of work being a set of input data or instructions for processing;classify units of work as instructions for applied code transformation based on associated precision requirements for the processing environment, which comprises a single instruction multiple thread (SIMT) or single instruction multiple data (SIMD) processing architecture;pack instruction inputs from specified registers together into a destination register according to the determined value of the braiding factor; andexecute the packed instructions presented in vector form with an instruction set architecture configured for executing packed instructions of different precisions,wherein the one or more processing elements are further configured to: replicate instructions for the units of work with precision requirements greater than precision requirements corresponding to the determined value of the braiding factor as multiple threads for SIMT or SIMD processing, the replicated instructions being identical instructions for processing by neighboring threads. 19. The graphics processor of claim 18, wherein multiple units of work are packed for parallel processing of multiple data elements into the physical thread for execution based on associated precision requirements for the processing environment. 20. The graphics processor of claim 19, wherein the number of units of work packed into the physical thread is determined based on compiler analysis or explicit qualification in a source language, and the mix of precision requirements are presented for the individual units of work in the source language to reduce fragmentation or scattering of contents in the register file. 21. The graphics processor of claim 20, wherein the one or more processing elements are further configured to: selectively replicate and pack instructions into the physical thread for units of work with precision requirements less than a basis precision of an instruction set architecture of the processing environment. 22. The graphics processor of claim 19, wherein the one or more processing elements are further configured to: narrow or widen operands of instructions to be applied as necessary to ensure consistent precision types of instruction inputs according to precision requirements of an instruction output;pack instruction inputs from specified registers together into a destination register according to the determined value of the braiding factor; andhandle flow control divergence per unit of work within packed instructions with predication masks. 23. The graphics processor of claim 22, wherein the one or more processing elements are further configured to: designate a predication mask for each unit of work within packed instructions to manage independent determination of branch outcomes for each unit of work;determine whether each unit of work within packed instructions is currently active or should be reactivated with the predication mask; andprovide tracking information for each unit of work within packed instructions for inspection and modification by an executing program. 24. The graphics processor of claim 18, wherein the electronic device comprises a mobile electronic device.

LOADING...

이 특허에 인용된 특허 (15)

Crow,Franklin C.; Montrym,John S.; Craighead,Matthew J., Apparatus, system, and method for gamma correction of smoothed primitives.
상세보기
May, Michael David, Compact instruction set encoding.
상세보기
Fahs, Brian; Nickolls, John R.; Moreton, Henry Packard; Coon, Brett W., Efficient implementation of arrays of structures on SIMT and SIMD architectures.
상세보기
Jones, Stephen; Gerfin, Geoffrey, Emitting coherent output from multiple threads for printf.
상세보기
Gschwind, Michael K., Generating and executing programs for a floating point single instruction multiple data instruction set architecture.
상세보기
Reid, Alastair David; Grimley-Evans, Edmund; Ford, Simon Andrew, Mapping a computer program to an asymmetric multiprocessing apparatus.
상세보기
Edwards,Stephen A., Method and apparatus for converting a concurrent control flow graph into a sequential control flow graph.
상세보기
Lu, Yan-Hong; Chang, Jia-Yang; Kuo, Pao-Hung; Chang, Chia-Chi; Tsung, Pei-Kuei, Methods and systems for managing an instruction sequence with a divergent control flow in a SIMT architecture.
상세보기
Hoyle, David, Microprocessor with instruction for saturating and packing data.
상세보기
Oberman, Stuart F.; Siu, Ming Y., Multipurpose arithmetic functional unit.
상세보기
Abdallah, Mohammad A., Parallel processing of a sequential program using hardware generated threads and their instruction groups executing on plural execution units and accessing register file segments using dependency inheritance vectors across multiple engines.
상세보기
Wilkinson Paul Amba ; Dieffenderfer James Warren ; Kogge Peter Michael ; Schoonover Nicholas Jerome, SIMD/MIMD array processor with vector processing.
상세보기
Underwood, Matthew John; Ridley, Nicholas Damon; Lapstun, Paul; Henderson, Peter Charles Boyd; Yourlo, Zhenya Alexander; Moini, Alireza; Rusman, Jan; Silverbrook, Kia, Sensing device for subsampling imaged coded data.
상세보기
Schwinn, Stephen Joseph; Tubbs, Matthew Ray; Wait, Charles David, Structural power reduction in multithreaded processor.
상세보기
Nyland, Lars; Nickolls, John R.; Hirota, Gentaro; Mandal, Tanmoy, Systems and methods for coalescing memory accesses of parallel threads.
상세보기

활용도 분석정보

상세보기

다운로드

내보내기

활용도 Top5 특허

해당 특허가 속한 카테고리에서 활용도가 높은 상위 5개 콘텐츠를 보여줍니다.
더보기 버튼을 클릭하시면 더 많은 관련자료를 살펴볼 수 있습니다.

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

[미국특허] Architecture and execution for efficient mixed precision computations in single instruction multiple data/thread (SIMD/T) devices 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (15)

활용도 분석정보

활용도 Top5 특허

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

이 특허와 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

[미국특허] Architecture and execution for efficient mixed precision computations in single instruction multiple data/thread (SIMD/T) devices 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (15)

활용도 분석정보

활용도 Top5 특허 더보기

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

이 특허와 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

활용도 Top5 특허