[특허]Vector processing in an active memory device

Vector processing in an active memory device 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-015/00 G06F-001/04 G06F-009/30 G06F-009/38 G06F-015/80
출원번호	US-0569359 (2012-08-08)
등록번호	US-9535694 (2017-01-03)
발명자 / 주소	Fleischer, Bruce M. Fox, Thomas W. Jacobson, Hans M. Nair, Ravi Prener, Daniel A.
출원인 / 주소	INTERNATIONAL BUSINESS MACHINES CORPORATION
대리인 / 주소	Cantor Colburn LLP
인용정보	피인용 횟수 : 0 인용 특허 : 42

초록 ▼

Embodiments relate to vector processing in an active memory device. An aspect includes a system for vector processing in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instructions to execute in parallel. An iteration count to repeat execution of the sub-instructions in parallel is determined. Execution of the sub-instructions is repeated in parallel for multiple iterations, by the processing element, based on the iteration count. Multiple locations in the memory are accessed in parallel based on the execution of the sub-instructions.

대표청구항 ▼

1. A system for vector processing in an active memory device, the system comprising: a memory in the active memory device; anda processing element in the active memory device, the processing element configured to perform a method comprising: decoding, in the processing element, an instruction comprising a plurality of sub-instructions to execute in parallel;determining an iteration count to repeat execution of the sub-instructions in parallel based on decoding an iteration count source field of the instruction that defines whether to set the iteration count based on an iteration count field of the instruction or based on an iteration count register;repeating execution of the sub-instructions in parallel for multiple iterations, by the processing element, based on the iteration count;accessing multiple locations in the memory in parallel based on the execution of the sub-instructions;identifying a lane control sub-instruction in the instruction based on the decoding of the instruction, the lane control sub-instruction controlling a sequence of instruction execution and positioned in parallel with the sub-instructions to execute in parallel; andexecuting the lane control sub-instruction, by the processing element, only once after execution of the sub-instructions is performed in parallel for multiple iterations. 2. The system of claim 1, wherein the sub-instructions comprise at least a pair of a memory access sub-instruction in parallel with an arithmetic-logical sub-instruction, and the processing element is further configured to perform: flowing the memory access sub-instruction to a load-store unit in the processing element; andflowing the arithmetic-logical sub-instruction to an arithmetic logic unit in the processing element to execute the memory access sub-instruction in parallel with the arithmetic-logical sub-instruction. 3. The system of claim 2, wherein the processing element is further configured to perform: accessing one or more of: a vector computation register file and a scalar computation register file in the processing element for operands to execute the memory access sub-instruction in the load-store unit; andaccessing one or more of: the vector computation register file and the scalar computation register file in the processing element for operands to execute the arithmetic-logical sub-instruction in the arithmetic logic unit. 4. The system of claim 3, wherein the processing element is further configured to perform: partitioning at least one of the operands as a plurality of sub-elements based on a data type of the arithmetic-logical sub-instruction;performing, by the arithmetic logic unit, an operation of the arithmetic-logical sub-instruction in parallel execution slots on each of the sub-elements; andcomputing, by the load-store unit, an address per sub-element. 5. The system of claim 3, wherein the processing element is further configured to perform: flowing an output of the load-store unit to one or more of: the load-store unit, an effective-to-real address translation unit, a load-store queue, the vector computation register file, and the scalar computation register file; andflowing an output of the arithmetic logic unit to one or more of: the arithmetic logic unit, the load-store unit, the vector computation register file, and the scalar computation register file. 6. The system of claim 3, wherein the processing element is partitioned into multiple processing slices operable in parallel, each processing slice comprising a pair of the load-store unit and the arithmetic logic unit, and an associated pair of the vector computation register file and the scalar computation register file, and the processing element is further configured to perform: flowing an output of the arithmetic logic unit of one processing slice to an input of one or more of: the load-store unit and the arithmetic logic unit. 7. The system of claim 3, wherein the processing element is further configured to perform: performing an error check on the operands prior to executing the memory access sub-instruction and the arithmetic-logical sub-instruction. 8. The system of claim 1, wherein the lane control sub-instruction is a branch sub-instruction executed by the processing element during execution of a last iteration of the instruction based on conditions evaluated during execution of a first element of the instruction. 9. A system for vector processing in an active memory device, the system comprising: a memory in the active memory device; anda processing element in the active memory device, the processing element configured to perform a method comprising: receiving, in the processing element, a command from a requestor;fetching, in the processing element, an instruction based on the command, the instruction being fetched from an instruction buffer in the processing element;decoding, in the processing element, the instruction comprising a plurality of sub-instructions to execute in parallel;determining an iteration count to repeat execution of the sub-instructions in parallel based on decoding an iteration count source field of the instruction that defines whether to set the iteration count based on an iteration count field of the instruction or based on an iteration count register;repeating execution of the sub-instructions in parallel for multiple iterations, by the processing element, based on the iteration count;accessing multiple locations in the memory in parallel based on the execution of the sub-instructions;identifying a lane control sub-instruction in the instruction based on the decoding of the instruction, the lane control sub-instruction controlling a sequence of instruction execution and positioned in parallel with the sub-instructions to execute in parallel; andexecuting the lane control sub-instruction, by the processing element, only once after execution of the sub-instructions is performed in parallel for multiple iterations. 10. The system of claim 9, wherein the processing element is further configured to perform: fetching a special instruction from the instruction buffer to load a new instruction from the memory; andreplacing an entry in the instruction buffer with the new instruction based on executing the special instruction. 11. The system of claim 9, wherein the active memory device is a three-dimensional memory cube, the memory is divided into three-dimensional blocked regions as memory vaults, and accessing multiple locations in the memory is performed through one or more memory controllers in the active memory device. 12. The system of claim 9, wherein the sub-instructions comprise at least a pair of a memory access sub-instruction in parallel with an arithmetic-logical sub-instruction, and the processing element is further configured to perform: flowing the memory access sub-instruction to a load-store unit in the processing element; andflowing the arithmetic-logical sub-instruction to an arithmetic logic unit in the processing element to execute the memory access sub-instruction in parallel with the arithmetic-logical sub-instruction. 13. The system of claim 12, wherein the processing element is further configured to perform: accessing one or more of: a vector computation register file and a scalar computation register file in the processing element for operands to execute the memory access sub-instruction in the load-store unit; andaccessing one or more of: the vector computation register file and the scalar computation register file in the processing element for operands to execute the arithmetic-logical sub-instruction in the arithmetic logic unit. 14. The system of claim 13, wherein the processing element is further configured to perform: partitioning at least one of the operands as a plurality of sub-elements based on a data type of the arithmetic-logical sub-instruction;performing, by the arithmetic logic unit, an operation of the arithmetic-logical sub-instruction in parallel execution slots on each of the sub-elements; andcomputing, by the load-store unit, an address per sub-element. 15. The system of claim 13, wherein the processing element is further configured to perform: performing an error check on the operands prior to executing the memory access sub-instruction and the arithmetic-logical sub-instruction;based on detecting a correctable error, freezing instruction processing, fixing the correctable error, and resuming instruction processing; andbased on detecting an uncorrectable error, freezing instruction processing and notifying a main processor. 16. The system of claim 13, wherein the processing element is further configured to perform: generating an address for the memory access sub-instruction;translating the generated address to a real address of the memory;checking for an address translation fault based on translating the generated address; andbased on identifying the address translation fault, freezing instruction processing, notifying the main processor, waiting for a response from the main processor, fixing a problem causing the address translation fault, and resuming instruction processing. 17. The system of claim 13, wherein the processing element is further configured to perform: detecting an exception based on executing the arithmetic-logical sub-instruction; andbased on detecting the exception, freezing instruction processing and notifying the main processor. 18. The system of claim 13, wherein the processing element is further configured to perform: decrementing the iteration count based on executing the memory access sub-instruction and the arithmetic-logical sub-instruction;based on decrementing the iteration count to zero, decoding the lane control sub-instruction from the instruction;based on determining that the lane control sub-instruction is one of: a return sub-instruction and a pause sub-instruction, freezing instruction processing and notifying a main processor; andbased on determining that the lane control sub-instruction is one of: a branch sub-instruction and a no-operation sub-instruction, adjusting a current instruction address to identify a next instruction in the instruction buffer.

이 특허에 인용된 특허 (42)

Cutler David N. (Bellevue WA) Orbits David A. (Redmond WA) Bhandarkar Dileep (Shrewsbury MA) Cardoza Wayne (Merrimack NH) Witek Richard T. (Littleton MA), Apparatus and method for recovering from missing page faults in vector data processing operations.
상세보기
Gostin Gary B. ; Barr Matthew F. ; McGuffey Ruth A. ; Roan Russell L., Apparatus, systems and method for improving memory bandwidth utilization in vector processing systems.
상세보기
Sandorfi,Miklos, Central processing unit.
상세보기
Clark, Lawrence T.; Patterson, Dan W., Circuits and methods for processors with multiple redundancy techniques for mitigating radiation errors.
상세보기
Arya Siamak, Conditional vector processing.
상세보기
Fleck Rod G. ; Mattela Venkat ; Chesters Eric ; Afsar Muhammad, Data processing device with loop pipeline.
상세보기
Morton Steven G., Digital signal processor containing scalar processor and a plurality of vector processors operating from a single instruction.
상세보기
Mimar, Tibet, Efficient handling of vector high-level language conditional constructs in a SIMD processor.
상세보기
Papworth David B. ; Hinton Glenn J. ; Fetterman Michael A. ; Colwell Robert P. ; Glew Andrew F., Exception handling in a processor that performs speculative out-of-order instruction execution.
상세보기
Raikin, Shlomo; Valentine, Robert, Gather cache architecture.
상세보기
Ferren, Bran; Hillis, W. Daniel; Mangione-Smith, William Henry; Myhrvold, Nathan P.; Tegreene, Clarence T; Wood, Jr., Lowell L., Hardware-error tolerant computing.
상세보기
Ferren,Bran; Hillis,W. Daniel; Mangione Smith,William Henry; Myhrvold,Nathan P.; Tegreene,Clarence T.; Wood, Jr.,Lowell L., Hardware-error tolerant computing.
상세보기
Mukherjee,Shubhendu S.; Reinhardt,Steven K.; Emer,Joel S., Incremental checkpointing in a multi-threaded architecture.
상세보기
Fujii Hiroaki (Kokubunji CA JPX) Hamanaka Naoki (Palo Alto CA) Tanaka Teruo (Hachoiji JPX) Inagami Yasuhiro (Kodaira JPX) Tamaki Yoshiko (Kodaira JPX), Information processing apparatus having a register file used interchangeably both as scalar registers of register window.
상세보기
Haigh Stephen G. (Redwood City CA) Baji Toru (Burlingame CA), Instruction preprocessor for conditionally combining short memory instructions into virtual long instructions.
상세보기
Scheuerlein, Roy E., Integrated circuit incorporating dual organization memory array.
상세보기
Thayer John S. ; Favor John G. ; Weber Frederick D., Load and store instructions which perform unpacking and packing of data bits in separate vector and integer cache storage.
상세보기
Luick, David Arnold; Mejdrich, Eric Oliver; Muff, Adam James, Load misaligned vector with permute and mask insert.
상세보기
Liao, Yu-Chung C.; Sandon, Peter A.; Cheng, Howard; Van Hook, Timothy J., Method and apparatus for obtaining a scalar value directly from a vector register.
상세보기
O'Connor, James Michael; Tremblay, Marc, Method frame storage using multiple memory circuits.
상세보기
Thomas L. Drabenstott ; Gerald G. Pechanek ; Edwin F. Barry ; Charles W. Kurak, Jr., Methods and apparatus to support conditional execution in a VLIW-based array processor with subword execution.
상세보기
Anderson, Timothy D.; Hoyle, David; Steiss, Donald E.; Krueger, Steven D., Microprocessor with non-aligned memory access.
상세보기
Gschwind, Michael K.; Olsson, Brett, Multi-addressable register file.
상세보기
Cho Seongrai ; Park Heonchul ; Song Seungyoon Peter, Multifunction data aligner in wide data width processor.
상세보기
Clery ; III William B., Multiple thread multiple data predictive coded parallel processing system and method.
상세보기
Dorojevets,Mikhail; Ogura,Eiji, Parallel vector processing.
상세보기
Reinhardt,Steven K.; Mukherjee,Shubhendu S.; Emer,Joel S., Periodic checkpointing in a redundantly multi-threaded architecture.
상세보기
Gschwind, Michael Karl; Hofstee, Harm Peter; Hopkins, Martin Edward, SIMD datapath coupled to scalar/vector/address/conditional data register file with selective subpath scalar processing mode.
상세보기
Brodnax Timothy B. (Austin TX) Bialas ; Jr. John S. (Bealeton VA) King Steven A. (Herndon VA) LeBlanc Johnny J. (Austin TX) Rickard Dale A. (Manassas VA) Spencer Clark J. (Praha CSX) Stanley Daniel L, Shadow register file for instruction rollback.
상세보기
Gower,Kevin C.; Kellogg,Mark W.; Maule,Warren E.; Smith, III,Thomas B.; Tremaine,Robert B., System, method and storage medium for providing data caching and data compression in a memory subsystem.
상세보기
Tremaine, Robert B., Systems and methods for providing data modification operations in memory subsystems.
상세보기
Gower, Kevin C.; Maule, Warren E.; Tremaine, Robert B., Systems and methods for providing distributed technology independent memory controllers.
상세보기
Zumkehr, John F.; Abouelnaga, Amir A., Systems and methods for use in reduced instruction set computer processors for retrying execution of instructions resulting in errors.
상세보기
Sandon, Peter A.; West, R. Michael P., Two dimensional addressing of a matrix-vector register array.
상세보기
Green Thomas S., Using three-dimensional storage to make variable-length instructions appear uniform in two dimensions.
상세보기
Hui, Ronald Chi-Chun, Vector processing with high execution throughput.
상세보기
Beard Douglas R. (Eleva WI) Phelps Andrew E. (Eau Claire WI) Woodmansee Michael A. (Eau Claire WI) Blewett Richard G. (Altoona WI) Lohman Jeffrey A. (Eau Claire WI) Silbey Alexander A. (Eau Claire WI, Vector processor having registers for control by vector resisters.
상세보기
Kashiyama Masamori (Hadano JPX) Ishii Koichi (Hadano JPX) Kawabe Shun (Machida JPX) Usami Masami (Ome JPX), Vector processor performing data operations in one half of a total time period of write operation and the read operation.
상세보기
Elwood Matthew Paul ; Hinds Christopher Neal, Vector register addressing.
상세보기
Glossner, III,Clair John; Hokenek,Erdem; Meltzer,David; Moudgill,Mayan, Vector register file with arbitrary vector addressing.
상세보기
Fossum Tryggve (Northboro MA) Manley Dwight P. (Holliston MA) McKeen Francis X. (Westboro MA) Tehranian Michael M. (Boxboro MA), Vector register system for executing plural read/write commands concurrently and independently routing data to plural re.
상세보기
Oberlin Steven M. ; Fromm Eric C. ; Passint Randal S., Virtual to logical to physical address translation for distributed memory massively parallel processing systems.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Vector processing in an active memory device 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (42)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Vector processing in an active memory device 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (42)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트