Embodiments relate to vector processor predication in an active memory device. An aspect includes a system for vector processor predication in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element
Embodiments relate to vector processor predication in an active memory device. An aspect includes a system for vector processor predication in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions.
대표청구항▼
1. A system for vector processor predication in an active memory device, the system comprising: memory in the active memory device; anda processing element in the active memory device, the processing element comprising a vector mask register, an arithmetic logic unit, and a load store unit, the proc
1. A system for vector processor predication in an active memory device, the system comprising: memory in the active memory device; anda processing element in the active memory device, the processing element comprising a vector mask register, an arithmetic logic unit, and a load store unit, the processing element configured to perform a method comprising: setting one or more mask bits in the vector mask register in the processing element;applying the one or more mask bits by the processing element to predicate operation of the arithmetic logic unit or the load-store unit in the processing element associated with at least one of a plurality of sub-instructions;performing a compare of operands in the processing element using predication of a compare instruction to perform less than a maximum supported number of comparisons in parallel based on the one or more mask bits;storing compare results of the compare instruction as mask bit values of the vector mask register;analyzing a compare instruction syntax bit of the compare instruction to select between performing an OR-reduction and an AND-reduction on the mask bit values stored in response to performing less than the maximum supported number of comparisons in parallel by the predication of the compare instruction;reducing the mask bit values to a summary condition by performing a logical OR combination of the compare results based on determining that the OR-reduction is selected by the compare instruction syntax bit;reducing the mask bit values to the summary condition by performing a logical AND combination of the compare results based on determining that the AND-reduction is selected by the compare instruction syntax bit;writing the summary condition to a condition register; andusing the summary condition of the condition register to determine a branch direction of a conditional branch instruction in the processing element. 2. The system of claim 1, wherein applying the one or more mask bits by the processing element to predicate operation further comprises blocking one or more of: execution of at least one element of the sub-instructions and execution of at least one execution slot operating on a sub-element of at least one of the sub-instructions. 3. The system of claim 1, wherein applying the one or more mask bits by the processing element to predicate operation further comprises blocking one or more of: a memory access sub-instruction and part of an arithmetic operation. 4. The system of claim 1, wherein the processing element is further configured to perform: performing one or more of clock gating and data gating to one or more of: the arithmetic logic unit, the load-store unit, a vector computation register file, and a scalar computation register file based on the one or more mask bits. 5. The system of claim 1, wherein the processing element is further configured to perform: populating mask bit values of the vector mask register from one or more of: the memory and the arithmetic logic unit; andperforming logical operations by the processing element on the mask bit values to modify the mask bit values of the vector mask register. 6. The system of claim 1, wherein performing the logical OR combination of the compare results further comprises including a current value of the condition register in the logical OR combination of the compare results, and performing the logical AND combination of the compare results further comprises including the current value of the condition register in the logical AND combination of the compare results. 7. A system for vector processor predication in an active memory device, the system comprising: memory in the active memory device, wherein the active memory device is a three-dimensional memory cube and the memory is divided into three-dimensional blocked regions as memory vaults; anda processing element in the active memory device, the processing element comprising a vector mask register, an arithmetic logic unit, and a load store unit, the processing element configured to perform a method comprising: fetching, in the processing element, an instruction from an instruction buffer in the processing element;decoding, in the processing element, the instruction comprising a plurality of sub-instructions to execute in parallel;setting one or more mask bits in the vector mask register in the processing element;applying the one or more mask bits by the processing element to predicate operation of the arithmetic logic unit or the load-store unit in the processing element associated with at least one of the sub-instructions;performing a compare of operands in the processing element using predication of a compare instruction to perform less than a maximum supported number of comparisons in parallel based on the one or more mask bits;storing compare results of the compare instruction as mask bit values of the vector mask register;analyzing a compare instruction syntax bit of the compare instruction to select between performing an OR-reduction and an AND-reduction on the mask bit values stored in response to performing less than the maximum supported number of comparisons in parallel by the predication of the compare instruction;reducing the mask bit values to a summary condition by performing a logical OR combination of the compare results based on determining that the OR-reduction is selected by the compare instruction syntax bit;reducing the mask bit values to the summary condition by performing a logical AND combination of the compare results based on determining that the AND-reduction is selected by the compare instruction syntax bit;writing the summary condition to a condition register;using the summary condition of the condition register to determine a branch direction of a conditional branch instruction in the processing element; andaccessing the memory through one or more memory controllers in the active memory device for data operated upon by the instruction. 8. The system of claim 7, wherein applying the one or more mask bits by the processing element to predicate operation further comprises blocking one or more of: execution of at least one element of the sub-instructions and execution of at least one execution slot operating on a sub-element of at least one of the sub-instructions. 9. The system of claim 7, wherein applying the one or more mask bits by the processing element to predicate operation further comprises blocking one or more of: a memory access sub-instruction to prevent an access of the memory, and part of an arithmetic operation. 10. The system of claim 7, wherein the vector mask register is comprised of a plurality of vector mask entries, each comprising a plurality of elements of the mask bits, forming two-dimensional vector masks in the vector mask register, and further comprising: generating multiple mask bits per cycle per element based on single instruction, multiple data-in-space compare operations to form the two-dimensional vector masks in the vector mask register; andusing the two-dimensional vector masks with two-dimensional vector data, the two-dimensional vector masks corresponding to data sub-elements in the two-dimensional vector data to predicate. 11. The system of claim 7, wherein the processing element is further configured to perform: performing one or more of clock gating and data gating to one or more of: the arithmetic logic unit, the load-store unit, a vector computation register file, and a scalar computation register file based on the one or more mask bits. 12. The system of claim 7, wherein the processing element is further configured to perform: populating mask bit values of the vector mask register from one or more of: the memory and the arithmetic logic unit; andperforming logical operations by the processing element on the mask bit values to modify the mask bit values of the vector mask register. 13. The system of claim 7, wherein performing the logical OR combination of the compare results further comprises including a current value of the condition register in the logical OR combination of the compare results, and performing the logical AND combination of the compare results further comprises including the current value of the condition register in the logical AND combination of the compare results.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (43)
Cutler David N. (Bellevue WA) Orbits David A. (Redmond WA) Bhandarkar Dileep (Shrewsbury MA) Cardoza Wayne (Merrimack NH) Witek Richard T. (Littleton MA), Apparatus and method for recovering from missing page faults in vector data processing operations.
Gostin Gary B. ; Barr Matthew F. ; McGuffey Ruth A. ; Roan Russell L., Apparatus, systems and method for improving memory bandwidth utilization in vector processing systems.
Papworth David B. ; Hinton Glenn J. ; Fetterman Michael A. ; Colwell Robert P. ; Glew Andrew F., Exception handling in a processor that performs speculative out-of-order instruction execution.
Fujii Hiroaki (Kokubunji CA JPX) Hamanaka Naoki (Palo Alto CA) Tanaka Teruo (Hachoiji JPX) Inagami Yasuhiro (Kodaira JPX) Tamaki Yoshiko (Kodaira JPX), Information processing apparatus having a register file used interchangeably both as scalar registers of register window.
Haigh Stephen G. (Redwood City CA) Baji Toru (Burlingame CA), Instruction preprocessor for conditionally combining short memory instructions into virtual long instructions.
Thayer John S. ; Favor John G. ; Weber Frederick D., Load and store instructions which perform unpacking and packing of data bits in separate vector and integer cache storage.
Liao, Yu-Chung C.; Sandon, Peter A.; Cheng, Howard; Van Hook, Timothy J., Method and apparatus for obtaining a scalar value directly from a vector register.
Thomas L. Drabenstott ; Gerald G. Pechanek ; Edwin F. Barry ; Charles W. Kurak, Jr., Methods and apparatus to support conditional execution in a VLIW-based array processor with subword execution.
Gschwind, Michael Karl; Hofstee, Harm Peter; Hopkins, Martin Edward, SIMD datapath coupled to scalar/vector/address/conditional data register file with selective subpath scalar processing mode.
Brodnax Timothy B. (Austin TX) Bialas ; Jr. John S. (Bealeton VA) King Steven A. (Herndon VA) LeBlanc Johnny J. (Austin TX) Rickard Dale A. (Manassas VA) Spencer Clark J. (Praha CSX) Stanley Daniel L, Shadow register file for instruction rollback.
Gower,Kevin C.; Kellogg,Mark W.; Maule,Warren E.; Smith, III,Thomas B.; Tremaine,Robert B., System, method and storage medium for providing data caching and data compression in a memory subsystem.
Zumkehr, John F.; Abouelnaga, Amir A., Systems and methods for use in reduced instruction set computer processors for retrying execution of instructions resulting in errors.
Beard Douglas R. (Eleva WI) Phelps Andrew E. (Eau Claire WI) Woodmansee Michael A. (Eau Claire WI) Blewett Richard G. (Altoona WI) Lohman Jeffrey A. (Eau Claire WI) Silbey Alexander A. (Eau Claire WI, Vector processor having registers for control by vector resisters.
Kashiyama Masamori (Hadano JPX) Ishii Koichi (Hadano JPX) Kawabe Shun (Machida JPX) Usami Masami (Ome JPX), Vector processor performing data operations in one half of a total time period of write operation and the read operation.
Fossum Tryggve (Northboro MA) Manley Dwight P. (Holliston MA) McKeen Francis X. (Westboro MA) Tehranian Michael M. (Boxboro MA), Vector register system for executing plural read/write commands concurrently and independently routing data to plural re.
Oberlin Steven M. ; Fromm Eric C. ; Passint Randal S., Virtual to logical to physical address translation for distributed memory massively parallel processing systems.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.