[특허]Gather/scatter of multiple data elements with packed loading/storing into/from a register file entry

Gather/scatter of multiple data elements with packed loading/storing into/from a register file entry 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-009/30
출원번호	US-0566141 (2012-08-03)
등록번호	US-9632777 (2017-04-25)
발명자 / 주소	Fleischer, Bruce M. Fox, Thomas W. Jacobson, Hans M. Moreno, Jaime H. Nair, Ravi Prener, Daniel A.
출원인 / 주소	INTERNATIONAL BUSINESS MACHINES CORPORATION
대리인 / 주소	Cantor Colburn LLP
인용정보	피인용 횟수 : 0 인용 특허 : 43

초록 ▼

Embodiments relate to packed loading and storing of data. An aspect includes a method for packed loading and storing of data distributed in a system that includes memory and a processing element. The method includes fetching and decoding an instruction for execution by the processing element. The processing element gathers a plurality of individually addressable data elements from non-contiguous locations in the memory which are narrower than a nominal width of register file elements in the processing element based on the instruction. The data elements are packed and loaded into register file elements of a register file entry by the processing element based on the instruction, such that at least two of the data elements gathered from the non-contiguous locations in the memory are packed and loaded into a single register file element of the register file entry.

대표청구항 ▼

1. A method for packed loading and storing of data distributed in a system that includes memory and a processing element, the processing element comprising a vector computation register file comprising a plurality of register file entries, each of the register file entries comprising a plurality of register file elements, and the processing element further comprising a scalar computation register file comprising a same number of register file entries as the vector computation register file, each register file entry of the scalar computation register file with one register file element, the method comprising: fetching and decoding an instruction for execution by the processing element, wherein the instruction comprises at least one sub-instruction accessing the vector computation register file and at least one sub-instruction accessing the scalar computation register file in parallel;distributing in each of at least two separate register file elements of a first register file entry of the register file entries of the vector computation register file, an address of one of a first plurality of non-contiguous locations in the memory of a plurality of data elements to gather;gathering, by the processing element based on the instruction and addresses, the data elements from the first plurality of non-contiguous locations in the memory, wherein each of the data elements is individually addressable and is narrower than a width of the register file elements of the vector computation register file in the processing element; andpacking and loading the data elements into one or more of the register file elements of a second register file entry of the register file entries of the vector computation register file by the processing element based on the instruction, such that at least two of the data elements gathered from the first plurality of non-contiguous locations in the memory are packed and loaded into a single register file element of the second register file entry as a plurality of packed data elements in parallel with performing the at least one sub-instruction accessing the register file entries of the scalar computation register file. 2. The method of claim 1, wherein each of the addresses comprises a greater width than a width of each of the data elements. 3. The method of claim 2, wherein packing and loading the data elements into the one or more of the register file elements further comprises distributing the packed data elements into fewer register file elements than are consumed by the addresses, and the second register file entry is subsequent to the first register file entry in the vector computation register file. 4. The method of claim 1, further comprising: packing at least a first number of the data elements to fill the width of the register file elements of the vector computation register file; andpacking a second number of the data elements into less than the width of the register file elements of the vector computation register file. 5. The method of claim 1, further comprising: fetching and decoding a second instruction for execution by the processing element;unpacking the data elements from the second register file entry by the processing element based on the second instruction; andscattering and storing, by the processing element, the data elements to a second plurality of non-contiguous locations in the memory based on the second instruction. 6. The method of claim 5, wherein a plurality of addresses of the second plurality of non-contiguous locations in the memory of the data elements to scatter are distributed in separate register file elements of the vector computation register file. 7. The method of claim 6, wherein the addresses of the second plurality of non-contiguous locations in the memory of the data elements to scatter are different addresses than the addresses of the first plurality of non-contiguous locations in the memory from which the data elements are gathered. 8. The method of claim 6, wherein unpacking the data elements from the second register file entry by the processing element based on the second instruction further comprises reading the packed data elements as narrower data types and reading the plurality of addresses of the second plurality of non-contiguous locations in the memory as wider data types as compared to the narrower data types. 9. The method of claim 6, further comprising: unpacking at least a first number of the data elements filling the width of the register file elements; andunpacking a second number of the data elements occupying less than the width of the register file elements. 10. A method for packed loading and storing of data distributed in an active memory device that includes memory and a processing element, the processing element comprising a vector computation register file comprising a plurality of register file entries, each of the register file entries comprising a plurality of register file elements, and the processing element further comprising a scalar computation register file comprising a same number of register file entries as the vector computation register file, each register file entry of the scalar computation register file with one register file element, the method comprising: fetching and decoding an instruction from an instruction buffer in the processing element for execution by the processing element, wherein the instruction comprises at least one sub-instruction accessing the vector computation register file and at least one sub-instruction accessing the scalar computation register file in parallel;unpacking a plurality of data elements loaded in one or more of the register file elements of a second register file entry of the register file entries of the vector computation register file based on the instruction, wherein at least two of the plurality of data elements are unpacked from a single register file element of the second register file entry; andscattering and storing, by the processing element, the plurality of data elements to a plurality of non-contiguous locations in the memory based on the instruction and a plurality of addresses in parallel with performing the at least one sub-instruction accessing the register file entries of the scalar computation register file, wherein each of the plurality of data elements is individually addressable and is narrower than a width of the plurality of register file elements of the vector computation register file, wherein an address of one of the non-contiguous locations in the memory of each of the data elements to scatter is distributed in each of at least two separate register file elements of a first register file entry of the register file entries of the vector computation register file. 11. The method of claim 10, wherein the active memory device is a three-dimensional memory cube, the memory is divided into three-dimensional blocked regions as memory vaults, and the non-contiguous locations in the memory are accessed through one or more memory controllers in the active memory device. 12. The method of claim 10, wherein the unpacking of the plurality of data elements is performed by a load-store unit in parallel with instruction processing by an arithmetic-logic unit. 13. The method of claim 12, wherein the vector computation register file is accessible by the load-store unit and the arithmetic-logic unit, wherein each of the plurality of addresses comprises a greater width than a width of each of the plurality of data elements. 14. The method of claim 13, wherein unpacking the plurality of data elements from the one or more of the register file elements by the processing element further comprises reading packed data elements as narrower data types and reading the plurality of addresses as wider data types as compared to the narrower data types. 15. The method of claim 13, further comprising: fetching and decoding a second instruction from the instruction buffer of the processing element for execution by the processing element;gathering, by the processing element, a set of data elements from a set of non-contiguous locations in the memory based on the second instruction; andpacking and loading the set of data elements into a set of register file elements of a third register file entry of the vector computation register file by the processing element based on the second instruction. 16. The method of claim 15, wherein packing and loading the set of data elements into the set of register file elements by the processing element based on the second instruction further comprises distributing the set of packed data elements into fewer register file elements than are consumed by a plurality of addresses of the set of the non-contiguous locations in the memory to gather. 17. The method of claim 16, wherein the plurality of addresses of the non-contiguous locations in the memory of the data elements to scatter are different addresses than the plurality of addresses of the set of non-contiguous locations in the memory from which the set of data elements are gathered. 18. The method of claim 15, further comprising: packing at least a first number of the set of data elements to fill a width of the set of register file elements; andpacking a second number of the set of data elements into less than the width of the set of register file elements.

이 특허에 인용된 특허 (43)

Cutler David N. (Bellevue WA) Orbits David A. (Redmond WA) Bhandarkar Dileep (Shrewsbury MA) Cardoza Wayne (Merrimack NH) Witek Richard T. (Littleton MA), Apparatus and method for recovering from missing page faults in vector data processing operations.
상세보기
Gostin Gary B. ; Barr Matthew F. ; McGuffey Ruth A. ; Roan Russell L., Apparatus, systems and method for improving memory bandwidth utilization in vector processing systems.
상세보기
Sandorfi,Miklos, Central processing unit.
상세보기
Clark, Lawrence T.; Patterson, Dan W., Circuits and methods for processors with multiple redundancy techniques for mitigating radiation errors.
상세보기
Arya Siamak, Conditional vector processing.
상세보기
Fleck Rod G. ; Mattela Venkat ; Chesters Eric ; Afsar Muhammad, Data processing device with loop pipeline.
상세보기
Morton Steven G., Digital signal processor containing scalar processor and a plurality of vector processors operating from a single instruction.
상세보기
Mimar, Tibet, Efficient handling of vector high-level language conditional constructs in a SIMD processor.
상세보기
Papworth David B. ; Hinton Glenn J. ; Fetterman Michael A. ; Colwell Robert P. ; Glew Andrew F., Exception handling in a processor that performs speculative out-of-order instruction execution.
상세보기
Raikin, Shlomo; Valentine, Robert, Gather cache architecture.
상세보기
Ferren, Bran; Hillis, W. Daniel; Mangione-Smith, William Henry; Myhrvold, Nathan P.; Tegreene, Clarence T; Wood, Jr., Lowell L., Hardware-error tolerant computing.
상세보기
Ferren,Bran; Hillis,W. Daniel; Mangione Smith,William Henry; Myhrvold,Nathan P.; Tegreene,Clarence T.; Wood, Jr.,Lowell L., Hardware-error tolerant computing.
상세보기
Mukherjee,Shubhendu S.; Reinhardt,Steven K.; Emer,Joel S., Incremental checkpointing in a multi-threaded architecture.
상세보기
Fujii Hiroaki (Kokubunji CA JPX) Hamanaka Naoki (Palo Alto CA) Tanaka Teruo (Hachoiji JPX) Inagami Yasuhiro (Kodaira JPX) Tamaki Yoshiko (Kodaira JPX), Information processing apparatus having a register file used interchangeably both as scalar registers of register window.
상세보기
Ichimura Katsuhiko,JPX ; Nakata Takeshi,JPX ; Fukutome Goro,JPX, Information processing device and method for sequence control and data processing.
상세보기
Haigh Stephen G. (Redwood City CA) Baji Toru (Burlingame CA), Instruction preprocessor for conditionally combining short memory instructions into virtual long instructions.
상세보기
Scheuerlein, Roy E., Integrated circuit incorporating dual organization memory array.
상세보기
Thayer John S. ; Favor John G. ; Weber Frederick D., Load and store instructions which perform unpacking and packing of data bits in separate vector and integer cache storage.
상세보기
Luick, David Arnold; Mejdrich, Eric Oliver; Muff, Adam James, Load misaligned vector with permute and mask insert.
상세보기
Liao, Yu-Chung C.; Sandon, Peter A.; Cheng, Howard; Van Hook, Timothy J., Method and apparatus for obtaining a scalar value directly from a vector register.
상세보기
O'Connor, James Michael; Tremblay, Marc, Method frame storage using multiple memory circuits.
상세보기
Thomas L. Drabenstott ; Gerald G. Pechanek ; Edwin F. Barry ; Charles W. Kurak, Jr., Methods and apparatus to support conditional execution in a VLIW-based array processor with subword execution.
상세보기
Anderson, Timothy D.; Hoyle, David; Steiss, Donald E.; Krueger, Steven D., Microprocessor with non-aligned memory access.
상세보기
Gschwind, Michael K.; Olsson, Brett, Multi-addressable register file.
상세보기
Cho Seongrai ; Park Heonchul ; Song Seungyoon Peter, Multifunction data aligner in wide data width processor.
상세보기
Clery ; III William B., Multiple thread multiple data predictive coded parallel processing system and method.
상세보기
Dorojevets,Mikhail; Ogura,Eiji, Parallel vector processing.
상세보기
Reinhardt,Steven K.; Mukherjee,Shubhendu S.; Emer,Joel S., Periodic checkpointing in a redundantly multi-threaded architecture.
상세보기
Gschwind, Michael Karl; Hofstee, Harm Peter; Hopkins, Martin Edward, SIMD datapath coupled to scalar/vector/address/conditional data register file with selective subpath scalar processing mode.
상세보기
Brodnax Timothy B. (Austin TX) Bialas ; Jr. John S. (Bealeton VA) King Steven A. (Herndon VA) LeBlanc Johnny J. (Austin TX) Rickard Dale A. (Manassas VA) Spencer Clark J. (Praha CSX) Stanley Daniel L, Shadow register file for instruction rollback.
상세보기
Gower,Kevin C.; Kellogg,Mark W.; Maule,Warren E.; Smith, III,Thomas B.; Tremaine,Robert B., System, method and storage medium for providing data caching and data compression in a memory subsystem.
상세보기
Tremaine, Robert B., Systems and methods for providing data modification operations in memory subsystems.
상세보기
Gower, Kevin C.; Maule, Warren E.; Tremaine, Robert B., Systems and methods for providing distributed technology independent memory controllers.
상세보기
Zumkehr, John F.; Abouelnaga, Amir A., Systems and methods for use in reduced instruction set computer processors for retrying execution of instructions resulting in errors.
상세보기
Sandon, Peter A.; West, R. Michael P., Two dimensional addressing of a matrix-vector register array.
상세보기
Green Thomas S., Using three-dimensional storage to make variable-length instructions appear uniform in two dimensions.
상세보기
Hui, Ronald Chi-Chun, Vector processing with high execution throughput.
상세보기
Beard Douglas R. (Eleva WI) Phelps Andrew E. (Eau Claire WI) Woodmansee Michael A. (Eau Claire WI) Blewett Richard G. (Altoona WI) Lohman Jeffrey A. (Eau Claire WI) Silbey Alexander A. (Eau Claire WI, Vector processor having registers for control by vector resisters.
상세보기
Kashiyama Masamori (Hadano JPX) Ishii Koichi (Hadano JPX) Kawabe Shun (Machida JPX) Usami Masami (Ome JPX), Vector processor performing data operations in one half of a total time period of write operation and the read operation.
상세보기
Elwood Matthew Paul ; Hinds Christopher Neal, Vector register addressing.
상세보기
Glossner, III,Clair John; Hokenek,Erdem; Meltzer,David; Moudgill,Mayan, Vector register file with arbitrary vector addressing.
상세보기
Fossum Tryggve (Northboro MA) Manley Dwight P. (Holliston MA) McKeen Francis X. (Westboro MA) Tehranian Michael M. (Boxboro MA), Vector register system for executing plural read/write commands concurrently and independently routing data to plural re.
상세보기
Oberlin Steven M. ; Fromm Eric C. ; Passint Randal S., Virtual to logical to physical address translation for distributed memory massively parallel processing systems.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Gather/scatter of multiple data elements with packed loading/storing into/from a register file entry 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (43)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Gather/scatter of multiple data elements with packed loading/storing into/from a register file entry 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (43)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트