Efficient data loading in a data-parallel processor
원문보기
IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0973895
(2007-10-09)
|
등록번호 |
US-8438365
(2013-05-07)
|
발명자
/ 주소 |
|
출원인 / 주소 |
- Calos Fund Limited Liability Company
|
인용정보 |
피인용 횟수 :
0 인용 특허 :
7 |
초록
▼
A method of loading data into register files that correspond to respective execution units within a data-parallel processor. After receiving a first set of parameters that specify a subset of data within a first memory, the first set of parameters are compared to a plurality of sets of conditions th
A method of loading data into register files that correspond to respective execution units within a data-parallel processor. After receiving a first set of parameters that specify a subset of data within a first memory, the first set of parameters are compared to a plurality of sets of conditions that correspond to respective patterns of data. The first set of parameters is then converted to a second set of parameters in accordance with one of the sets of conditions satisfied by the first set of parameters. A sequence of memory addresses are generated based on the second set of parameters. Data is retrieved from locations within the first memory specified by the sequence of memory addresses and loaded into register files that correspond to respective execution units within a processor.
대표청구항
▼
1. A method of loading data into register files that correspond to respective execution units within a processor, the method comprising: comparing a first set of parameters that specify a subset of data to be accessed from among a larger set of data within a first memory to a plurality of sets of co
1. A method of loading data into register files that correspond to respective execution units within a processor, the method comprising: comparing a first set of parameters that specify a subset of data to be accessed from among a larger set of data within a first memory to a plurality of sets of conditions that correspond to respective patterns of data to identify one of the sets of conditions that satisfies the first set of parameters;converting the first set of parameters to a second set of parameters in accordance with the pattern of data corresponding to the one of the sets of conditions that satisfies the first set of parameters;generating, based on the second set of parameters, a sequence of memory addresses that specify the subset of data that is to be loaded from the first memory into the register files that correspond to the respective execution units within the processor; andloading the subset of data from locations within the first memory specified by the sequence of memory addresses into the register files that correspond to the respective execution units within the processor,wherein the first set of parameters that specify the subset of data to be accessed within the first memory further comprises: a count value N that indicates a number of groups of data values to be loaded from the first memory relative to a predetermined starting address;a group parameter that indicates a number of data values to be read from sequentially-addressed locations within the first memory as a group of data values;a stride parameter that indicates a value to be added to the predetermined starting address of a first group of the data values to obtain a starting address of a second group of the data values; andwherein the method further comprises:retrieving 0 to N groups of data values from the first memory, wherein N corresponds to the count value N, wherein each of the 0 to N groups are discontiguous in the first memory relative to each other, and wherein a final location of a final group of the 0 to N groups in the first memory corresponds to ((the count value N minus 1) multiplied by the stride parameter) plus (the group parameter minus 1); andcontiguously storing the 0 to N groups of the data values as transformed data in a temporary buffer. 2. The method of claim 1, wherein loading the subset of data further comprises loading the transformed data from the temporary buffer into the register files that correspond to the respective execution units within the processor. 3. The method of claim 1, wherein: the first set of parameters that specify the subset of data to be accessed within the first memory include an allocation parameter T that indicates a distribution of the 0 to N groups of the data values among the register files that correspond to the respective execution units within the processor; andloading the subset of data comprises loading T of the groups of the data values from the temporary buffer into each of the register files. 4. The method of claim 1, wherein the first set of parameters that specify the subset of data to be accessed within the first memory include an allocation parameter that indicates a distribution of the subset of data among the register files that correspond to the respective execution units within the processor. 5. The method of claim 4, wherein the step of loading the subset of data into the register files that correspond to the respective execution units within the processor comprises loading a first portion of the subset of data into one of the register files and loading a second portion of the subset of data into another of the register files, wherein each of the first and second portions of the subset of data includes a number of data values in accordance with the allocation parameter. 6. The method of claim 1, wherein the first set of parameters that specify the subset of data to be accessed within the first memory further includes an offset value that indicates an offset between the starting address of the first group of the data values and a predetermined address. 7. The method of claim 6, wherein the first set of parameters that specify the subset of data to be accessed within the first memory further includes a pointer to a sequence of one or more address values, and the sequence of one or more address values includes the predetermined address. 8. The method of claim 1, wherein the first set of parameters that specify the subset of data to be accessed within the first memory further includes an allocation parameter that indicates a number of execution units among which the first group of the data values is to be allocated. 9. The method of claim 1, wherein the first set of parameters that specify the subset of data to be accessed within the first memory further includes a count value that indicates a number of group values to be loaded from the first memory relative to a predetermined starting address. 10. A non-transitory computer-readable medium having instructions stored thereon, the instructions comprising: instructions for comparing a first set of parameters that specify a subset of data to be accessed from among a larger set of data within a first memory to a plurality of sets of conditions that correspond to respective patterns of data to identify one of the sets of conditions that satisfies the first set of parameters;instructions for converting the first set of parameters to a second set of parameters in accordance with the pattern of data corresponding to the one of the sets of conditions that satisfies the first set of parameters;instructions for generating, based on the second set of parameters, a sequence of memory addresses that specify the subset of data that is to be loaded from the first memory into register files that correspond to respective execution units within a processor; andinstructions for loading the subset of data from locations within the first memory specified by the sequence of memory addresses into the register files that correspond to the respective execution units within the processor,wherein the first set of parameters that specify the subset of data to be accessed within the first memory further comprises: a count value N that indicates a number of groups of data values to be loaded from the first memory relative to a predetermined starting address;a group parameter that indicates a number of data values to be read from sequentially-addressed locations within the first memory as a group of data values;a stride parameter that indicates a value to be added to the predetermined starting address of a first group of the data values to obtain a starting address of a second group of the data values; andwherein the instructions further comprise:instructions for retrieving 0 to N groups of data values from the first memory, wherein N corresponds to the count value N, wherein each of the 0 to N groups are discontiguous in the first memory relative to each other, and wherein a final location of a final group of the 0 to N groups in the first memory corresponds to ((the count value N minus 1) multiplied by the stride parameter) plus (the group parameter minus 1); andinstructions for contiguously storing the 0 to N groups of the data values as transformed data in a temporary buffer. 11. The non-transitory computer-readable medium of claim 10, wherein the instructions for loading the subset of data further comprises instructions for loading the transformed data from the temporary buffer into the register files that correspond to the respective execution units within the processor. 12. The non-transitory computer-readable medium of claim 10, wherein: the first set of parameters that specify the subset of data to be accessed within the first memory include an allocation parameter T that indicates a distribution of the 0 to N groups of the data values among the register files that correspond to the respective execution units within the processor; andthe instructions for loading the subset of data comprises instructions for loading T of the groups of the data values from the temporary buffer into each of the register files. 13. The non-transitory computer-readable medium of claim 10, wherein the first set of parameters that specify the subset of data to be accessed within the first memory include an allocation parameter that indicates a distribution of the subset of data among the register files that correspond to the respective execution units within the processor. 14. The non-transitory computer-readable medium of claim 13, wherein the instructions for loading the subset of data into the register files that correspond to the respective execution units within the processor comprise instructions for loading a first portion of the subset of data into one of the register files and loading a second portion of the subset of data into another of the register files, wherein each of the first and second portions of the subset of data includes a number of data values in accordance with the allocation parameter. 15. The non-transitory computer-readable medium of claim 10, wherein the first set of parameters that specify the subset of data to be accessed within the first memory further include an offset value that indicates an offset between the starting address of the first group of the data values and a predetermined address. 16. The non-transitory computer-readable medium of claim 15, wherein the first set of parameters that specify the subset of data to be accessed within the first memory include a pointer to a sequence of one or more address values, and the sequence of one or more address values includes the predetermined address. 17. The non-transitory computer-readable medium of claim 10, wherein the first set of parameters that specify the subset of data to be accessed within the first memory further include an allocation parameter that indicates a number of execution units among which the first group of the data values is to be allocated. 18. The non-transitory computer-readable medium of claim 10, wherein the first set of parameters that specify the subset of data to be accessed within the first memory further include a count value that indicates a number of group values to be loaded from the first memory relative to a predetermined starting address.
이 특허에 인용된 특허 (7)
-
Moberg, Kenneth; Stine, Arthur B.; Kon, Ronnie Bernard, Apparatus and method for improving performance of critical code execution.
-
Scales ; III Hunter Ledbetter ; Diefendorff Keith Everett ; Olsson Brett ; Dubey Pradeep Kumar ; Hochsprung Ronald Ray ; Beavers Bradford Byron ; Burgess Bradley G. ; Snyder Michael Dean ; May Cathy , Data processing system for processing vector data and method therefor.
-
Ryan, Charles P.; Yoder, Ron; Shelly, William A., Data processing system processor dynamic selection of internal signal tracing.
-
Zurawski John H. (Stow MA) Beach Walter A. (Bedford MA), High speed transfer of instructions from a master to a slave processor.
-
Alexander, III, William Preston; Berry, Robert Francis; Levine, Frank Eliot; Urquhart, Robert John, Method and system for merging event-based data and sampled data into postprocessed trace output.
-
Davis, Gordon Taylor; Heddes, Marco C.; Leavens, Ross Boyd; Rinaldi, Mark Anthony, Multiple logical interfaces to a shared coprocessor resource.
-
Kennedy A. Richard ; Croxton Cody B., Pipelined processor operating in different power mode based on branch prediction state of branch history bit encoded as.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.