IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0028007
(2008-02-07)
|
등록번호 |
US-8181168
(2012-05-15)
|
발명자
/ 주소 |
- Lee, Walter
- Gottlieb, Robert A.
- Soni, Vineet
- Agarwal, Anant
- Schooler, Richard
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
28 인용 특허 :
14 |
초록
▼
A system comprises a plurality of computation units interconnected by an interconnection network. A method for configuring the system comprises forming subsets of instructions corresponding to different portions of a program, the subsets of instructions being related according to a control flow grap
A system comprises a plurality of computation units interconnected by an interconnection network. A method for configuring the system comprises forming subsets of instructions corresponding to different portions of a program, the subsets of instructions being related according to a control flow graph; forming one or more memory analysis regions that include one or more of the subsets of instructions, where each subset of instructions is included in a single memory analysis region; analyzing each memory analysis region to partition memory objects and instructions that access the memory objects into equivalence classes such that instructions within an equivalence class only access objects in the same equivalence class; and assigning memory access instructions a given equivalence class to one of the computation units for execution on the assigned computation unit.
대표청구항
▼
1. A computer-implemented method for configuring a system that does not include hardware support for providing cache coherence among respective caches of computation units by maintaining consistency of data stored in the respective caches according to a coherence protocol, the system comprising a pl
1. A computer-implemented method for configuring a system that does not include hardware support for providing cache coherence among respective caches of computation units by maintaining consistency of data stored in the respective caches according to a coherence protocol, the system comprising a plurality of such computation units interconnected by an interconnection network, the method comprising: forming subsets of instructions corresponding to different portions of a program, the subsets of instructions being related according to a control flow graph;forming one or more memory analysis regions that include one or more of the subsets of instructions, where each subset of instructions is included in a single memory analysis region;analyzing, by one or more computers, each memory analysis region to partition memory objects and instructions that access the memory objects into equivalence classes such that instructions within an equivalence class only access objects in the same equivalence class;inserting, by the one or more computers, initialization instructions before each memory analysis region identified as a critical region and inserting, by the one or more computers, finalization instructions after each memory analysis region identified as a critical region, the initialization instructions comprising instructions for copying memory objects into private memory accessible to a single computation unit, and the finalization instructions comprising instructions for copying memory objects out of the private memory; andassigning, by the one or more computers, sets of instructions corresponding to the memory analysis regions to respective computation units for execution on the assigned computation units, with sets of instructions that include memory access instructions belonging to different equivalence classes assigned to different ones of the computation units. 2. The method of claim 1, wherein identifying a memory analysis region as a critical region comprises identifying information in the memory analysis region that indicates the memory analysis region is performance critical. 3. The method of claim 1, wherein identifying a memory analysis region as a critical region comprises determining that the memory analysis region includes at least one inner loop. 4. The method of claim 1, further comprising flushing caches of the computation units at boundaries between the memory analysis regions. 5. The method of claim 1, further comprising forming a specification of the program to be executed by the plurality of computation units based on the assigned sets of instructions. 6. A computer program, stored on a computer-readable device, for configuring a system that does not include hardware support for providing cache coherence among respective caches of computation units by maintaining consistency of data stored in the respective caches according to a coherence protocol, the system comprising a plurality of such computation units interconnected by an interconnection network, the computer program comprising instructions for causing a computer system to: form subsets of instructions corresponding to different portions of a program, the subsets of instructions being related according to a control flow graph;form one or more memory analysis regions that include one or more of the subsets of instructions, where each subset of instructions is included in a single memory analysis region;analyze each memory analysis region to partition memory objects and instructions that access the memory objects into equivalence classes such that instructions within an equivalence class only access objects in the same equivalence class;insert initialization instructions before each memory analysis region identified as a critical region and inserting finalization instructions after each memory analysis region identified as a critical region, the initialization instructions comprising instructions for copying memory objects into private memory accessible to a single computation unit, and the finalization instructions comprising instructions for copying memory objects out of the private memory; andassign sets of instructions corresponding to the memory analysis regions to respective computation units for execution on the assigned computation units, with sets of instructions that include memory access instructions belonging to different equivalence classes assigned to different ones of the computation units. 7. A system, comprising: a plurality of interconnected processor devices that do not include hardware support for providing cache coherence among respective caches of the processor devices by maintaining consistency of data stored in the respective caches according to a coherence protocol; andinformation for configuring the processor devices by forming subsets of instructions corresponding to different portions of a program, the subsets of instructions being related according to a control flow graph;forming one or more memory analysis regions that include one or more of the subsets of instructions, where each subset of instructions is included in a single memory analysis region;analyzing each memory analysis region to partition memory objects and instructions that access the memory objects into equivalence classes such that instructions within an equivalence class only access objects in the same equivalence class;inserting initialization instructions before each memory analysis region identified as a critical region and inserting finalization instructions after each memory analysis region identified as a critical region, the initialization instructions comprising instructions for copying memory objects into private memory accessible to a single computation unit, and the finalization instructions comprising instructions for copying memory objects out of the private memory; andassigning sets of instructions corresponding to the memory analysis regions to respective processor devices for execution on the assigned processor device, with sets of instructions that include memory access instructions belonging to different equivalence classes assigned to different ones of the processor devices. 8. The system of claim 7, further comprising a memory for storing the information for configuring the processor devices. 9. The system of claim 7, wherein each processor device comprises a processor, anda switch including switching circuitry to forward data received over data paths from other processor devices to the processor and to switches of other processor devices, and to forward data received from the processor to switches of other processor devices. 10. A computer-implemented method for configuring a system that includes hardware support for providing cache coherence among respective caches of computation units by maintaining consistency of data stored in the respective caches according to a coherence protocol, the system comprising a plurality of such computation units interconnected by an interconnection network, the method comprising: forming subsets of instructions corresponding to different portions of a program, the subsets of instructions being related according to a control flow graph;forming one or more memory analysis regions that include one or more of the subsets of instructions, where each subset of instructions is included in a single memory analysis region;analyzing, by one or more computers, each memory analysis region to partition memory objects and instructions that access the memory objects into equivalence classes such that instructions within an equivalence class only access objects in the same equivalence class;for arrays that have been split into sub-arrays, remapping, by the one or more computers, the sub-arrays to different cache lines of multiple cache lines, each cache line corresponding to a set of memory addresses that are copied into the cache or evicted from the cache together; andassigning, by the one or more computers, sets of instructions corresponding to the memory analysis regions to respective computation units for execution on the assigned computation units, with sets of instructions that include memory access instructions belonging to different equivalence classes assigned to different ones of the computation units. 11. The method of claim 10, further comprising forming a specification of the program to be executed by the plurality of computation units based on the assigned sets of instructions. 12. A computer program, stored on a computer-readable device, for configuring a system that includes hardware support for providing cache coherence among respective caches of computation units by maintaining consistency of data stored in the respective caches according to a coherence protocol, the system comprising a plurality of such computation units interconnected by an interconnection network, the computer program comprising instructions for causing a computer system to: form subsets of instructions corresponding to different portions of a program, the subsets of instructions being related according to a control flow graph;form one or more memory analysis regions that include one or more of the subsets of instructions, where each subset of instructions is included in a single memory analysis region;analyze each memory analysis region to partition memory objects and instructions that access the memory objects into equivalence classes such that instructions within an equivalence class only access objects in the same equivalence class;for arrays that have been split into sub-arrays, remap the sub-arrays to different cache lines of multiple cache lines, each cache line corresponding to a set of memory addresses that are copied into the cache or evicted from the cache together; andassign sets of instructions corresponding to the memory analysis regions to respective computation units for execution on the assigned computation units, with sets of instructions that include memory access instructions belonging to different equivalence classes assigned to different ones of the computation units. 13. A system, comprising: a plurality of interconnected processor devices that include hardware support for providing cache coherence among respective caches of the processor devices by maintaining consistency of data stored in the respective caches according to a coherence protocol; andinformation for configuring the processor devices by forming subsets of instructions corresponding to different portions of a program, the subsets of instructions being related according to a control flow graph;forming one or more memory analysis regions that include one or more of the subsets of instructions, where each subset of instructions is included in a single memory analysis region;analyzing each memory analysis region to partition memory objects and instructions that access the memory objects into equivalence classes such that instructions within an equivalence class only access objects in the same equivalence class;for arrays that have been split into sub-arrays, remapping the sub-arrays to different cache lines of multiple cache lines, each cache line corresponding to a set of memory addresses that are copied into the cache or evicted from the cache together; andassigning sets of instructions corresponding to the memory analysis regions to respective processor devices for execution on the assigned processor device, with sets of instructions that include memory access instructions belonging to different equivalence classes assigned to different ones of the processor devices. 14. The system of claim 13, further comprising a memory for storing the information for configuring the processor devices. 15. The system of claim 13, wherein each processor device comprises a processor, anda switch including switching circuitry to forward data received over data paths from other processor devices to the processor and to switches of other processor devices, and to forward data received from the processor to switches of other processor devices. 16. A computer-implemented method for configuring a system comprising a plurality of computation units interconnected by an interconnection network, the method comprising: forming subsets of instructions corresponding to different portions of a program, the subsets of instructions being related according to a control flow graph;forming one or more memory analysis regions that include one or more of the subsets of instructions, where each subset of instructions is included in a single memory analysis region;analyzing, by one or more computers, each memory analysis region to partition memory objects and instructions that access the memory objects into equivalence classes such that instructions within an equivalence class only access objects in the same equivalence class;identifying, by the one or more computers, one or more memory analysis regions as critical regions based on identifying information in the memory analysis regions that indicates the memory analysis region is performance critical;receiving, by the one or more computers, cache coherence information that indicates whether or not the system includes circuitry to provide cache coherence among respective caches of the computation units by maintaining consistency of data stored in the respective caches according to a coherence protocol;inserting, by the one or more computers, initialization instructions including instructions for copying memory objects into private memory accessible to a single computation unit before each memory analysis region identified as a critical region and inserting, by the one or more computers, finalization instructions including instructions for copying memory objects out of the private memory after each memory analysis region identified as a critical region, if the cache coherence information indicates that the system does not include circuitry to provide cache coherence, but not if the cache coherence information indicates that the system does include circuitry to provide cache coherence; andassigning sets of instructions corresponding to the memory analysis regions to respective computation units for execution on the assigned computation units, with sets of instructions that include memory access instructions belong to different equivalence classes assigned to different ones of the computation units. 17. The method of claim 16, further comprising determining whether the system provides cache coherence among respective caches of the computation units to determine the cache coherence information. 18. The method of claim 16, further comprising forming a specification of the program to be executed by the plurality of computation units based on the assigned sets of instructions.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.