Mechanism for increasing parallelization in computer programs with read-after-write dependencies associated with prefix operations
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-009/46
G06F-009/45
출원번호
US-0493538
(2009-06-29)
등록번호
US-8949852
(2015-02-03)
발명자
/ 주소
Cypher, Robert E.
출원인 / 주소
Oracle America, Inc.
대리인 / 주소
Park, Vaughan, Fleming & Dowler LLP
인용정보
피인용 횟수 :
0인용 특허 :
6
초록▼
Some embodiments provide a system that increases parallelization in a computer program. During operation, the system obtains a binary associative operator and a ordered set of elements associated with a prefix operation in the computer program. Next, the system divides the elements into multiple set
Some embodiments provide a system that increases parallelization in a computer program. During operation, the system obtains a binary associative operator and a ordered set of elements associated with a prefix operation in the computer program. Next, the system divides the elements into multiple sets of contiguous iterations based on a number of processors used to execute the computer program. The system then performs, in parallel on the processors, a set of local reductions on the contiguous iterations using the binary associative operator. Afterwards, the system calculates a set of boundary prefixes between the contiguous iterations using the local reductions. Finally, the system applies, in parallel on the processors, the boundary prefixes to the contiguous iterations using the binary associative operator to obtain a set of prefixes for the prefix operation.
대표청구항▼
1. A computer-implemented method for increasing parallelization in a computer program, comprising: obtaining a binary associative operator and an ordered set of elements associated with a prefix operation in the computer program;dividing the elements into multiple sets of contiguous iterations based
1. A computer-implemented method for increasing parallelization in a computer program, comprising: obtaining a binary associative operator and an ordered set of elements associated with a prefix operation in the computer program;dividing the elements into multiple sets of contiguous iterations based on a number of processors used to execute the computer program;calculating, in parallel on the processors, a set of local reductions from the contiguous iterations, wherein each local reduction in the set is calculated by applying the binary associative operator between all elements in a corresponding contiguous iteration from the set of the contiguous iterations;for each given local reduction in a subset of the local reductions, calculating a first boundary prefix for the given local reduction by using the given local reduction and a second boundary prefix for a second local reduction in a second subset that precedes the given local reduction, wherein the second boundary prefix is calculated from the second local reduction and a third boundary prefix for a third local reduction in a third subset that precedes the second subset; andobtaining a set of prefixes for the ordered set of elements by applying, in parallel on the processors, the boundary prefixes to the contiguous iterations using the binary associative operator. 2. The computer-implemented method of claim 1, wherein the prefix operation is performed within a loop in the computer program. 3. The computer-implemented method of claim 1, wherein dividing the elements into multiple sets of contiguous iterations involves at least one of: dividing the elements substantially equally between the processors; anddividing the elements between the processors using a load-balancing technique. 4. The computer-implemented method of claim 1, wherein the boundary prefixes are calculated in parallel or sequentially. 5. The computer-implemented method of claim 1, wherein each of the elements corresponds to a tuple. 6. The computer-implemented method of claim 1, wherein the parallelization is provided by at least one of a compiler and a virtual machine. 7. The computer-implemented method of claim 1, wherein the binary associative operator corresponds to addition, multiplication, maximum, minimum, a binary logical operator, a carry generate, a carry propagate, matrix multiplication, and finite state machine evaluation. 8. The computer-implemented method of claim 1, wherein the contiguous iterations are arranged in a sequence so that, for each contiguous iteration, an order of the contiguous iteration in the sequence is based on an order, in the ordered set of elements, of the elements that the contiguous iteration includes, wherein the local reductions are arranged in a sequence based on the sequence for the contiguous iterations so that, for each of the local reductions, an order for the local reduction in the sequence of the local reductions corresponds to an order of the contiguous iteration from which the local reduction is calculated,wherein the subset of the local reductions excludes a first local reduction in the sequence of the local reductions, andwherein using the local reduction in the subset that precedes the given local reduction comprises using a local reduction in the subset that immediately precedes the given local reduction in the sequence of the local reductions. 9. The computer-implemented method of claim 1, wherein using the given local reduction and the boundary prefix for the local reduction that precedes the given local reduction comprises applying the binary associative operator between the given local reduction and the boundary prefix for the local reduction that precedes the given local reduction, and wherein the boundary prefix for the given local reduction corresponds to a prefix in the set of prefixes that is between the contiguous iteration for the given local reduction and the contiguous iteration for the local reduction that precedes the given local reduction. 10. The computer-implemented method of claim 1, wherein the set of contiguous iterations comprise a first, a second, and a third contiguous iteration, wherein the set of local reductions comprises a first, a second, and a third local reduction, wherein the first local reduction is calculated by applying the binary associative operator between all of the elements that are in the first contiguous iteration, wherein the second local reduction is calculated by applying the binary associative operator between all of the elements that are in the second contiguous iteration, and wherein the third local reduction is calculated by applying the binary associative operator between all of the elements that are in the third contiguous iteration,wherein calculating the boundary prefixes comprises calculating a first and a second boundary prefix, wherein the first boundary prefix is copied from the first local reduction, and wherein the second boundary prefix is calculated by applying the binary associative operator between the first boundary prefix and the second local reduction,wherein the set of prefixes comprises a first, a second, and a third subset of prefixes, wherein the first subset of prefixes is obtained from the local reduction, wherein the second subset of prefixes is obtained by applying the first boundary prefix to the second local reduction, and wherein the third subset of prefixes is obtained by applying the second boundary prefix to the third local reduction. 11. A system for increasing parallelization in a computer program, comprising: a set of processors configured to execute the computer program; anda parallelization apparatus configured to: obtain a binary associative operator and an ordered set of elements associated with a prefix operation in the computer program;divide the elements into multiple sets of contiguous iterations associated with the processors;calculate, in parallel on the processors, a set of local reductions from the contiguous operations, wherein the parallelization apparatus is configured to calculate each local reduction in the set by applying the binary associative operator between all elements in a corresponding contiguous iteration from the set of the contiguous iterations;for each given local reduction in a subset of the local reductions, calculate a first boundary prefix for the given local reduction by using the given local reduction and a second boundary prefix for a second local reduction in a second subset that precedes the given local reduction, wherein the second boundary prefix is calculated from the second local reduction and a third boundary prefix for a third local reduction in a third subset that precedes the second subset; andobtain a set of prefixes for the ordered set of elements by applying, in parallel on the processors, the boundary prefixes to the contiguous iterations using the binary associative operator. 12. The system of claim 11, wherein the prefix operation is performed within a loop in the computer program. 13. The system of claim 11, wherein dividing the elements into multiple sets of contiguous iterations involves at least one of: dividing the elements substantially equally between the processors; anddividing the elements between the processors using a load-balancing technique. 14. The system of claim 11, wherein the boundary prefixes are calculated in parallel or sequentially. 15. The system of claim 11, wherein each of the elements corresponds to a tuple. 16. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for increasing parallelization in a computer program, the method comprising: obtaining a binary associative operator and an ordered set of elements associated with a prefix operation in the computer program;dividing the elements into multiple sets of contiguous iterations based on a number of processors used to execute the computer program;calculating, in parallel on the processors, a set of local reductions from the contiguous iterations, wherein each local reduction in the set is calculated by applying the binary associative operator between all elements in a corresponding contiguous iteration from the set of the contiguous iterations;for each given local reduction in a subset of the local reductions, calculating a first boundary prefix for the given local reduction by using the given local reduction and a second boundary prefix for a second local reduction in a second subset that precedes the given local reduction, wherein the second boundary prefix is calculated from the second local reduction and a third boundary prefix for a third local reduction in a third subset that precedes the second subset; andobtaining a set of prefixes for the ordered set of elements by applying, in parallel on the processors, the boundary prefixes to the contiguous iterations using the binary associative operator. 17. The computer-readable storage medium of claim 16, wherein the prefix operation is performed within a loop in the computer program. 18. The computer-readable storage medium of claim 16, wherein dividing the elements into multiple sets of contiguous iterations involves at least one of: dividing the elements substantially equally between the processors; anddividing the elements between the processors using a load-balancing technique. 19. The computer-readable storage medium of claim 16, wherein the boundary prefixes are calculated in parallel or sequentially. 20. The computer-readable storage medium of claim 16, wherein the parallelization is provided by at least one of a compiler and a virtual machine. 21. The computer-readable storage medium of claim 16, wherein the binary associative operator corresponds to addition, multiplication, maximum, minimum, a binary logical operator, a carry generate, a carry propagate, matrix multiplication, and finite state machine evaluation.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (6)
Hardwick Jonathan C.,GBX, Dynamic load balancing among processors in a parallel computer.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.