[특허]Mechanism for increasing parallelization in computer programs with read-after-write dependencies associated with prefix operations

Mechanism for increasing parallelization in computer programs with read-after-write dependencies associated with prefix operations 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-009/46 G06F-009/45
출원번호	US-0493538 (2009-06-29)
등록번호	US-8949852 (2015-02-03)
발명자 / 주소	Cypher, Robert E.
출원인 / 주소	Oracle America, Inc.
대리인 / 주소	Park, Vaughan, Fleming & Dowler LLP
인용정보	피인용 횟수 : 0 인용 특허 : 6

초록 ▼

Some embodiments provide a system that increases parallelization in a computer program. During operation, the system obtains a binary associative operator and a ordered set of elements associated with a prefix operation in the computer program. Next, the system divides the elements into multiple sets of contiguous iterations based on a number of processors used to execute the computer program. The system then performs, in parallel on the processors, a set of local reductions on the contiguous iterations using the binary associative operator. Afterwards, the system calculates a set of boundary prefixes between the contiguous iterations using the local reductions. Finally, the system applies, in parallel on the processors, the boundary prefixes to the contiguous iterations using the binary associative operator to obtain a set of prefixes for the prefix operation.

대표청구항 ▼

1. A computer-implemented method for increasing parallelization in a computer program, comprising: obtaining a binary associative operator and an ordered set of elements associated with a prefix operation in the computer program;dividing the elements into multiple sets of contiguous iterations based on a number of processors used to execute the computer program;calculating, in parallel on the processors, a set of local reductions from the contiguous iterations, wherein each local reduction in the set is calculated by applying the binary associative operator between all elements in a corresponding contiguous iteration from the set of the contiguous iterations;for each given local reduction in a subset of the local reductions, calculating a first boundary prefix for the given local reduction by using the given local reduction and a second boundary prefix for a second local reduction in a second subset that precedes the given local reduction, wherein the second boundary prefix is calculated from the second local reduction and a third boundary prefix for a third local reduction in a third subset that precedes the second subset; andobtaining a set of prefixes for the ordered set of elements by applying, in parallel on the processors, the boundary prefixes to the contiguous iterations using the binary associative operator. 2. The computer-implemented method of claim 1, wherein the prefix operation is performed within a loop in the computer program. 3. The computer-implemented method of claim 1, wherein dividing the elements into multiple sets of contiguous iterations involves at least one of: dividing the elements substantially equally between the processors; anddividing the elements between the processors using a load-balancing technique. 4. The computer-implemented method of claim 1, wherein the boundary prefixes are calculated in parallel or sequentially. 5. The computer-implemented method of claim 1, wherein each of the elements corresponds to a tuple. 6. The computer-implemented method of claim 1, wherein the parallelization is provided by at least one of a compiler and a virtual machine. 7. The computer-implemented method of claim 1, wherein the binary associative operator corresponds to addition, multiplication, maximum, minimum, a binary logical operator, a carry generate, a carry propagate, matrix multiplication, and finite state machine evaluation. 8. The computer-implemented method of claim 1, wherein the contiguous iterations are arranged in a sequence so that, for each contiguous iteration, an order of the contiguous iteration in the sequence is based on an order, in the ordered set of elements, of the elements that the contiguous iteration includes, wherein the local reductions are arranged in a sequence based on the sequence for the contiguous iterations so that, for each of the local reductions, an order for the local reduction in the sequence of the local reductions corresponds to an order of the contiguous iteration from which the local reduction is calculated,wherein the subset of the local reductions excludes a first local reduction in the sequence of the local reductions, andwherein using the local reduction in the subset that precedes the given local reduction comprises using a local reduction in the subset that immediately precedes the given local reduction in the sequence of the local reductions. 9. The computer-implemented method of claim 1, wherein using the given local reduction and the boundary prefix for the local reduction that precedes the given local reduction comprises applying the binary associative operator between the given local reduction and the boundary prefix for the local reduction that precedes the given local reduction, and wherein the boundary prefix for the given local reduction corresponds to a prefix in the set of prefixes that is between the contiguous iteration for the given local reduction and the contiguous iteration for the local reduction that precedes the given local reduction. 10. The computer-implemented method of claim 1, wherein the set of contiguous iterations comprise a first, a second, and a third contiguous iteration, wherein the set of local reductions comprises a first, a second, and a third local reduction, wherein the first local reduction is calculated by applying the binary associative operator between all of the elements that are in the first contiguous iteration, wherein the second local reduction is calculated by applying the binary associative operator between all of the elements that are in the second contiguous iteration, and wherein the third local reduction is calculated by applying the binary associative operator between all of the elements that are in the third contiguous iteration,wherein calculating the boundary prefixes comprises calculating a first and a second boundary prefix, wherein the first boundary prefix is copied from the first local reduction, and wherein the second boundary prefix is calculated by applying the binary associative operator between the first boundary prefix and the second local reduction,wherein the set of prefixes comprises a first, a second, and a third subset of prefixes, wherein the first subset of prefixes is obtained from the local reduction, wherein the second subset of prefixes is obtained by applying the first boundary prefix to the second local reduction, and wherein the third subset of prefixes is obtained by applying the second boundary prefix to the third local reduction. 11. A system for increasing parallelization in a computer program, comprising: a set of processors configured to execute the computer program; anda parallelization apparatus configured to: obtain a binary associative operator and an ordered set of elements associated with a prefix operation in the computer program;divide the elements into multiple sets of contiguous iterations associated with the processors;calculate, in parallel on the processors, a set of local reductions from the contiguous operations, wherein the parallelization apparatus is configured to calculate each local reduction in the set by applying the binary associative operator between all elements in a corresponding contiguous iteration from the set of the contiguous iterations;for each given local reduction in a subset of the local reductions, calculate a first boundary prefix for the given local reduction by using the given local reduction and a second boundary prefix for a second local reduction in a second subset that precedes the given local reduction, wherein the second boundary prefix is calculated from the second local reduction and a third boundary prefix for a third local reduction in a third subset that precedes the second subset; andobtain a set of prefixes for the ordered set of elements by applying, in parallel on the processors, the boundary prefixes to the contiguous iterations using the binary associative operator. 12. The system of claim 11, wherein the prefix operation is performed within a loop in the computer program. 13. The system of claim 11, wherein dividing the elements into multiple sets of contiguous iterations involves at least one of: dividing the elements substantially equally between the processors; anddividing the elements between the processors using a load-balancing technique. 14. The system of claim 11, wherein the boundary prefixes are calculated in parallel or sequentially. 15. The system of claim 11, wherein each of the elements corresponds to a tuple. 16. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for increasing parallelization in a computer program, the method comprising: obtaining a binary associative operator and an ordered set of elements associated with a prefix operation in the computer program;dividing the elements into multiple sets of contiguous iterations based on a number of processors used to execute the computer program;calculating, in parallel on the processors, a set of local reductions from the contiguous iterations, wherein each local reduction in the set is calculated by applying the binary associative operator between all elements in a corresponding contiguous iteration from the set of the contiguous iterations;for each given local reduction in a subset of the local reductions, calculating a first boundary prefix for the given local reduction by using the given local reduction and a second boundary prefix for a second local reduction in a second subset that precedes the given local reduction, wherein the second boundary prefix is calculated from the second local reduction and a third boundary prefix for a third local reduction in a third subset that precedes the second subset; andobtaining a set of prefixes for the ordered set of elements by applying, in parallel on the processors, the boundary prefixes to the contiguous iterations using the binary associative operator. 17. The computer-readable storage medium of claim 16, wherein the prefix operation is performed within a loop in the computer program. 18. The computer-readable storage medium of claim 16, wherein dividing the elements into multiple sets of contiguous iterations involves at least one of: dividing the elements substantially equally between the processors; anddividing the elements between the processors using a load-balancing technique. 19. The computer-readable storage medium of claim 16, wherein the boundary prefixes are calculated in parallel or sequentially. 20. The computer-readable storage medium of claim 16, wherein the parallelization is provided by at least one of a compiler and a virtual machine. 21. The computer-readable storage medium of claim 16, wherein the binary associative operator corresponds to addition, multiplication, maximum, minimum, a binary logical operator, a carry generate, a carry propagate, matrix multiplication, and finite state machine evaluation.

이 특허에 인용된 특허 (6)

Hardwick Jonathan C.,GBX, Dynamic load balancing among processors in a parallel computer.
상세보기
Archambault, Roch G.; Gao, Yaoqing; Ren, Zhixing; Silvera, Raul E., Framework for parallelizing general reduction.
상세보기
DeHon, Andre M.; Kapre, Nachiket, Method and a circuit using an associative calculator for calculating a sequence of non-associative operations.
상세보기
Sundaresan Neelakantan, Method of, system for, and article of manufacture for providing a generic reduction object for data parallelism.
상세보기
Wilkinson Paul Amba ; Dieffenderfer James Warren ; Kogge Peter Michael ; Schoonover Nicholas Jerome, SIMD/MIMD array processor with vector processing.
상세보기
Le Grand, Scott M., Work-efficient parallel prefix sum algorithm for graphics processing units.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Mechanism for increasing parallelization in computer programs with read-after-write dependencies associated with prefix operations 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (6)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Mechanism for increasing parallelization in computer programs with read-after-write dependencies associated with prefix operations 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (6)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트