[특허]System and method for the distribution of a program among cooperating processing elements

System and method for the distribution of a program among cooperating processing elements 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-009/45
출원번호	US-0814882 (2010-06-14)
등록번호	US-8387034 (2013-02-26)
발명자 / 주소	Gordy, Robert Stephen Spitzer, Terry
출원인 / 주소	Management Services Group, Inc.
인용정보	피인용 횟수 : 5 인용 특허 : 9

초록 ▼

A Veil program analyzes the source code and/or data of an existing sequential target program without user interaction and determines how best to distribute the target program and data among the processing elements of a multi-processing element computing system. The Veil program analyzes source code loops, data sizes and types to prepare a set of distribution attempts or strategies, whereby each strategy is run under a run-time evaluation system and evaluated to determine the optimal decomposition and distribution across the available processing elements.

대표청구항 ▼

1. A system that automatically decomposes an existing sequential program into one or more distributed programs that execute on multiple-processing elements of a computing system, the decomposition being performed without the knowledge of a user, the system comprising: a computer system comprising at least one processor and a plurality of processing elements;a target program in object code form, the target program written to run on a single processing element computer system;a plurality of strategies, the strategies including one or more of parallelization strategies and distribution strategies;a program having coded therein a means for: (a) encapsulating the target program with a run-time analyzer;(b) selecting one of the strategies from the plurality of strategies;(c) parallelization and distributing the target program according to the selected one strategy;(d) executing the target program by one or more of the processing elements;(e) recording execution results of the target program; and(f) if there are further strategies in the plurality of strategies, changing the selected one strategy to a next strategy from the plurality of strategies and repeat steps a-f;(g) comparing the execution results and selecting a best parallelization strategy having a best performance-oriented result;(h) parallelizing the target application using the best parallelization strategy with the best performance-oriented result;(i) analyzing a target data on which the target program is designed to operate;(j) if the target data is a contiguous dataset then: determining a number of available processing elements;copying at least part of the target program onto the available processing elements;dividing the target data into a number of data segments, the data segments being of a fixed-size or a variable size;executing the at least part of the target program on the available processing elements, each operating upon one of the data segments;combining the results from the available processing elements;(k) if the target data is burst data then: measuring an interval between bursts;measuring a computation time of the target program;if the interval between bursts is less than the computation time then determining the number of the available processing elements and distributing at least part of the target program over the available processing elements;if the interval between bursts is not less than the computation time then executing the program on one of the available processing elements;(l) if the target data is a continuous data stream then: determining the number of the available processing elements;duplicating at least part of the target program onto the available processing elements; anddemultiplex the target data into the number of sub-streams, where the number of sub-streams matches the number of the available processing elements. 2. The system of claim 1, wherein the plurality of strategies are selected from the group consisting of detecting program iteration, detecting recursion, detecting a locality of memory reference, detecting regularity of memory references, and processor heterogeneity. 3. The system of claim 1, wherein the run-time analyzer measures/reads and records statistics related to the group consisting of virtual memory paging load, swapped memory load, network load, input/output device load, run queue length for each processing element, runtime profiles of other processes and types of input/output operations requested by the target program. 4. The system of claim 1, wherein the run-time analyzer reads and records operating system data selected from the group consisting of amount of free storage space, storage read/write speed, file protections, file creation limits, and files accessed by the target program. 5. The system of claim 1, wherein the run-time analyzer is records input/output read/write patterns requested by the target program. 6. A computer implemented method for decomposing and distributing a target program onto a plurality of processing elements without user intervention, the target program being sequential, the method comprising: (a) encapsulating the target program with a run-time analyzer;(b) selecting one strategy from a plurality of strategies, the strategies being performance oriented parallelization/distribution strategies;(c) decomposing and distributing the target program according to the one strategy;(d) executing the target program with the run-time analyzer on at least two of the plurality of processing elements;(e) recording execution results of the target program;(f) if there are further strategies in the plurality of strategies, changing the one strategy to a next strategy from the plurality of strategies and repeating steps a-f;(i) analyzing a target data on which the target program is designed to operate;(j) if the target data is a contiguous dataset then: determining a number of available processing elements;copying at least part of the target program onto the available processing elements;dividing the target data into a number of data segments;running the at least part of the target program on the available processing elements, each of the available processing elements operating upon one of the data segments;combining the results from the available processing elements;(k) if the target data is burst data then: measuring an interval between bursts;measuring a computational time of the target program;if the interval between bursts is less than the computational time then determining the number of the available processing elements and distributing the at least part of the target program over the number of available processing elements;if the interval between bursts is not less than the computational time then executing the program on one available processing elements;(l) if the target data is a continuous data stream then: determining the number of available processing elements;duplicating the at least part of the target program onto the available processing elements; anddemultiplexing the target data into a number of sub-streams, where the number of sub-streams matches the number of available processing elements. 7. The computer implemented method of claim 6, wherein the plurality of strategies are selected from the group consisting of detecting program iteration, detecting recursion, detecting a locality of memory reference, detecting regularity of memory references, and processor heterogeneity. 8. The computer implemented method of claim 7, wherein the run-time analyzer implements the steps of reading and recording operating system data selected from the group consisting of amount of free storage space, storage read/write speed, file protections, file creation limits, and files accessed by the target program. 9. The computer implemented method of claim 6, wherein the run-time analyzer includes the steps of measuring/reading and recording statistics related to the group consisting of virtual memory paging load, swapped memory load, network load, input/output device load, run queue length for each processing element, runtime profiles of other processes and types of input/output operations requested by the target program. 10. The system of claim 6, further comprising: if there still remains unassigned processing elements, then allocating the unassigned processing elements to assist the assigned processing elements as a sub-group. 11. The system of claim 6, wherein the plurality of strategies includes results-oriented strategies that achieve improved performance using fewer than all of the processing elements. 12. A system for optimizing the operation of a program in a multiple-processing element computing system, the system comprising: (a) a computer system comprising at least one processor and a plurality of processing elements;(b) a sequential target program written to run on the computer system;(c) a means for encapsulating the target program with a means for analyzing run-time performance of the target program;(d) a means for selecting one strategy from a plurality of strategies;(e) a means for decomposing and parallelizing the target program according to the one strategy;(f) a means for executing the target program with the means for analyzing on one or more of the processing elements;(g) a means for recording execution results of the target program; and(h) if there are further strategies in the plurality of strategies, a means for changing the one strategy to a next strategy from the plurality of strategies and repeat steps c-h;(i) a means for comparing the execution results and for selecting a best strategy having a best execution time;(j) a means for decomposing and parallelizing the target application using the best strategy;(k) a means for analyzing a target data on which the target program is designed to operate;(l) if the target data is a contiguous dataset then: a means for determining a number of available processing elements;a means for copying at least part of the target program onto the available processing elements;a means for dividing the target data into a number of data segments;a means for running the target program on the available processing elements, each processing elements operating upon one of the data segments;a means for combining the results from the available processing elements;(m) if the target data is burst data then: a means for measuring an interval between bursts;a means for measuring a computational time of the target program;if the interval between bursts is less than the computational time then a means for determining the number of the available processing elements and a means for distributing the target program over the available processing elements;if the interval between bursts is not less than the computational time then a means for executing the program on one of the processing elements;(n) if the target data is a continuous data stream then: a means for determining the number of available processing elements;a means for duplicating the at least part of the target program onto the available processing elements; anda means for demultiplexing the target data into a number of sub-streams, where the number of sub-streams matches the number of available processing elements. 13. The system of claim 12, wherein the plurality of strategies are selected from the group consisting of detecting program iteration, detecting recursion, detecting a locality of memory reference, detecting regularity of memory references, and processor heterogeneity. 14. The system of claim 13, wherein the run-time analyzer includes the steps of measuring/reading and recording statistics related to the group consisting of virtual memory paging load, swapped memory load, network load, input/output device load, run queue length for each processing element, runtime profiles of other processes and types of input/output operations requested by the target program. 15. The system of claim 12, wherein the run-time analyzer implements the steps of reading and recording operating system data selected from the group consisting of amount of free storage space, storage read/write speed, file protections, file creation limits, and files accessed by the target program. 16. The system of claim 12, further comprising: if there still remains unassigned processing element, then allocating the unassigned processing elements to assist the assigned processing elements as a sub-group. 17. The system of claim 12, wherein the plurality of strategies includes results-oriented strategies that achieve improved performance using fewer than all of the processing elements.

이 특허에 인용된 특허 (9)

Suraj C. Kothari ; Mitra Simanta ; Youngtae Kim, Apparatus and method for parallelizing legacy computer code.
상세보기
Anthony Passera ; John R. Thorp ; Michael J. Beckerle ; Edward S. Zyszkowski, Computer system and computerized method for partitioning data for parallel processing.
상세보기
Campbell Michael J. (Los Angeles CA) Finn Dennis J. (Los Angeles CA) Tucker George K. (Los Angeles CA) Vahey Michael D. (Manhattan Beach CA) Vedder Rex W. (Playa del Rey CA), Data-flow multiprocessor architecture with three dimensional multistage interconnection network for efficient signal and.
상세보기
Hardwick Jonathan C.,GBX, Dynamic load balancing among processors in a parallel computer.
상세보기
Breslau Franklin Charles ; Greenstein Paul Gregory ; Rodell John Ted, Method and system for compiling sections of a computer program for multiple execution environments.
상세보기
Iwasawa Kyoko,JPX ; Kurosawa Takashi,JPX ; Kikuchi Sumio,JPX, Method for supporting parallelization of source program.
상세보기
Seki Mitsuho (Ohaza JPX) Ikeda Mitsuji (Katsuta JPX) Kiyoshige Yoshikazu (Hitachi JPX), Method of processing a program by parallel processing, and a processing unit thereof.
상세보기
Blelloch Guy E. ; Gibbons Phillip B. ; Matias Yossi, Methods and means for scheduling parallel processors.
상세보기
Rechtschaffen Rudolph N. (Scarsdale NY) Ekanadham Kattamuri (Yorktown Heights NY), Self-parallelizing computer system and method.
상세보기

이 특허를 인용한 특허 (5)

Lu, Jiwei Oliver; Yamada, Koichi; Beany, James D.; Shanmugavelayutham, Palaniverlrajan; Zhang, Bo, Method and apparatus for page-level monitoring.
상세보기
Bokka, Ramakrishna V., System and method for receiving services provided by distributed systems.
상세보기
Sager, David J.; Sasanka, Ruchira; Gabor, Ron; Raikin, Shlomo; Nuzman, Joseph; Peled, Leeor; Domer, Jason A.; Kim, Ho-Seop; Wu, Youfeng; Yamada, Koichi; Ngai, Tin-Fook; Chen, Howard H.; Bobba, Jayaram; Cook, Jeffery J.; Shaikh, Omar M.; Srinivas, Suresh, Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program to multiple parallel threads.
상세보기
Sasanka, Ruchira; Das, Abhinav; Cook, Jeffrey J.; Bobba, Jayaram; Krishnaswamy, Arvind; Sager, David J.; Srinivas, Suresh, Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program to multiple parallel threads.
상세보기
Bobba, Jayaram; Sasanka, Ruchira; Cook, Jeffrey J.; Das, Abhinav; Krishnaswamy, Arvind; Sager, David J.; Agron, Jason M., Using control flow data structures to direct and track instruction execution.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

System and method for the distribution of a program among cooperating processing elements 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (9)

이 특허를 인용한 특허 (5)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

System and method for the distribution of a program among cooperating processing elements 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (9)

이 특허를 인용한 특허 (5)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트