IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0814882
(2010-06-14)
|
등록번호 |
US-8387034
(2013-02-26)
|
발명자
/ 주소 |
- Gordy, Robert Stephen
- Spitzer, Terry
|
출원인 / 주소 |
- Management Services Group, Inc.
|
인용정보 |
피인용 횟수 :
5 인용 특허 :
9 |
초록
▼
A Veil program analyzes the source code and/or data of an existing sequential target program without user interaction and determines how best to distribute the target program and data among the processing elements of a multi-processing element computing system. The Veil program analyzes source code
A Veil program analyzes the source code and/or data of an existing sequential target program without user interaction and determines how best to distribute the target program and data among the processing elements of a multi-processing element computing system. The Veil program analyzes source code loops, data sizes and types to prepare a set of distribution attempts or strategies, whereby each strategy is run under a run-time evaluation system and evaluated to determine the optimal decomposition and distribution across the available processing elements.
대표청구항
▼
1. A system that automatically decomposes an existing sequential program into one or more distributed programs that execute on multiple-processing elements of a computing system, the decomposition being performed without the knowledge of a user, the system comprising: a computer system comprising at
1. A system that automatically decomposes an existing sequential program into one or more distributed programs that execute on multiple-processing elements of a computing system, the decomposition being performed without the knowledge of a user, the system comprising: a computer system comprising at least one processor and a plurality of processing elements;a target program in object code form, the target program written to run on a single processing element computer system;a plurality of strategies, the strategies including one or more of parallelization strategies and distribution strategies;a program having coded therein a means for: (a) encapsulating the target program with a run-time analyzer;(b) selecting one of the strategies from the plurality of strategies;(c) parallelization and distributing the target program according to the selected one strategy;(d) executing the target program by one or more of the processing elements;(e) recording execution results of the target program; and(f) if there are further strategies in the plurality of strategies, changing the selected one strategy to a next strategy from the plurality of strategies and repeat steps a-f;(g) comparing the execution results and selecting a best parallelization strategy having a best performance-oriented result;(h) parallelizing the target application using the best parallelization strategy with the best performance-oriented result;(i) analyzing a target data on which the target program is designed to operate;(j) if the target data is a contiguous dataset then: determining a number of available processing elements;copying at least part of the target program onto the available processing elements;dividing the target data into a number of data segments, the data segments being of a fixed-size or a variable size;executing the at least part of the target program on the available processing elements, each operating upon one of the data segments;combining the results from the available processing elements;(k) if the target data is burst data then: measuring an interval between bursts;measuring a computation time of the target program;if the interval between bursts is less than the computation time then determining the number of the available processing elements and distributing at least part of the target program over the available processing elements;if the interval between bursts is not less than the computation time then executing the program on one of the available processing elements;(l) if the target data is a continuous data stream then: determining the number of the available processing elements;duplicating at least part of the target program onto the available processing elements; anddemultiplex the target data into the number of sub-streams, where the number of sub-streams matches the number of the available processing elements. 2. The system of claim 1, wherein the plurality of strategies are selected from the group consisting of detecting program iteration, detecting recursion, detecting a locality of memory reference, detecting regularity of memory references, and processor heterogeneity. 3. The system of claim 1, wherein the run-time analyzer measures/reads and records statistics related to the group consisting of virtual memory paging load, swapped memory load, network load, input/output device load, run queue length for each processing element, runtime profiles of other processes and types of input/output operations requested by the target program. 4. The system of claim 1, wherein the run-time analyzer reads and records operating system data selected from the group consisting of amount of free storage space, storage read/write speed, file protections, file creation limits, and files accessed by the target program. 5. The system of claim 1, wherein the run-time analyzer is records input/output read/write patterns requested by the target program. 6. A computer implemented method for decomposing and distributing a target program onto a plurality of processing elements without user intervention, the target program being sequential, the method comprising: (a) encapsulating the target program with a run-time analyzer;(b) selecting one strategy from a plurality of strategies, the strategies being performance oriented parallelization/distribution strategies;(c) decomposing and distributing the target program according to the one strategy;(d) executing the target program with the run-time analyzer on at least two of the plurality of processing elements;(e) recording execution results of the target program;(f) if there are further strategies in the plurality of strategies, changing the one strategy to a next strategy from the plurality of strategies and repeating steps a-f;(i) analyzing a target data on which the target program is designed to operate;(j) if the target data is a contiguous dataset then: determining a number of available processing elements;copying at least part of the target program onto the available processing elements;dividing the target data into a number of data segments;running the at least part of the target program on the available processing elements, each of the available processing elements operating upon one of the data segments;combining the results from the available processing elements;(k) if the target data is burst data then: measuring an interval between bursts;measuring a computational time of the target program;if the interval between bursts is less than the computational time then determining the number of the available processing elements and distributing the at least part of the target program over the number of available processing elements;if the interval between bursts is not less than the computational time then executing the program on one available processing elements;(l) if the target data is a continuous data stream then: determining the number of available processing elements;duplicating the at least part of the target program onto the available processing elements; anddemultiplexing the target data into a number of sub-streams, where the number of sub-streams matches the number of available processing elements. 7. The computer implemented method of claim 6, wherein the plurality of strategies are selected from the group consisting of detecting program iteration, detecting recursion, detecting a locality of memory reference, detecting regularity of memory references, and processor heterogeneity. 8. The computer implemented method of claim 7, wherein the run-time analyzer implements the steps of reading and recording operating system data selected from the group consisting of amount of free storage space, storage read/write speed, file protections, file creation limits, and files accessed by the target program. 9. The computer implemented method of claim 6, wherein the run-time analyzer includes the steps of measuring/reading and recording statistics related to the group consisting of virtual memory paging load, swapped memory load, network load, input/output device load, run queue length for each processing element, runtime profiles of other processes and types of input/output operations requested by the target program. 10. The system of claim 6, further comprising: if there still remains unassigned processing elements, then allocating the unassigned processing elements to assist the assigned processing elements as a sub-group. 11. The system of claim 6, wherein the plurality of strategies includes results-oriented strategies that achieve improved performance using fewer than all of the processing elements. 12. A system for optimizing the operation of a program in a multiple-processing element computing system, the system comprising: (a) a computer system comprising at least one processor and a plurality of processing elements;(b) a sequential target program written to run on the computer system;(c) a means for encapsulating the target program with a means for analyzing run-time performance of the target program;(d) a means for selecting one strategy from a plurality of strategies;(e) a means for decomposing and parallelizing the target program according to the one strategy;(f) a means for executing the target program with the means for analyzing on one or more of the processing elements;(g) a means for recording execution results of the target program; and(h) if there are further strategies in the plurality of strategies, a means for changing the one strategy to a next strategy from the plurality of strategies and repeat steps c-h;(i) a means for comparing the execution results and for selecting a best strategy having a best execution time;(j) a means for decomposing and parallelizing the target application using the best strategy;(k) a means for analyzing a target data on which the target program is designed to operate;(l) if the target data is a contiguous dataset then: a means for determining a number of available processing elements;a means for copying at least part of the target program onto the available processing elements;a means for dividing the target data into a number of data segments;a means for running the target program on the available processing elements, each processing elements operating upon one of the data segments;a means for combining the results from the available processing elements;(m) if the target data is burst data then: a means for measuring an interval between bursts;a means for measuring a computational time of the target program;if the interval between bursts is less than the computational time then a means for determining the number of the available processing elements and a means for distributing the target program over the available processing elements;if the interval between bursts is not less than the computational time then a means for executing the program on one of the processing elements;(n) if the target data is a continuous data stream then: a means for determining the number of available processing elements;a means for duplicating the at least part of the target program onto the available processing elements; anda means for demultiplexing the target data into a number of sub-streams, where the number of sub-streams matches the number of available processing elements. 13. The system of claim 12, wherein the plurality of strategies are selected from the group consisting of detecting program iteration, detecting recursion, detecting a locality of memory reference, detecting regularity of memory references, and processor heterogeneity. 14. The system of claim 13, wherein the run-time analyzer includes the steps of measuring/reading and recording statistics related to the group consisting of virtual memory paging load, swapped memory load, network load, input/output device load, run queue length for each processing element, runtime profiles of other processes and types of input/output operations requested by the target program. 15. The system of claim 12, wherein the run-time analyzer implements the steps of reading and recording operating system data selected from the group consisting of amount of free storage space, storage read/write speed, file protections, file creation limits, and files accessed by the target program. 16. The system of claim 12, further comprising: if there still remains unassigned processing element, then allocating the unassigned processing elements to assist the assigned processing elements as a sub-group. 17. The system of claim 12, wherein the plurality of strategies includes results-oriented strategies that achieve improved performance using fewer than all of the processing elements.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.