Resource optimization for parallel data integration
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-015/173
G06F-009/46
G06F-009/50
출원번호
US-0554867
(2009-09-04)
등록번호
US-8935702
(2015-01-13)
발명자
/ 주소
Harris, Simon David
Pu, Xiaoyan
출원인 / 주소
International Business Machines Corporation
대리인 / 주소
England, Anthony V. S.
인용정보
피인용 횟수 :
5인용 특허 :
23
초록▼
For optimizing resources for a parallel data integration job, a job request is received, which specifies a parallel data integration job to deploy in a grid. Grid resource utilizations are predicted for hypothetical runs of the specified job on respective hypothetical grid resource configurations. T
For optimizing resources for a parallel data integration job, a job request is received, which specifies a parallel data integration job to deploy in a grid. Grid resource utilizations are predicted for hypothetical runs of the specified job on respective hypothetical grid resource configurations. This includes automatically predicting grid resource utilizations by a resource optimizer module responsive to a model based on a plurality of actual runs of previous jobs. A grid resource configuration is selected for running the parallel data integration job, which includes the optimizer module automatically selecting a grid resource configuration responsive to the predicted grid resource utilizations and an optimization criterion.
대표청구항▼
1. A program product for optimizing a parallel data integration job, the program product comprising: a nontransitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising:computer readable code configured to receive a
1. A program product for optimizing a parallel data integration job, the program product comprising: a nontransitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising:computer readable code configured to receive a job request specifying a parallel data integration job to deploy in a grid, wherein the job request includes operators specifying parallel integration operations performed when the parallel data integration job is run;computer readable code configured to predict grid resource utilizations for hypothetical runs of the specified job on respective hypothetical grid resource configurations responsive to a model based on performance data from a plurality of actual runs of previously deployed, parallel data jobs; andcomputer readable code configured to select a grid resource configuration for running the parallel data integration job, including resource optimizer module computer readable code configured to automatically select a grid resource configuration responsive to the predicted grid resource utilizations and an optimization criterion based on at least one resource utilization index for the job; andcomputer readable code configured to generate the at least one resource utilization index for the job, comprising: computer readable code configured to generate resource utilization indices for each respective operator responsive to the predicted grid resource utilizations on resource portions;computer readable code configured to generate a respective operator index maximum for each respective operator;computer readable code configured to generate, for each of a respective group of the operators, a respective maximum of the operator index maxima among the operators of the respective group;computer readable code configured to select a first maximum of resource utilization indices for a first and second subset of data source and sink operator groups;computer readable code configured to select a second maximum of resource utilization indices for a first and second subset of processing and scratch operator groups; andcomputer readable code configured to generate the at least one resource utilization index for the job responsive to a ratio of the first and second maxima. 2. The program product of claim 1, comprising: computer readable code configured to generate resource utilization categories, including resource optimizer module computer readable code configured to automatically generate resource utilization categories responsive to the predicted grid resource utilizations, wherein automatically selecting the grid resource configuration responsive to the optimization criterion includes selecting the grid resource configuration responsive to the categories. 3. The program product of claim 1, comprising: computer readable code configured to generate a resource utilization index for the job responsive to a sum of the predicted grid resource utilizations for all the operators. 4. The program product of claim 1, comprising: computer readable code configured to generate correlation coefficients for the model responsive to performance data for a plurality of previous jobs actually run on respective configurations of the grid resources. 5. The program product of claim 1, wherein the computer readable code configured to select the grid resource configuration for running the parallel data integration job comprises: computer readable code configured to adjust estimated resource utilizations for operators responsive to a ratio of user-specified job execution time to estimated job execution time; andcomputer readable code configured to select a number of partitions for each operator responsive to a combined total of adjusted estimated resource utilizations on all the operator's partitions and a minimum of the adjusted estimated resource utilizations among all the operator's partitions. 6. The program product of claim 1, wherein the job request includes a data graph of linked operators specifying a sequence of parallel data integration operations performed when the parallel data integration job is run, such that each operator has one or more respective link mates, and wherein selecting the grid resource configuration for running the parallel data integration job comprises: computer readable code configured to traverse the data graph and increase numbers of partitions for operators having throughputs less than their respective link mates. 7. A computer system comprising: at least one storage system for storing a parallel data integration job resource optimization program; andat least one processor for processing the parallel data integration job resource optimization program, the system being configured with the program and the processor to:receive a job request specifying a parallel data integration job to deploy in a grid, wherein the job request includes operators specifying parallel integration operations performed when the parallel data integration job is run;predict grid resource utilizations for hypothetical runs of the specified job on respective hypothetical grid resource configurations responsive to a model based on a performance data from plurality of actual runs of previously deployed, parallel data jobs;select a grid resource configuration for running the parallel data integration job, including an optimizer module automatically selecting a grid resource configuration responsive to the predicted grid resource utilizations and an optimization criterion based on at least one resource utilization index for the job; andgenerate the at least one resource utilization index for the job, comprising: generate resource utilization indices for each respective operator responsive to the predicted grid resource utilizations on resource portions;generate a respective operator index maximum for each respective operator;generate, for each of a respective group of the operators, a respective maximum of the operator index maxima among the operators of the respective group;select a first maximum of resource utilization indices for a first and second subset of data source and sink operator groups;select a second maximum of resource utilization indices for a first and second subset of processing and scratch operator groups; andgenerate the at least one resource utilization index for the job responsive to a ratio of the first and second maxima. 8. The computer system of claim 7, the system being configured with the program and the processor to generate resource utilization categories, including the optimizer module automatically generating resource utilization categories responsive to the predicted grid resource utilizations, wherein the optimizer module automatically selecting the grid resource configuration responsive to the optimization criterion includes selecting the grid resource configuration responsive to the categories. 9. The computer system of claim 7, the system being configured with the program and the processor to generate a resource utilization index for the job responsive to a sum of the predicted grid resource utilizations for all the operators. 10. The computer system of claim 7, wherein the system being configured with the program and the processor to select the grid resource configuration for running the parallel data integration job comprise the system being configured to adjust estimated resource utilizations for operators responsive to a ratio of user-specified job execution time to estimated job execution time, and select a number of partitions for each operator responsive to a combined total of adjusted estimated resource utilizations on all the operator's partitions and a minimum of the adjusted estimated resource utilizations among all the operator's partitions. 11. The computer system of claim 7, wherein the job request includes a data graph of linked operators specifying a sequence of parallel data integration operations performed when the parallel data integration job is run, such that each operator has one or more respective link mates, and wherein the system being configured with the program and the processor to select the grid resource configuration for running the parallel data integration job comprise the system being configured to traverse the data graph and increase numbers of partitions for operators having throughputs less than their respective link mates.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (23)
Pu, Xiaoyan, Apparatus, system, and method for generating a resource utilization description for a parallel data processing system.
Miller, Ian D.; Janneck, Jorn W.; Parlour, David B.; Schumacher, Paul R., Automated method of architecture mapping selection from constrained high level language description via element characterization.
Al-Hilali Hilal ; Clarke Perry ; Guimbellot David Edward ; Howell David Andrew, Method and computer program product for estimating total resource usage requirements of a server application in a hypothetical user configuration.
Murphy,Richard C.; Carter,Scott M.; Ornelas,Mario G.; Deshpande,Shrikant, System and method for dynamic resource configuration using a dependency graph.
Al-omari Awny K. ; Leslie Harry A. ; Fridrich Marek J., System and method for reducing compile time in a top down rule based system using rule heuristics based upon the predicted resulting data flow.
Bradley Lewis, User interface for developing and executing data flow programs and methods, apparatus, and articles of manufacture for optimizing the execution of data flow programs.
Christensen, Aaron; Vranyes, Steve A.; Gresham, Jed; Pierson, Samuel, Systems and methods for providing access to data sets owned by different entities.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.