System and method for limiting the impact of stragglers in large-scale parallel data processing
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-009/30
G06F-015/00
G06F-009/50
G06F-007/38
G06F-009/00
G06F-009/44
G06F-009/54
G06F-017/30
출원번호
US-0727753
(2015-06-01)
등록번호
US-9396036
(2016-07-19)
발명자
/ 주소
Malewicz, Grzegorz
Dvorsky, Marian
Colohan, Christopher B.
Thomson, Derek P.
Levenberg, Joshua Louis
출원인 / 주소
GOOGLE INC.
대리인 / 주소
Morgan, Lewis & Bockius LLP
인용정보
피인용 횟수 :
0인용 특허 :
27
초록▼
A large-scale data processing system and method including a plurality of processes, wherein a master process assigns input data blocks to respective map processes and partitions of intermediate data are assigned to respective reduce processes. In each of the plurality of map processes an application
A large-scale data processing system and method including a plurality of processes, wherein a master process assigns input data blocks to respective map processes and partitions of intermediate data are assigned to respective reduce processes. In each of the plurality of map processes an application-independent map program retrieves a sequence of input data blocks assigned thereto by the master process and applies an application-specific map function to each input data block in the sequence to produce the intermediate data and stores the intermediate data in high speed memory of the interconnected processors. Each of the plurality of reduce processes receives a respective partition of the intermediate data from the high speed memory of the interconnected processors while the map processes continue to process input data blocks an application-specific reduce function is applied to the respective partition of the intermediate data to produce output values.
대표청구항▼
1. A method of performing a large-scale data processing job, comprising: executing a plurality of processes on a plurality of interconnected processors, the plurality of processes including a master process for coordinating a data processing job for processing a set of input data, a plurality of map
1. A method of performing a large-scale data processing job, comprising: executing a plurality of processes on a plurality of interconnected processors, the plurality of processes including a master process for coordinating a data processing job for processing a set of input data, a plurality of map processes, and a plurality of reduce processes;in the master process, assigning input data blocks of a set of input data to respective map processes of the plurality of map processes, assigning partitions of intermediate data to respective reduce processes of the plurality of reduce processes, and determining when the data processing job has reached a predefined level of completion;in each of the plurality of map processes: executing an application-independent map program to retrieve a sequence of input data blocks assigned thereto by the master process and to apply an application-specific map function to each input data block in the sequence to produce the intermediate data; andstoring the intermediate data in memory of the interconnected processors;in each of the plurality of reduce processes: receiving a respective partition of the intermediate data from the memory of the interconnected processors; andapplying an application-specific reduce function to the respective partition of the intermediate data to produce output values;in accordance with a determination that processing of an identified partition of the intermediate data, identified using predefined criteria, is likely to delay the data processing job, taking a remedial action with respect to the identified partition; andin accordance with the master process determining that the data processing job has reached the predefined level of completion, scheduling a backup execution of a respective remaining task, anddetermining that the respective remaining task is completed when either the respective remaining task or the backup execution of the respective remaining task completes. 2. The method of claim 1, wherein the predefined criteria used to determine that processing of the identified partition of the intermediate data is likely to delay the data processing job includes determining the size of the partition of the intermediate data relative to the size of other partitions of the intermediate data in the data processing job. 3. The method of claim 1, wherein the predefined criteria used to determine that processing of the identified partition of the intermediate data is likely to delay the data processing job includes identifying a partition of the intermediate data that is substantially larger than an average partition size of the intermediate data. 4. The method of claim 1, including, in a respective reduce process: receiving multiple distinct partitions of the intermediate data; andprocessing the multiple partitions one at a time in succession. 5. The method of claim 1, including, in a respective reduce process, receiving a respective partition of the intermediate data from the memory of the interconnected processors while at least one map process that produces the intermediate data received by the respective reduce process continues executing the application-independent map program to retrieve the sequence of input data blocks assigned thereto by the master process and to apply the application-specific map function to input data blocks in the sequence to produce the intermediate data received by the respective reduce process. 6. A system for large-scale processing of data, comprising: memory;a plurality of interconnected processors; andone or more modules stored in the memory and executed by the plurality of interconnected processors, the one or more modules including instructions to:execute a plurality of processes on the plurality of interconnected processors, the plurality of processes including a master process for coordinating a data processing job for processing a set of input data, plurality of map processes, and a plurality of reduce processes;in the master process, assign input data blocks of a set of input data to respective map processes of the plurality of map processes, assign partitions of intermediate data to respective reduce processes of the plurality of reduce processes, and determine when the data processing job has reached a predefined level of completion;in each of the plurality of map processes: execute an application-independent map program to retrieve a sequence of input data blocks assigned thereto by the master process and to apply an application-specific map function to each input data block in the sequence to produce the intermediate data; andstore the intermediate data in memory of the interconnected processors;in each of the plurality of reduce processes: receive a respective partition of the intermediate data from the memory of the interconnected processors; andapply an application-specific reduce function to the respective partition of the intermediate data to produce output values;in accordance with a determination that processing of an identified partition of the intermediate data, identified using predefined criteria, is likely to delay the data processing job, take a remedial action with respect to the identified partition; andin accordance with the master process determining that the data processing job has reached the predefined level of completion, schedule a backup execution of a respective remaining task, anddetermine that the respective remaining task is completed when either the respective remaining task or the backup execution of the respective remaining task completes. 7. The system of claim 6, wherein the predefined criteria used to determine that processing of the identified partition of the intermediate data is likely to delay the data processing job includes determining the size of the partition of the intermediate data relative to the size of other partitions of the intermediate data in the data processing job. 8. The system of claim 6, wherein the predefined criteria used to determine that processing of the identified partition of the intermediate data is likely to delay the data processing job includes a partition of the intermediate data that is substantially larger than an average partition size of the intermediate data. 9. The system of claim 6, wherein the one or more modules include instructions to, in a respective reduce process: receive multiple distinct partitions of the intermediate data; andprocess the multiple partitions one at a time in succession. 10. The system of claim 6, wherein the one or more modules include instructions to, in a respective reduce process, receive a respective partition of the intermediate data from the memory of the interconnected processors while at least one map process that produces the intermediate data received by the respective reduce process continues executing the application-independent map program to retrieve the sequence of input data blocks assigned thereto by the master process and to apply the application-specific map function to input data blocks in the sequence to produce the intermediate data received by the respective reduce process. 11. A non-transitory computer readable storage medium storing one or more programs for execution by a plurality of interconnected processors in a computer system, the one or more programs comprising instructions to: execute a plurality of processes on the plurality of interconnected processors, the plurality of processes including a master process for coordinating a data processing job for processing a set of input data, plurality of map processes, and a plurality of reduce processes;in the master process, assign input data blocks of a set of input data to respective map processes of the plurality of map processes, assign partitions of intermediate data to respective reduce processes of the plurality of reduce processes, and determine when the data processing job has reached a predefined level of completion;in each of the plurality of map processes: execute an application-independent map program to retrieve a sequence of input data blocks assigned thereto by the master process and to apply an application-specific map function to each input data block in the sequence to produce the intermediate data; andstore the intermediate data in memory of the interconnected processors;in each of the plurality of reduce processes: receive a respective partition of the intermediate data from the memory of the interconnected processors; andapply an application-specific reduce function to the respective partition of the intermediate data to produce output values;in accordance with a determination that processing of an identified partition of the intermediate data, identified using predefined criteria, is likely to delay the data processing job, take a remedial action with respect to the identified partition; andin accordance with the master process determining that the data processing job has reached the predefined level of completion, schedule a backup execution of a respective remaining task, anddetermine that the respective remaining task is completed when either the respective remaining task or the backup execution of the respective remaining task completes. 12. The non-transitory computer readable storage medium of claim 11, wherein the predefined criteria used to determine that processing of the identified partition of the intermediate data is likely to delay the data processing job includes determining the size of the partition of the intermediate data relative to the size of other partitions of the intermediate data in the data processing job. 13. The non-transitory computer readable storage medium of claim 11, wherein the predefined criteria used to determine that processing of the identified partition of the intermediate data is likely to delay the data processing job includes identifying a partition of the intermediate data that is substantially larger than an average partition size of the intermediate data. 14. The non-transitory computer readable storage medium of claim 11, wherein the one or more modules include instructions to, in a respective reduce process: receive multiple distinct partitions of the intermediate data; andprocess the multiple partitions one at a time in succession. 15. The non-transitory computer readable storage medium of claim 11, wherein the one or more modules include instructions to, in a respective reduce process, receive a respective partition of the intermediate data from the memory of the interconnected processors while at least one map process that produces the intermediate data received by the respective reduce process continues executing the application-independent map program to retrieve the sequence of input data blocks assigned thereto by the master process and to apply the application-specific map function to input data blocks in the sequence to produce the intermediate data received by the respective reduce process. 16. The method of claim 1, wherein scheduling backup execution comprises scheduling the backup execution of the respective remaining task for processing on a high capacity process. 17. The method of claim 1 further comprising: in accordance with the master process determining that a process fails: determining what task was running in the failed process,dividing the failed task over a plurality of subtasks, andassigning the plurality of subtasks to at least one new process. 18. The system of claim 6, wherein the instructions to schedule the backup execution further comprise instructions to schedule the backup execution of the respective remaining task for processing on a high capacity process. 19. The system of claim 6 wherein the one or more modules include instructions to, in accordance with the master process determining that a process fails: determine what task was running in the failed process,divide the failed task over a plurality of subtasks, andassign the plurality of subtasks to at least one new process. 20. The non-transitory computer readable storage medium of claim 11, wherein the instructions to schedule the backup execution further comprise instructions to schedule the backup execution of the respective remaining task for processing on a high capacity process. 21. The non-transitory computer readable storage medium of claim 11, wherein the one or more programs include instructions to, in accordance with the master process determining that a process fails: determine what task was running in the failed process,divide the failed task over a plurality of subtasks, andassign the plurality of subtasks to at least one new process.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (27)
McMillen Robert J. ; Watson M. Cameron ; Chura David J., Computer system using a master processor to automatically reconfigure faulty switch node that is detected and reported.
Ekanadham Kattamuri ; Moreira Jose Eduardo ; Naik Vijay Krishnarao, Method for resource control in parallel environments using program organization and run-time support.
Bookman,Lawrence A.; Blair,David Albert; Rosenthal,Steven M.; Krawitz,Robert Louis; Beckerle,Michael J.; Callen,Jerry Lee; Razdow,Allen M.; Mudambi,Shyam R., Segmentation and processing of continuous data streams using transactional semantics.
Malewicz, Grzegorz; Dvorsky, Marian; Colohan, Christopher B.; Thomson, Derek P.; Levenberg, Joshua Louis, System and method for limiting the impact of stragglers in large-scale parallel data processing.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.