System and method for limiting the impact of stragglers in large-scale parallel data processing
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-009/54
G06F-007/38
G06F-015/00
G06F-009/00
G06F-009/44
출원번호
US-0965108
(2013-08-12)
등록번호
US-9047141
(2015-06-02)
발명자
/ 주소
Malewicz, Grzegorz
Dvorsky, Marian
Colohan, Christopher B.
Thomson, Derek P.
Levenberg, Joshua Louis
출원인 / 주소
GOOGLE INC.
대리인 / 주소
Morgan, Lewis & Bockius LLP
인용정보
피인용 횟수 :
1인용 특허 :
24
초록▼
A large-scale data processing system and method including a plurality of processes, wherein a master process assigns input data blocks to respective map processes and partitions of intermediate data are assigned to respective reduce processes. In each of the plurality of map processes an application
A large-scale data processing system and method including a plurality of processes, wherein a master process assigns input data blocks to respective map processes and partitions of intermediate data are assigned to respective reduce processes. In each of the plurality of map processes an application-independent map program retrieves a sequence of input data blocks assigned thereto by the master process and applies an application-specific map function to each input data block in the sequence to produce the intermediate data and stores the intermediate data in high speed memory of the interconnected processors. Each of the plurality of reduce processes receives a respective partition of the intermediate data from the high speed memory of the interconnected processors while the map processes continue to process input data blocks an application-specific reduce function is applied to the respective partition of the intermediate data to produce output values.
대표청구항▼
1. A method of performing a large-scale data processing job, comprising: executing a plurality of processes on a plurality of interconnected processors, the plurality of processes including a master process for coordinating a data processing job for processing a set of input data, and plurality of m
1. A method of performing a large-scale data processing job, comprising: executing a plurality of processes on a plurality of interconnected processors, the plurality of processes including a master process for coordinating a data processing job for processing a set of input data, and plurality of map processes and a plurality of reduce processes;in the master process, assigning input data blocks of a set of input data to respective map processes of the plurality of map processes and assigning partitions of intermediate data to respective reduce processes of the plurality of reduce processes;in each of the plurality of map processes: executing an application-independent map program to retrieve a sequence of input data blocks assigned thereto by the master process and to apply an application-specific map function to each input data block in the sequence to produce the intermediate data; andstoring the intermediate data in memory of the interconnected processors; andin each of the plurality of reduce processes: receiving a respective partition of the intermediate data from the memory of the interconnected processors; andapplying an application-specific reduce function to the respective partition of the intermediate data to produce output values; andin a respective reduce process: receiving multiple distinct partitions of the intermediate data; andprocessing the multiple partitions one at a time in succession; andidentifying the respective reduce process as a reduce process that is delaying the data processing job while continuing to process intermediate data and, in response, dividing the intermediate data in a partition that is assigned to the respective reduce process into a plurality of subpartitions and assigning at least one subpartition of the plurality of subpartitions to a reduce process that is not the respective reduce process. 2. The method of claim 1, further comprising sorting the intermediate data into the plurality of partitions of the intermediate data. 3. The method of claim 2, wherein the data processing job is initiated by a user, and the intermediate data is sorted into the plurality of partitions based on an application-specific partition function selected by the user. 4. The method of claim 3, wherein the application-specific partition function is defined by the user. 5. The method of claim 1, wherein the data processing job is initiated by a user, and the application-specific map function and the application-specific reduce function are selected by the user. 6. The method of claim 5, wherein the application-specific map function and the application-specific reduce function are defined by the user. 7. The method of claim 1, wherein: producing the intermediate data includes producing a plurality of blocks of intermediate data, wherein each block of intermediate data includes all of the intermediate data produced by applying the application-specific map function to a respective block of input data; andreceiving a respective partition of the intermediate data includes receiving a subset of the intermediate data in a first block of intermediate data that is associated with the respective partition while a second block of intermediate data is being produced, the second block of intermediate data including at least some intermediate data that is associated with the respective partition. 8. The method of claim 1, further comprising identifying a partition that is likely to delay the data processing job using predefined criteria and taking a remedial action. 9. The method of claim 8, wherein identifying a partition that is likely to delay the data processing job includes determining the size of the partition relative to the size of other partitions in the data processing job. 10. The method of claim 8, wherein remedial action comprises scheduling the partition for processing on a high capacity reduce process. 11. The method of claim 1, wherein dividing the intermediate data in the partition that is assigned to the respective reduce process includes copying the intermediate data in the partition from memory associated with the respective reduce process to memory associated with a reduce process that is not the respective reduce process. 12. The method of claim 1, wherein applying an application-specific reduce function to the respective partition of the intermediate data to produce output values includes: while continuing to receive a respective partition of the intermediate data: storing at least a subset of the intermediate data of the respective partition in memory associated with the reduce process;while the intermediate data is stored in the memory associated with the reduce process, applying an application-specific combiner function to produce combined intermediate data values; andapplying the application-specific reduce function to the combined intermediate data values to produce output values. 13. The method of claim 12, wherein the combiner function is the same function as the application-specific reduce function. 14. The method of claim 1, wherein receiving the respective partition of the intermediate data from the memory of the interconnected processors occurs while the map processes that produced the received intermediate data continue to process input data blocks. 15. A system for large-scale processing of data, comprising: memory;one or more processors; andone or more modules stored in the memory and executed by the one or more processors, the one or more modules including instructions to:execute a plurality of processes on a plurality of interconnected processors, the plurality of processes including a master process for coordinating a data processing job for processing a set of input data, and plurality of map processes and a plurality of reduce processes;in the master process, assign input data blocks of a set of input data to respective map processes of the plurality of map processes and assigning partitions of intermediate data to respective reduce processes of the plurality of reduce processes;in each of the plurality of map processes: execute an application-independent map program to retrieve a sequence of input data blocks assigned thereto by the master process and to apply an application-specific map function to each input data block in the sequence to produce the intermediate data; andstore the intermediate data in memory of the interconnected processors; andin each of the plurality of reduce processes: receive a respective partition of the intermediate data from the memory of the interconnected processors; andapply an application-specific reduce function to the respective partition of the intermediate data to produce output values; andin a respective reduce process: receive multiple distinct partitions of the intermediate data; andprocess the multiple partitions one at a time in succession; andidentify the respective reduce process as a reduce process that is delaying the data processing job while continuing to process intermediate data and, in response, divide the intermediate data in a partition that is assigned to the respective reduce process into a plurality of subpartitions and assign at least one subpartition of the plurality of subpartitions to a reduce process that is not the respective reduce process. 16. The system of claim 15, further including instructions to identify a partition that is likely to delay the data processing job using predefined criteria and taking a remedial action. 17. The system of claim 15, wherein: the instructions to produce the intermediate data include instructions to produce a plurality of blocks of intermediate data, wherein each block of intermediate data includes all of the intermediate data produced by applying the application-specific map function to a respective block of input data; andthe instructions to receive a respective partition of the intermediate data include instructions to receive a subset of the intermediate data in a first block of intermediate data that is associated with the respective partition while a second block of intermediate data is being produced, the second block of intermediate data including at least some intermediate data that is associated with the respective partition. 18. The system of claim 15, wherein the instructions to apply an application-specific reduce function to the respective partition of the intermediate data to produce output values include instructions to: while continuing to receive a respective partition of the intermediate data: store at least a subset of the intermediate data of the respective partition in memory associated with the reduce process;while the intermediate data is stored in the memory associated with the reduce process, apply an application-specific combiner function to produce combined intermediate data values; andapply the application-specific reduce function to the combined intermediate data values to produce output values. 19. The system of claim 15, wherein receiving the respective partition of the intermediate data from the memory of the interconnected processors occurs while the map processes that produced the received intermediate data continue to process input data blocks. 20. A non-transitory computer readable storage medium storing one or more programs for execution by one or more processors of a client device, the one or more programs comprising instructions to: execute a plurality of processes on a plurality of interconnected processors, the plurality of processes including a master process for coordinating a data processing job for processing a set of input data, and plurality of map processes and a plurality of reduce processes;in the master process, assign input data blocks of a set of input data to respective map processes of the plurality of map processes and assigning partitions of intermediate data to respective reduce processes of the plurality of reduce processes;in each of the plurality of map processes: execute an application-independent map program to retrieve a sequence of input data blocks assigned thereto by the master process and to apply an application-specific map function to each input data block in the sequence to produce the intermediate data; andstore the intermediate data in memory of the interconnected processors; andin each of the plurality of reduce processes: receive a respective partition of the intermediate data from the memory of the interconnected processors; andapply an application-specific reduce function to the respective partition of the intermediate data to produce output values; andin a respective reduce process: receive multiple distinct partitions of the intermediate data; andprocess the multiple partitions one at a time in succession; andidentify the respective reduce process as a reduce process that is delaying the data processing job while continuing to process intermediate data and, in response, divide the intermediate data in a partition that is assigned to the respective reduce process into a plurality of subpartitions and assign at least one subpartition of the plurality of subpartitions to a reduce process that is not the respective reduce process. 21. The non-transitory computer readable storage medium of claim 20, further including instructions to identify a partition that is likely to delay the data processing job using predefined criteria and taking a remedial action. 22. The non-transitory computer readable storage medium of claim 20, wherein: the instructions to produce the intermediate data include instructions to produce a plurality of blocks of intermediate data, wherein each block of intermediate data includes all of the intermediate data produced by applying the application-specific map function to a respective block of input data; andthe instructions to receive a respective partition of the intermediate data include instructions to receive a subset of the intermediate data in a first block of intermediate data that is associated with the respective partition while a second block of intermediate data is being produced, the second block of intermediate data including at least some intermediate data that is associated with the respective partition. 23. The non-transitory computer readable storage medium of claim 20, wherein the instructions to apply an application-specific reduce function to the respective partition of the intermediate data to produce output values include instructions to: while continuing to receive a respective partition of the intermediate data: store at least a subset of the intermediate data of the respective partition in memory associated with the reduce process;while the intermediate data is stored in the memory associated with the reduce process, apply an application-specific combiner function to produce combined intermediate data values; andapply the application-specific reduce function to the combined intermediate data values to produce output values. 24. The non-transitory computer readable storage medium of claim 20, wherein receiving the respective partition of the intermediate data from the memory of the interconnected processors occurs while the map processes that produced the received intermediate data continue to process input data blocks.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (24)
McMillen Robert J. ; Watson M. Cameron ; Chura David J., Computer system using a master processor to automatically reconfigure faulty switch node that is detected and reported.
Ekanadham Kattamuri ; Moreira Jose Eduardo ; Naik Vijay Krishnarao, Method for resource control in parallel environments using program organization and run-time support.
Bookman,Lawrence A.; Blair,David Albert; Rosenthal,Steven M.; Krawitz,Robert Louis; Beckerle,Michael J.; Callen,Jerry Lee; Razdow,Allen M.; Mudambi,Shyam R., Segmentation and processing of continuous data streams using transactional semantics.
Malewicz, Grzegorz; Dvorsky, Marian; Colohan, Christopher B.; Thomson, Derek P.; Levenberg, Joshua Louis, System and method for limiting the impact of stragglers in large-scale parallel data processing.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.