[특허]System and method for limiting the impact of stragglers in large-scale parallel data processing

System and method for limiting the impact of stragglers in large-scale parallel data processing 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-007/38 G06F-009/00 G06F-009/44 G06F-015/00
출원번호	US-0759637 (2010-04-13)
등록번호	US-8510538 (2013-08-13)
발명자 / 주소	Malewicz, Grzegorz Dvorsky, Marian Colohan, Christopher B. Thomson, Derek P. Levenberg, Joshua Louis
출원인 / 주소	Google Inc.
대리인 / 주소	Morgan, Lewis & Bockius LLP
인용정보	피인용 횟수 : 27 인용 특허 : 20

초록 ▼

A large-scale data processing system and method including a plurality of processes, wherein a master process assigns input data blocks to respective map processes and partitions of intermediate data are assigned to respective reduce processes. In each of the plurality of map processes an application-independent map program retrieves a sequence of input data blocks assigned thereto by the master process and applies an application-specific map function to each input data block in the sequence to produce the intermediate data and stores the intermediate data in high speed memory of the interconnected processors. Each of the plurality of reduce processes receives a respective partition of the intermediate data from the high speed memory of the interconnected processors while the map processes continue to process input data blocks an application-specific reduce function is applied to the respective partition of the intermediate data to produce output values.

대표청구항 ▼

1. A method of performing a large-scale data processing job, comprising: executing a plurality of processes on a plurality of interconnected processors, the plurality of processes including a master process for coordinating a data processing job for processing a set of input data, and plurality of map processes and a plurality of reduce processes;in the master process, assigning input data blocks of a set of input data to respective map processes of the plurality of map processes and assigning partitions of intermediate data to respective reduce processes of the plurality of reduce processes;in each of the plurality of map processes: executing an application-independent map program to retrieve a sequence of input data blocks assigned thereto by the master process and to apply an application-specific map function to each input data block in the sequence to produce the intermediate data; andstoring the intermediate data in memory of the interconnected processors; andin each of the plurality of reduce processes: receiving a respective partition of the intermediate data from the memory of the interconnected processors; andapplying an application-specific reduce function to the respective partition of the intermediate data to produce output values; andin a respective reduce process: receiving multiple distinct partitions of the intermediate data andprocessing the multiple partitions one at a time in succession; andidentifying the respective reduce process as a reduce process that is delaying the data processing job while continuing to process intermediate data and, in response, reassigning at least one of the multiple partitions, which has not yet been processed, to a second reduce process, including copying the intermediate data in the reassigned partition to the other reduce process. 2. The method of claim 1, further comprising sorting the intermediate data into the plurality of partitions of the intermediate data. 3. The method of claim 2, wherein the data processing job is initiated by a user, and the intermediate data is sorted into the plurality of partitions based on an application-specific partition function selected by the user. 4. The method of claim 3, wherein the application-specific partition function is defined by the user. 5. The method of claim 1, wherein the data processing job is initiated by a user, and the application-specific map function and the application-specific reduce function are selected by the user. 6. The method of claim 5, wherein the application-specific map function and the application-specific reduce function are defined by the user. 7. The method of claim 1, wherein: producing the intermediate data includes producing a plurality of blocks of intermediate data, wherein each block of intermediate data includes all of the intermediate data produced by applying the application-specific map function to a respective block of input data; andreceiving a respective partition of the intermediate data includes receiving a subset of the intermediate data in a first block of intermediate data that is associated with the respective partition while a second block of intermediate data is being produced, the second block of intermediate data including at least some intermediate data that is associated with the respective partition. 8. The method of claim 1, further comprising identifying a partition that is likely to delay the data processing job using predefined criteria and taking a remedial action. 9. The method of claim 8, wherein identifying a partition that is likely to delay the data processing job includes determining the size of the partition relative to the size of other partitions in the data processing job. 10. The method of claim 8, wherein remedial action comprises scheduling the partition for processing on a high capacity reduce process. 11. The method of claim 1, wherein the intermediate data in the reassigned partition is copied from memory associated with the respective reduce process. 12. The method of claim 1, further comprising, after identifying the respective reduce process as a reduce process that is delaying the data processing job, dividing the intermediate data in a partition that is assigned to the respective reduce process into a plurality of subpartitions and assigning each subpartition to a reduce process that is not the respective reduce process. 13. The method of claim 12, wherein dividing the intermediate data in the partition that is assigned to the respective reduce process includes copying the intermediate data in the partition from memory associated with the respective reduce process to memory associated with a reduce process that is not the respective reduce process. 14. The method of claim 1, wherein applying an application-specific reduce function to the respective partition of the intermediate data to produce output values includes: while continuing to receive a respective partition of the intermediate data: storing at least a subset of the intermediate data of the respective partition in memory associated with the reduce process;while the intermediate data is stored in the memory associated with the reduce process, applying an application-specific combiner function to produce combined intermediate data values; andapplying the application-specific reduce function to the combined intermediate data values to produce output values. 15. The method of claim 14, wherein the combiner function is the same function as the application-specific reduce function. 16. A system for large-scale processing of data, comprising: memory;one or more processors; andone or more modules stored in the memory and executed by the one or more processors, the one or more modules including instructions to:execute a plurality of processes on a plurality of interconnected processors, the plurality of processes including a master process for coordinating a data processing job for processing a set of input data, and plurality of map processes and a plurality of reduce processes;in the master process, assign input data blocks of a set of input data to respective map processes of the plurality of map processes and assigning partitions of intermediate data to respective reduce processes of the plurality of reduce processes;in each of the plurality of map processes: execute an application-independent map program to retrieve a sequence of input data blocks assigned thereto by the master process and to apply an application-specific map function to each input data block in the sequence to produce the intermediate data; andstore the intermediate data in memory of the interconnected processors; andin each of the plurality of reduce processes: receive a respective partition of the intermediate data from the memory of the interconnected processors; andapply an application-specific reduce function to the respective partition of the intermediate data to produce output values; andin a respective reduce process: receive multiple distinct partitions of the intermediate data andprocess the multiple partitions one at a time in succession; andidentify the respective reduce process as a reduce process that is delaying the data processing job while continuing to process intermediate data and, in response, reassign at least one of the multiple partitions, which has not yet been processed, to a second reduce process, including copying the intermediate data in the reassigned partition to the other reduce process. 17. The system of claim 16, wherein: the instructions to produce the intermediate data include instructions to produce a plurality of blocks of intermediate data, wherein each block of intermediate data includes all of the intermediate data produced by applying the application-specific map function to a respective block of input data; andthe instructions to receive a respective partition of the intermediate data include instructions to receive a subset of the intermediate data in a first block of intermediate data that is associated with the respective partition while a second block of intermediate data is being produced, the second block of intermediate data including at least some intermediate data that is associated with the respective partition. 18. The system of claim 16, further comprising instructions, responsive to identifying the respective reduce process as a reduce process that is delaying the data processing job, to divide the intermediate data in a partition that is assigned to the respective reduce process into a plurality of subpartitions and assign each subpartition to a reduce process that is not the respective reduce process. 19. The system of claim 16, wherein the instructions to apply an application-specific reduce function to the respective partition of the intermediate data to produce output values include instructions to: while continuing to receive a respective partition of the intermediate data: store at least a subset of the intermediate data of the respective partition in memory associated with the reduce process;while the intermediate data is stored in the memory associated with the reduce process, apply an application-specific combiner function to produce combined intermediate data values; andapply the application-specific reduce function to the combined intermediate data values to produce output values. 20. A non-transitory computer readable storage medium storing one or more programs for execution by one or more processors of a client device, the one or more programs comprising instructions to: execute a plurality of processes on a plurality of interconnected processors, the plurality of processes including a master process for coordinating a data processing job for processing a set of input data, and plurality of map processes and a plurality of reduce processes;in the master process, assign input data blocks of a set of input data to respective map processes of the plurality of map processes and assigning partitions of intermediate data to respective reduce processes of the plurality of reduce processes;in each of the plurality of map processes: execute an application-independent map program to retrieve a sequence of input data blocks assigned thereto by the master process and to apply an application-specific map function to each input data block in the sequence to produce the intermediate data; andstore the intermediate data in memory of the interconnected processors; andin each of the plurality of reduce processes: receive a respective partition of the intermediate data from the memory of the interconnected processors; andapply an application-specific reduce function to the respective partition of the intermediate data to produce output values; andin a respective reduce process: receive multiple distinct partitions of the intermediate data andprocess the multiple partitions one at a time in succession; andidentify the respective reduce process as a reduce process that is delaying the data processing job while continuing to process intermediate data and, in response, reassign at least one of the multiple partitions, which has not yet been processed, to a second reduce process, including copying the intermediate data in the reassigned partition to the other reduce process. 21. The non-transitory computer readable storage medium of claim 20, wherein: the instructions to produce the intermediate data include instructions to produce a plurality of blocks of intermediate data, wherein each block of intermediate data includes all of the intermediate data produced by applying the application-specific map function to a respective block of input data; andthe instructions to receive a respective partition of the intermediate data include instructions to receive a subset of the intermediate data in a first block of intermediate data that is associated with the respective partition while a second block of intermediate data is being produced, the second block of intermediate data including at least some intermediate data that is associated with the respective partition. 22. The non-transitory computer readable storage medium of claim 20, wherein the one or more programs further comprise instructions, responsive to identifying the respective reduce process as a reduce process that is delaying the data processing job, to divide the intermediate data in a partition that is assigned to the respective reduce process into a plurality of subpartitions and assign each subpartition to a reduce process that is not the respective reduce process. 23. The non-transitory computer readable storage medium of claim 20, wherein the instructions to apply an application-specific reduce function to the respective partition of the intermediate data to produce output values include instructions to: while continuing to receive a respective partition of the intermediate data: store at least a subset of the intermediate data of the respective partition in memory associated with the reduce process;while the intermediate data is stored in the memory associated with the reduce process, apply an application-specific combiner function to produce combined intermediate data values; andapply the application-specific reduce function to the combined intermediate data values to produce output values. 24. The method of claim 1, wherein receiving the respective partition of the intermediate data from the memory of the interconnected processors occurs while the map processes that produced the received intermediate data continue to process input data blocks. 25. The system of claim 16, wherein receiving the respective partition of the intermediate data from the memory of the interconnected processors occurs while the map processes that produced the received intermediate data continue to process input data blocks. 26. The non-transitory computer readable storage medium of claim 20, wherein receiving the respective partition of the intermediate data from the memory of the interconnected processors occurs while the map processes that produced the received intermediate data continue to process input data blocks.

이 특허에 인용된 특허 (20)

McMillen Robert J. ; Watson M. Cameron ; Chura David J., Computer system using a master processor to automatically reconfigure faulty switch node that is detected and reported.
상세보기
Hardwick Jonathan C.,GBX, Dynamic load balancing among processors in a parallel computer.
상세보기
Shimon Muller ; Denton E. Gentry, Jr. ; John E. Watkins ; Linda T. Cheng, High performance network interface.
상세보기
Liu, Huan, Infrastructure for parallel programming of clusters of machines.
상세보기
Dean, Jeffrey; Ghemawat, Sanjay, Large-scale data processing in a distributed and parallel processing enviornment.
상세보기
Matsushita Masayuki,JPX ; Ugajin Atsushi,JPX, Management system and method for parallel computer system.
상세보기
Dageville,Benoit; Amor,Patrick A., Managing parallel execution of work granules according to their affinity.
상세보기
Eichstaedt Matthias ; Lu Qi ; Teng Shang-Hua, Method and apparatus for parallel profile matching in a large scale webcasting system.
상세보기
Tsuchida Masashi,JPX ; Masai Kazuo,JPX ; Torii Shunichi,JPX, Method and system of database divisional management for parallel database system.
상세보기
Matsuzawa Hirofumi,JPX ; Fukuda Takeshi,JPX, Method for executing aggregate queries, and computer system.
상세보기
Waddington William H. ; Tan Leng Leng ; Grewell Patricia, Method for managing shared resources in a multiprocessing computer system.
상세보기
Waddington William H. ; Tan Leng Leng ; Grewell Patricia, Method for managing termination of a lock-holding process using a waiting lock.
상세보기
van Driel,Marinus A., Method for the automatic generation of an interactive electronic equipment documentation package.
상세보기
Allen,Terry Dennis; Desai,Paramesh S.; Shibamiya,Akira; Tie,Hong Sang; Tsang,Annie S., Method, system, and program for optimizing database query execution.
상세보기
Chan Lee ; Richard A. Weier ; Robert F. Krick, Multi-tag system and method for cache read/write.
상세보기
Sudzilouski, Uladzislau; Zaika, Igor, Multi-threaded processes for opening and saving documents.
상세보기
Douglas P. Brown ; Allen N. Diaz ; Donald R. Pederson, Multi-threading, multi-tasking architecture for a relational database management system.
상세보기
Hardwick Jonathan C.,GBX, Nested parallel 2D Delaunay triangulation method.
상세보기
Gulko,Abraham; Mellor,David, Parallel computing system, method and architecture.
상세보기
Dean, Jeffrey; Ghemawat, Sanjay, System and method for efficient large-scale data processing.
상세보기

이 특허를 인용한 특허 (27)

Borate, Milind; Bardale, Trimbak; Gottipati, Srikiran, Active repartitioning in a distributed database.
상세보기
MacLeod, Peter S., Adaptive parallel data processing.
상세보기
Rus, Silvius V.; Jiang, Wei, Automated load-balancing of partitions in arbitrarily imbalanced distributed mapreduce computations.
상세보기
Ohrimenko, Olga; Costa, Manuel; Fournet, Cedric; Gkantsidis, Christos; Kohlweiss, Markulf; Sharma, Divya, Data center privacy.
상세보기
Zhang, Jiaxing; Zhou, Hucheng; Guo, Zhenyu; Lin, Haoxiang; Zhou, Lidong, Data-parallel computation management.
상세보기
Jung, Myung-June; Lee, Ju-Pyung, Distributed processing apparatus and method for processing large data through hardware acceleration.
상세보기
Balikov, Alexander Gourkov; Dvorsky, Marian; Zhao, Yonggang, Dynamic shuffle reconfiguration.
상세보기
Balikov, Alexander Gourkov; Dvorsky, Marian; Zhao, Yonggang, Dynamic shuffle reconfiguration.
상세보기
Cherkasova, Ludmila; Verma, Abhishek, Estimating a performance parameter of a job having map and reduce tasks after a failure.
상세보기
Cai, Bin; Xiang, Zhe; Xue, Wei; Yang, Bo; Yu, Qi, Generating map task output with version information during map task execution and executing reduce tasks using the output including version information.
상세보기
Matsubara, Katsushige; Matsumi, Takayuki; Mochizuki, Seiji; Iwata, Kenichi; Kaya, Toshiyuki, Image processing apparatus and control method for the same including estimation and scheduling.
상세보기
Palanisamy, Balaji; Singh, Aameek, Locality-aware resource allocation for cloud computing.
상세보기
Cramer, Michael J.; Christian, Brian P., Management of intermediate data spills during the shuffle phase of a map-reduce job.
상세보기
Cramer, Michael J.; Christian, Brian P., Management of intermediate data spills during the shuffle phase of a map-reduce job.
상세보기
Rangaraju, Ramasimha; Gupta, Virad; Narayanan, Deepankar; Edalur, Raghu; Sahoo, Mohini; Verma, Vivek, Managing parallel processes for application-level partitions.
상세보기
Cai, Bin; Xiang, Zhe; Xue, Wei; Yang, Bo; Yu, Qi, Method and system for operating a data center by reducing an amount of data to be processed.
상세보기
Balikov, Alexander Gourkov; Dvorsky, Marian; Zhao, Yonggang, Persistent shuffle system.
상세보기
Long, Brian Gregory; Pfifer, Justin Thomas; Chong, Sunjae, Roll back of scaled-out data.
상세보기
Murray, Edward Paul, Rolling subpartition management.
상세보기
Scheer, Michael; Clark, Morgan; Rashid, Ahsan; Vempati, Srinivasa R.; DeSouter, Marc; Sethi, Pranit; Kachmar, Maher, Segregating data and metadata in a file system.
상세보기
Jayaraman, Vinod; Dinkar, Abhijit; Taylor, Mark; Rao, Goutham; Root, Michael E.; Bashyam, Murali, Storage optimization manager.
상세보기
Kinoshita, Atsuhiro; Hoshino, Junichi; Kurita, Takahiro, Storage system.
상세보기
Pike, Robert C.; Quinlan, Sean; Dorward, Sean M.; Dean, Jeffrey; Ghemawat, Sanjay, System and method for analyzing data records.
상세보기
Dean, Jeffrey; Ghemawat, Sanjay, System and method for large-scale data processing using an application-independent framework.
상세보기
Malewicz, Grzegorz; Dvorsky, Marian; Colohan, Christopher B.; Thomson, Derek P.; Levenberg, Joshua Louis, System and method for limiting the impact of stragglers in large-scale parallel data processing.
상세보기
Malewicz, Grzegorz; Dvorsky, Marian; Colohan, Christopher B.; Thomson, Derek P.; Levenberg, Joshua Louis, System and method for limiting the impact of stragglers in large-scale parallel data processing.
상세보기
Malewicz, Grzegorz; Dvorsky, Marian; Colohan, Christopher B.; Thomson, Derek P.; Levenberg, Joshua Louis, System and method for limiting the impact of stragglers in large-scale parallel data processing.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

System and method for limiting the impact of stragglers in large-scale parallel data processing 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (20)

이 특허를 인용한 특허 (27)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

System and method for limiting the impact of stragglers in large-scale parallel data processing 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (20)

이 특허를 인용한 특허 (27)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트