A data parallel pipeline may specify multiple parallel data objects that contain multiple elements and multiple parallel operations that operate on the parallel data objects. Based on the data parallel pipeline, a dataflow graph of deferred parallel data objects and deferred parallel operations corr
A data parallel pipeline may specify multiple parallel data objects that contain multiple elements and multiple parallel operations that operate on the parallel data objects. Based on the data parallel pipeline, a dataflow graph of deferred parallel data objects and deferred parallel operations corresponding to the data parallel pipeline may be generated and one or more graph transformations may be applied to the dataflow graph to generate a revised dataflow graph that includes one or more of the deferred parallel data objects and deferred, combined parallel data operations. The deferred, combined parallel operations may be executed to produce materialized parallel data objects corresponding to the deferred parallel data objects.
대표청구항▼
1. A method comprising: executing an application that includes a data parallel pipeline, wherein the data parallel pipeline specifies multiple parallel data objects that contain multiple elements and multiple parallel operations that operate on the parallel data objects;generating, based on the data
1. A method comprising: executing an application that includes a data parallel pipeline, wherein the data parallel pipeline specifies multiple parallel data objects that contain multiple elements and multiple parallel operations that operate on the parallel data objects;generating, based on the data parallel pipeline, a dataflow graph of deferred parallel data objects and deferred parallel operations corresponding to the data parallel pipeline;applying one or more graph transformations to the dataflow graph to generate a revised dataflow graph that includes one or more of the deferred parallel data objects and deferred, combined parallel data operations; andexecuting the deferred, combined parallel operations to produce materialized parallel data objects corresponding to the deferred parallel data objects,wherein the deferred, combined parallel data operations includes at least one generalized mapreduce operation, the generalized mapreduce operation including multiple, parallel map operations and multiple, parallel reduce operations and being translatable to a single mapreduce operation that includes a single map function to implement the multiple, parallel map operations and a single reduce function to implement the multiple, parallel reduce operations, andwherein each deferred parallel operation includes a pointer to a parallel data object that is an input to the deferred parallel operation and a pointer to a deferred parallel object that is an output of the deferred parallel operation. 2. The method of claim 1 wherein executing the generalized mapreduce operation comprises translating the combined mapreduce operation to the single mapreduce operation and executing the single mapreduce operation. 3. The method of claim 2 wherein executing the single mapreduce operation comprises determining whether to execute the single mapreduce operation as a local, sequential operation or a remote, parallel operation. 4. The method of claim 2 wherein translating the generalized mapreduce operation to the single mapreduce operation comprises generating a map function that includes the multiple map operations and a reducer function that includes the multiple reducer operations. 5. The method of claim 1 wherein each deferred parallel data object includes a pointer to a parallel data operation that produces the parallel data object. 6. The method of claim 1 wherein each materialized object includes data contained in the object. 7. The method of claim 1 wherein the multiple parallel data objects are first class objects of a host programming language. 8. The method of claim 1 wherein the pipeline further includes a single data object that contains a single element and the dataflow graph includes a corresponding deferred single data object. 9. The method of claim 8 wherein at least one of the multiple parallel operations in the pipeline operates on the single data object and one of the multiple parallel data objects and the dataflow graph includes a corresponding deferred parallel operation that operates on a deferred single data object and a deferred parallel data object. 10. The method of claim 1 further comprising caching one or more results of the execution of the deferred, combined parallel operations for use in a future execution of the data parallel pipeline. 11. A system comprising: one or more processing devices;one or more storage devices, the storage devices storing instructions that, when executed by the one or more processing devices, implement the following:an application that includes a data parallel pipeline, wherein the data parallel pipeline specifies multiple parallel data objects that contain multiple elements and multiple parallel operations that operate on the parallel data objects;an evaluator configured, based on the data parallel pipeline, to generate a dataflow graph of deferred parallel data objects and deferred parallel operations corresponding to the data parallel pipeline;an optimizer configured to apply one or more graph transformations to the dataflow graph to generate a revised dataflow graph that includes one or more of the deferred parallel data objects and deferred, combined parallel data operations; andan executor configured to execute the deferred, combined parallel operations to produce materialized parallel data objects corresponding to the deferred parallel data objects,wherein the deferred, combined parallel data operations includes at least one generalized mapreduce operation, the generalized mapreduce operation including multiple, parallel map operations and multiple, parallel reduce operations and being translatable to a single mapreduce operation that includes a single map function to implement the multiple, parallel map operations and a single reduce function to implement the multiple, parallel reduce operations, andwherein each deferred parallel operation includes a pointer to a parallel data object that is an input to the deferred parallel operation and a pointer to a deferred parallel object that is an output of the deferred parallel operation. 12. The system of claim 1 wherein, to execute the generalized mapreduce operation, the executor is configured to translate the combined mapreduce operation to the single mapreduce operation and execute the single mapreduce operation. 13. The system of claim 12 wherein, to execute the single mapreduce operation, the executor is configured to determine whether to execute the single mapreduce operation as a local, sequential operation or a remote, parallel operation. 14. The system of claim 12 wherein, to translate the generalized mapreduce operation to the single mapreduce operation, the executor is configured to generate a map function that includes the multiple map operations and a reducer function that includes the multiple reducer operations. 15. The system of claim 1 wherein each deferred parallel data object includes a pointer to a parallel data operation that produces the parallel data object. 16. The system of claim 1 wherein each materialized object includes data contained in the object. 17. The system of claim 1 wherein the multiple parallel data objects are first class objects of a host programming language. 18. The system of claim 1 wherein the pipeline further includes a single data object that contains a single element and the dataflow graph includes a corresponding deferred single data object. 19. The system of claim 18 wherein at least one of the multiple parallel operations in the pipeline operates on the single data object and one of the multiple parallel data objects and the dataflow graph includes a corresponding deferred parallel operation that operates on a deferred single data object and a deferred parallel data object. 20. The system of claim 1 wherein the executor is configured to cache one or more results of the execution of the deferred, combined parallel operations for use in a future execution of the data parallel pipeline.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (7)
Linderman, Michael D.; Collins, Jamison D.; Wang, Perry; Wang, Hong, Compiler and runtime for heterogeneous multiprocessor systems.
Fontoura, Marcus Felipe; Josifovski, Vanja; Ravikumar, Shanmugasundaram; Olston, Christopher; Reed, Benjamin Clay; Tomkins, Andrew, Formal language and translator for parallel processing of data.
Abadi, Daniel; Bajda-Pawlikowski, Kamil; Abouzied, Azza; Silberschatz, Avi, Processing of data using a database system in communication with a data processing framework.
Sitsky, David; Hill, Matthew Westwood; Power, Robin; Sheehy, Eddie; Stewart, Stephen, Systems and methods for scalable delocalized information governance.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.