A method processes data records. The method partitions the data records into groups and assigns each group to a respective process of a first plurality of processes, which execute in parallel. For each group, the assigned process extracts information from the data records, applies a script with info
A method processes data records. The method partitions the data records into groups and assigns each group to a respective process of a first plurality of processes, which execute in parallel. For each group, the assigned process extracts information from the data records, applies a script with information processing commands applied sequentially to produce intermediate values, stores the intermediate values in a respective intermediate data structure, and updates the status of the group to indicate completion. When the predefined threshold percentage of the data records are completed, the process assigns each group to a respective second process as a backup. When each of the groups has been completed by at least one process (either the original or the backup), the method executes a second plurality of processes to aggregate intermediate values from the intermediate data structures to produce output data. The aggregation includes intermediate values only once for each group.
대표청구항▼
1. A computer-implemented method of processing a plurality of data records, performed on a system having one or more processors and memory storing one or more programs for execution by the one or more processors to perform the computer-implemented method, comprising: partitioning the plurality of da
1. A computer-implemented method of processing a plurality of data records, performed on a system having one or more processors and memory storing one or more programs for execution by the one or more processors to perform the computer-implemented method, comprising: partitioning the plurality of data records into groups and assigning each group of data records to a respective process of a first plurality of processes;executing the first plurality of processes in parallel, wherein for each group the assigned process: extracts information from the data records in the group;applies a multi-step script comprising a plurality of information processing commands applied sequentially to the extracted information to produce one or more intermediate values;stores the one or more intermediate values in a respective intermediate data structure in a plurality of intermediate data structures; andupdates a status of the group to indicate completion;determining whether at least a predefined threshold percentage of the plurality of data records are completed based on the status updates provided by the first plurality of processes, wherein the predefined threshold percentage is a predetermined value that is less than all the first plurality of data records;when it is determined that the predefined threshold percentage of the plurality of data records are completed, assigning each group of data records that is not completed to a respective second process of the first plurality of processes;when it is determined that each of the groups in the plurality of groups has been completed by at least one process, executing a second plurality of processes to aggregate intermediate values from the intermediate data structures to produce output data, wherein the aggregation includes intermediate values only once for each group in the plurality of groups. 2. The computer-implemented method of claim 1, wherein the predetermined value is 90 percent of the first plurality of data records. 3. The computer-implemented method of claim 1, wherein the predetermined value is 95 percent of the first plurality of data records. 4. The computer-implemented method of claim 1, wherein the predetermined value is 99 percent of the first plurality of data records. 5. The computer-implemented method of claim 1, wherein an intermediate data structure in the plurality of intermediate data structures is a table having a plurality of indices. 6. The computer-implemented method of claim 1, wherein a data record in the plurality of data records comprises a log file, a transaction record or a document. 7. The computer-implemented method of claim 1, wherein a respective intermediate data structure in the plurality of intermediate data structures is a table having a plurality of indices, wherein at least a subset of the plurality of indices is dynamically generated when one or more values are stored in the respective intermediate data structure. 8. The computer-implemented method of claim 1, wherein an intermediate data structure in the plurality of intermediate data structures is a table, the computer-implemented method further comprising initializing the table, wherein the table comprises a plurality of indices that are statically generated when the table is initialized. 9. The computer-implemented method of claim 1, wherein a first process in the first plurality of processes generates a first intermediate data structure in the plurality of intermediate data structures, the first intermediate data structure having a first key,a second process in the first plurality of processes generates a second intermediate data structure in the plurality of intermediate data structures, the second intermediate data structure having a second key, wherein the first key and the second key are the same, anda process in the second plurality of processes aggregates (i) values from the first intermediate data structure indexed to the first key and (ii) values from the second intermediate data structure indexed to the second key to produce all or a portion of the output data. 10. A system for processing a plurality of data records, comprising: one or more processors; andmemory storing one or more programs to be executed by the one or more processors;the one or more programs comprising instructions for:partitioning the plurality of data records into groups and assigning each group of data records to a respective process of a first plurality of processes;executing the first plurality of processes in parallel, wherein for each group the assigned process: extracts information from the data records in the group;applies a multi-step script comprising a plurality of information processing commands applied sequentially to the extracted information to produce one or more intermediate values;stores the one or more intermediate values in a respective intermediate data structure in a plurality of intermediate data structures; andupdates a status of the group to indicate completion;determining whether at least a predefined threshold percentage of the plurality of data records are completed based on the status updates provided by the first plurality of processes, wherein the predefined threshold percentage is a predetermined value that is less than all the first plurality of data records;when it is determined that the predefined threshold percentage of the plurality of data records are completed, assigning each group of data records that is not completed to a respective second process of the first plurality of processes;when it is determined that each of the groups in the plurality of groups has been completed by at least one process, executing a second plurality of processes to aggregate intermediate values from the intermediate data structures to produce output data, wherein the aggregation includes intermediate values only once for each group in the plurality of groups. 11. The system of claim 10, wherein an intermediate data structure in the plurality of intermediate data structures is a table having a plurality of indices. 12. The system of claim 10, wherein a data record in the plurality of data records comprises a log file, a transaction record or a document. 13. The system of claim 10, wherein a respective intermediate data structure in the plurality of intermediate data structures is a table having a plurality of indices, wherein at least a subset of the plurality of indices is dynamically generated when one or more values are stored in the respective intermediate data structure. 14. The system of claim 10, wherein an intermediate data structure in the plurality of intermediate data structures is a table, the computer-implemented method further comprising initializing the table, wherein the table comprises a plurality of indices that are statically generated when the table is initialized. 15. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for: partitioning a plurality of data records into groups and assigning each group of data records to a respective process of a first plurality of processes;executing the first plurality of processes in parallel, wherein for each group the assigned process: extracts information from the data records in the group;applies a multi-step script comprising a plurality of information processing commands applied sequentially to the extracted information to produce one or more intermediate values;stores the one or more intermediate values in a respective intermediate data structure in a plurality of intermediate data structures; andupdates a status of the group to indicate completion;determining whether at least a predefined threshold percentage of the plurality of data records are completed based on the status updates provided by the first plurality of processes, wherein the predefined threshold percentage is a predetermined value that is less than all the first plurality of data records;when it is determined that the predefined threshold percentage of the plurality of data records are completed, assigning each group of data records that is not completed to a respective second process of the first plurality of processes;when it is determined that each of the groups in the plurality of groups has been completed by at least one process, executing a second plurality of processes to aggregate intermediate values from the intermediate data structures to produce output data, wherein the aggregation includes intermediate values only once for each group in the plurality of groups. 16. The non-transitory computer readable storage medium of claim 15, wherein an intermediate data structure in the plurality of intermediate data structures is a table having a plurality of indices. 17. The non-transitory computer readable storage medium of claim 15, wherein a data record in the plurality of data records comprises a log file, a transaction record or a document. 18. The non-transitory computer readable storage medium of claim 15, wherein a respective intermediate data structure in the plurality of intermediate data structures is a table having a plurality of indices, wherein at least a subset of the plurality of indices is dynamically generated when one or more values are stored in the respective intermediate data structure. 19. The non-transitory computer readable storage medium of claim 15, wherein an intermediate data structure in the plurality of intermediate data structures is a table, the computer-implemented method further comprising initializing the table, wherein the table comprises a plurality of indices that are statically generated when the table is initialized.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (71)
Sprenger Jeff H. ; Gramley George W. ; Major Debbie A. ; Thompson Richard A. ; Hatcherson Rob, Apparatus and system for an adaptive data management architecture.
McMillen Robert J. ; Watson M. Cameron ; Chura David J., Computer system using a master processor to automatically reconfigure faulty switch node that is detected and reported.
Van Huben Gary Alan ; Mueller Joseph Lawrence ; Siegel Michael Steven ; Warnock Thomas Bernard ; McDonald Darryl James, Data management system and process.
Tsuchida, Masashi; Nakano, Yukio; Kawamura, Nobuo; Negishi, Kazuyoshi; Torii, Shunichi, Database management apparatus and query operation therefor, including processing plural database operation requests based on key range of hash code.
Tsuchida Masashi,JPX ; Nakano Yukio,JPX ; Kawamura Nobuo,JPX ; Negishi Kazuyoshi,JPX ; Torii Shunichi,JPX, Database management system and method for query process for the same.
Tsuchida Masashi,JPX ; Nakano Yukio,JPX ; Kawamura Nobuo,JPX ; Negishi Kazuyoshi,JPX ; Torii Shunichi,JPX, Database management system and query operation therefor, including processing plural database operation requests based on key range of hash code.
Yamamoto Fujio (Higashiyamato JPX), Evaluating method of data division patterns and a program execution time for a distributed memory parallel computer syst.
Hamid BenHadda FR; Jean-Francois Marcotorchino FR; Didier Otthoffer FR, Machine-implementable method and apparatus for iteratively extending the results obtained from an initial query in a database.
Sprenger Jeff H. ; Gramley George W. ; Major Debbie A. ; Thompson Richard A. ; Hatcherson Rob, Method and apparatus for data management using an event transition network.
Gautam Jyotin ; Waddington William H. ; Tan Leng Leng ; Hallmark Gary ; Klein Jonathan ; Brumm Allen, Method and apparatus for parallelizing operations that insert data into an existing data container.
Ricard Gary Ross ; Rocheleau Richard Miles ; Sadecki Wayne Christopher, Method and computer program product for implementing highly concurrent record insertion in an ordinal number dependent database.
Ekanadham Kattamuri ; Moreira Jose Eduardo ; Naik Vijay Krishnarao, Method for resource control in parallel environments using program organization and run-time support.
Reuven Bakalash IL; Guy Shaked IL, Method of and system for managing multi-dimensional databases using modular-arithmetic based address data mapping processes on integer-encoded business dimensions.
Ramesh Bhashyam ; Kraus Timothy Brent ; Walter Todd Allan, Optimization of SQL queries involving aggregate expressions using a plurality of local and global aggregation operations.
McNeill Kevin M. (Tucson AZ) Ozeki Takeshi (Tucson AZ), Parallel searching system having a master processor for controlling plural slave processors for independently processing.
Cohen Gerald D. (New York NY) Stout Ralph L. (Brooklyn NY) Edson Jay H. (Woodside NY), Query processor for parallel processing in homogenous and heterogenous databases.
Eadline Douglas J. (Bethlehem PA), Run-time system having nodes for identifying parallel tasks in a logic program and searching for available nodes to exec.
Scarr James L. (Akron OH) Karolick Katherine (Brecksville OH) Reid Nacine M. (Parma Hights OH) Pressler Armin (Indianapolis IN) Bartkus Sandy J. (Midland MI), Script-based system for testing a multi-user computer system.
Bookman,Lawrence A.; Blair,David Albert; Rosenthal,Steven M.; Krawitz,Robert Louis; Beckerle,Michael J.; Callen,Jerry Lee; Razdow,Allen M.; Mudambi,Shyam R., Segmentation and processing of continuous data streams using transactional semantics.
Kremer Mark ; Tran Quoc Tai ; Depledge Michael ; Mukhopadhyay Santanu ; Keese William M. ; Arbab-Dehkordi Behrouz, System and apparatus for storage retrieval and analysis of relational and non-relational data.
Faybishenko, Yaroslav; Kan, Gene H.; Doolin, David M.; Waterhouse, Steve; Boutros, Sherif, System and method for determining relevancy of query responses in a distributed network search mechanism.
Faybishenko,Yaroslav; Kan,Gene H.; Camarda,Thomas J.; Doolin,David M.; Waterhouse,Steve; Cutting,Douglass R., System and method for distributed real-time search.
Tsuchida, Masashi; Nakano, Yukio; Kawamura, Nobuo; Negishi, Kazuyoshi; Torii, Shunichi, System and method for implementing hash-based load-balancing query processing in a multiprocessor database system.
Malewicz, Grzegorz; Dvorsky, Marian; Colohan, Christopher B.; Thomson, Derek P.; Levenberg, Joshua Louis, System and method for limiting the impact of stragglers in large-scale parallel data processing.
Samji, Mohammed; De Vorchik, David G.; Ramasubramanian, Ram; Guzak, Chris J.; McKee, Timothy P.; Ballou, Nathaniel H.; Raman, Balan Sethu, System and method for virtual folder sharing including utilization of static and dynamic lists.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.