Managing lineage information includes: receiving lineage information representing one or more lineage relationships among two or more data processing programs and two or more logical datasets; receiving one or more runtime artifacts, each runtime artifact including information related to a previous
Managing lineage information includes: receiving lineage information representing one or more lineage relationships among two or more data processing programs and two or more logical datasets; receiving one or more runtime artifacts, each runtime artifact including information related to a previous execution of a data processing program of the two or more data processing programs; and analyzing the one or more runtime artifacts and the lineage information to determine one or more candidate modifications to the lineage information.
대표청구항▼
1. A method for managing lineage information, the method including: receiving lineage information representing one or more lineage relationships among two or more data processing programs and two or more logical datasets, wherein at least one of the logical datasets resolves to a physical dataset at
1. A method for managing lineage information, the method including: receiving lineage information representing one or more lineage relationships among two or more data processing programs and two or more logical datasets, wherein at least one of the logical datasets resolves to a physical dataset at run time of at least one of the data processing programs;receiving one or more runtime artifacts, each runtime artifact including information related to a previous execution of a data processing program of the two or more data processing programs; andanalyzing the one or more runtime artifacts and the lineage information and determining one or more candidate modifications to the lineage information based on results of the analyzing, wherein at least one candidate modification includes a modification to a representation of at least one of the two or more logical datasets based at least in part on a result of the analyzing, wherein the result is associated with one or more physical datasets. 2. The method of claim 1 wherein the one or more candidate modifications include a candidate modification that adds a new indirect lineage relationship between a data processing program of the two or more data processing programs and a logical dataset of the two or more logical datasets. 3. The method of claim 1 wherein the one or more candidate modifications include a first candidate modification that adds a new direct lineage relationship between a data processing program of the two or more data processing programs and a logical dataset of the two or more logical datasets. 4. The method of claim 3 wherein analyzing the runtime artifacts and the lineage information includes analyzing logs of previous executions of the two or more data processing programs to determine physical datasets read from or written to by the two or more data processing programs. 5. The method of claim 4 wherein analyzing the runtime artifacts and the lineage information further includes identifying two distinct logical datasets of the two or more logical datasets that are represented in the lineage information and are associated with the same physical dataset. 6. The method of claim 5 wherein the first candidate modification includes creation of the new lineage relationship between the two distinct logical datasets. 7. The method of claim 5 wherein the first candidate modification includes creation of the new lineage relationship including merging the two distinct logical datasets into a new combined logical dataset. 8. The method of claim 1 wherein each data processing program of the two or more data processing programs is an instance of a generic data processing program instantiated according to a set of one or more parameter values. 9. The method of claim 8 wherein analyzing the one or more runtime artifacts and the lineage information includes: analyzing one or more logs of previous executions of a first data processing program of the two or more data processing programs to determine a first parameter set used in a first instantiation of the first data processing program according to a first set of one or more parameter values,selecting at least some parameters from the first parameter set, anddetermining that the first instantiation of the first data processing program is not represented in the lineage information based on a generic version of the first data processing program and the at least some parameters. 10. The method of claim 9 wherein selecting at least some parameters from the first parameter set includes selecting parameters based on information received from a user. 11. The method of claim 9 wherein selecting at least some parameters from the first parameter set includes selecting parameters based on one or more predefined rules. 12. The method of claim 11 wherein a first rule of the one or more predefined rules specifies that parameters with parameter values in the form of a date are excluded from the selected parameters. 13. The method of claim 11 wherein a first rule of the one or more predefined rules specifies that a parameter with a parameter value that is transformed in the logic of a generic data processing program is included in the selected parameters. 14. The method of claim 9 wherein the one or more candidate modifications to the lineage information includes a first candidate modification that adds a new lineage relationship between the first data processing program of the two or more data processing programs and a logical dataset of the two or more logical datasets. 15. The method of claim 1, wherein the results of the analyzing include identification of at least one physical dataset to which at least one logical dataset resolved at run time of at least one data processing program. 16. The method of claim 1, further including applying a selected one of the one or more candidate modifications to the lineage information. 17. The method of claim 16 wherein the selected candidate modification is selected and applied to the lineage information automatically by a computing system performing the analyzing. 18. The method of claim 16 wherein the selected candidate modification is selected based at least in part on user input received after presenting one or more of the candidate modifications. 19. The method of claim 1 wherein the lineage relationships include a first lineage relationship representing a first data processing program of the two or more data processing programs receiving first data from a first logical dataset of the two or more logical datasets,a second lineage relationship representing a transfer of second data between two data processing programs of the two or more data processing programs, anda third lineage relationship representing a second logical dataset of the two or more logical datasets storing third data received from a second data processing program of the two or more data processing programs. 20. A non-transitory computer-readable medium storing software for managing lineage information, the software including instructions for causing a computing system to: receive lineage information representing one or more lineage relationships among two or more data processing programs and two or more logical datasets, wherein at least one of the logical datasets resolves to a physical dataset at run time of at least one of the data processing programs;receive one or more runtime artifacts, each runtime artifact including information related to a previous execution of a data processing program of the two or more data processing programs; andanalyze the one or more runtime artifacts and the lineage information and determining one or more candidate modifications to the lineage information based on results of the analyzing, wherein at least one candidate modification includes a modification to a representation of at least one of the two or more logical datasets based at least in part on a result of the analyzing, wherein the result is associated with one or more physical datasets. 21. The non-transitory computer-readable medium of claim 20 wherein the one or more candidate modifications include a first candidate modification that adds a new direct lineage relationship between a data processing program of the two or more data processing programs and a logical dataset of the two or more logical datasets. 22. The non-transitory computer-readable medium of claim 21 wherein analyzing the runtime artifacts and the lineage information includes analyzing logs of previous executions of the two or more data processing programs to determine physical datasets read from or written to by the two or more data processing programs. 23. The non-transitory computer-readable medium of claim 22 wherein analyzing the runtime artifacts and the lineage information further includes identifying two distinct logical datasets of the two or more logical datasets that are represented in the lineage information and are associated with the same physical dataset. 24. The non-transitory computer-readable medium of claim 23 wherein the first candidate modification includes creation of the new lineage relationship between the two distinct logical datasets. 25. The non-transitory computer-readable medium of claim 23 wherein the first candidate modification includes creation of the new lineage relationship including merging the two distinct logical datasets into a new combined logical dataset. 26. The non-transitory computer-readable medium of claim 20 wherein each data processing program of the two or more data processing programs is an instance of a generic data processing program instantiated according to a set of one or more parameter values. 27. The non-transitory computer-readable medium of claim 26 wherein analyzing the one or more runtime artifacts and the lineage information includes: analyzing one or more logs of previous executions of a first data processing program of the two or more data processing programs to determine a first parameter set used in a first instantiation of the first data processing program according to a first set of one or more parameter values,selecting at least some parameters from the first parameter set, anddetermining that the first instantiation of the first data processing program is not represented in the lineage information based on a generic version of the first data processing program and the at least some parameters. 28. The non-transitory computer-readable medium of claim 27 wherein the one or more candidate modifications to the lineage information includes a first candidate modification that adds a new lineage relationship between the first data processing program of the two or more data processing programs and a logical dataset of the two or more logical datasets. 29. The non-transitory computer-readable medium of claim 20, wherein the results of the analyzing include identification of at least one physical dataset to which at least one logical dataset resolved at run time of at least one data processing program. 30. The non-transitory computer-readable medium of claim 20 wherein the lineage relationships include a first lineage relationship representing a first data processing program of the two or more data processing programs receiving first data from a first logical dataset of the two or more logical datasets,a second lineage relationship representing a transfer of second data between two data processing programs of the two or more data processing programs, anda third lineage relationship representing a second logical dataset of the two or more logical datasets storing third data received from a second data processing program of the two or more data processing programs. 31. A computing system for managing lineage information, the computing system including: an input device or port configured to receive lineage information representing one or more lineage relationships among two or more data processing programs and two or more logical datasets, and one or more runtime artifacts, each runtime artifact including information related to a previous execution of a data processing program of the two or more data processing programs, wherein at least one of the logical datasets resolves to a physical dataset at run time of at least one of the data processing programs; andat least one processor configured to analyze the one or more runtime artifacts and the lineage information and determining one or more candidate modifications to the lineage information based on results of the analyzing, wherein at least one candidate modification includes a modification to a representation of at least one of the two or more logical datasets based at least in part on a result of the analyzing, wherein the result is associated with one or more physical datasets. 32. The computing system of claim 31 wherein the one or more candidate modifications include a first candidate modification that adds a new direct lineage relationship between a data processing program of the two or more data processing programs and a logical dataset of the two or more logical datasets. 33. The computing system of claim 32 wherein analyzing the runtime artifacts and the lineage information includes analyzing logs of previous executions of the two or more data processing programs to determine physical datasets read from or written to by the two or more data processing programs. 34. The computing system of claim 33 wherein analyzing the runtime artifacts and the lineage information further includes identifying two distinct logical datasets of the two or more logical datasets that are represented in the lineage information and are associated with the same physical dataset. 35. The computing system of claim 34 wherein the first candidate modification includes creation of the new lineage relationship between the two distinct logical datasets. 36. The computing system of claim 34 wherein the first candidate modification includes creation of the new lineage relationship including merging the two distinct logical datasets into a new combined logical dataset. 37. The computing system of claim 31 wherein each data processing program of the two or more data processing programs is an instance of a generic data processing program instantiated according to a set of one or more parameter values. 38. The computing system of claim 37 wherein analyzing the one or more runtime artifacts and the lineage information includes: analyzing one or more logs of previous executions of a first data processing program of the two or more data processing programs to determine a first parameter set used in a first instantiation of the first data processing program according to a first set of one or more parameter values,selecting at least some parameters from the first parameter set, anddetermining that the first instantiation of the first data processing program is not represented in the lineage information based on a generic version of the first data processing program and the at least some parameters. 39. The computing system of claim 38 wherein the one or more candidate modifications to the lineage information includes a first candidate modification that adds a new lineage relationship between the first data processing program of the two or more data processing programs and a logical dataset of the two or more logical datasets. 40. The computing system of claim 31, wherein the results of the analyzing include identification of at least one physical dataset to which at least one logical dataset resolved at run time of at least one data processing program. 41. The computing system of claim 31 wherein the lineage relationships include a first lineage relationship representing a first data processing program of the two or more data processing programs receiving first data from a first logical dataset of the two or more logical datasets,a second lineage relationship representing a transfer of second data between two data processing programs of the two or more data processing programs, anda third lineage relationship representing a second logical dataset of the two or more logical datasets storing third data received from a second data processing program of the two or more data processing programs. 42. A computing system for managing lineage information, the computing system including: means for receiving lineage information representing one or more lineage relationships among two or more data processing programs and two or more logical datasets, and one or more runtime artifacts, each runtime artifact including information related to a previous execution of a data processing program of the two or more data processing programs, wherein at least one of the logical datasets resolves to a physical dataset at run time of at least one of the data processing programs; andmeans for analyzing the one or more runtime artifacts and the lineage information and determining one or more candidate modifications to the lineage information based on results of the analyzing, wherein at least one candidate modification includes a modification to a representation of at least one of the two or more logical datasets based at least in part on a result of the analyzing, wherein the result is associated with one or more physical datasets.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (89)
Olenick, Brad M.; Szyperski, Clemens A.; Hunt, David George; Hughes, Gregory Lawrence; Manis, William A.; Zmrhal, Terry, Accessing and manipulating data in a data flow graph.
Belanger, David G.; Kester, Adrian; Parker, Sam; Puthenpura, Sarat; Weiss, Phyllis, Arrangement for guiding user design of comprehensive product solution using on-the-fly data validation.
Brady, Shaun Michael, Computer database system and method for collecting and reporting real estate property and loan performance information over a computer driven network.
Bjornson, Robert D; Weston, Stephen B.; Wing, James D.; Sherman, Andrew H.; Willard, Nathan L. H.; McCusker, James, Method and system for dataflow creation and execution.
Lee, Juhnyoung; Mohan, Rakesh; Rosinski, Thomas D.; Sigl, Gerhard, Method and system for estimating financial benefits of packaged application service projects.
Chen, Yen-Fu; Handy-Bosma, John Hans; Selvage, Mei Yang; Walker, Keith Raymond, Method for providing quick responses in instant messaging conversations.
Cesare,Mark Anthony; Christopher,Tom Robert; Jerves,Julie Ann; Mandel, III,Richard Henry, Method, system, program, and data structure for cleaning a database table.
Johnson, Michael K.; Troan, Erik W.; Wilson, Matthew S., Methods, systems, and computer program products for provisioning software using dynamic tags to identify and process files.
Freiburger, Paul Donald; Smith-Casem, Mervin Mencias; Fan, Liexiang; Milkowski, Andrzej, Multi-volume rendering of single mode data in medical diagnostic imaging.
Pearcy, Brett T.; Karli, Matthew E.; Reynolds, Charles P.; Tagle, Hugo A.; Maltby, II, David R.; Clark, Michael J.; Yadavalli, Balakrishna M., Parcel data acquisition and processing.
Moore Allan R. (Herndon VA) Poulos Lori J. (McLean VA) DeFazio Lynn G. (Manassas VA), Program storage device and computer program product for managing an event driven management information system with rule.
Schumacher, Larry Lee; Gonzales-Tuchmann, Agustin; Yogman, Laurence Tobin; Dingman, Paul C., System and method for data transformation using dataflow graphs.
Gibson William,GB2 ; Marshall David R. ; Turner Steve,GB2 ; Dawson William N. ; Hogan Patrick M., System and method for the creation and use of surrogate information system objects.
Adya, Atul; Melnik, Sergey; Michailov, Zlatko; Meek, Colin Joseph, View maintenance rules for an update pipeline of an object-relational mapping (ORM) platform.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.