IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0494449
(2012-06-12)
|
등록번호 |
US-9659042
(2017-05-23)
|
발명자
/ 주소 |
- Puri, Colin A.
- Kim, Doo Soon
- Yeh, Peter Z.
- Verma, Kunal
|
출원인 / 주소 |
- ACCENTURE GLOBAL SERVICES LIMITED
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
5 인용 특허 :
4 |
초록
▼
A data lineage tracking system may include a memory storing a module comprising machine readable instructions to obtain trace log entries representing an interaction with, a manipulation of, and/or a creation of a data value. The data lineage tracking system may further include machine readable inst
A data lineage tracking system may include a memory storing a module comprising machine readable instructions to obtain trace log entries representing an interaction with, a manipulation of, and/or a creation of a data value. The data lineage tracking system may further include machine readable instructions to select the trace log entries that are associated with commands performed by an application, cluster similar trace log entries from the selected trace log entries, and analyze mappings between the clustered trace log entries to determine data lineage flow associated with the data value.
대표청구항
▼
1. A data lineage tracking system comprising: a processor; anda memory storing machine readable instructions that when executed by the processor cause the processor to: obtain trace log entries representing at least one of an interaction with a data value, a manipulation of the data value, and a cre
1. A data lineage tracking system comprising: a processor; anda memory storing machine readable instructions that when executed by the processor cause the processor to: obtain trace log entries representing at least one of an interaction with a data value, a manipulation of the data value, and a creation of the data value;select, from the obtained trace log entries, trace log entries that are associated with commands performed by an application;cluster similar trace log entries from the selected trace log entries;measure variability of temporal differences between the trace log entries in cluster pairs by calculating entropy of the temporal differences to determine a consistency of the temporal differences, wherein the entropy represents a measure of uncertainty associated with the temporal differences,a relatively high entropy score represents a high variation in the temporal differences, anda relatively low entropy score represents a low variation in the temporal differences;map a command-timestamp pair, (s1, t1), for a cluster c1 to another command-timestamp pair, (s2, t2), for a cluster c2, when there does not exist a s1′ in cluster c1 such that |t1′−t2|<|t1−t2|, and there does not exist a s2′ in cluster c2 such that |t1′−t2|<|t1−t2|, wherein the s1 is a trace log entry command from the cluster c1 and the t1 is a timestamp for the trace log entry command s1, the s1′ is a trace log entry command from the cluster c1 and the t1′ is a timestamp for the trace log entry command s1′, the s2 is a trace log entry command from the cluster c2 and the t2 is a timestamp for the trace log entry command s2, and the s2′ is a trace log entry command from the cluster c2;analyze the mappings between the clustered trace log entries to determine data lineage flow associated with the data value by identifying each cluster of a plurality of clusters for which an entropy falls below a predetermined entropy threshold, wherein entropies below the predetermined entropy threshold represent a low entropy, andconstructing a cluster chain including clusters with the low entropies to generate the data lineage flow;determine data value lineage by determining a first command associated with at least one of an interaction with, a manipulation of, and a creation of the data value,determining a second command associated with at least one of an interaction with and a manipulation of the data value, andlinking the second command to the first command;determine, based on the data value lineage associated with the data value, whether the data value is authentic; andin response to a determination that the data value is authentic, generate, based on the data value, a report with respect to different systems associated with the data value and the application. 2. The data lineage tracking system of claim 1, wherein the similar trace log entries are clustered based on at least one of a command type, a table name, and an attribute name. 3. The data lineage tracking system of claim 1, wherein the machine readable instructions to determine the data value lineage further comprise machine readable instructions that when executed by the processor further cause the processor to: link the second command to the first command by setting a reference value for the second command to a unique identification (ID) for the first command. 4. The data lineage tracking system of claim 1, further comprising machine readable instructions that when executed by the processor further cause the processor to: determine a reason for a command of the commands based on an analysis of an asset, a resource and the application registered with the data lineage tracking system, wherein the reason for the command is based on a historical analysis of interactions with the asset, the resource and the application. 5. The data lineage tracking system of claim 1, further comprising machine readable instructions that when executed by the processor further cause the processor to: identify an anomaly in the data value lineage based on a determination of whether a change in the data value exceeds a predetermined percentage. 6. The data lineage tracking system of claim 1, further comprising machine readable instructions that when executed by the processor further cause the processor to: generate a graph illustrating the data lineage flow identifying at least one of an asset, a resource and the application that have interacted with the data value. 7. The data lineage tracking system of claim 1, further comprising machine readable instructions that when executed by the processor further cause the processor to: receive calls from data sources, wherein the calls include structured query language (SQL) queries and NoSQL inserts and updates. 8. The data lineage tracking system of claim 1, further comprising machine readable instructions that when executed by the processor further cause the processor to: poll data sources for structured query language (SQL) queries and NoSQL inserts and updates. 9. A data lineage tracking system comprising: a processor; anda memory storing machine readable instructions that when executed by the processor cause the processor to: obtain trace log entries representing at least one of an interaction with a data value, a manipulation of the data value, and a creation of the data value;select, from the obtained trace log entries, trace log entries that are associated with commands performed by an application;cluster similar trace log entries from the selected trace log entries;measure variability of temporal differences between the trace log entries in cluster pairs by calculating entropy of the temporal differences to determine a consistency of the temporal differences, wherein the entropy represents a measure of uncertainty associated with the temporal differences,a relatively high entropy score represents a high variation in the temporal differences, anda relatively low entropy score represents a low variation in the temporal differences;map a command-timestamp pair, (s1, t1), for a cluster c1 to another command-timestamp pair, (s2, t2), for a cluster c2, when there does not exist a s1′ in cluster c1 such that |t1′−t2|<|t1−t2|, and there does not exist a s2′ in cluster c2 such that |t1′−t2|<|t1−t2|,wherein the s1 is a trace log entry command from the cluster c1 and the t1 is a timestamp for the trace log entry command s1, the s1′ is a trace log entry command from the cluster c1 and the t1′ is a timestamp for the trace log entry command s1′, the s2 is a trace log entry command from the cluster c2 and the t2 is a timestamp for the trace log entry command s2, and the s2′ is a trace log entry command from the cluster c2;analyze the mappings between the clustered trace log entries to determine data lineage flow associated with the data value by identifying each cluster of a plurality of clusters for which an entropy falls below a predetermined entropy threshold, wherein entropies below the predetermined entropy threshold represent a low entropy, andconstructing a cluster chain including clusters with the low entropies to generate the data lineage flow;determine data value lineage by determining a first command associated with at least one of an interaction with, a manipulation of, and a creation of the data value,determining a second command associated with at least one of an interaction with and a manipulation of the data value, andlinking the second command to the first command by setting a reference value for the second command to a unique identification (ID) for the first command;determine, based on the data value lineage associated with the data value, whether the data value is authentic; andin response to a determination that the data value is authentic, generate, based on the data value, a report with respect to different systems associated with the data value and the application. 10. The data lineage tracking system of claim 9, further comprising machine readable instructions that when executed by the processor further cause the processor to: identify an anomaly in the data value lineage based on a determination of whether a change in the data value exceeds a predetermined percentage. 11. The data lineage tracking system of claim 9, wherein the similar trace log entries are clustered based on at least one of a command type, a table name, and an attribute name. 12. The data lineage tracking system of claim 9, further comprising machine readable instructions that when executed by the processor further cause the processor to: generate a graph illustrating the data lineage flow identifying at least one of an asset, a resource and the application that have interacted with the data value. 13. The data lineage tracking system of claim 9, further comprising machine readable instructions that when executed by the processor further cause the processor to: receive calls from data sources, wherein the calls include structured query language (SQL) queries and NoSQL inserts and updates. 14. The data lineage tracking system of claim 9, further comprising machine readable instructions that when executed by the processor further cause the processor to: poll data sources for structured query language (SQL) queries and NoSQL inserts and updates. 15. A method for data lineage tracking, the method comprising: obtaining trace log entries representing at least one of an interaction with, a manipulation of, and a creation of a data value;selecting, from the obtained trace log entries, trace log entries that are associated with commands performed by an application;clustering similar trace log entries from the selected trace log entries;measuring variability of temporal differences between the trace log entries in cluster pairs by calculating entropy of the temporal differences to determine a consistency of the temporal differences, wherein the entropy represents a measure of uncertainty associated with the temporal differences,a relatively high entropy score represents a high variation in the temporal differences, anda relatively low entropy score represents a low variation in the temporal differences;mapping a command-timestamp pair, (s1, t1), for a cluster c1 to another command-timestamp pair, (s2, t2), for a cluster c2, when there does not exist a s1′ in cluster c1 such that |t1′−t2|<|t1−t2|, and there does not exist a s2′ in cluster c2 such that |t1′−t2|<|t1−t2|, wherein the s1is a trace log entry command from the cluster c1 and the t1 is a timestamp for the trace log entry command s1, the s1′ is a trace log entry command from the cluster c1 and the t1′ is a timestamp for the trace log entry command s1′, the s2 is a trace log entry command from the cluster c2 and the t2 is a timestamp for the trace log entry command s2, and the s2′ is a trace log entry command from the cluster c2;analyzing, by a processor, the mappings between the clustered trace log entries to determine data lineage flow associated with the data value by identifying each cluster of a plurality of clusters for which an entropy falls below a predetermined entropy threshold, wherein entropies below the predetermined entropy threshold represent a low entropy, andconstructing a cluster chain including clusters with the low entropies to generate the data lineage flow;determining a reason for a command of the commands based on an analysis of at least one of an asset, a resource and the application that performs the commands;determining data value lineage by determining a first command associated with at least one of an interaction with, a manipulation of, and a creation of the data value,determining a second command associated with at least one of an interaction with and a manipulation of the data value, andlinking the second command to the first command;determining, based on the data value lineage associated with the data value, whether the data value is authentic; andin response to a determination that the data value is authentic, generating, based on the data value, a report with respect to different systems associated with the data value and the application. 16. The method of claim 15, wherein linking the second command to the first command further comprises: linking the second command to the first command by setting a reference value for the second command to a unique identification (ID) for the first command. 17. The method of claim 16, further comprising: identifying an anomaly in the data value lineage based on a determination of whether a change in the data value exceeds a predetermined percentage. 18. A non-transitory computer readable medium having stored thereon machine readable instructions for data lineage tracking, the machine readable instructions when executed cause a computer system to: obtain trace log entries representing at least one of an interaction with, a manipulation of, and a creation of a data value;select, from the obtained trace log entries, trace log entries that are associated with commands performed by an application;cluster similar trace log entries from the selected trace log entries;measure variability of temporal differences between the trace log entries in cluster pairs by calculating entropy of the temporal differences to determine a consistency of the temporal differences, wherein the entropy represents a measure of uncertainty associated with the temporal differences,a relatively high entropy score represents a high variation in the temporal differences, anda relatively low entropy score represents a low variation in the temporal differences;map a command-timestamp pair, (s1, t1), for a cluster c1 to another command-timestamp pair, (s2, t2), for a cluster c2, when there does not exist a s1′ in cluster c1 such that |t1′−t2|<|t1−t2|, and there does not exist a s2′ in cluster c2 such that |t1′−t2|<|t1−t2|, wherein the s1is a trace log entry command from the cluster c1 and the t1 is a timestamp for the trace log entry command s1, the s1′ is a trace log entry command from the cluster c1 and the t1′ is a timestamp for the trace log entry command s1′, the s2 is a trace log entry command from the cluster c2 and the t2 is a timestamp for the trace log entry command s2, and the s2′ is a trace log entry command from the cluster c2;analyze, by a processor, the mappings between the clustered trace log entries to determine data lineage flow associated with the data value by identifying each cluster of a plurality of clusters for which an entropy falls below a predetermined entropy threshold, wherein entropies below the predetermined entropy threshold represent a low entropy, andconstructing a cluster chain including clusters with the low entropies to generate the data lineage flow;determine a reason for a command of the commands based on an analysis of at least one of an asset, a resource and the application that performs the commands;determine data value lineage by determining a first command associated with at least one of an interaction with, a manipulation of, and a creation of the data value,determining a second command associated with at least one of an interaction with and a manipulation of the data value, andlinking the second command to the first command;determine, based on the data value lineage associated with the data value, whether the data value is authentic; andin response to a determination that the data value is authentic, generate, based on the data value, a report with respect to different systems associated with the data value and the application. 19. The non-transitory computer readable medium of claim 18, wherein the similar trace log entries are clustered based on at least one of a command type, a table name, and an attribute name. 20. The non-transitory computer readable medium of claim 18, further comprising machine readable instructions that when executed cause the computer system to: identify an anomaly in the data value lineage based on a determination of whether a change in the data value exceeds a predetermined percentage.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.