Disclosed are a method and an apparatus for de-identification of personal information. The method for de-identification of personal information comprises the steps of: obtaining, from a database, a raw table including records in which raw data indicating the personal information is recorded; generat
Disclosed are a method and an apparatus for de-identification of personal information. The method for de-identification of personal information comprises the steps of: obtaining, from a database, a raw table including records in which raw data indicating the personal information is recorded; generating generalized data by generalizing the raw data recorded in each of the records included in the raw table; setting a generalized hierarchical model consisting of the raw data and the generalized data; generating a raw lattice including a plurality of candidate nodes on the basis of the generalized hierarchical model; and setting, from among the plurality of candidate nodes included in the raw lattice, a final lattice including at least one candidate node satisfying a predetermined criterion. Thus, it is possible for the personal information to be efficiently de-identified.
대표청구항▼
1. A personal information de-identification method performed by a personal information de-identification apparatus, the method comprising: acquiring an original table including records in which original data indicating personal information is recorded from a database;classifying respective records i
1. A personal information de-identification method performed by a personal information de-identification apparatus, the method comprising: acquiring an original table including records in which original data indicating personal information is recorded from a database;classifying respective records included in the original table based on attributes of the respective records, wherein the respective records are classified as one of classes of identifier (ID), quasi-identifier (QI), sensitive attribute (SA), and insensitive attribute (IA);generalizing the original data recorded in the respective records included in the original table based on generalization levels;setting up a generalization hierarchy model composed of the original data and the generalized data;generating an original lattice including a plurality of candidate nodes indicating tables, which indicate generalization levels for types of personal information, based on a hierarchical structure indicated by the generalization hierarchy model; andsetting up a final lattice including one or more candidate nodes which satisfy a preset requirement among the plurality of candidate nodes included in the original lattice. 2. The personal information de-identification method of claim 1, wherein the classifying respective records includes searching for personal information in the original table with regular expressions and setting up one of the classes for the respective records. 3. The personal information de-identification method of claim 1, wherein a de-identified table generated in the generalizing of the original data is generated based on K-anonymity, generated based on K-anonymity and L-diversity, or generated based on K-anonymity and T-closeness. 4. The personal information de-identification method of claim 3, wherein the preset requirement includes a preset suppression requirement, which indicates a ratio of equivalence classes which do not satisfy a preset K-anonymity to equivalence classes constituting the de-identified table. 5. The personal information de-identification method of claim 1, further comprising calculating a re-identification risk and a utility of a de-identified table corresponding to at least one final node included in the final lattice. 6. The personal information de-identification method of claim 1, further comprising masking some or all of original data in records indicated by the ID among the records included in the original table, or deleting original data in records indicated by the ID. 7. The personal information de-identification method of claim 1, wherein the setting up of the final lattice comprises: selecting one or more candidate nodes from among the plurality of candidate nodes included in the original lattice;generating de-identified tables by de-identifying the original table based on generalization levels indicated by the one or more candidate nodes;setting a candidate node corresponding to a de-identified table satisfying a preset suppression requirement to a final node; andsetting up the final lattice including the final node corresponding to the candidate node satisfying the preset requirement. 8. The personal information de-identification method of claim 1, wherein the ID indicates an equivalence class including a record in which original data indicating personal information whereby a specific individual is explicitly identified is recorded, the QI indicates an equivalence class including a record in which original data indicating personal information whereby a specific individual is inexplicitly identified is recorded, the SA indicates an equivalence class including a record in which original data indicating personal information having a sensitivity of a preset reference value or higher is recorded, and the IA indicates an equivalence class including a record in which original data indicating personal information having a lower sensitivity than SA is recorded. 9. A personal information de-identification method performed by a personal information de-identification apparatus, the method comprising: acquiring an original table including records in which original data indicating personal information is recorded from a database;searching for personal information in the original table with regular expressions;setting up respective records included in the original table as one of classes of identifier (ID), quasi-identifier (QI), sensitive attribute (SA), and insensitive attribute (IA) based on attributes of the respective records according to results of the searching; andgeneralizing the original data recorded in the respective records included in the original table based on generalization levels. 10. The personal information de-identification method of claim 9, wherein a de-identified table generated in the generalizing the original data is generated based on K-anonymity, generated based on K-anonymity and L-diversity, or generated based on K-anonymity and T-closeness. 11. The personal information de-identification method of claim 10, wherein the preset requirement includes a preset suppression requirement, which indicates a ratio of equivalence classes which do not satisfy a preset K-anonymity to equivalence classes constituting the de-identified table. 12. The personal information de-identification method of claim 9, further comprising setting up a generalization hierarchy model composed of the original data and the generalized data. 13. The personal information de-identification method of claim 12, further comprising generating an original lattice including a plurality of candidate nodes indicating tables, which indicate generalization levels for types of personal information, based on a hierarchical structure indicated by the generalization hierarchy model. 14. The personal information de-identification method of claim 13, further comprising setting up a final lattice including one or more candidate nodes which satisfy a preset requirement among the plurality of candidate nodes included in the original lattice. 15. The personal information de-identification method of claim 9, further comprising calculating a re-identification risk and a utility of a de-identified table corresponding to at least one final node included in the final lattice. 16. The personal information de-identification method of claim 9, further comprising masking some or all of original data in records indicated by the ID among the records included in the original table, or deleting original data in records indicated by the ID. 17. The personal information de-identification method of claim 9, wherein the setting up of the final lattice comprises: selecting one or more candidate nodes from among the plurality of candidate nodes included in the original lattice;generating de-identified tables by de-identifying the original table based on generalization levels indicated by the one or more candidate nodes;setting a candidate node corresponding to a de-identified table satisfying a preset suppression requirement to a final node; andsetting up the final lattice including the final node corresponding to the candidate node satisfying the preset requirement. 18. A personal information de-identification apparatus comprising: a processor; anda memory configured to store at least one command executed by the processor,wherein the at least one command is executable to: acquire an original table including records in which original data indicating personal information is recorded from a database;search for personal information in the original table on the basis of regular expressions;set up respective records included in the original table as one of classes of identifier (ID), quasi-identifier (QI), sensitive attribute (SA), and insensitive attribute (IA) based on attributes of the respective records according to results of the search; andgeneralize the original data recorded in the respective records included in the original table based on generalization levels. 19. The personal information de-identification apparatus of claim 18, wherein at least one command is further executable to: set up a generalization hierarchy model composed of the original data and the generalized data;generate an original lattice including a plurality of candidate nodes indicating tables, which indicate generalization levels for types of personal information, based on a hierarchical structure indicated by the generalization hierarchy model; andset up a final lattice including one or more candidate nodes which satisfy a preset requirement among the plurality of candidate nodes included in the original lattice. 20. The personal information de-identification apparatus of claim 18, wherein a de-identified table generated by generalizing the original data is generated based on K-anonymity, generated based on K-anonymity and L-diversity, or generated based on K-anonymity and T-closeness.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.