[논문]프라이버시를 보호하는 분산 기계 학습 연구 동향

이민섭; 신영아; 천지영

doi:10.3745/tkips.2024.13.2.76

프라이버시를 보호하는 분산 기계 학습 연구 동향
Systematic Research on Privacy-Preserving Distributed Machine Learning 원문보기

The Transactions of the Korea Information Processing Society, v.13 no.2, 2024년, pp.76 - 90

이민섭 (고려대학교 정보보호대학원) , 신영아 (고려대학교 정보보호대학원) , 천지영 (서울사이버대학교 빅데이터.정보보호학과)

초록
AI-Helper

인공지능 기술은 스마트 시티, 자율 주행, 의료 분야 등 다양한 분야에서 활용 가능성을 높이 평가받고 있으나, 정보주체의 개인정보 및 민감정보의 노출 문제로 모델 활용이 제한되고 있다. 이에 따라 데이터를 중앙 서버에 모아서 학습하지 않고, 보유 데이터셋을 바탕으로 일차적으로 학습을 진행한 후 글로벌 모델을 최종적으로 학습하는 분산 기계 학습의 개념이 등장하였다. 그러나, 분산 기계 학습은 여전히 협력하여 학습을 진행하는 과정에서 데이터 프라이버시 위협이 발생한다. 본 연구는 분산 기계 학습 연구 분야에서 프라이버시를 보호하기 위한 연구를 서버의 존재 유무, 학습 데이터셋의 분포 환경, 참여자의 성능 차이 등 현재까지 제안된 분류 기준들을 바탕으로 유기적으로 분석하여 최신 연구 동향을 파악한다. 특히, 대표적인 분산 기계 학습 기법인 수평적 연합학습, 수직적 연합학습, 스웜 학습에 집중하여 활용된 프라이버시 보호 기법을 살펴본 후 향후 진행되어야 할 연구 방향을 모색한다.

Abstract ▼ AI-Helper

Although artificial intelligence (AI) can be utilized in various domains such as smart city, healthcare, it is limited due to concerns about the exposure of personal and sensitive information. In response, the concept of distributed machine learning has emerged, wherein learning occurs locally before training a global model, mitigating the concentration of data on a central server. However, overall learning phase in a collaborative way among multiple participants poses threats to data privacy. In this paper, we systematically analyzes recent trends in privacy protection within the realm of distributed machine learning, considering factors such as the presence of a central server, distribution environment of the training datasets, and performance variations among participants. In particular, we focus on key distributed machine learning techniques, including horizontal federated learning, vertical federated learning, and swarm learning. We examine privacy protection mechanisms within these techniques and explores potential directions for future research.

주제어

표/그림 (9)

그림 Fig. 1. Fuctional Encryption
그림 Fig. 2. Privacy Preserving Record Linkage
그림 Fig. 3. Private Set Intersection
그림 Fig. 4. Type of Network Topology
그림 Fig. 5. Classfication of Federated Learning
그림 Fig. 6. Classification of Distribution Types in Training Datasets
표 Table 1. Horizontal Federated Learning
표 Table 2. FederatedAveraging Algorithm
표 Table 3. Vertical Federated Learning

참고문헌 (76)

A. Shamir, "How to share a secret," Communications of the ACM, Vol.22, No.11, pp.612-613, 1979.？

상세보기
W. Diffie and M. E. Hellman, "New directions in cryptography," Democratizing Cryptography: The Work of Whitfield Diffie and Martin Hellman, pp.365-390, 2022.？
P. Paillier, ''Public-key cryptosystems based on composite degree residuosity classes,'' in International Conference on the Theory and Applications of Cryptographic Techniques, pp.223-238, 1999.？
Q. Li, Z. Wen, Z. Wu, S. Hu, N. Wang, and Y. Li, "A survey on federated learning systems: Vision, hype and reality for data privacy and protection," IEEE Transactions on Knowledge and Data Engineering, 2021.？
"What is Data Cleansing?" [Internet], https://aws.amazon.com/ko/what-is/data-cleansing/？
L. Ma, Q. Pei, L. Zhou, H. Zhu, L. Wang, and Y. Ji, "Federated Data Cleaning: Collaborative and Privacy-Preserving Data Cleaning for Edge Intelligence," in IEEE Internet of Things Journal, Vol.8, No.8, pp.6757-6770, 2021. doi: 10.1109/JIOT.2020.3027980.？

상세보기
A. Koufakou, E. G. Ortiz, M. Georgiopoulos, G. C. Anagnostopoulos, and K. M. Reynolds, "A scalable and efficient outlier detection strategy for categorical data," in 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Vol.2, pp.210-217, 2007.？
S. D. Bay and M. Schwabacher, "Mining distance-based outliers in near linear time with randomization and a simple pruning rule," in Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.29-38, 2003.？
F. Jiang, G. Liu, J. Du, and Y. Sui, "Initialization of K-modes clustering using outlier detection techniques," Information Sciences, Vol.332, pp.167-183, 2016.？

상세보기
M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, "LOF: Identifying density-based local outliers," Proceedings of the 2000 ACM SIGMOD international conference on Management of data, Vol.29, No.2, pp.93-104, 2000.？
A. Arasu, M. Gotz, and R. Kaushik, "On active learning of record matching packages," in Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp.783-794, 2010.？
S. Mudgal et al., "Deep learning for entity matching: A design space exploration," in Proceedings of the 2018 International Conference on Management of Data, pp.19-34, 2018.？
T. Rekatsinas, X. Chu, I. F. Ilyas, and C. Re, "Holoclean: Holistic data repairs with probabilistic inference," Proceeding VLDB Endowment, Vol.10, No.11, pp.1190-1201, 2017.？
M. Yakout, L. Berti-Equille, and A. K. Elmagarmid, "Don't be SCAREd: Use SCalable automatic REpairing with maximal likelihood and bounded changes," in Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp.553-564, 2013.？
M. Yakout, A. K. Elmagarmid, J. Neville, M. Ouzzani, and I. F. Ilyas, "Guided data repair," Proceeding VLDB Endowment, Vol.4, No.5, pp.279-289, 2011.？
S. Krishnan, J. Wang, M. J. Franklin, K. Goldberg, and T. Kraska, "PrivateClean: Data cleaning and differential privacy," in Proceedings of the 2016 International Conference on Management of Data, pp.937-951, 2016.？
R. A. Popa, C. Redfield, N. Zeldovich, and H. Balakrishnan, "CryptDB: Protecting confidentiality with encrypted query processing," in Proceedings of the twenty-third ACM symposium on operating systems principles, pp.85-100, 2011.？
P. Mohassel and Y. Zhang, "SecureML: A system for scalable privacypreserving machine learning," in IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, pp.19-38, 2017.？
D. Demmler, T. Schneider, and M. Zohner, "Aby-a framework for efficient mixed-protocol secure two-party computation," in Network and Distributed System Security (NDSS), pp.59, 2015.？
H. L. Dunn, "Record linkage," American Journal of Public Health Nations Health, Vol.36, No.12, pp.1412-1416, 1946.？
I. P. Fellegi and A. B. Sunter, "A Theory for Record Linkage", Journal of the American Statistical Association, Vol.64, No.328, pp.1183-1210, 1969.？

상세보기
가명정보결합종합지원시스템 [Internet], https://link.privacy.go.kr/nadac/organ/introData.do？
D. Vatsalan, Z. Sehili, P. Christen, and E. Rahm, "Privacy-Preserving Record Linkage for Big Data: Current Approaches and Research Challenges," In: Zomaya, A., Sakr, S. (eds) Handbook of Big Data Technologies. Springer, Cham. 2017. https://doi.org/10.1007/978-3-319-49340-4_25？
A. Gkoulalas-Divanis, D. Vatsalan, D. Karapiperis, and M. Kantarcioglu, "Modern privacy-preserving record linkage techniques: An overview," in IEEE Transactions on Information Forensics and Security, Vol.16, pp.4966-4987, 2021. doi: 10.1109/TIFS.2021.3114026？

상세보기
S. Gomatam, R. Carter, M. Ariet, and G. Mitchell, "An empirical comparison of record linkage procedures," Statistics in Medicine, Vol.21, No.10, pp.1485-1496, 2002.？

상세보기
Peter Christen, "Data matching: concepts and techniques for record linkage, entity resolution, and duplicate detection," Springer Science & Business Media, 2012.？
A. P. Brown, C. Borgs, S. M. Randall, and R. Schnell, "Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets," BMC Medical Informatics and Decision Making, Vol.17, pp.1-7, 2017. https://doi.org/10.1186/s12911-017-0478-5？

상세보기
I. Lazrig, T. C. Ong, I. Ray, I. Ray, X. Jiang, and J. Vaidya, "Privacy preserving probabilistic record linkage without trusted third party," in 2018 16th Annual Conference on Privacy, Security and Trust (PST), pp.1-10, 2018.？
B. H. Bloom, "Space/time trade-offs in hash coding with allowable errors," Communications of the ACM, Vol.13, No.7, pp.422-426, 1970.？

상세보기
R. Schnell, T. Bachteler, and J. Reiher, "A novel error-tolerant anonymous linking code," Social Science Research Network, WP-GRLC-2011-02, 2011.？
Christine M. O'Keefe, Ming Yung, Lifang Gu, and Rohan Baxter. 2004. "Privacy-preserving data linkage protocols," In Proceedings of the 2004 ACM Workshop on Privacy in the Electronic Society (WPES '04). Association for Computing Machinery,NY,USA,94-102. https://doi.org/10.1145/1029179.1029203？
S. B. Dusetzina, S. Tyree, A.-M. Meyer, A. Meyer, L. Green, and W. R. Carpenter, "An Overview of Record Linkage Methods," 2014.？
S. B. Johnson, G. Whitney, M. McAuliffe, H. Wang, E. McCreedy, L. Rozenblit, and C. C. Evans, "Using global unique identifiers to link autism collections," Journal of the American Medical Informatics Association, Vol.17, No.6, pp.689-695, 2010.？

상세보기
A. Inan, M. Kantarcioglu, G. Ghinita, and E. Bertino, "Private record matching using differential privacy," in Proceeding EDBT, pp.123-134, 2010.？
M. Kuzu, M. Kantarcioglu, A. Inan, E. Bertino, E. Durham, and B. Malin, "Efficient privacy-aware record integration," in Proceeding EDBT, Genoa, Italy, pp.167-178, 2013.？
A. L. Potosky, G. F. Riley, J. D. Lubitz, R. M. Mentnech, and L. G. Kessler, "Potential for cancer related health services research using a linked Medicare-tumor registry database," Medical Care, Vol.31, No.8, pp.732-748, 1993.？
S. J. Grannis, J. M. Overhage, and C. J. McDonald, "Analysis of identifier performance using a deterministic linkage algorithm," Proceeding of AMIA Symposium, pp.305-309, 2002.？
B. McMahan, E. Moore, D. Ramage, S. Hampson, and y Arcas, "Communication-efficient learning of deep networks from decentralized data," Artificial Intelligence and Statistics, Vol.54, 2017.？
S. Hardy, W. Henecka, H. Ivey-Law, R. Nock, G. Patrini, G. Smith, and B. Thorne, "Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption," arXiv preprint arXiv:1711.10677, 2017.？
R. Xu, N. Baracaldo, Y. Zhou, A. Anwar, J. Joshi, and H. Ludwig, "Fedv: Privacy-preserving federated learning over vertically partitioned data," Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security, 2021.？
D. Romanini, A. J. Hall, P. Papadopoulos et al., "Pyvertical: A vertical federated learning framework for multi-headed splitnn," arXiv:2104.00489, 2021.？
S. Stammler et al., "Mainzelliste SecureEpiLinker (MainSEL): Privacypreserving record linkage using secure multi-party computation," Bioinformatics, Vol.2020, pp.1-12, 2020.？
A, Southwell et al., "Validating a novel deterministic privacy-preserving record linkage between administrative & clinical data: applications in stroke research," International Journal of Population Data Science, Vol.7, No.4, pp.1755, 2022. doi: 10.23889/ijpds.v7i4.1755. PMID: 37152407; PMCID: PMC10161965.？

상세보기
D. Morales, I. Agudo, and J. Lopez, "Private set intersection: A systematic literature review," Computer Science Review, Vol.49, pp.100567, 2023, https://doi.org/10.1016/j.cosrev.2023.100567.？

상세보기
A. Adir, E. Aharoni, N. Drucker, E. Kushnir, R. Masalha, M. Mirkin and O. Soceanu, "Privacy-preserving record linkage using local sensitive hash and private set intersection," ArXiv:2203.14284v1, 2022.？
B. McMahan, E. Moore, D. Ramage, S. Hampson, and y Arcas, "Communication-efficient learning of deep networks from decentralized data," Artificial Intelligence and Statistics, PMLR, 2017.？
K. Bonawitz et al., "Practical secure aggregation for privacy-preserving machine learning," Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017.？
S. Truex, "A hybrid approach to privacy-preserving federated learning," Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, 2019.？
R. Xu, N. Baracaldo, Y. Zhou, A. Anwar and H. Ludwig, "Hybridalpha: An efficient approach for privacy-preserving federated learning," Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, 2019.？
J. Zhang, B. Chen, S. Yu, and H. Deng, "PEFL: A privacy-enhanced federated learning scheme for big data analytics," 2019 IEEE Global Communications Conference (GLOBECOM), IEEE, 2019.？
C. Zhang, S. Li, J. Xia, and W. Wang, "{BatchCrypt}: Efficient homomorphic encryption for {Cross-Silo} federated learning," 2020 USENIX Annual Technical Conference (USENIX ATC 20), 2020.？
G. Xu, H. Li, S. Liu, K. Yang and X. Lin, "Verifynet: Secure and verifiable federated learning," IEEE Transactions on Information Forensics and Security, Vol.15, pp.911-926, 2019.？
X. Guo et al., "VeriFL: Communication-Efficient and Fast Verifiable Aggregation for Federated Learning," IEEE Transactions on Information Forensics and Security, Vol.16, pp.1736-1751, 2020.？
H. Fereidooni et al., "SAFELearn: Secure aggregation for private federated learning," 2021 IEEE Security and Privacy Workshops (SPW), IEEE, 2021.？
J. Park and H. Lim, "Privacy-preserving federated learning using homomorphic encryption," Applied Sciences, Vol.12, No.2, pp.734, 2022.？
Y. A. Shin, G. Noh, I. R. Jeong, and J. Y. Chun, "Securing a local training dataset size in federated learning," IEEE Access, Vol.10, pp.104135-104143, 2022.？

상세보기
J. Ma, SA. Naas, S. Sigg, and X. Lyu, "Privacy-preserving federated learning based on multi-key homomorphic encryption," International Journal of Intelligent Systems, Vol.37, No.9, pp.5880-5901, 2022.？

상세보기
Y. Cheng, Y. Liu, T. Chen, and Q. Yang, "Federated learning for privacy-preserving AI," Communications of the ACM, Vol.63, No.12, pp.33-36, 2020.？

상세보기
M. G. Poirot, P. Vepakomma, K. Chang, J. K.Cramer, R. Gupta, and R. Raskar, "Split Learning for collaborative deep learning in healthcare," NeurIPS, 2019.？
B. McMahan and D. Ramage, Google Research, Apr. 2017, [Online] Available: https://blog.research.google/2017/04/federated-learning-collaborative.html？
A. Hard et al., "Federated learning for mobile keyboard prediction," arXiv preprint arXiv:1811.03604, 2018.？
A. Gascon, P. Schoppmann, B. Balle, M. Raykova, J. Doemer, S. Zahur and D. Evans, "Secure linear regression on vertically partitioned datasets," International Association for Cryptologic Research Cryptology ePrint Archive, 892, 2016.？
K. Yang, T. Fan, T. Chen, Y. Shi, and Q. Yang, "A quasi-newton method based vertical federated learning framework for logistic regression," arXiv preprint arXiv:1912.00513, 2019.？
B. Gu, Z. Dang, X. Li, and H. Huang, "Federated doubly stochastic kernel learning for vertically partitioned data," Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020.？
T. Chen, X. Jin, Y. Sun, and W. Yin, "Vafl: a method of vertical asynchronous federated learning," arXiv preprint arXiv:2007.06081, 2020.？
C. Wang, J. Liang, M. Huang, B. Bai, K. Bai, and H. Li, "Hybrid differentially private federated learning on vertically partitioned data," arXiv preprint arXiv:2009.02763, 2020.？
K. Cheng et al., "Secureboost: A lossless federated learning framework," IEEE Intelligent Systems, Vol.36, No.6, pp.87-98, 2021.？

상세보기
Q. Zhang, B. Gu, C. Deng, and H. Huang, "Secure bilevel asynchronous vertical federated learning with backward updating," Proceedings of the AAAI Conference on Artificial Intelligence. Vol.35, No.12, 2021.？
S. Warnat-Herresthal et al., "Swarm Learning for decentralized and confidential clinical machine learning," Nature, Vol.594, pp.265-270, 2021.？
O. L. Saldanha et al., "Swarm learning for decentralized artificial intelligence in cancer histopathology," Nature Medicine, Vol.28, No.6, pp.1232-1239, 2022.？

상세보기
H. Basak, R. Kundu, PK. Singh, MF. Ijaz, M. Wozniak, and R. Sarkar, "A union of deep learning and swarm-based optimization for 3D human action recognition," Scientific Reports, Vol.12, No.1, pp.5494, 2022.？
F. Wang, X. Wang, and S. Sun, "A reinforcement learning level-based particle swarm optimization algorithm for large-scale optimization," Information Sciences, Vol.602, pp.298-312, 2022.？

상세보기
M. Al-Rubaie and J. M. Chang, "Privacy-preserving machine learning: Threats and solutions," IEEE Security & Privacy, Vol.17, No.2, pp.49-58, 2019.？

상세보기
R. Xu, N. Baracaldo, and J. Joshi. "Privacy-preserving machine learning: Methods, challenges and directions," arXiv preprint arXiv:2108.04417, 2021.？
G. A. Kaissis, Kaissis, M. R. Makowski, D. Ruckert, and R. F. Braren, "Secure, privacy-preserving and federated machine learning in medical imaging," Nature Machine Intelligence, Vol.2, No.6, pp.305-311, 2020.？

상세보기
A. Lau, and J. Passerat-Palmbach. "Statistical privacy guarantees of machine learning preprocessing techniques," arXiv preprint arXiv:2109.02496, 2021.

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증