Wang, Jinpeng
(Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China)
,
Wang, Yang
(Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China)
,
Wang, Hekang
(Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China)
,
Ye, Kejiang
(Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China)
,
Xu, Chengzhong
(Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China)
,
He, Shuibing
(Zhejiang University, China)
,
Zeng, Lingfang
(Huazhong University of Science and Technology, China)
In this paper, we design an efficient deduplication algorithm based on the distributed storage architecture of Ceph. The algorithm uses on-line block-level data deduplication technology to complete data slicing, which neither affects the data storage process in Ceph nor alter other interfaces and fu...
In this paper, we design an efficient deduplication algorithm based on the distributed storage architecture of Ceph. The algorithm uses on-line block-level data deduplication technology to complete data slicing, which neither affects the data storage process in Ceph nor alter other interfaces and functions in Ceph. Without relying on any central node, the algorithm maintains the characteristics of Ceph by designing a special hash object to store the data fingerprint, and uses the CRUSH algorithm to judge the data duplication based on calculation, instead of global search. The algorithm replaces the duplicate data with the deduplicated objects, which storage their fingerprints with less storage space. We compare the effects of different block sizes with respect to the performance and deduplication rates through experimental studies, and select the most appropriate block size in our prototype implementation. The experimental results show that the algorithm can not only effectively save the storage space but also improve the bandwidth utilization when reading and writing the duplicate data.
In this paper, we design an efficient deduplication algorithm based on the distributed storage architecture of Ceph. The algorithm uses on-line block-level data deduplication technology to complete data slicing, which neither affects the data storage process in Ceph nor alter other interfaces and functions in Ceph. Without relying on any central node, the algorithm maintains the characteristics of Ceph by designing a special hash object to store the data fingerprint, and uses the CRUSH algorithm to judge the data duplication based on calculation, instead of global search. The algorithm replaces the duplicate data with the deduplicated objects, which storage their fingerprints with less storage space. We compare the effects of different block sizes with respect to the performance and deduplication rates through experimental studies, and select the most appropriate block size in our prototype implementation. The experimental results show that the algorithm can not only effectively save the storage space but also improve the bandwidth utilization when reading and writing the duplicate data.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.