LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Pushing Collaborative Data Deduplication to the Network Edge: An Optimization Framework and System Design

Photo by davidvives from unsplash

Edge computing has become a new computing paradigm with explosive growth in recent years. We consider the problem of pushing data deduplication to the network edge and propose a new… Click to show full abstract

Edge computing has become a new computing paradigm with explosive growth in recent years. We consider the problem of pushing data deduplication to the network edge and propose a new framework for distributed edge-facilitated deduplication (EF-dedup). Deduplication at the network edge allows us to exploit the high degree of geographic- and temporal-correlation in edge data to achieve space efficiency. By leveraging distributed computing power available on the edge in a collaborative fashion, the edge nodes can effectively suppress duplicated edge data, consuming considerably less space and WAN bandwidth. To this end, we partition the edge nodes into disjoint collaborative clusters, maintain a deduplication index structure across them using a distributed key-value store and perform deduplication within those clusters. However, this partitioning problem is very challenging and requires the optimization of a novel tradeoff: edge nodes with highly correlated data may not always be within the same edge cloud, with non-trivial network cost among them. We formulate a joint storage and network optimization problem with different design objectives, such as arbitrary partitioning and balanced partitioning of edge nodes. The problem is shown to be NP-Hard in general. Then, an optimization framework with efficient algorithms is developed and is proven to achieve a closed-form competitive ratio. Our experiments, performed on edge nodes in a corporate lab1 and a central cloud at AWS, demonstrate that EF-dedup achieves 67.4$\sim$133.7% better deduplication throughput than sole cloud-based techniques and achieves 20.0-62.6$\%$ lesser aggregate cost in terms of the network-storage trade-off as compared to approaches that solely favor one over the other.

Keywords: network edge; edge; deduplication; deduplication network; optimization

Journal Title: IEEE Transactions on Network Science and Engineering
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.