The filter-placement problem and its application to content de-duplication

Bestavros, Azer; Erdos, Dora; Ishakian, Vatche; Lapets, Andrei; Terzi, Evimaria

The filter-placement problem and its application to content de-duplication

Files

2011-005-filter-placement.pdf(295.24 KB)

Date

2011-02-21

Authors

Bestavros, Azer

Erdos, Dora

Ishakian, Vatche

Lapets, Andrei

Terzi, Evimaria

URI

https://hdl.handle.net/2144/11362

Citation

Bestavros, Azer; Erdos, Dora; Ishakian, Vatche; Lapets, Andrei; Terzi, Evimaria. "The Filter-Placement Problem and its Application to Content De-Duplication", Technical Report BUCS-TR-2011-005, Computer Science Department, Boston University, February 21, 2011. [Available from: http://hdl.handle.net/2144/11362]

Abstract

In many information networks, data items such as updates in social networks, news flowing through interconnected RSS feeds and blogs, measurements in sensor networks, route updates in ad-hoc networks, etc. propagate in an uncoordinated manner: nodes often relay information they receive to neighbors, independent of whether or not these neighbors received such information from other sources. This uncoordinated data dissemination may result in significant, yet unnecessary communication and processing overheads, ultimately reducing the utility of information networks. To alleviate the negative impacts of this information multiplicity phenomenon, we propose that a subset of nodes (selected at key positions in the network) carry out additional information de-duplication functionality namely, the removal (or significant reduction) of the duplicative data items relayed through them. We refer to such nodes as filters. We formally define the Filter Placement problem as a combinatorial optimization problem, and study its computational complexity for different types of graphs. We also present polynomial-time approximation algorithms for the problem. Our experimental results, which we obtained through extensive simulations on synthetic and real-world information flow networks, suggest that in many settings a relatively small number of filters is fairly effective in removing a large fraction of duplicative information.

Collections

CAS: Computer Science: Technical Reports

Full item page