The filter-placement problem and its application to content de-duplication
MetadataShow full item record
CitationBestavros, Azer; Erdos, Dora; Ishakian, Vatche; Lapets, Andrei; Terzi, Evimaria. "The Filter-Placement Problem and its Application to Content De-Duplication", Technical Report BUCS-TR-2011-005, Computer Science Department, Boston University, February 21, 2011. [Available from: http://hdl.handle.net/2144/11362]
In many information networks, data items such as updates in social networks, news flowing through interconnected RSS feeds and blogs, measurements in sensor networks, route updates in ad-hoc networks, etc. propagate in an uncoordinated manner: nodes often relay information they receive to neighbors, independent of whether or not these neighbors received such information from other sources. This uncoordinated data dissemination may result in significant, yet unnecessary communication and processing overheads, ultimately reducing the utility of information networks. To alleviate the negative impacts of this information multiplicity phenomenon, we propose that a subset of nodes (selected at key positions in the network) carry out additional information de-duplication functionality namely, the removal (or significant reduction) of the duplicative data items relayed through them. We refer to such nodes as filters. We formally define the Filter Placement problem as a combinatorial optimization problem, and study its computational complexity for different types of graphs. We also present polynomial-time approximation algorithms for the problem. Our experimental results, which we obtained through extensive simulations on synthetic and real-world information flow networks, suggest that in many settings a relatively small number of filters is fairly effective in removing a large fraction of duplicative information.