Benefiting from disorder: source coding for unordered data
Files
First author draft
Date
2018
DOI
Authors
Varshney, Lav R.
Goyal, Vivek K.
Version
First author draft
OA Version
Citation
Lav R Varshney, Vivek K Goyal. "Benefiting from Disorder: Source Coding for Unordered Data."
Abstract
The order of letters is not always relevant in a communication task. This paper discusses the implications of order irrelevance on source coding, presenting results in several major branches of source coding theory: lossless coding, universal lossless coding, rate-distortion, high-rate quantization, and universal lossy coding. The main conclusions demonstrate that there is a significant rate savings when order is irrelevant. In particular, lossless coding of n letters from a finite alphabet requires Theta(log n) bits and universal lossless coding requires n + o(n) bits for many countable alphabet sources. However, there are no universal schemes that can drive a strong redundancy measure to zero. Results for lossy coding include distribution-free expressions for the rate savings from order irrelevance in various high-rate quantization schemes. Rate-distortion bounds are given, and it is shown that the analogue of the Shannon lower bound is loose at all finite rates.