ECCC-Report TR18-141https://eccc.weizmann.ac.il/report/2018/141Comments and Revisions published for TR18-141en-usWed, 05 Dec 2018 08:18:23 +0200
Revision 1
| Local Decodability of the Burrows-Wheeler Transform |
Omri Weinstein,
Sandip Sinha
https://eccc.weizmann.ac.il/report/2018/141#revision1The Burrows-Wheeler Transform (BWT) is among the most influential discoveries in text compression and DNA storage. It is a reversible preprocessing step that rearranges an $n$-letter string into runs of identical characters (by exploiting context regularities), resulting in highly compressible strings, and is the basis of the \texttt{bzip} compression program. Alas, the decoding process of BWT is inherently sequential and requires $\Omega(n)$ time even to retrieve a \emph{single} character.
We study the succinct data structure problem of locally decoding short substrings of a given text under its \emph{compressed} BWT, i.e., with small additive redundancy $r$ over the \emph{Move-To-Front} (\texttt{bzip}) compression. The celebrated BWT-based FM-index (FOCS '00), as well as other related literature, yield a trade-off of $r=\tilde{O}(n/\sqrt{t})$ bits, when a single character is to be decoded in $O(t)$ time. We give a near-quadratic improvement $r=\tilde{O}(n\lg(t)/t)$. As a by-product, we obtain an \emph{exponential} (in $t$) improvement on the redundancy of the FM-index for counting pattern-matches on compressed text. In the interesting regime where the text compresses to $n^{1-o(1)}$ bits, these results provide an $\exp(t)$ \emph{overall} space reduction. For the local decoding problem of BWT, we also prove an $\Omega(n/t^2)$ cell-probe lower bound for ``symmetric" data structures.
We achieve our main result by designing a compressed partial-sums (Rank) data structure over BWT. The key component is a \emph{locally-decodable} Move-to-Front (MTF) code: with only $O(1)$ extra bits per block of length $n^{\Omega(1)}$, the decoding time of a single character can be decreased from $\Omega(n)$ to $O(\lg n)$. This result is of independent interest in algorithmic information theory.Wed, 05 Dec 2018 08:18:23 +0200https://eccc.weizmann.ac.il/report/2018/141#revision1
Paper TR18-141
| Local Decodability of the Burrows-Wheeler Transform |
Omri Weinstein,
Sandip Sinha
https://eccc.weizmann.ac.il/report/2018/141The Burrows-Wheeler Transform (BWT) is among the most influential discoveries in text compression and DNA storage. It is a \emph{reversible} preprocessing step that rearranges an $n$-letter string into runs of identical characters (by exploiting context regularities), resulting in highly compressible strings, and is the basis for the ubiquitous \texttt{bzip} program. Alas, the decoding process of BWT is inherently sequential and requires $\Omega(n)$ time even to retrieve a \emph{single} character.
We study the succinct data structure problem of locally decoding short substrings of a given text under its \emph{compressed} BWT, i.e., with small redundancy $r$ over the \emph{Move-To-Front} based (\texttt{bzip}) compression. The celebrated BWT-based FM-index (FOCS '00), and other related literature, gravitate toward a tradeoff of $r=\tilde{O}(n/\sqrt{t})$ bits, when a single character is to be decoded in $O(t)$ time. We give a near-quadratic improvement $r=\tilde{O}(n\cdot \lg t/t)$. As a by-product, we obtain an \emph{exponential} (in $t$) improvement on the redundancy of the FM-index for counting pattern-matches on compressed text. In the interesting regime where the text compresses to $n^{1-o(1)}$ bits, these results provide an $\exp(t)$ \emph{overall} space reduction. For the local decoding problem, we also prove an $\Omega(n/t^2)$ cell-probe lower bound for ``symmetric" data structures.
We achieve our main result by designing a compressed Rank (partial-sums) data structure over BWT. The key component is a locally-decodable Move-to-Front (MTF) code: with only $O(1)$ extra bits per block of length $n^{\Omega(1)}$, the decoding time of a single character can be decreased from $\Omega(n)$ to $O(\lg n)$. This result is of independent interest in algorithmic information theory. Mon, 13 Aug 2018 01:54:47 +0300https://eccc.weizmann.ac.il/report/2018/141