TR95-022 Authors: Marek Karpinski, Wojciech Rytter, Ayumi Shinohara

Publication: 2nd May 1995 17:33

Downloads: 1895

Keywords:

We consider strings which are succinctly described. The description

is in terms of straight-line programs in which the constants are

symbols and the only operation is the concatenation. Such

descriptions correspond to the systems of recurrences or to

context-free grammars generating single words. The descriptive

size of a string is the length $n$ of a straight-line program (or

size of a grammar) which defines this string. Usually the strings

of descriptive size $n$ are of exponential length.

{\em Fibonacci \/} and {\em Thue-Morse words \/} are examples of

such strings. We show that for a pattern $P$ and text $T$ of

descriptive sizes $m, n$, an occurrence of $P$ in $T$ can be found

(if there is any) in time polynomial with respect to $n$. This is

nontrivial, since the actual lengths of $P$ and $T$ could be

exponential, and none of the known string-matching algorithms is

directly applicable. Our first tool is the periodicity lemma, which

allows to represent some sets of exponentially many positions in

terms of feasibly many arithmetic progressions. The second tool is

arithmetics: a simple application of Euclid algorithm. Hence a

textual problem for exponentially long strings is reduced here to

simple arithmetics on integers with (only) linearly many bits. We

present also an NP-complete version of the pattern-matching for

shortly described strings.