Revision #1 Authors: Eldar Fischer, Frederic Magniez, Michel de Rougemont

Accepted on: 11th April 2005 00:00

Downloads: 3371

Keywords:

We investigate property testing and related questions, where instead

of the usual Hamming and edit distances between input strings, we consider

the more relaxed edit distance with moves.

Using a statistical embedding of words which has similarities with the Parikh mapping,

we first construct a tolerant tester for the equality of two words, whose complexity is independent of the string size,

and we derive an approximation algorithm for the normalized edit distance with moves.

We then consider the question of testing if a string is a member

of a given language.

We develop a method to compute, in polynomial time in the representation,

a geometric approximate description of a regular language by a finite union of polytopes. As an application, we have a new tester for regular languages

given by their nondeterministic finite automaton (or regular expressions),

whose complexity does not depend on the automaton, except for a polynomial time preprocessing step.

Furthermore, this method allows us to compare languages and validates the new

notion of equivalent testing that we introduce. Using the

geometrical embedding we can

distinguish between a pair of automata that

compute the same language, and a pair of automata whose

languages are not $\eps$-equivalent in an appropriate sense.

Our equivalence tester is deterministic and has polynomial time complexity,

whereas the non-approximated version is PSPACE-complete.

Last, we extend the geometric embedding, and hence the tester algorithms,

to infinite regular languages and to context-free grammars as well.

For context-free grammars the equivalence test

has now exponential time complexity, but in comparison,

the non-approximated version

is not even recursively decidable.

TR04-096 Authors: Eldar Fischer, Frederic Magniez, Michel de Rougemont

Publication: 11th November 2004 19:59

Downloads: 4385

Keywords:

Using a new statistical embedding of words which has similarities with the Parikh mapping, we first construct a tolerant tester for the equality of two words, whose complexity is independent of the string size, where the distance between inputs is measured by the normalized edit distance with moves. As a consequence we get an approximation algorithm for the normalized distance, the first such algorithm whose complexity does not depend on the string size.

Then we extend our embedding to languages, and get a geometrical approximate description of regular languages by finite unions of polytopes. As an application, we have a new tester for regular languages whose complexity does not depend on the automaton. The automaton is only required in a preprocessing step, whose time is polynomial in the automaton size for a fixed threshold distance. The remaining complexity is a constant depending on the threshhold distance but not on the automaton.

Last, we introduce the notion of equivalence testing. Using the above geometrical description, we exhibit an equivalence tester for regular languages. The tester is deterministic and of polynomial time, for a fixed threshold distance. In contrast, the problem of deciding the exact equivalence of finite automata requires exponential space.