We investigate property testing and related questions, where instead
of the usual Hamming and edit distances between input strings, we consider
the more relaxed edit distance with moves.
Using a statistical embedding of words which has similarities with the Parikh mapping,
we first construct a tolerant tester for the equality of two words, whose complexity is independent of the string size,
and we derive an approximation algorithm for the normalized edit distance with moves.
We then consider the question of testing if a string is a member
of a given language.
We develop a method to compute, in polynomial time in the representation,
a geometric approximate description of a regular language by a finite union of polytopes. As an application, we have a new tester for regular languages
given by their nondeterministic finite automaton (or regular expressions),
whose complexity does not depend on the automaton, except for a polynomial time preprocessing step.
Furthermore, this method allows us to compare languages and validates the new
notion of equivalent testing that we introduce. Using the
geometrical embedding we can
distinguish between a pair of automata that
compute the same language, and a pair of automata whose
languages are not $\eps$-equivalent in an appropriate sense.
Our equivalence tester is deterministic and has polynomial time complexity,
whereas the non-approximated version is PSPACE-complete.
Last, we extend the geometric embedding, and hence the tester algorithms,
to infinite regular languages and to context-free grammars as well.
For context-free grammars the equivalence test
has now exponential time complexity, but in comparison,
the non-approximated version
is not even recursively decidable.
Using a new statistical embedding of words which has similarities with the Parikh mapping, we first construct a tolerant tester for the equality of two words, whose complexity is independent of the string size, where the distance between inputs is measured by the normalized edit distance with moves. As a consequence we get an approximation algorithm for the normalized distance, the first such algorithm whose complexity does not depend on the string size.
Then we extend our embedding to languages, and get a geometrical approximate description of regular languages by finite unions of polytopes. As an application, we have a new tester for regular languages whose complexity does not depend on the automaton. The automaton is only required in a preprocessing step, whose time is polynomial in the automaton size for a fixed threshold distance. The remaining complexity is a constant depending on the threshhold distance but not on the automaton.
Last, we introduce the notion of equivalence testing. Using the above geometrical description, we exhibit an equivalence tester for regular languages. The tester is deterministic and of polynomial time, for a fixed threshold distance. In contrast, the problem of deciding the exact equivalence of finite automata requires exponential space.