TR20-079 Authors: Hermann Gruber , Markus Holzer, Simon Wolfsteiner

Publication: 21st May 2020 17:33

Downloads: 735

Keywords:

Finite languages lie at the heart of literally every regular expression. Therefore, we investigate the approximation complexity of minimizing regular expressions without Kleene star, or, equivalently, regular expressions describing finite languages. On the side of approximation hardness, given such an expression of size~$s$, we prove that it is impossible to approximate the minimum size required by an equivalent regular expression within a factor of $O\left(\frac{s}{(\log s)^{2+\delta}}\right)$ if the running time is bounded by a quasipolynomial function depending on~$\delta$, for every $\delta>0$, unless the exponential time hypothesis (ETH) fails. For approximation ratio~$O(s^{1-\delta})$, we prove an exponential time lower bound depending on~$\delta$, assuming ETH. The lower bounds apply for alphabets of constant size. On the algorithmic side, we show that the problem can be approximated in polynomial time within~$O(\frac{s\log\log s}{\log s})$, with~$s$ being the size of the given regular expression. For constant alphabet size, the bound improves to~$O(\frac{s}{\log s})$. Finally, we devise a familiy of superpolynomial approximation algorithms that attain the performance ratios of the lower bounds, while their running times are only slightly above those excluded by the ETH.