Revision #3 Authors: Oded Goldreich, Dana Ron

Accepted on: 24th May 2023 10:15

Downloads: 0

Keywords:

We initiate a study of a new model of property testing that is a hybrid of testing properties of distributions and testing properties of strings.

Specifically, the new model refers to testing properties of distributions, but these are distributions over huge objects (i.e., very long strings).

Accordingly, the model accounts for the total number of local probes into these objects (resp., queries to the strings)as well as for the distance between objects (resp., strings).

Specifically, the distance between distributions is defined as the earth mover's distance with respect to the relative Hamming distance between strings.

We study the query complexity of testing in this new model, focusing on three directions.

First, we try to relate the query complexity of testing properties in the new model to the sample complexity of testing these properties in the standard distribution testing model.

Second, we consider the complexity of testing properties that arise naturally in the new model (e.g., distributions that capture random variations of fixed strings).

Third, we consider the complexity of testing properties that were extensively studied in the standard distribution testing model: Two such cases are uniform distributions

and pairs of identical distributions, where we obtain the following results.

\begin{itemize}

\item

Testing whether a distribution over $n$-bit long strings is uniform on some set of size $m$ can be tested with query complexity ${\widetilde O}(m/\epsilon^3)$, where $\epsilon>(\log_2m)/n$ is the proximity parameter.

\item

Testing whether two distribution over $n$-bit long strings that have support size at most $m$ are identical can be tested with query complexity ${\widetilde O}(m^{2/3}/\epsilon^3)$.

\end{itemize}

Both upper bounds are pretty tight; that is, for $\epsilon=\Omega(1)$, the first task requires $\Omega(m^c)$ queries for any $c<1$ and $n=\omega(\log m)$, whereas the second task requires $\Omega(m^{2/3})$ queries.

Note that the query complexity of the first task is higher than the sample complexity of the corresponding task in the standard distribution testing model, whereas in the case of the second task the bounds almost match.

Improved/extended introduction

Revision #2 Authors: Oded Goldreich, Dana Ron

Accepted on: 23rd January 2022 18:12

Downloads: 89

Keywords:

We initiate a study of a new model of property testing that is a hybrid of testing properties of distributions and testing properties of strings.

Specifically, the new model refers to testing properties of distributions, but these are distributions over huge objects (i.e., very long strings).

Accordingly, the model accounts for the total number of local probes into these objects (resp., queries to the strings)as well as for the distance between objects (resp., strings).

Specifically, the distance between distributions is defined as the earth mover's distance with respect to the relative Hamming distance between strings.

We study the query complexity of testing in this new model, focusing on three directions.

First, we try to relate the query complexity of testing properties in the new model to the sample complexity of testing these properties in the standard distribution testing model.

Second, we consider the complexity of testing properties that arise naturally in the new model (e.g., distributions that capture random variations of fixed strings).

Third, we consider the complexity of testing properties that were extensively studied in the standard distribution testing model: Two such cases are uniform distributions

and pairs of identical distributions, where we obtain the following results.

\begin{itemize}

\item

Testing whether a distribution over $n$-bit long strings is uniform on some set of size $m$ can be tested with query complexity ${\widetilde O}(m/\epsilon^3)$, where $\epsilon>(\log_2m)/n$ is the proximity parameter.

\item

Testing whether two distribution over $n$-bit long strings that have support size at most $m$ are identical can be tested with query complexity ${\widetilde O}(m^{2/3}/\epsilon^3)$.

\end{itemize}

Both upper bounds are pretty tight; that is, for $\epsilon=\Omega(1)$, the first task requires $\Omega(m^c)$ queries for any $c<1$ and $n=\omega(\log m)$, whereas the second task requires $\Omega(m^{2/3})$ queries.

Note that the query complexity of the first task is higher than the sample complexity of the corresponding task in the standard distribution testing model, whereas in the case of the second task the bounds almost match.

Correcting inaccuracies in the stmt of Thm 1.6 (and 2.2), and correcting corresponding inaccuracies in the proof of Thm 2.2 and in Sec 2.3.

Revision #1 Authors: Oded Goldreich, Dana Ron

Accepted on: 16th September 2021 18:15

Downloads: 215

Keywords:

We initiate a study of a new model of property testing that is a hybrid of testing properties of distributions and testing properties of strings.

Specifically, the new model refers to testing properties of distributions, but these are distributions over huge objects (i.e., very long strings).

Accordingly, the model accounts for the total number of local probes into these objects (resp., queries to the strings)as well as for the distance between objects (resp., strings).

Specifically, the distance between distributions is defined as the earth mover's distance with respect to the relative Hamming distance between strings.

We study the query complexity of testing in this new model, focusing on three directions.

First, we try to relate the query complexity of testing properties in the new model to the sample complexity of testing these properties in the standard distribution testing model.

Second, we consider the complexity of testing properties that arise naturally in the new model (e.g., distributions that capture random variations of fixed strings).

Third, we consider the complexity of testing properties that were extensively studied in the standard distribution testing model: Two such cases are uniform distributions

and pairs of identical distributions, where we obtain the following results.

\begin{itemize}

\item

Testing whether a distribution over $n$-bit long strings is uniform on some set of size $m$ can be tested with query complexity ${\widetilde O}(m/\epsilon^3)$, where $\epsilon>(\log_2m)/n$ is the proximity parameter.

\item

Testing whether two distribution over $n$-bit long strings that have support size at most $m$ are identical can be tested with query complexity ${\widetilde O}(m^{2/3}/\epsilon^3)$.

\end{itemize}

Both upper bounds are pretty tight; that is, for $\epsilon=\Omega(1)$, the first task requires $\Omega(m^c)$ queries for any $c<1$ and $n=\omega(\log m)$, whereas the second task requires $\Omega(m^{2/3})$ queries.

Note that the query complexity of the first task is higher than the sample complexity of the corresponding task in the standard distribution testing model, whereas in the case of the second task the bounds almost match.

Adding a new result (which appears as Thm 1.6)

and detailing the proof of Thm 5.2.

TR21-133 Authors: Oded Goldreich, Dana Ron

Publication: 12th September 2021 14:24

Downloads: 306

Keywords:

Specifically, the new model refers to testing properties of distributions, but these are distributions over huge objects (i.e., very long strings).

Accordingly, the model accounts for the total number of local probes into these objects (resp., queries to the strings)as well as for the distance between objects (resp., strings).

Specifically, the distance between distributions is defined as the earth mover's distance with respect to the relative Hamming distance between strings.

First, we try to relate the query complexity of testing properties in the new model to the sample complexity of testing these properties in the standard distribution testing model.

Second, we consider the complexity of testing properties that arise naturally in the new model (e.g., distributions that capture random variations of fixed strings).

Third, we consider the complexity of testing properties that were extensively studied in the standard distribution testing model: Two such cases are uniform distributions

and pairs of identical distributions, where we obtain the following results.

\item

Testing whether a distribution over $n$-bit long strings is uniform on some set of size $m$ can be tested with query complexity ${\widetilde O}(m/\epsilon^3)$, where $\epsilon>(\log_2m)/n$ is the proximity parameter.

\item

Testing whether two distribution over $n$-bit long strings that have support size at most $m$ are identical can be tested with query complexity ${\widetilde O}(m^{2/3}/\epsilon^3)$.

\end{itemize}

Both upper bounds are pretty tight; that is, for $\epsilon=\Omega(1)$, the first task requires $\Omega(m^c)$ queries for any $c<1$ and $n=\omega(\log m)$, whereas the second task requires $\Omega(m^{2/3})$ queries.

Note that the query complexity of the first task is higher than the sample complexity of the corresponding task in the standard distribution testing model, whereas in the case of the second task the bounds almost match.