TR16-078 Authors: Gregory Valiant, Paul Valiant

Publication: 13th May 2016 13:43

Downloads: 699

Keywords:

We introduce the notion of a database system that is information theoretically "secure in between accesses"--a database system with the properties that 1) users can efficiently access their data, and 2) while a user is not accessing their data, the user's information is information theoretically secure to malicious agents, provided that certain requirements on the maintenance of the database are realized. We stress that the security guarantee is information theoretic and everlasting: it relies neither on unproved hardness assumptions, nor on the assumption that the adversary is computationally or storage bounded.

We propose a realization of such a database system and prove that a user's stored information, in between times when it is being legitimately accessed, is information theoretically secure both to adversaries who interact with the database in the prescribed manner, as well as to adversaries who have installed a virus that has access to the entire database and communicates with the adversary.

The central idea behind our design of an information theoretically secure database system is the construction of a "re-randomizing database" that periodically changes the internal representation of the information that is being stored. To ensure security, these remappings of the representation of the data must be made sufficiently often in comparison to the amount of information that is being communicated from the database between remappings and the amount of local memory in the database that a virus may preserve during the remappings. While this changing representation provably foils the ability of an adversary to glean information, it can be accomplished in a manner transparent to the legitimate users, preserving how database users access their data.

The core of the proof of the security guarantee is the following communication/data tradeoff for the problem of learning sparse parities from uniformly random $n$-bit examples. Fix a set $S \subset \{1,\ldots,n\}$ of size $k$: given access to examples $x_1,\ldots,x_t$ where $x_i \in \{0,1\}^n$ is chosen uniformly at random, conditioned on the XOR of the components of $x$ indexed by set $S$ equalling 0, any algorithm that learns the set $S$ with probability at least $p$ and extracts at most $r$ bits of information from each example, must see at least $p\cdot \left(\frac{n}{r}\right)^{k/2} c_k$ examples, for $c_k \ge \frac{1}{4}\cdot\sqrt{\frac{(2e)^{k}}{k^{k+3}} }$. The $r$ bits of information extracted from each example can be an arbitrary (adaptively chosen) function of the entire example, and need not be simply a subset of the bits of the example.