We prove that for every distribution $D$ on $n$ bits with Shannon
entropy $\ge n-a$ at most $O(2^{d}a\log^{d+1}g)/\gamma{}^{5}$ of
the bits $D_{i}$ can be predicted with advantage $\gamma$ by an
AC$^{0}$ circuit of size $g$ and depth $d$ that is a function of
all the bits of $D$ except $D_{i}$. This answers a question by Meir
and Wigderson (2017) who proved a corresponding result for decision
trees.
We also show that there are distributions $D$ with entropy $\ge n-O(1)$
such that any subset of $O(n/\log n)$ bits of $D$ on can be distinguished
from uniform by a circuit of depth $2$ and size $\poly(n)$. This
separates the notions of predictability and distinguishability in
this context.