Definition
Bias
Let
be a probability distribution over
.
The bias of
with respect to a set of indices
is defined as[1]
![{\displaystyle {\text{bias))_{I}(X)=\left|\Pr _{x\sim X}\left(\sum _{i\in I}x_{i}=0\right)-\Pr _{x\sim X}\left(\sum _{i\in I}x_{i}=1\right)\right|=\left|2\cdot \Pr _{x\sim X}\left(\sum _{i\in I}x_{i}=0\right)-1\right|\,,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3e7b935e1cd158079e846bfdede3a721f5e3d970)
where the sum is taken over
, the finite field with two elements. In other words, the sum
equals
if the number of ones in the sample
at the positions defined by
is even, and otherwise, the sum equals
.
For
, the empty sum is defined to be zero, and hence
.
ϵ-biased sample space
A probability distribution
over
is called an
-biased sample space if
holds for all non-empty subsets
.
ϵ-biased set
An
-biased sample space
that is generated by picking a uniform element from a multiset
is called
-biased set.
The size
of an
-biased set
is the size of the multiset that generates the sample space.
ϵ-biased generator
An
-biased generator
is a function that maps strings of length
to strings of length
such that the multiset
is an
-biased set. The seed length of the generator is the number
and is related to the size of the
-biased set
via the equation
.
Constructions of small epsilon-biased sets
Usually the goal is to find
-biased sets that have a small size
relative to the parameters
and
.
This is because a smaller size
means that the amount of randomness needed to pick a random element from the set is smaller, and so the set can be used to fool parities using few random bits.
Theoretical bounds
The probabilistic method gives a non-explicit construction that achieves size
.[2]
The construction is non-explicit in the sense that finding the
-biased set requires a lot of true randomness, which does not help towards the goal of reducing the overall randomness.
However, this non-explicit construction is useful because it shows that these efficient codes exist.
On the other hand, the best known lower bound for the size of
-biased sets is
, that is, in order for a set to be
-biased, it must be at least that big.[2]
Explicit constructions
There are many explicit, i.e., deterministic constructions of
-biased sets with various parameter settings:
- Naor & Naor (1990) achieve
. The construction makes use of Justesen codes (which is a concatenation of Reed–Solomon codes with the Wozencraft ensemble) as well as expander walk sampling.
- Alon et al. (1992) achieve
. One of their constructions is the concatenation of Reed–Solomon codes with the Hadamard code; this concatenation turns out to be an
-balanced code, which gives rise to an
-biased sample space via the connection mentioned above.
- Concatenating Algebraic geometric codes with the Hadamard code gives an
-balanced code with
.[2]
- Ben-Aroya & Ta-Shma (2009) achieves
.
- Ta-Shma (2017) achieves
which is almost optimal because of the lower bound.
These bounds are mutually incomparable. In particular, none of these constructions yields the smallest
-biased sets for all settings of
and
.
Application: almost k-wise independence
An important application of small-bias sets lies in the construction of almost k-wise independent sample spaces.
k-wise independent spaces
A random variable
over
is a k-wise independent space if, for all index sets
of size
, the marginal distribution
is exactly equal to the uniform distribution over
.
That is, for all such
and all strings
, the distribution
satisfies
.
Constructions and bounds
k-wise independent spaces are fairly well understood.
- A simple construction by Joffe (1974) achieves size
.
- Alon, Babai & Itai (1986) construct a k-wise independent space whose size is
.
- Chor et al. (1985) prove that no k-wise independent space can be significantly smaller than
.
Joffe's construction
Joffe (1974) constructs a
-wise independent space
over the finite field with some prime number
of elements, i.e.,
is a distribution over
. The initial
marginals of the distribution are drawn independently and uniformly at random:
.
For each
with
, the marginal distribution of
is then defined as
![{\displaystyle Y_{i}=Y_{0}+Y_{1}\cdot i+Y_{2}\cdot i^{2}+\dots +Y_{k-1}\cdot i^{k-1}\,,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8b4bd3887667bef2dbe919a2370fd59f068b29c2)
where the calculation is done in
.
Joffe (1974) proves that the distribution
constructed in this way is
-wise independent as a distribution over
.
The distribution
is uniform on its support, and hence, the support of
forms a
-wise independent set.
It contains all
strings in
that have been extended to strings of length
using the deterministic rule above.
Almost k-wise independent spaces
A random variable
over
is a
-almost k-wise independent space if, for all index sets
of size
, the restricted distribution
and the uniform distribution
on
are
-close in 1-norm, i.e.,
.
Constructions
Naor & Naor (1990) give a general framework for combining small k-wise independent spaces with small
-biased spaces to obtain
-almost k-wise independent spaces of even smaller size.
In particular, let
be a linear mapping that generates a k-wise independent space and let
be a generator of an
-biased set over
.
That is, when given a uniformly random input, the output of
is a k-wise independent space, and the output of
is
-biased.
Then
with
is a generator of an
-almost
-wise independent space, where
.[3]
As mentioned above, Alon, Babai & Itai (1986) construct a generator
with
, and Naor & Naor (1990) construct a generator
with
.
Hence, the concatenation
of
and
has seed length
.
In order for
to yield a
-almost k-wise independent space, we need to set
, which leads to a seed length of
and a sample space of total size
.