Stratified sampling
From CGAFaq
Contents |
What is stratified sampling?
In Monte Carlo ray tracing, stratified sampling is a technique used to reduce the amount of noise in an image; it is one of many variance reduction techniques. It is most effective in a low-dimensional domain (e.g., for jittered sampling or direct lighting). The basic idea is to break up the domain into several smaller pieces, making sure at least one sample is taken within each piece. This approach typically gives a better estimate than if we took every sample from a distribution over the entire domain.
Example
Imagine we are trying to cover a black image of pixels by drawing
white dots in random locations. The first dot we draw will turn one of the black pixels white, leaving us with
dots to cover the remaining
black pixels. If we forget about where we put the first dot, then there is some small chance we will choose the same location for the second dot. In that case, we are left with only
dots to cover
black pixels. Since there are more black pixels remaining than dots to cover them with, at least one pixel in the final image will remain black. In fact, the more dots we put down, the less likely it is that a new dot will cover up a black pixel, and in the end there may be large parts of the image we neglect to cover; we will have done a poor job of sampling the domain.
Discussion
If we pick sample locations in this way when computing a Monte Carlo estimate of an integral, we may miss important features in the integrand. Ideally we would like to spread out our samples so that we get a similar amount of information from every part of the domain.
Stratified sampling partitions the domain into multiple pieces, and takes a number of samples in each piece proportional to its size. For example, we could cut up the image from the example into tiny rectangles, placing at least one sample within each rectangle. If we do this, it will be impossible to end up with a large region of black pixels, since at the very least we've put one dot in each rectangle. (There are other methods which take the placement of previous samples into account such as Poisson disk sample generation.)
The domain can be partitioned however we like, but we should make it relatively simple to pick a random point within each region. We also want to keep the number of elements in the partition small enough that we can take at least one sample in each region. If we partition the domain into pieces and take
samples in the
th piece, then our stratified Monte Carlo estimate of the integral is
where is the
th sample taken from a random variable uniformly distributed over the domain of the
th piece. In other words, we estimate the integral piece by piece, and take the sum of these estimates to get the final answer. (Compare this with the usual estimator used in Monte Carlo integration.)
Some pseudocode for integrating a one-dimensional function defined over the unit interval by taking samples in each of
equally-sized strata might look like
integrateMCStratified( f, M, N )
{
overallEstimate = 0
for i = 1 .. M
{
curEstimate = 0
for j = 1 .. N
{
curEstimate += f( uniformRandRange( (i-1)/M, i/M ) );
}
overallEstimate += curEstimate/N;
}
return overallEstimate;
}
where uniformRandRange(a, b) returns random values uniformly distributed over the interval .
Why does stratified sampling work?
Although stratified sampling seems to produce an even sampling of the integrand, we would like to know whether it actually reduces the variance of our estimator. The picture below gives a rough idea of why the variance is lower. When we break up the function into several smaller pieces
, each piece takes on a smaller range of values than the original function; each one “varies” less than the function as a whole. In this particular example, since an estimator which takes
samples over a single
will have lower variance than an estimator which takes
samples over all of
, and since
(i.e., the variance of a sum of independent random variables is equal to the sum of the variance of each one), the variance of a stratified estimator is smaller than the variance of an unstratified estimator. In general, stratification will never increase variance, though in some situations it will not reduce variance either. See Veach [1], pp 50–51 for a proof and Mitchell [2] for a discussion of convergence.
Why doesn't stratified sampling work well in high dimensions?
For unstratified sampling, the variance of an estimator is , where
is the number of samples and
is the integrand. This means we need to take twice as many samples to reduce the variance by half. Mitchell gives the following general rule for the variance of a stratified estimator:
where is the dimension of the domain of the function being integrated. As
becomes larger, the variance (fairly quickly) approaches the variance of the unstratified estimator. To reduce the variance by half we will have to take approximately 1.26, 1.41, 1.52, 1.59, 1.64, and 1.68 times as many samples in dimensions one through six, respectively. At some point, the cost of generating stratified samples becomes greater than the cost of taking additional unstratified samples.
There are other practical problems with stratification in high-dimensional spaces. For instance, over complex domains it may be difficult to come up with a partition where the variance within each region is smaller than the variance of the overall function. Even if we do find a good partition for a high-dimensional space, it may consist of a huge number of regions (requiring a huge number of samples). Consider partitioning the unit hypercube into
regular cells; this curse of dimensionality is what the Monte Carlo method aims to avoid in the first place.
Can I use stratified sampling and importance sampling simultaneously?
At first, stratified sampling and importance sampling might appear to be contradictory techniques: stratified sampling tries to spread out samples evenly over the domain, whereas importance sampling concentrates samples into specific regions. However, there is an alternative interpretation of importance sampling which makes it clear that the two techniques are compatible. If we think of an importance density as something which stretches the domain of the integrand where the density is large and shrinks the domain where the density is small until the density is equal everywhere, then we can think of stratification as being applied within this transformed domain.
We typically have a function which transforms the domain of the integrand into a domain where the importance density is uniform, namely the the inverse of the function which transforms values from our (uniform) random number generator into the importance density. Therefore, all we need to do to stratify samples in our target domain is to stratify samples coming out of the random number generator. If our importance density corresponds well to the function being integrated then we won't have missed important features in the integrand.

