**[EDIT] This post was migrated from my blog from 2011…**

The Halton sequence, which is one of my favourite algorithms ever, can be used for efficient stratified multi-dimensional sampling. Some references:

It is possible to do stratified sampling of hyper-points in the *s-dimensional* unit hyper-cube by picking one consecutive dimension of the Halton series for each component. A convenient way to do so is to use the first *s* prime numbers as the basis for each Halton sequence.

It is well-known, however, that while this approach works great for low dimensions, high dimensions often exhibit a great degree of undesired correlation. The following image displays a grid where each cell combines two components of the *32-dimensional* Halton hyper-cube.

*Raw 32-dimensional Halton hyper-cube*

One can easily spot how some pairs exhibit an obvious pattern, while others fill their corresponding 2D area very densely. This happens more aggressively for higher dimensions (*i.e.,* to the right/bottom in the image) and for pairs formed with close components (*i.e.,* near the diagonal in the image). Click to enlarge!

A successful approach to dissolve this problem without losing the good properties of stratification is to do *“random digit scrambling”* (*a.k.a.,* rds). During the construction of a Halton number, digits in the range `[0..base[`

are combined. Given a *Pseudo-Random Permutation* of `length=base`

, all that one must do is use `PRP(digit)`

instead of digit directly. This somewhat shuffles Halton pairs in rows and columns in a strong way so the correlation disappears. However, since the PRP is a bijection, the good properties of stratification are generally preserved.

How to build a strong and efficient randomised PRP of an arbitrary length is an interesting subject which details I won’t get into here.

Here’s the scrambling strategy in action:

*Scrambled 32-dimensional Halton hyper-cube*

Now all the blocks in the grid look equally dense. Magic!

As long as one picks good PRPs, it is possible to generate any number of different samplings, all with the same good properties.