Playing with Blue Noise (III)


Playing with Blue Noise (III)

xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx no-discontinuity samplers

Playing with Blue Noise (II)

If you are implementing a Monte Carlo integration solution such as a path tracing engine, you are definitely familiar with low-discrepancy series and hypercubes PRNs/QRNs. Actually, a typical situation for a mature renderer is to end up with a generator of tuples of Quasi-Random Numbers which meets as good as possible all of these requirements:

1- can deterministically generate a new tuple for each pixel for each render pass. 2- within each pixel, its tuples present a low discrepancy over the course of the render passes. 3- the tuples of each pixel present a low discrepancy with the tuples of neighboring pixels. 4- is fast and requires little or no significant amount of memory for precomputations.

A path-tracing path generating a sample is characterized (solely) by the tuple of QRNs that generate it (e.g., 2 random numbers for the subpixel position, 2 random numbers for camera DOF, 2 random numbers to evaluate the BRDF at each bounce, etc…). In a full-fledged GI solution such as Maverick Render, these tuples can get arbitrarily long and are generally hundreds of numbers long each. Each “position” in a tuple is generally called dimension.

Halton and Sobol series are popular solutions that meet those requirements when used right. On top of these, there is a bunch of powerful techniques such as random-digit scrambling, Owen-scrambling, padding, shuffling, etc… that allow to create seedable series variants which are different to each other but still capture the same low-discrepancy characteristics of the original.

So over the course of the render passes (pass=spp=samples-per-pixel) each pixel generates one tu

Playing with Blue Noise (I)

Recently I’ve been doing some research on [](Blue Noise) in the context of sampling for rendering, and would like to talk about the subject a little bit.

In this post, let’s forget about rendering, sampling, etc… and just focus on how to generate blue noise. Or, rather, how to “blue noisify” a pre-existing set of 2D points.

As usual when I write these blog entries I am not sure whether I should start from the very basics or whether eventual readers will be knowledgeable in the subject. But here we go…

Blue Noise is a type of noise where frequencies concentrate at the high end of the spectrum. i.e., where high frequencies (treble) are predominant and low frequencies (bass) are dim or absent.

Another way to see blue noise is as the result of applying a high-pass band on purely random (white) noise. This definition is very close to what this post is about.

The three videos below come from our Unit Testing system at RandomControl. They start with a 2D image with uniformly distributed points (white noise). The distribution is transformed into blue noise by progressively sucking the low frequencies out.

I will go over the algorithm, which is super-simple, in a little bit. But before:

  • The noise field in these videos is 128x128, although it is tiled 2x2 in order to illustrate the fact that the resulting blue noise is tileable.
  • The top-right corner features a decreasing graph as more and more low frequencies are removed.
  • The bottom-right corner is the (magnitude of the) Direct Fourier Transform of the noise field.

The DFT beautifully transitions from a flat image (because all sorts of frequencies are present in white noise), to an image where a black circle in the middle (where low frequencies sit) starts to manifest. The central pixel of the DFT is proportional to the average of the field so this one is not meant to go away.

The optimization process I am using here is just a simplified flavor of [](Simulated annealing).

At each step:

  • A pair of pixels (p0 and p1) in the 2D image is randomly selected.
  • A metric err_now is computed in the neighborhood of p0 and in the neighborhood of p1. Both amounts are added.
  • err_swap is computed the same, but swapping the B/W values at p0 and p1.
  • The swap is randomly accepted if (and only if) err_now<err_swap, and rejected otherwise.

The metric I am using is nothing but a gaussian convolution of a small radius around a given pixel. Which is, basically, a measurement of how empty or crowded the neighborhood of the pixel is.

The random acceptance chance can be something as blunt as “50% of the times”.

Since we’re only (randomly) accepting swaps where the metric increases, we are making white points repel other white points as if they were tiny electrical charges. This effectively dissolves clumping and creates an eye-pleasing even spacing around white points.

As a fun exercise, flipping the acceptance criterion < by a > clumps white points in islands.

Simulated annealing is not guaranteed to find the best arrangement possible, but with some proper randomization will effortlessly climb to a maximum that is “good enough”.

[NOTE] This post is unfinished. I will try to complete it tomorrow…


Barycentric coords in triangle sweeps

A “triangle sweep” can be defined as 2 triangles (A,B,C) and (D,E,F) and the 3 bilinear patches at the sides of both triangles. It’s kind of a “twisted” triangular prism.

We can define some sort of (u,v,w) barycentric coordinates here, such that each point P interior to the sweep could be written as P=(A+AD*w)*u+(B+BE*w)*v+(C+CF*w)*(1-u-v).

As soon as the sweep is “twisted” the behavior of those barycentric coords is no longer linear. i.e., linear motion in the (u,v,w) space does not translate to linear motion in 3D space and viceversa. However these coordinates can be useful for certain applications.

One way to find (u,v,w) for a given P is to solve for w first (the red plane below) and then find u and v as regular barycentrics in the triangular section. After massaging the numbers a bit, solving for w turns out to be a cubic equation, which is the wiggly orange curve below.

We are only interested in solutions where w IN [0..1].

Octahedron unit-vector encoding

There are many situations in Computer Graphics where you may need to store massive amounts of unit-length vectors (e.g., directions, surface normals, …). The normals AOV in a G-Buffer, incoming directions in a Photon map, or uploading a 3D model to the GPU for rendering are situations that greatly benefit from a bit-efficient encoding for normals.

Spherical coordinates encoding

The straightforward way to encode a normal is to store its (x,y,z) coordinates, which takes 3x32=96 bits in single-precision floating-point. This naive approach doesn’t take advantage of the fact that in normals or directions the vector length is irrelevant. So assuming unit length, you may use spherical coordinates (theta, phi) and encode each normal with 2 numbers instead of 3.

Since (theta, phi) have a [0..2pi] and [0..pi] range, it sounds reasonable to quantize its values using fixed-point math. This way both angles become a couple of integers of, say, 16-bit each, with their ranges turned into [0..65535]. Note that this gives twice as much resolution to phi but that’s fine. We have achieved a reasonably fine-grained 16+16=32-bit encoding. Depending on how tolerable precision losses are in your application, you may go even lower (e.g., 12+12=24-bit, or even 8+8=16-bit).

The problem with this simple approach is that spherical coordinates are not an equal-area transform, as they concentrate points near the poles:

Spherical coordinates encoding: phi gets more precision than theta, and density between the equator/poles is highly uneven.

Octahedron encoding

A smarter solution which produces a much more even distribution, while being strikingly simple and more efficient computationally (no trigonometry whatsoever) is octahedron encoding.

Octahedron encoding: The distribution is pretty even.

The idea is to project positions in the sphere (unit vectors) onto a sphere-centered octahedron. And then piece together the 8 faces of the octahedron in the unit cube. This diagram should be self-explanatory.

Reference: Octahedron environment maps

Fibonacci encoding

Recently I have been using the Fibonacci lattice here and there and had the idea to use its equal-area spherical flavor for normal encoding. After googling a bit I came across this paper, so somebody had the idea first:

Unorganized Unit Vectors Sets Quantization

Actually, pulling from this thread I also came across these Shadertoy samples by Íñigo Quilez. They visualize what happens with either method as more bits are budgeted for normals encoding:



_Fibonacci encoding: Looks great, but comes at a price._

This looks elegant (pretty even!). Unfortunately the implementation to encode/decode unit-vectors with n bits is rather convoluted and involves N=2^n and 1/N. As soon as n>24 or so numerical degradation makes this solution unusable in single-precision. Also, this method is significantly slower than octahedron encoding, and while the distribution looks prettier, the difference in practice is negligible.

QRNs in the unit square/disk/sphere

Basic concepts in this post, but worth refreshing.

Very often in Computer Graphics (in Monte Carlo integration in particular) you need to sample N points uniformly over the surface of a square, a disk, or a sphere.

Depending on many (many!) factors such as how critical the quality of the random distribution needs to be, performance or code simplicity considerations, availability of a high-quality randomization seed, etc… one may choose one method or another. There are plenty of options, and each comes with their own strengths and weaknesses.

In this post I will talk about:

  • The Equal-Area transform to turn a pair of uniformly distributed random numbers in the unit square into uniformly distributed points over the surface of the the unit disk/sphere.
  • A super-simple Fibonacci-based method to generate N points that uniformly cover the unit square (and hence the unit disk/sphere).

Equal-Area transform

Given u a uniform distribution of 2D points in the surface of the unit square, the transforms to produce uniform distributions of 2D points in the unit disk, or 3D points in the unit sphere are:

There are very clean-cut explanations for these on the always fantastic website.

Disk Point Picking

Sphere Point Picking

These are based on the area (disk) and solid angle (sphere) differential elements.

Fibonacci lattice

The algorithm to generate the first N points in the Fibonacci lattice goes like this:

This produces a stratified distribution of points in the unit square.

The beauty of this method is that it’s a strikingly simple couple of divs and a mod. Unlike most series generators, however, it requires that you know N beforehand. Other usual methods like the ones depicted later in this post are incremental, so each new sample is spawned next to the previous ones. The Fibonacci lattice, on the other hand, “squeezes” previous samples as new ones come in, as illustrated by the videos below.

Fibonacci lattice (square/disk/sphere)

This videos simply plug the Fibonacci lattice formula to the Equal-Area transforms above.

As a bonus, the next videos are generated like the ones above, but using a uniformly distributed QRN generator (the popular and very strong Mersenne Twister) and the van Der Corput stratified series (in particular, the Halton sequence using primes 2 and 3).

Mersenne Twister (Random)

Halton (Stratified)

Recklessly encoding int as float

Quiz: Does this loop end, and if it does, what is the value of k after the loop? (assuming IEEE-754 and usual 32-bit semantics for int/float)

Running this code in VS you get k=16777217 which equals 2^24+1.

In other words, in line with the true intent of the quiz, float can encode -exactly- (without any precission loss) all natural numbers up to 2^24 (included). Because float encodes sign in a dedicated bit, this property holds true for negative values as well. So you could hypothetically encode any [-2^24..+2^24] int in a float with a static/implicit cast.

I said hypothetically because generally speaking I would never do such thing. Carelessly casting between int and float (without an explicit floor, ceil, trunc, round…) is often a sign of lousy coding, unless performance is at stake and you know well what you are doing.

However, I came across the situation in Unity recently, where the Vertex ID node in their Shader Graph hands over SV_VertexID as a float (disappointed face :-P). I would’ve expected an uint/int output or a float that you could reinterpret-cast to retrieve the raw SV_VertexID bits with asuint(). But nope. You are handed over a float which seemingly carries (float)SV_VertexID.

One may recklessly use this float to index a StructuredBuffer ignoring the way static-casting works. This is one case where ignorance is bliss, because the float received exactly matches the original int as long as the original value is <=2^24. That is, as long as you are dealing with fewer than (roughly) 16.7 million vertices, which is usually the case.

I believe that Unity’s ShaderGraph doesn’t support int/uint as In/Out values between nodes, so I guess that the Vertex ID and Instance ID nodes just static-cast the corresponding SV_... value to a float. But it would be better (in the pedantic sense) to reinterpret the bit pattern of the raw values with asfloat and then let the user retrieve them with asint()/asuint().

reinterpret-cast between int/float in HSLS




A (loose) explanation of the 2^24 limit

This is the IEEE-754 standard for floating-point numbers, as described in Wikipedia:

The value of a float-encoded number is reconstructed as (-1)^S * M * 2^E.

  • S is the 1-bit sign.
  • M is the 23-bit mantissa, interpreted as 1.xxxxx (in binary).
  • E is the 8-bit exponent, used as E-127 where 127 is often called bias.

e.g., a power-of-2 integer number like 4 would be encoded as:

  • M=1.00 which is the binary representation of 4, with the point at the left-most 1.
  • E=2 (+bias).

The restored number is 1.00 shl 2=4.

We should be able to do the same for all power-of-2 integers until we max out E.

For non-power-of-2 integers the process is similar. e.g., number 5:

  • M=1.01...00.
  • E=2 (+bias).

The restored number is now 1.01 shl 2=5.

This works the same for all integer numbers as long as M can hold the raw binary representation of the number. The tallest binary number that M can hold is 23 consecutive 1s. That is: 1.11…11 (24x1s in total). With E=23 (+bias) this equals 2^24-1.

The next integer 2^24 would be encoded as 1.00...00 (24x0s, clamped to 23, but the trailing 0s are meaningless here). With E=24 (+bias) this equals 2^24 (the answer provided above!!).

But the next integer 1.{23x0s.1} can’t be encoded in a 23-bit M. So from 2^24 onwards, there is necessarily a loss. Some integers beyond 2^24 (such as powers-of-2) will be luckily encoded exactly by float. But not all consecutive numbers will. Actually, 2^24+1 is the first integer that won’t.

Whoa! As always, apologies for any mistakes in any of my posts.

[EDIT] The same reasoning can be applied to double, where the mantisa M is 52-bit long.

Hence double can encode exactly all integers in the range [-2^53..+2^53].

Viewport baked PBR materials

The technique described here is very basic, but useful nonetheless.

Let’s make the following (cringy :-D) assumptions:

  1. No interactions between surfaces (i.e., no GI, no occlusion, etc…).
  2. The only light source is an IBL (environment) map placed at infinity which orientation is anchored to the camera.
  3. Each surface point P is made of a perfectly rough white material which simply “collects” the light that arrives to P from above the hemiplane defined by (P,N).

Under such assumptions a ball floating in the void looks like this image rendered in Maverick Studio with a procedural sphere primitive and a standard material:

The illumination collected by each point P and radiated back to the camera (+ tonemapping) happens to be, exactly, one particular pixel in the rendered ball as depicted in the hand-drawn diagram above. Note that due to the constraints set, this color depends exclusively on the normal N at P but not on P itself. In other words, the rendered ball effectively precomputes what any surface point P sees through its normal N.

The goal in this post is to benefit from this knowledge in a rasterization viewport. So let’s jump now to the world of pixel shaders. In our vertex shader, let’s transform the object’s normals N to camera space Nc and let’s drop the resulting Z coordinate. (Nc.x,Nc.y) are the geometry normals projected to screen space. Let’s normalize these: Nc'=normalize(Nc.x,Nc.y)

Let’s upload the ball render as a regular 2D texture and let’s define the following UV space:

Nc' is normalized, so the UVs generated will cover the interior of the ball only. Normals at grazing angles will be near the perimeter of the texture’s ball.

In our fragment shader, let’s simply output the color found in the ball image at the UVs computed as described. Here’s the result:

Of course, physically-based materials are more complicated than this. In general, the amount of light received by the surface from the (P,N) hemiplane through each possible incoming direction and then bounced back through each possible outgoing direction depends on both directions, and also on the properties of the material. This is basically the definition of a BRDF, BTW.

But because we set the constraint that the lights are not moving (i.e., the IBL stays fixed with respect to the camera) the incoming and outgoing directions and hence the behavior of the BRDF ends up being a function of N and nothing else. So this method still works no matter what material we render the ball with (well… excluding refractions :-D).

Here goes another example with the same IBL but adding some specularity to the material:

Of course this could be pushed further with some SSAO or any other technique that approximates self-shadowing. But this is a very compact and cheap (memory/code/effort) way to display models in an easy-to-author appealing way.

Some final notes:

  • If you want to be accurate, the ball should be rendered with an orthographic camera (i.e., not with a perspective camera).
  • For blurry/glossy or diffuse reflections, you can get away with a low-res ball render. However, make sure that the UVs that you generate don’t step on the antialias pixels at the boundary of the rendered ball. Otherwise the model will look b0rked at grazing angles.
  • This technique can be extended by using 2 ball images: one with the diffuse component of the material, and another with the specular component. This way the fragment shader could modulate the diffuse part and the specular part by separate colors that could be passed as inputs to the shader. Keep in mind that this would require that the ball images and the colors are passed in linear (instead of sRGB) space and that the resulting color is sent back to sRGB.

Trimmed NURBS

I’m currently swimming in an ocean of NURBS with their trimming loops. This is experimental work for NURBS rendering in Maverick Render.

[EDIT] Throwing some GLSL vert/frag shaders in to the mix for normal/tangency inspection. 2 days in and this is still WIP.

Lattice-Boltzmann Method (LBM)

A couple of weekends ago I was tinkering with fluid dynamics and implemented a few compute shaders in Unity. Among the ones I implemented, my favorite is the Lattice-Boltzmann Method featured here.

This early implementation allows to draw SDF entities in the simulation space behaving as blockers, inflows or outflows. I intend to expand on the features, but most likely at a slow pace, because this is a no-rush side-project. But I definitely wish to post now and then about my progress.

I also hope to find some spare time soon to write a quick article on how to implement an LBM cellular automata from a light-on-maths perspective.

Colorful pet project

Some pet project I am making some progress on. More info soon. Definitely. Maybe.

[EDIT] I had to install a plug-in so Twitter cards display the main post image when cross-posting to Twitter. Apologies for the multiple test posts.

[EDIT] Here’s a hint. It’s a cutsie CSG-based Solid modeling experiment in Unity Editor.

Topology fun

Recently I have been revisiting in my spare time one of my favorite subjects: topology. I will probably be posting some stuff about topological operators, subdivision surfaces, DCEL structures, irregular tilings and such things in the near future.

Below: A little demo I’ve written that supports an extended flavor of Conway’s notation for polyhedra: adotaocD.

[EDIT] I paused this pet project for the time being… because the 2022.2 release of Maverick has virtually blown all my productivity hours away. But I plan to write some entries on topological operators at some point further into this year.

Old MLT experiments

These are some very old MLT tests that I found on my hard drive today.

Metropolis Light Transport (MLT) is a global illumination variant of the Metropolis-Hastings algorithm for Markov-Chain Monte Carlo (MCMC).

In a nutshell, MCMC algorithms explore the neighborhood of a Monte Carlo sample by jittering (mutating) said sample. A metric is then used to define how good a sample is, so the next mutation is accepted or rejected based on how good it is compared to the current sample. This leads to a random-walk in the space of samples which tends to crawl up in the increasing direction of the metric used.

This method biases sampling towards successful samples. So a bit of statistics juggling is required to counter that bias. The link below is a seminal paper that beautifully describes the nuts and bolts of this process. The idea presented there is to jitter samples, not in sample space directly, but in quasi-random number tuples space instead:

Simple and Robust Mutation Strategy for Metropolis Light Transport Algorithm

In the particular case of MLT, samples are random light paths, and the metric used is the amount of light energy transported by a path. This can be implemented on top of regular Path Tracing. Arion was the first (?) commercial render engine that did this on the GPU back in 2012 or so.

MLT has fallen out of favor over the years (path guiding is all the rage these days). But MLT proved to be a competent universal solution to global illumination, being capable of boosting path tracing so it could efficiently solve even the hardest cases, such as refractive caustics.

The beauty of MLT is that it requires 0 configuration from the user and does not need any additional memory for data structures. The main drawbacks are the ugly splotchy and non-uniform look of noise, as compared to regular path tracing, and the inability to guarantee noise stability across animation frames.

Experiments in 2D

Before I implemented MLT in Arion, I did some MCMC simulations with 2D sampling. In the videos below I used the (x,y) coords of pixels as samples, and the grayscale luminance Y of the image as the metric. The goal of the mutations here is to reconstruct the image by sampling harder in the directions of higher luminance.

MCMC is prone to getting stuck for long amounts of time in local maxima. The practical solution proposed in the above paper is to introduce a plarge probability that kicks the sampling scheme and sends the next sample to a purely random position in the sample space. The videos below visualize this very well I think.

Mutating in QRN tuple space without plarge - Watch on Youtube

Mutating in QRN tuple space with plarge - Watch on Youtube

MLT reconstructing a grayscale image - Watch on Youtube

Implementing MLT successfully in our render engine did require some specialization in our sampling routines. I won’t get into the details, but basically, since mutations happen in the random tuples that generate the paths, you expect continuous mutations in the tuples to produce continuous changes in the generated paths. So all samplers involved must avoid discontinuities in the [QRN<->path] bijection.

Back in time

Some MLT-heavy images rendered in Arion back then:

Uniform vs. stratified (2015-03-10)

[EDIT] This post was migrated from my blog from 2011…

This is a classic subject in numerical (Monte Carlo) integration.

Uniform 2D distribution vs. Halton series for the first 2 dimensions

To the left: 32768 points in a 512×512 image using a uniform random number generator (Mersenne Twister). To the right, the first 32768 pairs in the Halton series, using dimensions #0 and #1. Click to enlarge!

Filling of missing image pixels (2015-05-28)

[EDIT] This post was migrated from my blog from 2011…

Here’s what we could call a mean pyramid of the 512×512 Lena image. i.e., a sequence of progressive 1:2 downscalings, where each pixel in the i-th downscaled level is the average of the corresponding 4 pixels in the previous level. For people familiar with OpenGL and such, this is what happens when the mipmaps for a texture map are computed:

The 9+1 mipmap levels in the 512×512 Lena image

There are many smart, efficient, and relatively simple algorithms based on multi-resolution image pyramids. Modern GPUs can deal with upscaling and downscaling of RGB/A images at the speed of light, so usually such algorithms can be implemented at interactive or even real-time framerates.

Here is the result of upscaling all the mip levels of the Lena image by doing progressive 2:1 bi-linear upscalings until the original resolution is reached:

Upscaled mipmap levels

Note the characteristic “blocky” (bi-linear) appearance, specially evident in the images of the second row.

Lately, I have been doing some experimentation with pixel extrapolation algorithms that “restore” the missing pixels in an incomplete image for a 2D DOF filter based on the Depth AOV.

My pixel extrapolation algorithm works in 2 stages:

  1. The first stage (analysis) prepares the mean pyramid of the source image by doing progressive 1:2 downscalings. Only the meaningful (not-a-hole) pixels in each 2x2 packet are averaged down. If a 2x2 packet does not have any meaningful pixels, a hole is passed to the next (lower) level.
  2. The second stage (synthesis) starts at the smallest level and goes up, leaving meaningful pixels intact, while replacing holes by upscaled data from the previous (lower) level.

Mean pyramid (with holes)

Note that the analysis stage can stop as soon as a mip level doesn’t have any holes.

Here is the full algorithm, successfully filling the missing pixels in the image.

Mean pyramid (filled)


This algorithm can be implemented in an extremely efficient fashion on the GPU, and allows for fantastic parallelization on the CPU as well. The locality of color/intensity is preserved reasonably well, although holes become smears of color “too easily”.

A small (classic) inconvenience is that the source image must be pre-padded to a power-of-2 size for the progressive 1:2 downscalings to be well defined. I picked an image that is a power-of-2 in size already in the above examples.

TL;DR: Given the ease of implementation and the extreme potential in terms of efficiency, the results are quite decent for many applications.

Playing with the Fourier Transform (2016-07-28)

[EDIT] This post was migrated from my blog from 2011…

The beauty and power of the Fourier Transform never cease to amaze me. And since several effects in ArionFX are based on it, I have had to play with it a lot in recent times.

As explained in a previous post, diffraction patterns (e.g., the glare simulated in ArionFX for Photoshop) come from the Fourier Transform of the lens aperture. I will use the FT of an aperture mask for visualization in this post.

I will use pow-2 square sizes (my FT implementation is an FFT). Let’s start by this Aperture x Obstacle map output directly from ArionFX for Photoshop v3.5.0.

Aperture x Obstacle mask, rasterized @ 512×512

The Fourier Transform of that image, rasterized at that size in said 512×512 buffer, is the following.

FT of the mask above

The faux-color is done on the magnitude of each complex number in the FT. All the FT images in this post are normalized equally, and offset to look “centered” around the mid-pixel.

Such diffraction patterns and some heavy convolution magic are what ArionFX uses to compute glare on HDR images:

Resulting glare in ArionFX

Now, let’s focus on what happens to the FT (frequency space) when one does certain operations on the source data (image space). Or, in this exemplification: what happens to the diffraction pattern, when one plays with the rasterized aperture mask.

Note that we’re speaking of the Discrete Fourier Transform, so sampling (rasterization, pixelization) issues are mostly ignored.

Rotation about the center

A rotation of the source buffer about its center doesn’t change the frequencies present in the data; only their orientation. So a rotation in the source data rotates the FT rotates in the exact same way.

As we will see next, this property holds true regardless of the center of rotation, because the FT is invariant with respect to translations.

Rotation about the center

Translation (with warp-around)

Frequencies arise from the relative position of values in the data field, and not from their absolute position in said field. For this reason, shifting (warp-around included) the source data does not affect the corresponding Fourier Transform in any way.

Invariance to translation

Let’s recall that the idea behind the FT is that “any periodic function can be rewritten as a weighted sum of sines and cosines of different frequencies”. Periodic being the keyword there.

Repetition (tiling)

Tiling the data buffer NxM times (e.g., 2×2 in the example below) produces the same FT, but with frequencies “exploded” every NxM cells, canceling out everywhere else.

This is because no new frequencies are introduced, since we are transforming the same source data. However, the source data is NxM times smaller proportional to the data buffer size (i.e., the frequencies become NxM times higher).

Exploded frequencies on tiling

Data scaling

Normalization and sampling issues aside, scaling the data within the source buffer scales the FT inversely.

This is because encoding smaller data requires higher frequencies, while encoding a larger version of the same data requires lower frequencies.

Inverse effect on scaling

In the particular case of glare (e.g., ArionFX) this means that the diffraction pattern becomes blurry if the iris is sampled small. Or, in other words, for a given iris, the sharpest diffraction pattern possible is achieved when the iris is sampled as large as the data buffer itself.

Note, however, that “large” here means “with respect to the data buffer”, being the size of the data buffer irrelevant as we will see next.

Hosek & Wilkie sky model (2015-03-03)

[EDIT] This post was migrated from my blog from 2011…

This is an old comparison between the Preetham and Hosek & Wilkie sky/sun models.

Preetham et al. sky model

Hosek & Wilkie sky model

Scrambled Halton (2015-03-10)

[EDIT] This post was migrated from my blog from 2011…

The Halton sequence, which is one of my favourite algorithms ever, can be used for efficient stratified multi-dimensional sampling. Some references:

It is possible to do stratified sampling of hyper-points in the s-dimensional unit hyper-cube by picking one consecutive dimension of the Halton series for each component. A convenient way to do so is to use the first s prime numbers as the basis for each Halton sequence.

It is well-known, however, that while this approach works great for low dimensions, high dimensions often exhibit a great degree of undesired correlation. The following image displays a grid where each cell combines two components of the 32-dimensional Halton hyper-cube.

Raw 32-dimensional Halton hyper-cube

One can easily spot how some pairs exhibit an obvious pattern, while others fill their corresponding 2D area very densely. This happens more aggressively for higher dimensions (i.e., to the right/bottom in the image) and for pairs formed with close components (i.e., near the diagonal in the image). Click to enlarge!

A successful approach to dissolve this problem without losing the good properties of stratification is to do “random digit scrambling” (a.k.a., rds). During the construction of a Halton number, digits in the range [0..base[ are combined. Given a Pseudo-Random Permutation of length=base, all that one must do is use PRP(digit) instead of digit directly. This somewhat shuffles Halton pairs in rows and columns in a strong way so the correlation disappears. However, since the PRP is a bijection, the good properties of stratification are generally preserved.

How to build a strong and efficient randomised PRP of an arbitrary length is an interesting subject which details I won’t get into here.

Here’s the scrambling strategy in action:

Scrambled 32-dimensional Halton hyper-cube

Now all the blocks in the grid look equally dense. Magic!

As long as one picks good PRPs, it is possible to generate any number of different samplings, all with the same good properties.

Sobel operator (2014-09-07)

[EDIT] This post was migrated from my blog from 2011…

The Sobel operator is a simple way to approximate the gradient of the intensity in an image. This, in visual terms, can be used for edge detection. The purpose of edge detection is to significantly reduce the amount of data in the image, while preserving the structural properties to be used for further image processing.

In practice, the Sobel operator is simply a pair of 3×3 (separable) convolution kernels. One highlights the horizontal gradient/edges, and the other one highlights the vertical gradient/edges.

In non-formal terms, and under certain theoretical assumptions, this is conceptually equivalent to computing the partial derivatives of the image with respect to x an y.

For the examples below, I am using as input the same image featured by Wikipedia in the Sobel operator page:

Sobel operator

This grid presents:

  1. The input image (luminance).
  2. The absolute magnitude of the result of the Sobel filter.
  3. The result of the Sobel (x) filter.
  4. The result of the Sobel (y) filter.
  5. Same as (2), but in faux color.
  6. The gradient vectors, normalized and displayed as an RGB-encoded normal map.

The 3-tap Sobel convolution kernels have a 1px radius, so they have a very limited edge detection range. This makes the filter quite shaky as soon as the image presents fine details or noise. For this reason, one may want to pre-pass the input image with a subtle Gaussian blur.

This has the effect of diminishing edges in general (as expected), but the resulting gradient images are equivalent, yet much cleaner.

Sobel operator (Gaussian with sigma=2)

The Sobel operator is one of the most fundamental building blocks in feature detection in the field of Computer Vision.

Note that the Sobel operator does not characterize edges or detects features in any way. It simply produces a filtered image where pixels that most likely belong to an area of high gradient (such as an edge) are highlighted.

Bonus remark: Sobel filtering is very similar to what a render engine such as Maverick does to transform a height map (bump map) into a normal map.

The Error function (erf) (2014-09-06)

[EDIT] This post was migrated from my blog from 2011…

Here is the 1D Gaussian function:

Put in short, the Error function is the integral of the Gaussian function from 0 to a certain point x:

At least, that is the way the formula is presented by Wikipedia and Wolfram|Alpha. But as soon as you try to work with it you find out that in order to really match the above Gaussian function, normalization and axis scaling must be taken care of:

The plots below display G(x,sigma) (bell-shaped) in blue, and erf(x,sigma) (S-shaped) in yellow.

A very typical use for G(x,sigma) that I’ve been talking about on this blog lately, is to build convolution kernels for Gaussian blur. An image convolution kernel, for example, is a pixel-centered discretization of a certain underlying function (a Gaussian, in this case). Said discretization splits the x axis in uniform unit-length bins (a.k.a., taps, or intervals) centered at x=0.

For each bin, this is pretty much the definition of the integral of G(x,sigma) along the bin. That is, the increment of erf(x,sigma) between both end-points of the bin.

Discretized 1D Gaussian (sigma=0.5)

Discretized 1D Gaussian (sigma=1.0)

Discretized 1D Gaussian (sigma=1.5)

Doing it wrong:

Most implementations of Gaussian convolution kernels simply evaluate G(x,sigma) at the mid-point of each bin. This is a decent approximation that is also trivial to implement:

This value is represented in blue in the above plots.

Implementing erf:

While G(x,sigma) has a trivial explicit formulation, erf is an archetypical non-elementary function.

Some development environments provide an erf implementation. But most (the ones I use) don’t. There are some smart example implementations out there if you Google a bit. Usually they are either numerical approximations, or piece-wise interpolations.

Assuming that one has an erf implementation at his disposal, the correct value for the bin [a..b] is:

This value is represented in red in the above plots.

A poor-man’s approximation:

If you are feeling lazy, an alternative approximation is to super-sample G(x,sigma) along the bin. The amount of samples can be chosen to be proportional to how small sigma is.

This method is easy to implement. However, it is far less elegant and also less efficient as more evaluation calls are required.


When discretizing a Gaussian function, as sigma becomes small (close to or smaller than (b-a)), the approximation bin(a,b) (in blue above) deviates significantly from the correct bin'(a,b) (in red above).

So if you are going to Gaussian-blur at radius 1px or smaller, and precision is important, then it is necessary to use erf(x,sigma) or at least super-sample G(x,sigma).

Downsampling and Gaussian blur (2014-09-01)

[EDIT] This post was migrated from my blog from 2011…

I talked about several strategies to optimize convolutions in some of my previous posts. I still got to talk about how to approximate a Gaussian blur using a multi-step Box blur in a future post. However, there is yet another good technique to optimize a Gaussian blur that may come handy in some cases.

This post is inspired by a need that I had some days ago: Say that you need to do a 3D Gaussian blur on a potentially humongous 3D data buffer. Working with downsampled data sounds ideal in terms of storage and performance. So that’s what I am going to talk about here:

What happens when downsampled data is used as input for a Gaussian blur?

The idea:

Here’s the 0-centered and un-normalized 1D Gaussian function:

The sigma parameter in the Gaussian function stretches the bell shape along the x axis. So it is quite straightforward to understand that if one downsamples the input dataset by a scale factor k, then applies a (smaller) Gaussian where sigma’=s/k, and finally upscales the result by the same scale factor k, the result will approximate a true Gaussian on the original dataset where sigma=s.

In cleaner terms: if one has an input dataset (e.g., an image) I and wants to have it blurred by a Gaussian where sigma=s:

  1. I’<=I downsampled by a certain scale factor k.
  2. I”<=I' blurred by a small Gaussian where s’=s/k.
  3. I”’<=I'' upscaled by a scale factor k.

How good is this approximation?

The Sampling Theorem states that sampling a signal at (at least) twice its smallest wavelength is enough. Which means that downsampling cuts frequencies above the Nyquist limit (half the sampling rate). In other words: Downsampling means less data to process, but at the expense of introducing an error.

Fortunately, a Gaussian blur is a form of low-pass frequency filter. This means that blurring is quite tolerant to alterations in the high part of the frequency spectrum.

Visual evaluation:

In the examples below I am downsampling with a simple pixel average, and I am upscaling with a simple bilinear filter. The 2x2 grids below compare:

  1. Top-left – The original image I.
  2. Top-right – I downsampled and upscaled by k (note the blocky bilinear filter look).
  3. Bottom-left – The resulting image I”’.
  4. Bottom-right – I blurred by a true Gaussian where sigma=s.

In these examples, k=sigma was chosen for simplicity. This means that the small Gaussian uses sigma’=1.

Gaussian blur where sigma=4

Gaussian blur where sigma=16

Gaussian blur where sigma=64


As shown, the approximation (bottom-left vs. bottom-right) is pretty good.

The gain in speed depends on multiple implementation factors. However, as I explained above, this post was inspired by a need to cope with a cubic memory storage problem when doing Gaussian blurs on a 3D buffer. Working with a heavily downsampled buffer clearly helps in that sense. And it is needless to say that decreasing the amount of data to process by k^3 also brings a dramatic speed boost, making it possible to use tiny separable convolutions along the 3 (downsampled) axes.

Note that one might pick any downsampling scale factor k>=1. The higher the value of k, the higher the approximation error and the smaller and faster the convolution.

The choice k=sigma offers a good trade-off between approximation error and speed gain as shown above.

Diaphragm and f-stop (2014-08-17)

[EDIT] This post was migrated from my blog from 2011…

This is just another image taken from the Maverick Unit Testing system. The chart displays different polygonal diaphragms at different f-stop values. Doubling the f-stop number, halves the surface light can pass through.

Polygonal diaphragms and f-stop

Glare patterns (2014-08-14)

[EDIT] This post was migrated from my blog from 2011…

Glare in photography is due to Fraunhofer diffraction as light from distant objects passes through the camera diaphragm.

There is a magical connection between Fraunhofer diffraction (physics) and the Fourier Transform (math). As a matter of fact, the intensity of the Fraunhofer diffraction pattern of a certain aperture is given by the squared modulus of the Fourier Transform of said aperture.

Assuming a clean and unobstacled camera, the aperture is the diaphragm shape. Here you have the diffraction patterns that correspond to some basic straight-blade (polygonal) diaphragms.

Glare patterns

Interestingly, the Fourier Transform produces one infinite decaying streak perpendicular to each polygon edge. When the number of edges is even, the streaks overlap in pairs. That is why an hexagonal diaphragm produces 6 streaks, and an heptagonal diaphragm produces 14.

The leftmost pattern happens to be the Airy disk. The Airy disk is a limit case where the number of polygon edges/streaks is infinite.

The examples above were generated at 256x256. The visual definition of the pattern naturally depends on the resolution of the buffers involved in the computation of the Fourier Transform. However, note that the FT has an infinite range. This means that for ideal polygonal shapes, the streaks are infinitely long.

In the practical case, buffers are far from infinite, and you hit one property of the Fourier Transform that is often nothing but an annoyance: the FT is cyclic. The image below depicts what happens when one pumps up the intensity of one of the glare patterns obtained above: the (infinite) streaks, which warp-around the (finite) FT buffer, become evident.

Cyclic glare pattern

Bonus: Here’s some real-life glare I screengrabbed this evening at the European Athletics Championships.

Real-life glare

My latest running shoes (2014-08-07)

[EDIT] This post was migrated from my blog from 2011…

I have run my latest 7000 km or so on Nike Lunaracer+ (v1 and v3) shoes. This week I started my last pair. I will have to re-stock soon.

My latest 7000 km