Brash & Plucky

by: Chema Guerra

Micro-Patch Displacement Mapping

One major endeavour that I undertook this past summer was the creation of a novel displacement mapping system for Maverick Render.

The main motivations behind this have been:

  • Our customers were running out of GPU memory too often when using displacement.
  • Our main goal is that Maverick is an exquisite-quality tool for model visualization. Hence displacement is very important to us.
  • Open the door to some other geometry-generation features beyond displacement itself.

So I undusted the old virtualized geometry solution I implemented for fryrender back in 2008 and re-engineered it for the Maverick core in the GPU.

Say hi to: Micro-Patch Displacement Mapping

Non-affine MPDM in arbitrarily-curved surfaces

Affine MPDM in flat surfaces

Some good properties of Maverick’s displacement:

  • No requirements whatsoever on the base mesh or on the heightmap texture.
  • It costs near 0 in warm-up time.
  • It costs near 0 in memory (only a small payload is appended to the height map).
  • 0 information is lost; every single texel in the heightmap is turned into a displaced micro-patch.
  • Implemented as a custom primitive in Maverick’s OptiX ray-tracing backend.
  • Used every trick in the book to optimize it as much as I could.

MPDM vs. brute-force

MPDM is meant to address some classic problems with the subdivide-and-displace-vertices (brute-force) approach. Using brute-force:

  • There is some (often significant) wait time for warm-up.
  • Memory usage goes through the roof, more than often blowing up all available memory.
  • Height sampling happens at the vertices and not at the heightmap texels.

Points 1-2 are very obvious. It becomes impossible to abuse displacement, or to recklessly use it for large surfaces. Some techniques such as back-face culling or camera culling can be used to work around these difficulties, but such techniques pose problems on their own. Expecting the user to be careful not to exceed his memory budget is always an ill-fated plan.

Point 3 often means that the resulting displacement is smudgy, and not as crisp as it should be. i.e., information is lost, and/or geometry is produced in excess. Some techniques (that Maverick itself has featured in the past) such as autobump can be used to work around this problem by kind-of-restoring the information that is lost in the process. But then again, optimal configuration would become the responsibility of the user.

None of these issues happen with MPDM. :-)

Are there any benefits to the old brute-force approach?

As usual, nothing really good comes without a price. This is often very true in the realm of algorithm optimization.

  • MPDM unfolds virtualized geometry during ray-traversal. The displaced geometry never exists for real in memory (that is why it is said to be virtualized) and this is the reason why MPDM has a near 0-cost in memory usage. However, this virtualization comes at an algorithmic cost. Despite massively optimized, MPDM renders a bit more slowly than the same displaced geometry would if the vertices were pre-displaced during warm-up and entered the ray-tracing BVH as regular geometry.
  • For MPDM, part of the optimization process relies on the fact that the input map is discretized in texels, (as in a regular bitmap). For brute-force displacement the only requirement is that the height map can be evaluated at each geometry vertex, which is true to bitmaps, procedurals, and all map types. So MPDM is a priori limited to filetex maps, but we have added to Maverick a new map type called bakemap so the user can bake any procedural map tree into a filetex to feed MPDM if necessary.

How does MPDM work?

Here’s a very broad explanation on how MPDM in Maverick works:

  • A (rather classic) hierarchical partitioning scheme based in min-max mip-map levels is performed on the input heightmap. As depicted in the images below.
  • Those voxels are efficiently ray-traceable… unless there is curvature in the base mesh. Hell breaks loose as soon as curvature is at play. Maverick does non-affine math here (visualized in a previous post on non-affine triangle sweeps). I’ve tried different approaches for this (non-affine, discretized, and hybrid), and the code allows to flag-switch each mode on and off.
  • Because of the non-affine treatment, at the lowest hierarchy level you no longer have quads or triangles, but bilinear patches (hence the name: Micro-PATCH Displacement Mapping).

Non-affine MPDM in arbitrarily-curved surfaces

The input heightmap used

Affine MPDM in flat surfaces (special optimization branch)

The input heightmap used

But the devil lies in the details. Efficient implementation of this all was a multi-week hell working with data structures, low-level optimizations, coordinating MPDM with OptiX and the rest of the Maverick core, and dealing with plenty of non-obvious fine-grain issues such as:

  • Avoiding cracks across edges.
  • Dealing with UV mapping projections.
  • Transfer of shading normals.
  • Real-time notifications from the UI.

Youtube 4K videos:

Diffraction vs. Multi-resolution

I’ve been working lately on glare/bloom/fringe and other post-processing effects in Maverick Render. Some of these inherit from our lovely ArionFX Adobe Photoshop and AfterEffects plug-in.

One complaint in ArionFX and also in Maverick is (was, because this post is about a successful fix) that Glare/Bloom diverge in shape and power when the input image is rendered at a different resolution, even if the Glare/Bloom parameters stay the same.

There are some relatively unobvious reasons for this. Basically, the challenges are:

  • Hard challenge: Diffraction is a frequency analysis effect. For a render, this happens in the discrete realm (pixels). The size (amount of pixels) of the images involved changes what frequencies and how they show up in the Fourier Transform.
  • Hard challenge: Anti-Aliasing of neighboring pixels (more prevalent at low resolution) averages their power and dims the overall Glare/Bloom overlay. This can pose a real problem for thin geometries such as lightbulb filaments.
  • Easy challenge: As illustrated in some of my previous posts, the FT itself has some properties that relate its scale and power to the scale and power of the aperture/obstacle of the lens iris. These of course must be compensated for.
  • Medium challenge: Changes in aspect ratio, or in padding in the image buffers (such as the padding between the IPR size in the UI vs. the canvas size) must be taken into account as well.

The upcoming release of Maverick will address these issues.

Here’s a small video with a sequence of Maverick post-processing effects, all rendered alternating landscape and portrait aspect ratios between 512 and 2048. The video is cropped as landscape to be easier on the eyes. As can be seen, at lower resolutions there’s always some power divergence, and a little bit of blur. But those are unavoidable to some extent.

Youtube 4K version:…

Flyby heatmaps (i)

These are some technical visualizations from the all-new 0-memory geometry generation subsystem in Maverick Render which we intend to release soon.

I will be posting more related images and videos in the upcoming days.

Youtube versions of the above videos

[EDIT] I believe that has some difficulty with (i.e., won’t play back) videos encoded at a non-standard aspect ratio. So here are the same videos in Youtube form:………

Results in production

The first beneficiary of this subsystem is Displacement Mapping (now Micro-Patch Displacement Mapping a.k.a., MPDM). This new MPDM system uses near zero memory, near zero warm-up time, and captures literally and exactly every single texel in the input texture.

Youtube 4K version:…

Glare/Bloom versatility in Maverick

These below are a bunch of glare patterns produced with the (much improved) bloom/glare toolchain in Maverick Render.

These tests have been conducted by @Vloba with the soon-to-be-released Maverick Studio 2022.5 and they make use of some of our procedural maps for aperture.

These images are expressive of the versatility of the new system.

Recent developments in Maverick Render

I am super-stoked about the new tech we’ve been creating for Maverick Render recently.

Test render courtesy of Studio Podrini

Our Product Manager wouldn’t like me to spoil any surprises by revealing what new features are involved in this image, so I will leave that to the viewer’s imagination. But here’s a hint: new/updated BSDF stuff + new heavy raytracing-related stuff + even more new heavy raytracing-related stuff.

All these will see the light in Maverick Render 2022.5 (as soon as it is ready).

Barycentric coords in triangle sweeps

A “triangle sweep” can be defined as 2 triangles (A,B,C) and (D,E,F) and the 3 bilinear patches at the sides of both triangles. It’s kind of a “twisted” triangular prism.

We can define some sort of (u,v,w) barycentric coordinates here, such that each point P interior to the sweep could be written as P=(A+AD*w)*u+(B+BE*w)*v+(C+CF*w)*(1-u-v).

As soon as the sweep is “twisted” the behavior of those barycentric coords is no longer linear. i.e., linear motion in the (u,v,w) space does not translate to linear motion in 3D space and viceversa. However these coordinates can be useful for certain applications.

One way to find (u,v,w) for a given P is to solve for w first (the red plane below) and then find u and v as regular barycentrics in the triangular section. After massaging the numbers a bit, solving for w turns out to be a cubic equation, which is the wiggly orange curve below.

We are only interested in solutions where w IN [0..1].

Octahedron unit-vector encoding

There are many situations in Computer Graphics where you may need to store massive amounts of unit-length vectors (e.g., directions, surface normals, …). The normals AOV in a G-Buffer, incoming directions in a Photon map, or uploading a 3D model to the GPU for rendering are situations that greatly benefit from a bit-efficient encoding for normals.

Spherical coordinates encoding

The straightforward way to encode a normal is to store its (x,y,z) coordinates, which takes 3x32=96 bits in single-precision floating-point. This naive approach doesn’t take advantage of the fact that in normals or directions the vector length is irrelevant. So assuming unit length, you may use spherical coordinates (theta, phi) and encode each normal with 2 numbers instead of 3.

Since (theta, phi) have a [0..2pi] and [0..pi] range, it sounds reasonable to quantize its values using fixed-point math. This way both angles become a couple of integers of, say, 16-bit each, with their ranges turned into [0..65535]. Note that this gives twice as much resolution to phi but that’s fine. We have achieved a reasonably fine-grained 16+16=32-bit encoding. Depending on how tolerable precision losses are in your application, you may go even lower (e.g., 12+12=24-bit, or even 8+8=16-bit).

The problem with this simple approach is that spherical coordinates are not an equal-area transform, as they concentrate points near the poles.

Spherical coordinates encoding: phi gets more precision than theta, and density between the equator/poles is highly uneven.

Octahedron encoding

A smarter solution which produces a much more even distribution, while being strikingly simple and more efficient computationally (no trigonometry whatsoever) is octahedron encoding.

Octahedron encoding: The distribution is pretty even.

The idea is to project positions in the sphere (unit vectors) onto a sphere-centered octahedron. And then piece together the 8 faces of the octahedron in the unit cube. This diagram should be self-explanatory.

Reference: Octahedron environment maps

Fibonacci encoding

Recently I have been using the Fibonacci lattice here and there and had the idea to use its equal-area spherical flavor for normal encoding. After googling a bit I came across this paper, so somebody had the idea first:

Unorganized Unit Vectors Sets Quantization

Actually, pulling from this thread I also came across these Shadertoy samples by Íñigo Quilez. They visualize what happens with either method as more bits are budgeted for normals encoding:



Fibonacci encoding: Looks great, but comes at a price.

This looks elegant (pretty even!). Unfortunately the implementation to encode/decode unit-vectors with n bits is rather convoluted and involves N=2^n and 1/N. As soon as n>24 or so numerical degradation makes this solution unusable in single-precision. Also, this method is significantly slower than octahedron encoding, and while the distribution looks prettier, the difference in practice is negligible.

QRNs in the unit square/disk/sphere

Basic concepts in this post, but worth refreshing.

Very often in Computer Graphics (in Monte Carlo integration in particular) you need to sample N points uniformly over the surface of a square, a disk, or a sphere.

Depending on many (many!) factors such as how critical the quality of the random distribution needs to be, performance or code simplicity considerations, availability of a high-quality randomization seed, etc… one may choose one method or another. There are plenty of options, and each comes with their own strengths and weaknesses.

In this post I will talk about:

  • The Equal-Area transform to turn a pair of uniformly distributed random numbers in the unit square into uniformly distributed points over the surface of the the unit disk/sphere.
  • A super-simple Fibonacci-based method to generate N points that uniformly cover the unit square (and hence the unit disk/sphere).

Equal-Area transform

Given u a uniform distribution of 2D points in the surface of the unit square, the transforms to produce uniform distributions of 2D points in the unit disk, or 3D points in the unit sphere are:

There are very clean-cut explanations for these on the always fantastic website.

Disk Point Picking

Sphere Point Picking

These are based on the area (disk) and solid angle (sphere) differential elements.

Fibonacci lattice

The algorithm to generate the first N points in the Fibonacci lattice goes like this:

This produces a stratified distribution of points in the unit square.

The beauty of this method is that it’s a strikingly simple couple of divs and a mod. Unlike most series generators, however, it requires that you know N beforehand. Other usual methods like the ones depicted later in this post are incremental, so each new sample is spawned next to the previous ones. The Fibonacci lattice, on the other hand, “squeezes” previous samples as new ones come in, as illustrated by the videos below.

Fibonacci lattice (square/disk/sphere)

This videos simply plug the Fibonacci lattice formula to the Equal-Area transforms above.

As a bonus, the next videos are generated like the ones above, but using a uniformly distributed QRN generator (the popular and very strong Mersenne Twister) and the van Der Corput stratified series (in particular, the Halton sequence using primes 2 and 3).

Mersenne Twister (Random)

Halton (Stratified)

Viewport baked PBR materials

The technique described here is very basic, but useful nonetheless.

Let’s make the following broad assumptions:

  1. No interactions between surfaces (i.e., no GI, no occlusion, etc…).
  2. The only light source is an IBL (environment) map placed at infinity which orientation is anchored to the camera.
  3. Each surface point P is made of a perfectly rough white material which simply “collects” the light that arrives to P from above the hemiplane defined by (P,N).

Under such assumptions a ball floating in the void looks like this image rendered in Maverick Studio with a procedural sphere primitive and a standard material:

The illumination collected by each point P and radiated back to the camera (+ tonemapping) happens to be, exactly, one particular pixel in the rendered ball as depicted in the hand-drawn diagram above. Note that due to the constraints set, this color depends exclusively on the normal N at P but not on P itself. In other words, the rendered ball effectively precomputes what any surface point P sees through its normal N.

The goal in this post is to benefit from this knowledge in a rasterization viewport. So let’s jump now to the world of pixel shaders. In our vertex shader, let’s transform the object’s normals N to camera space Nc and let’s drop the resulting Z coordinate. (Nc.x,Nc.y) are the geometry normals projected to screen space. Let’s normalize these: Nc'=normalize(Nc.x,Nc.y)

Let’s upload the ball render as a regular 2D texture and let’s define the following UV space:

Nc' is normalized, so the UVs generated will cover the interior of the ball only. Normals at grazing angles will be near the perimeter of the texture’s ball.

In our fragment shader, let’s simply output the color found in the ball image at the UVs computed as described. Here’s the result:

Of course, physically-based materials are more complicated than this. In general, the amount of light received by the surface from the (P,N) hemiplane through each possible incoming direction and then bounced back through each possible outgoing direction depends on both directions, and also on the properties of the material. This is basically the definition of a BRDF, BTW.

But because we set the constraint that the lights are not moving (i.e., the IBL stays fixed with respect to the camera) the incoming and outgoing directions and hence the behavior of the BRDF ends up being a function of N and nothing else. So this method still works no matter what material we render the ball with (well… excluding refractions).

Here goes another example with the same IBL but adding some specularity to the material:

Of course this could be pushed further with some SSAO or any other technique that approximates self-shadowing. But this is a very compact and cheap (memory/code/effort) way to display models in an easy-to-author appealing way.

Some final notes:

  • If you want to be accurate, the ball should be rendered with an orthographic camera (i.e., not with a perspective camera).
  • For blurry/glossy or diffuse reflections, you can get away with a low-res ball render. However, make sure that the UVs that you generate don’t step on the antialias pixels at the boundary of the rendered ball. Otherwise shading will look b0rked at grazing angles.
  • This technique can be extended by using 2 ball images: one with the diffuse component of the material, and another with the specular component. This way the fragment shader could modulate the diffuse part and the specular part by separate colors that could be passed as inputs to the shader. Keep in mind that this would require that the ball images and the colors are passed in linear (instead of sRGB) space and that the resulting color is sent back to sRGB.

Trimmed NURBS

I’m currently swimming in an ocean of NURBS with their trimming loops. This is experimental work for NURBS rendering in Maverick Render.

[EDIT] Throwing some GLSL vert/frag shaders in to the mix for normal/tangency inspection. 2 days in and this is still WIP.

Colorful pet project

Some pet project I am making some progress on. More info soon. Definitely. Maybe.

[EDIT] I had to install a plug-in so Twitter cards display the main post image when cross-posting to Twitter. Apologies for the multiple test posts.

[EDIT] Here’s a hint. It’s a cutsie CSG-based Solid modeling experiment in Unity Editor.

Topology fun

Recently I have been revisiting in my spare time one of my favorite subjects: topology. I will probably be posting some stuff about topological operators, subdivision surfaces, DCEL structures, irregular tilings and such things in the near future.

Below: A little demo I’ve written that supports an extended flavor of Conway’s notation for polyhedra: adotaocD.

[EDIT] I paused this pet project for the time being… because the 2022.2 release of Maverick has virtually blown all my productivity hours away. But I plan to write some entries on topological operators at some point further into this year… Hopefully…

Old MLT experiments

These are some very old MLT tests that I found on my hard drive today.

Metropolis Light Transport (MLT) is a global illumination variant of the Metropolis-Hastings algorithm for Markov-Chain Monte Carlo (MCMC).

In a nutshell, MCMC algorithms explore the neighborhood of a Monte Carlo sample by jittering (mutating) said sample. A metric is then used to define how good a sample is, so the next mutation is accepted or rejected based on how good it is compared to the current sample. This leads to a random-walk in the space of samples which tends to crawl up in the increasing direction of the metric used.

This method biases sampling towards successful samples. So a bit of statistics juggling is required to counter that bias. The link below is a seminal paper that beautifully describes the nuts and bolts of this process. The idea presented there is to jitter samples, not in sample space directly, but in quasi-random number tuples space instead:

Simple and Robust Mutation Strategy for Metropolis Light Transport Algorithm

In the particular case of MLT, samples are random light paths, and the metric used is the amount of light energy transported by a path. This can be implemented on top of regular Path Tracing. Arion was the first (?) commercial render engine that did this on the GPU back in 2012 or so.

MLT has fallen out of favor over the years (path guiding is all the rage these days). But MLT proved to be a competent universal solution to global illumination, being capable of boosting path tracing so it could efficiently solve even the hardest cases, such as refractive caustics.

The beauty of MLT is that it requires 0 configuration from the user and does not need any additional memory for data structures. The main drawbacks are the ugly splotchy and non-uniform look of noise, as compared to regular path tracing, and the inability to guarantee noise stability across animation frames.

Experiments in 2D

Before I implemented MLT in Arion, I did some MCMC simulations with 2D sampling. In the videos below I used the (x,y) coords of pixels as samples, and the grayscale luminance Y of the image as the metric. The goal of the mutations here is to reconstruct the image by sampling harder in the directions of higher luminance.

MCMC is prone to getting stuck for long amounts of time in local maxima. The practical solution proposed in the above paper is to introduce a plarge probability that kicks the sampling scheme and sends the next sample to a purely random position in the sample space. The videos below visualize this very well I think.

Mutating in QRN tuple space without plarge - Watch on Youtube

Mutating in QRN tuple space with plarge - Watch on Youtube

MLT reconstructing a grayscale image - Watch on Youtube

Implementing MLT successfully in our render engine did require some specialization in our sampling routines. I won’t get into the details, but basically, since mutations happen in the random tuples that generate the paths, you expect continuous mutations in the tuples to produce continuous changes in the generated paths. So all samplers involved must avoid discontinuities in the [QRN<->path] bijection.

Back in time

Some MLT-heavy images rendered in Arion back then:

Scrambled Halton - 2015-03-10

[EDIT] This post was migrated from my blog from 2011…

The Halton sequence, which is one of my favourite algorithms ever, can be used for efficient stratified multi-dimensional sampling. Some references:

It is possible to do stratified sampling of hyper-points in the s-dimensional unit hyper-cube by picking one consecutive dimension of the Halton series for each component. A convenient way to do so is to use the first s prime numbers as the basis for each Halton sequence.

It is well-known, however, that while this approach works great for low dimensions, high dimensions often exhibit a great degree of undesired correlation. The following image displays a grid where each cell combines two components of the 32-dimensional Halton hyper-cube.

Raw 32-dimensional Halton hyper-cube

One can easily spot how some pairs exhibit an obvious pattern, while others fill their corresponding 2D area very densely. This happens more aggressively for higher dimensions (i.e., to the right/bottom in the image) and for pairs formed with close components (i.e., near the diagonal in the image). Click to enlarge!

A successful approach to dissolve this problem without losing the good properties of stratification is to do “random digit scrambling” (a.k.a., rds). During the construction of a Halton number, digits in the range [0..base[ are combined. Given a Pseudo-Random Permutation of length=base, all that one must do is use PRP(digit) instead of digit directly. This somewhat shuffles Halton pairs in rows and columns in a strong way so the correlation disappears. However, since the PRP is a bijection, the good properties of stratification are generally preserved.

[EDIT] While this is true, it has been proven that Owen-Scrambling is a superior method.

How to build a strong and efficient randomised PRP of an arbitrary length is a very interesting subject which details I won’t get into here. You can build your own carefully combining a number of reversible hashing operations.

Here’s the scrambling strategy in action:

Scrambled 32-dimensional Halton hyper-cube

Now all the blocks in the grid look equally dense. Magic!

As long as one picks a good seedable PRP, it is possible to generate any number of different samplings, all with the same good properties.

[EDIT] These are the guidelines of the QRN system that Arion used for many years.

Hosek & Wilkie sky model - 2015-03-03

[EDIT] This post was migrated from my blog from 2011…

This is an old comparison between the Preetham and Hosek & Wilkie sky/sun models.

Preetham et al. sky model

Hosek & Wilkie sky model