# Brash & Plucky

## Diaphragm and f-stop - 2014-08-17

[EDIT] This post was migrated from my blog from 2011…

This is just another image taken from the Maverick Unit Testing system. The chart displays different polygonal diaphragms at different f-stop values. Doubling the f-stop number, halves the surface light can pass through.

Polygonal diaphragms and f-stop ## Glare patterns - 2014-08-14

[EDIT] This post was migrated from my blog from 2011…

Glare in photography is due to Fraunhofer diffraction as light from distant objects passes through the camera diaphragm.

There is a magical connection between Fraunhofer diffraction (physics) and the Fourier Transform (math). As a matter of fact, the intensity of the Fraunhofer diffraction pattern of a certain aperture is given by the squared modulus of the Fourier Transform of said aperture.

Assuming a clean and unobstacled camera, the aperture is the diaphragm shape. Here you have the diffraction patterns that correspond to some basic straight-blade (polygonal) diaphragms.

Glare patterns Interestingly, the Fourier Transform produces one infinite decaying streak perpendicular to each polygon edge. When the number of edges is even, the streaks overlap in pairs. That is why an hexagonal diaphragm produces 6 streaks, and an heptagonal diaphragm produces 14.

The leftmost pattern happens to be the Airy disk. The Airy disk is a limit case where the number of polygon edges/streaks is infinite.

The examples above were generated at 256x256. The visual definition of the pattern naturally depends on the resolution of the buffers involved in the computation of the Fourier Transform. However, note that the FT has an infinite range. This means that for ideal polygonal shapes, the streaks are infinitely long.

In the practical case, buffers are far from infinite, and you hit one property of the Fourier Transform that is often nothing but an annoyance: the FT is cyclic. The image below depicts what happens when one pumps up the intensity of one of the glare patterns obtained above: the (infinite) streaks, which warp-around the (finite) FT buffer, become evident.

Cyclic glare pattern Bonus: Here’s some real-life glare I screengrabbed this evening at the European Athletics Championships.

Real-life glare ## My latest running shoes - 2014-08-07

[EDIT] This post was migrated from my blog from 2011…

I have run my latest 7000 km or so on Nike Lunaracer+ (v1 and v3) shoes. This week I started my last pair. I will have to re-stock soon.

My latest 7000 km ## Fast convolutions (III) - 2014-07-19

[EDIT] This post was migrated from my blog from 2011…

Some more remarks about the performance of the convolution methods described so far. I will be leaving the brute-force algorithm out for obvious reasons.

The plot below represents input size (in pixels) in the `x` axis, and convolution time (in seconds) in the `y` axis. So, if you go to `x=4096` that means “a convolution of a 4096×4096 image by a 4096×4096 kernel”.

Even competition between FFT-based vs. separable convolution methods Two conclusions can be made from the above plot, which confirm what was explained in my previous post:

1. For large kernels (as large as the image itself) the separable convolution method is `O(n^3)` and times get to absurd levels very quickly. If you are dealing with large generic images/kernels, the FFT-based method is the way to go.
2. The FFT-based method uses the Fast Fourier Transform, which is `O(n^2·log(n))` thanks to some decomposition technique that requires the size of the input data to be (padded to) a power-of-2. For this reason, it takes the same amount of time to do a convolution on a `257×257` image/kernel than on a `512×512` image/kernel, because both cases operate on `512×512` buffers after all. This is why the graph for the FFT method is stepped. When `x` crosses a power-of-2, the running time goes up all of a sudden and stays stable until the next power-of-2.

The plot was generated with my current implementation of both convolution methods in MK_api. My FFT uses the Cooley-Tukey algorithm, and everything (i.e., FFT, IFFT, point-wise products, and 1D separable convolutions) makes use of multi-threading. There’s always room for improvement, but the running times seem pretty decent, as we’re speaking of `<2s` for images up to `4096×4096` in a 16-thread CPU. An implementation in CUDA would be (orders of magnitude) faster, though. 😛

[EDIT] It’s been 8 years since this post (2014..2022). Now we use the cuFFT implementation in Maverick, which is blazingly fast.

Separable convolution with a fixed 11×11 kernel (in orange) A couple more remarks:

1. The graph of the FFT-based method wouldn’t change if smaller kernels were used, as the algorithm requires the kernel to be padded to the size of the image. However, the graph of the separable method becomes much less steep when very small kernels are used:
2. The running time of the FFT/IFFT is not exactly a trivial subject. Speed in FFT/IFFT algorithms depends not only on the size of the data, but also on the data itself. While generating the above plots, I came across some anomalies in the time measurements. Some kernels or some images produced faster or slower results. Some combinations would even produce non-flat steps in the staircase-looking graph. That’s normal, but worth mentioning.

## Fast convolutions (II) - 2014-07-18

[EDIT] This post was migrated from my blog from 2011…

I will analyze the algorithmic complexity of the convolution algorithms described in my previous posts.

To make things simpler, let’s assume that the dimensions of the image are `>=` the dimensions of the convolution kernel, and that both are square, with dimensions `S·S` and `s·s`, respectively.

### Naive algorithm - O(n^4)

“wxh operations for each of the W·H image pixels”.

i.e., `S·S·s·s` operations. This is quadratic with a heavy constant for tiny kernels, but quickly becomes quartic for medium-to-large kernels.

The auxiliary buffer can be identical in size and bit-depth to the original image. So the memory usage factor is 2x.

### Separable convolution - O(n^3)

“One 1D convolution for each row + One 1D convolution for each column”.

i.e., `2·S·S·s` operations. This is quadratic with a bad constant for small kernels, but becomes cubic for large kernels.

Remarkably, the total running time depends on the dimensions of the image -and- the dimensions of the kernel.

Again, the auxiliary buffer can be identical in size and bit-depth to the original image. So the memory usage factor is 2x.

### Convolution theorem - O(n^2·log(n))

“Two FFTs + One point-wise product + One IFFT”.

Let’s call `S’` to the closest power-of-2 such that `S’>=S`. Then a proper implementation of the FFT/IFFT does (approx.) `2·S’·S’·log(S)` operations, while the point-wise product does `S’·S’` operations. This makes the algorithm `O(S’·S’·log(S’))` with some heavy (quadratic) overhead due to the memory copying, padding, and the point-wise product.

Remarkably, the total running time is independent of the size of the kernel.

This algorithm is quite memory hungry, though, because two `S’·S’` complex-number buffers are required. This means two floating-point numbers per entry (i.e., the real/imaginary coefficients). The algorithm starts by copying the image/kernel to the real part of the corresponding complex-number buffer, leaving the imaginary coefficient and the surface excess filled with zeroes. Then the FFTs/product/IFFT happen in-place.

So the auxiliary memory required is `4·S’·S’` floating-point numbers.

In the worst case where `S` is a power-of-2-plus-1, `S’` gets nearly twice as large as `S`. If the original image is 8-bit and we are using single-precision floating-point math for the FFT/IFFT, this means a memory usage factor of 64x. In the case of an HDR (single-precision floating-point) grayscale image, we are speaking of a worst case scenario of 16x. In average, however, the memory usage factor is around 8x. If `S` is a power-of-2, then the memory usage factor goes down to 4x.

Heavy glare using the FFT-based method in an HDR image by Paul Debevec This image with heavy glare has been output with some ArionFX-related experimental tonemapping code I am working on these days.

### Conclusions:

Assuming that we are only interested in sheer performance:

1. The FFT-based method is (by far) the fastest for large images/kernels. Interestingly, the algorithm is not affected by the size of the kernel, which can be as large as the (padded) image itself without a penalty.
2. The FFT-based method becomes even faster if the same kernel is applied multiple times. The kernel FFT can be calculated just once, and then be re-used.
3. Due to the heavy setup overhead in the FFT-based method, the separable method can be faster for small (separable) kernels where `s` is in the range of `log(S’)`.

Last, but certainly not least, there is a much faster and more light-weight algorithm for the special case of Box/Gaussian Blur. I will talk about this in a separate blog entry.

## Fast convolutions (I) - 2014-07-18

[EDIT] This post was migrated from my blog from 2011…

In my previous post it was stated that the convolution of a `WxH` image with a `wxh` kernel is a new `WxH` image where each pixel is the sum of `wxh` products obtained as the central pixel of the kernel slides across each of the pixels in the original image. This double-double loop leads to an impractical `O(n^4)` algorithm complexity.

Fortunately, we can do better, but the key here is not in optimizing the code, but in making use of some mathematical weaponry. Let’s analyze the options that we have:

1. Naive implementation.
2. Separable convolution kernels.
3. The convolution theorem.
4. Can we do EVEN better?

### Naive implementation

Pros:

• Trivial implementation in just a few lines of code.
• Works for any input size, and for any type of kernel.
• Trivial clamping at the boundaries.
• Allows for multi-threading.

Cons:

• Embarrassingly inefficient: `O(n^4)`.
• Requires an auxiliary `WxH` buffer.
• By default, the output of the operation is returned in the auxiliary buffer (not in-place).
• Not very cache friendly due to the constant jumps across rows. – Applying the same kernel multiple times has the same cost, every time.

When should I use this method?

• Never, unless the input data is tiny and clean code is more important than sheer performance.

### Separable convolution kernels

A separable convolution kernel is one that can be broken into two 1D (vertical and horizontal) projections. For these projections the (matrix) product of the `1xh` vertical kernel by the `wx1` horizontal kernel must restore the original `wxh` 2D kernel.

1D vertical and horizontal Gaussian convolution Conveniently enough, the most usual convolution kernels (e.g., Gaussian blur, box blur, …) happen to be separable.

The convolution of an image by a separable convolution kernel becomes the following:

1. Convolute the rows of the original image with the horizontal kernel projection.
2. Convolute the columns of the resulting image with the vertical kernel projection.

Note: These two steps are commutative.

2-step separable vs. brute-force 2D Gaussian convolution Pros:

• More efficient than the naive implementation: `O(n^3)`.
• Trivial implementation in just a few lines of code.
• Trivial clamping at the boundaries.
• Works for any input size.
• Since this is a two-step process, the convolution can be returned in-place.
• Cache-friendly.
• Allows for multi-threading.

Cons:

• Only works with separable kernels.
• Needs an auxiliary `WxH` buffer.
• Applying the same kernel multiple times has the same cost, every time.

When should I use this method?

• We do not use this method anywhere in Maverick’s API. You will understand why soon.

### The convolution theorem

“The convolution in the spatial domain is equivalent to a point-wise product in the frequency domain, and vice-versa.”

This method relies on the (Fast) Fourier Transform, which is one of the most beautiful mathematical constructs, ever. Seriously!

The convolution of an image by a generic kernel becomes the following:

1. Compute the Fourier Transform of the image.
2. Compute the Fourier Transform of the kernel.
3. Multiply both Fourier Transforms, point-wise.
4. Compute the Inverse Fourier Transform of the result.

Brute-force 2D Gaussian vs. the convolution theorem Magic Pros:

• Even more efficient: `O(n^2·log(n))`.
• Works with any convolution kernel, separable or not.
• Should the kernel be applied multiple times, the FFT of the kernel can be computed just once, and then be re-used.
• The FFT/IFFT and the convolution product are cache-friendly.
• The FFT/IFFT and the convolution product allow for multi-threading.

Cons:

• Definitely not easy to implement, unless you already own an FFT module that suits your needs.
• The FFT operates on buffers with power-of-2 dimensions. This means that the input image (and the kernel) must be padded with zeroes to a larger size (i.e., extra setup time + memory).
• Both the image and the kernel are transformed, which requires two auxiliary buffers instead of one.
• Each FFT produces complex-number values, which doubles the memory usage of each auxiliary buffer.
• The dynamic range of the FFT values is generally wilder than that of the original image. This requires a careful implementation, and the use of floating-point math regardless of the bit-depth and range of the input image.
• This method doesn’t do any clamping at the boundaries of the image, producing a very annoying warp-around effect that may need special handling (e.g., more padding).

When should I use this method?

• Every time that a large generic convolution kernel (i.e., not a simple blur) is involved.
• The most obvious examples I can think of in Maverick’s are Glare & Bloom.

### Can we do EVEN better?

Oh yes. We can do much better, at least in the very particular case of blur. I will talk about this in detail in a dedicated blog entry, at some point.

Some implementation remarks:

All the algorithms presented above allow for multi-threading. Naturally, MT does not change the algorithm complexity, since the maximum number of threads is fixed, but you can get very decent speeds in practical cases if you combine sleek code with proper multi-threading. In the separable case (2), MT must be used twice. First for the rows, and then for the cols. In the FFT case (3), the FFT itself can be multi-threaded (in a rows/cols fashion as well). The huge point-wise product in frequency space can be MT’ed too.

Since convolutions are usually applied on very large images, writing cache-friendly code can make a big difference. Assuming that the memory layout of your image/kernel is per rows, make sure to arrange your loops so memory accesses are as consecutive as possible. This is immediate for the loops that do a 1D convolution on each row. However, for the loops that do a 1D convolution on each column, it may help to use a local cache to transpose a column to a row back and forth.

## Point-Spread Functions & Convolutions - 2014-07-14

[EDIT] This post was migrated from my blog from 2011…

One might explain what a convolution is in many ways. However, in the field of image processing, there is an informal and very intuitive way to understand convolutions through the concept of Point-Spread Functions and their inverses.

A PSF is an arbitrary description of the way in which a point spreads its energy around itself in 2D space.

Classic PSFs: 1D box, 2D box, 1D Gaussian, 2D Gaussian Although this is not a mandatory requirement, the integral of a PSF usually equals 1, so no energy is gained or lost in the process. The above image does not match this requirement for the sake of visualization; the PSFs on the right column have been un-normalized for better visibility. On the other hand, the range of a PSF (how far away from the source point energy is allowed to reach) can be infinite. However, in most practical uses the range is finite, and usually as short as possible.

So, a `PSF(x,y)` is a function `f:R^2->R` or, in the case of images, a finite/discrete real-value 2D matrix. For example, `PSF(x,y)=0.2` means that the point `P=(a,b)` sends 20% of its energy to point `Q=(a+x,b+y)`.

If we apply the above PSFs to all the pixels in an image, this is what we get:

Classic PSFs applied to an image WARNING: Do not confuse this with a convolution; we’re not there yet.

The inverse of a PSF (let’s use the term IPSF for now) is a description of what amount of energy a point receives from the points around itself in 2D space.

So, an `IPSF(x,y)` is also a function `f:R^2->R` or, in the case of images, a finite/discrete real-value 2D matrix. For example, `IPSF(x,y)=0.2` means that the point `Q=(a,b)` receives 20% of the energy from point `P=(a+x,b+y)`.

From here follows that a PSF and the corresponding IPSF are radially symmetric:

`IPSF(-x,-y) = PSF(x,y)`

If `P=(a,b)` spreads energy to `Q=(a+x,b+y)`, then `Q=(a’,b’)` gathers energy from `P=(a’-x,b’-y)`.

Finally: a convolution is the result of applying the same IPSF to all the pixels of an image. Note that IPSF matrices are more commonly known as convolution kernels, or filter kernels.

Conveniently enough, the PSFs displayed above are all radially symmetric with respect to themselves. As a matter of fact, it is true to most popular convolution kernels (e.g., 1D/2D box blur, 1D/2D Gaussian blur, …) that the PSF and the IPSF are identical. This makes the process of spreading/gathering energy equivalent in the cases presented above, but this is not true to other (more exotic) kernels.

In the case of image convolutions, kernels are usually square matrices of dimensions `DxD`, where `D=(R+1+R)` and `R` is generally known as the radius of the kernel. This way, kernels have a central pixel. For instance, a kernel of `R=3` (where each pixel is affected by neighbors never farther than 3px away) would be a `7x7` matrix.

The convolution is a fundamental operation in Digital Image Processing, and most image filters (e.g., Gaussian Blur in Photoshop) are based on convolutions in one way or another.

Naive algorithm: A convolution is an operation that takes two discrete real-value matrices (i.e., a luminance image and a convolution kernel) and makes the center of the kernel slide along each pixel in the image. At each pixel, the kernel is multiplied point-wise with all the pixels it covers, and the sum of these products is used to replace the original pixel value. Since this operation modifies pixels on the go, an auxiliary buffer is necessary.

Let’s assume that the resolution of the image is `WxH` pixels, and the convolution kernel is a matrix of dimensions `wxh`. The convolution needs to run through `WxH` pixels and at each pixel, it must perform and add `wxh` products. This is as slow as `O(n^4)` = Terrible.

As a matter of fact, the convolution of even a small kernel with a large image can take an eternity (literally) to compute using this naive solution. Fortunately, there is some mathematical trickery that we can take advantage of. More on fast convolutions in a future post.

Bonus remark: A particularly nice example of PSF is glare, which comes from the Fraunhoffer diffraction of the camera/eye aperture. Below you can see what happens when a glare PSF is applied to an HDR image. The actual implementation convolutes the IPSF (the radial-symmetric of the glare PSF) with the source HDR image.

Glare applied to an Arion render Typical glare PSF for a 6-blade iris ## Unbiased spanning-tree generation

Recently I’ve done some micro-tests with spanning-trees in my spare time inspired by some talks I had with my good friend ditiem.

The video below is my implementation of Wilson’s algorithm.

Spanning-trees can be visualized as classic mazes, which are always pretty to look at (bonus: rainbow floodfill). But let’s forget about the maze itself. To me the majestuous beauty of this algorithm lies in the fact that it is fully unbiased:

• If the random numbers involved are uniformly distributed, it generates an also uniformly distributed random sample in the space of all possible 2D-filling spanning-trees.
• i.e., if the algorithm is called an infinite amount of times, all possible such mazes will be generated with equal likelihood.

Unbiasedness for the win.

Some interesting remarks could be made about how to optimize the generation time stochastically. Let’s save that discussion for another time…

## Circular & radial blur - 2014-07-14

[EDIT] This post was migrated from my blog from 2011…

Circular and radial blur can be implemented in different ways. The method I am going to describe here is reasonably efficient, provided that there is a hyper-fast 1D box-based blur routine at one’s disposal (more on that in a future post). The algorithm is quite straightforward to implement, and also has the beautiful property of being able to do both circular and radial blur at the same time.

I will work on grayscale images here, although as usual the process can by extended to color images by performing the same operations on the three R, G, and B channels.

The key to circular/radial blur is to not work on the image space directly, but on a dual space, where cartesian co-ordinates are transformed to polar co-ordinates around the central pixel. Like this:

Cartesian-to-polar transform Each column in the transformed image is one of the ‘spokes’ that go from the center to one of the pixels at the perimeter in the original image. The length of the largest spoke is half a diagonal, while the perimeter of a WxH image has `2·(W+H-2)` pixels. So the transformed image is a buffer of dimensions `ceil(sqrt(W^2+H^2)/2)` and `2·(W+H-2)`.

We also need an inverse transform that restores the original image from its polar form.

Note that, for better results, the transform and also its inverse must do proper filtering. Since the spokes are diagonals that do not follow the arrangement of the pixels in the original image, the process of transforming and un-transforming is not exactly reciprocal. i.e., un-transforming the transformed image does not restore the original image identically. In simpler words: this process adds some little blur due to filtering. However, this is ok, because we’re aiming at circular/radial blur after all.

Below are the schematics of the type of filtering I am using in Maverick’s API. When I sample a pixel centered at `(xm,ym)` along a spoke, I integrate the original image, constrained to a `1x1` square area. This integration simply takes the (up to) 4 overlapped pixels, and weighs each of them by the corresponding surface overlapping factor:

Sample 1x1 px area Note also that the inverse transform must undo the filtering, by splatting contributions to the final image using the same surface overlapping factors.

…and here comes the interesting part.

• If we 1D-blur the rows in the polar-space image, and then apply the inverse transform, we get a circular blur of the original image.
• If we 1D-blur the columns in the polar-space image, and then apply the inverse transform, we get a radial blur of the original image.

A fast box-based 1D blur implementation can run in `O(n)`, regardless of the radius of the blur. Let’s assume a square image of side `S`. The size of the transformed image is `2·(S+S-2)·sqrt(2)·S/2`, which means a quadratic complexity, or linear with respect to the number of pixels. The algorithm is made of fairly simple arithmetic operations, and allows for multi-threading.

Here you can take a look at a couple of filtering examples taken from Maverick’s Unit Testing System:

Circular blur  Some bonus remarks:

• If the amount of blur becomes progressively small as you approach the center, radial blur becomes lens blur.
• If the amount of blur used in radial blur is different for each color component, you get chromatic aberration.
• If you work with spectral colors, instead of RGB, chromatic aberration looks great even when the blur stretches colors along long streaks.

Lens blur & heavy chromatic aberration Some more final remarks:

• In general, blur operations are clamped to the boundaries of the buffer they operate on. However, in the case of circular blur, one must warp-around from the first to the last spoke.
• It is not really necessary to create the transformed polar image, which is (much) larger than the original. One can feed the 1D blur with transformed pixels directly, and save some memory. Doing so doesn’t cause a performance penalty, because the algorithm runs through each spoke only once.

## Random-walk SSS

Arion, predecessor of Maverick Render, was already doing “random-walk” volume and SSS rendering long before the term random-walk even became trendy.

This is a graph plotted by some simulation code I prototyped way back during the R&D phase (circa 2014). Each wavelength (simplified to RGB in the plot) traverses the media with different stochastic statistics. ## Legendre polynomials

Legendre polynomials. Because… Why not? ## Anatomy of a PE - 2012-10-30

[EDIT] This post was migrated from my blog from 2011…

### Preamble

Executables (EXE/DLL) in Microsoft Windows make use of a format called PE (Portable Executable).

A PE image typically lies in the hard drive as a .exe or .dll file, and is loaded by the Windows Loader into the RAM of the system when a process is created from that file (e.g., by double-clicking the .exe). So, basically, we have two different states:

• Physical PE image file.
• Loaded PE image (running process).

### Layout

The layout of a PE image contains the following elements:

• A header, glued at the first byte of the file.
• A list of consecutive sections.
• An optional chunk of trailing (unused) bytes.

These elements are concatenated, and there may or may not be padding space between them. This padding space, if present, is usually filled with `0x00` bytes.

It is possible to add trailing bytes to a PE image. Actually, those files will simply lie there doing nothing. However, some types of processes such as package installers can use the trailing area to add some payload that the process will use (e.g., decrypt, uncompress, …) at some point.

### The PE Header

The PE header contains a wealth of information, structured in a fixed way.

Among the many details about the PE that you can gather from the header, there’s the list of sections. The list of sections contains the name, position, and size of each section in the PE, both in physical file format, and when loaded in memory as a process.

Many other things are described by the header, such as the address of the Entry Point (the instruction where code execution must start), and the address of some data directories. It is in these data directories where you can find the list of DLLs the PE depends on, the list of functions exported by the PE, etc…

The PE Header is usually `0x0400` (1024) bytes long. The data structures contained in it are defined in `winnt.h`, and they are always the same (so you can assume offsets and such). One must pay attention, though, to the fact that depending on whether the process is 32-bit or 64-bit, some pieces of the header will be PE32 or PE64. One can tell whether the process is 32-bit or 64-bit by checking the header flags (which are found at a location which is common to PE32 and PE64 headers).

### RVAs and VAs

In general, the addresses found in the PE header are given as RVAs (Relative Virtual Addresses). An RVA is an offset relative to the first byte of the PE image. Assuming that you know the location in memory of the first byte of the PE image (the image base pointer), then the relationship between a VA (Virtual Address) and its corresponding RVA is given by:

`VA = ( base + RVA )`

`RVA = ( VA - base )`

If the PE is a physical file, then the base pointer is simply the start of the file. However, if the PE has been loaded as a process, then the base address can be found in several ways that should (in theory) match:

• The value returned by `GetModuleHandle(0)` is, in fact, the base pointer.
• The `HINSTANCE` received by `WinMain` is, in fact, the base pointer.
• The Windows Loader stores the base pointer at the PE header on load, in the `BaseOfImage` field.

The PE header is glued at this location in memory. So, in run-time, one can use this knowledge to do PE-related operations such as leap-frogging through the PE sections. A typical anti-cracking use is to run a CRC32 on the code section in order to display a badboy message if the executable code has been patched, infected, or tampered with.

### Physical vs. loaded state

The anatomy of the PE image is different, yet quite similar, in both (physical vs. loaded) PE states.

When a PE is loaded as a process, the PE image file gets chopped in sections, and these sections get relocated (copied) in memory:

• In both states, the PE image forms a block of contiguous bytes.
• In both states, the header and the sections are found in the same order.
• The amount of padding between sections usually differs.

In their physical file form, PE sections do not need to have any particular padding, so EXE/DLL files can be as small as possible. On the other hand, when loaded, each PE section occupies a certain number of whole memory pages with certain permissions (execution, read-only, etc…). The amount of padding for the start address and the size of each section is given by the PE header.

It is important to note that this relocation procedure simply adds some padding between sections. The section chunks themselves remain identical, and are a straight copy of the original bytes found in the physical file. The only exceptions are some areas which get initialised by the Windows Loader on load (e.g., some fields in the PE header, the IAT, …). I will talk about these in a future post.

[EDIT] I have made significant progress in this area in the 10+ years since I wrote this. But Part II of this post never saw the light of day. Spare time has always been scarce.

## Code fortification - 2012-10-17

[EDIT] This post was migrated from my blog from 2011…

Lately, I’ve been reading a lot about code fortification, and low-level debugging. Here are some of the most interesting links I’ve found. They’re all worth reading if you fancy hardcore OS internals and such:

An In-Depth Look into the Win32 Portable Executable File Format

Peering Inside the PE: A Tour of the Win32 Portable Executable File Format

The Portable Executable Format on Windows

Writing a basic Windows debugger (Part I)

Writing a basic Windows debugger (Part II)

An Anti-Reverse Engineering Guide

Introduction Into Windows Anti-Debugging

Understanding the Import Address Table

Anti-unpacker tricks

This all brings me back to my teenager years coding in x86 assembly language.

## Index-Of-Refraction - 2011-10-15

[EDIT] This post was migrated from my blog from 2011…

This all below is stuff that kids study in school. :)

Below is a visualization of the behavior of a ray of light as it hits a dielectric interface.

Some key phenomena which show up in the video are:

• The Fresnel term (reflection vs. refraction).
• The Index of Refraction.
• The critical angle.
• Total Internal Reflection (TIR).
• As nd increases the Index of Refraction becomes higher, and so does the Fresnel term, which defines the proportion between reflected and refracted light. The critical angle becomes higher too, so there is more Total Internal Reflection.

When a ray of light hits an interface (assuming an ideal surface), all incident light must be either reflected or refracted. The Fresnel term (controlled by nd) tells how much light is reflected at a given incident angle. All the light that is not reflected is refracted, so both amounts (reflection and refraction) always add up to the total amount of incident light.

The Fresnel term approaches 1 at grazing angles (all light is reflected and nothing is refracted, regardless of nd) and is low (the lower the smaller the nd) at perpendicular angles (more light is refracted).

As a rule of thumb:

• The lower the nd, the lower the IOR, and the more transparent the surface (more glass/liquid-like).
• The higher the nd, the higher the IOR, and the more reflective the surface (more metallic/mirror-like).

For example:

• Void: nd=1.
• Air: nd=1.1.
• Water: nd=1.33.
• Glass: nd=1.51.
• Diamond: nd=2.5.
• Metals: nd=20+. (Simplification that ignores Complex IOR).
• Ideal mirror: nd=infinity.

When a ray of light enters a medium with an nd lower than the nd of the previous medium, there is an angle at which the Fresnel term becomes 1 and beyond which light won’t refract anymore. This angle is called critical angle, and beyond it, the surface behaves like a perfect mirror, reflecting back all incident light. This effect is called Total Internal Reflection (TIR).

## The cosine lobe - 2011-10-12

[EDIT] This post was migrated from my blog from 2011…