GDC 2015 presentations

The Game Developers Conference took place last week in San Francisco. As I am starting to see more speakers publish their slides, I am creating this post to keep track of some them (this list is not meant to be exhaustive).

For a more extensive list, Cédric Guillemet has been garnering links to GDC 2015 papers on his blog.

Gamma correct and HDR rendering in a 32 bits buffer

Recently I am looking for the available options for doing gamma correct and/or HDR rendering in a 32 bits buffer. Gamma correct means you need higher precision for low values (this article by Benjamin Supnik demonstrates why). HDR means you may have values greater than 1, and since your range is getting wider, you want higher precision everywhere. The way to go recommended everywhere is to use 16 bits floats, like RGBA16, or even higher. But suppose you don’t want your buffer to get above 32 bits, what tools are available?

Note: the article has been reworked as I gathered more information. I thought organizing them was better than merely adding an update notice at the end.


My first thought was to use standard RGBA8, store the maximum of the RGB channels in the alpha channel, and store the RGB vector divided by that scale. A back of the envelope test later, I was forgetting about it, convinced it wouldn’t go very far: since values are limited to the [0, 1] range, it would require to define the maximum value meant when alpha is 1. More importantly, interpolation would give incorrect results.

Or so I thought. It seems doing this is known as RGBM (M for shared multiplier) and while indeed the interpolation gives incorrect results, this article argues they are barely noticeable, and the other advantages outweigh it (see RGBD here after for an other worth reading article).

There are also variations of this approach, as shown on this online Unity demo. Here is the code.


By searching on the web I first found this solution, consisting in storing the inverse of the scale in the alpha channel. Known as RGBD (D for shared divider), it doesn’t suffer from having to define a maximum value, and plotting the function seems to show an acceptable precision across the range. Unfortunately it doesn’t interpolate either.

This article gives a good comparison of RGBM and RGBD, and addresses the question of interpolation. Interestingly, it notes that while neither have correct interpolation, whether it may acceptable or not depends on the distribution of the colors.


Then you have the RGBE (E for shared exponent): RGB and an exponent. Here is a shader implementation using an RGBA8 buffer. But then again, because of the exponent being stored in the alpha channel, interpolation is going to be an issue.


Further searching, I stumbled upon the OpenGL EXT_texture_shared_exponent extension, which defines a GL_RGB9_E5 texture format with three 9 bits components for the color channels, and an additional 5 bits exponent shared by the channels. This sounded nice: 9 bits of precision is already twice as many shades, and the exponent gives precision everywhere, as long as the channels values have the same order of magnitude. Because it is a standard format, I assume interpolation is going to be a non issue. Unfortunately as can be read on the OpenGL wiki, while this is a required texture format, it is not required for renderbuffers. In other words: chances are it’s not going to be implemented.


Since we really want a wide range of light intensity, a different approach is to use a different color space. Several people mentioned LogLUV, which I hear gives good results, at the expense of a high instruction cost for both packing and unpacking. Here is a detailed explanation.


There is still the R11F_G11F_B10F format (DXGI_FORMAT_R11G11B10_FLOAT in DirectX) where R and G channels have a 6 bits mantissa and a 5 bits exponent, and B has a 5 bits mantissa and 5 bits exponent. Since floats have higher precision with low values, this seem very well suited to gamma correct rendering. And since this is a standard format, interpolation should be a non issue.


I haven’t tested in practice yet, but from these readings it seems to me the sensible solution would be to use a R11G11B10 float format when available. Otherwise (for example on mobile platforms) choose between RGBM and RGBD depending on the kind of image being rendered. Unless the format is standard, it seems interpolation is always going to be an issue, and the best you can do is mitigate by choosing the solution depending on your use case.

Did I miss something?

Readings on vector class optimization

Now that Revision has passed, we feel tempted to grab the ax and happily chop into parts of our code base we wanted to change but couldn’t really since we had other priorities. One tempting part is the linear algebra one: vector, quaternion and matrix data structures. Lets say vector for a start. Not that it’s really necessary, but the transformations are the most time consuming parts after the rendering itself, and the problem itself is somewhat interesting.

After a little googling, I basically found three approaches to this problem:

Every here and there, people seem to think of SSE instructions as a silver bullet and propose various examples of code, snippets or full implementations. The idea being to use dedicated processor instructions to apply operations on four components at a time instead of one after another.

Quite on the opposite, Fabian Giesen argued some years ago that it was not such a good idea. A quick look at the recently publicly released Farbrausch codebase shows they indeed used purely conventional C++ code for it.

At last this quite dated article (with regards to hardware evolution) by Tomas Arce takes a completely orthogonal approach, consisting of using C++ templates to evaluate a full expression component after component, thus avoiding wasting time moving and copying things around.

I am curious to implement and compare them on nowadays hardware.

Update: this is 2016 and the topic was brought back recently when someone wrote the article How to write a math library in 2016.

The point of the article is that the old advice to not bother with SSE and stick with floats doesn’t apply anymore, and it goes on to show results and sample code. This sparked a few discussions on Twitter, with opinions voiced to put it mildly.

It seemed the consensus was still against the use of SSE for the following reasons:

  • Implementation is tedious.
  • For 3 dimensional vector, which is the most common case, there is a 25% waste.
  • For 4 dimensional vectors, like homogeneous coordinates and RGBA, it doesn’t work so well either since the fourth component is treated differently than the other ones.
  • Even if the implementation detail is hidden behind a nice interface, the alignment requirements will leak and become constraints to the rest of the code.
  • Compilers like clang are smart enough to generate SSE code from usual float operations.