Readings on vector class optimization

Now that Revision has passed, we feel tempted to grab the ax and happily chop into parts of our code base we wanted to change but couldn’t really since we had other priorities. One tempting part is the linear algebra one: vector, quaternion and matrix data structures. Lets say vector for a start. Not that it’s really necessary, but the transformations are the most time consuming parts after the rendering itself, and the problem itself is somewhat interesting.

After a little googling, I basically found three approaches to this problem:

Every here and there, people seem to think of SSE instructions as a silver bullet and propose various examples of code, snippets or full implementations. The idea being to use dedicated processor instructions to apply operations on four components at a time instead of one after another.

Quite on the opposite, Fabian Giesen argued some years ago that it was not such a good idea. A quick look at the recently publicly released Farbrausch codebase shows they indeed used purely conventional C++ code for it.

At last this quite dated article (with regards to hardware evolution) by Tomas Arce takes a completely orthogonal approach, consisting of using C++ templates to evaluate a full expression component after component, thus avoiding wasting time moving and copying things around.

I am curious to implement and compare them on nowadays hardware.


Update: this is 2016 and the topic was brought back recently when someone wrote the article How to write a math library in 2016.

The point of the article is that the old advice to not bother with SSE and stick with floats doesn’t apply anymore, and it goes on to show results and sample code. This sparked a few discussions on Twitter, with opinions voiced to put it mildly.

It seemed the consensus was still against the use of SSE for the following reasons:

  • Implementation is tedious.
  • For 3 dimensional vector, which is the most common case, there is a 25% waste.
  • For 4 dimensional vectors, like homogeneous coordinates and RGBA, it doesn’t work so well either since the fourth component is treated differently than the other ones.
  • Even if the implementation detail is hidden behind a nice interface, the alignment requirements will leak and become constraints to the rest of the code.
  • Compilers like clang are smart enough to generate SSE code from usual float operations.

Mesh generation in Farbrausch demos

Back in 2007, the German demogroup Farbrausch was releasing its masterpiece, fr-041 Debris. Aside from the artistic qualities of this demo, it is impressive by its size: just 177kB worth of binary, for a seven minutes long full featured animation. The equivalent of a mere ten seconds of mp3 of poor quality. Size coding at work.

Reaching such a result requires dedicated tools, compression of course, but more importantly, content generation. Fabian Giesen offered in a recent blog post to give some details of the techniques used for this demo. After a quick poll, he decided to start with mesh generation, hence the first post of a very promising series: Half-edge based mesh representations: theory.

Representing a mesh with half-edges (picture stolen from the aforementioned article; diagrams that took this much time to create are worth showing ;) )