{"id":221,"date":"2012-04-24T07:44:22","date_gmt":"2012-04-24T07:44:22","guid":{"rendered":"http:\/\/lousodrome.net\/blog\/light\/?p=221"},"modified":"2016-04-26T11:43:04","modified_gmt":"2016-04-26T11:43:04","slug":"readings-on-vector-class-optimization","status":"publish","type":"post","link":"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/","title":{"rendered":"Readings on vector class optimization"},"content":{"rendered":"<p>Now that Revision has passed, we feel tempted to grab the ax and happily chop into parts of our code base we wanted to change but couldn&#8217;t really since we had other priorities. One tempting part is the linear algebra one: vector, quaternion and matrix data structures. Lets say vector for a start. Not that it&#8217;s really necessary, but the transformations are the most time consuming parts after the rendering itself, and the problem itself is somewhat interesting.<\/p>\n<p>After a little googling, I basically found three approaches to this problem:<\/p>\n<p>Every here and there, people seem to think of <a href=\"http:\/\/en.wikipedia.org\/wiki\/Streaming_SIMD_Extensions\">SSE instructions<\/a> as a silver bullet and propose <a href=\"http:\/\/fhtr.blogspot.jp\/2010\/02\/4x4-float-matrix-multiplication-using.html\">various<\/a> examples of <a href=\"http:\/\/fastcpp.blogspot.jp\/2011\/12\/simple-vector3-class-with-sse-support.html\">code<\/a>, <a href=\"http:\/\/fastcpp.blogspot.jp\/2011\/04\/vector-cross-product-using-sse-code.html\">snippets<\/a> or <a href=\"http:\/\/sourceforge.net\/projects\/v3d\/\">full implementations<\/a>. The idea being to use dedicated processor instructions to apply operations on four components at a time instead of one after another.<\/p>\n<p>Quite on the opposite, <a href=\"http:\/\/twitter.com\/rygorous\">Fabian Giesen<\/a> argued some years ago that <a href=\"http:\/\/www.farbrausch.de\/~fg\/articles\/ubiquitous_sse_vector.html\">it was not such a good idea<\/a>. A quick look at the <a href=\"https:\/\/github.com\/farbrausch\/fr_public\">recently publicly released Farbrausch codebase<\/a> shows <a href=\"https:\/\/github.com\/farbrausch\/fr_public\/blob\/master\/werkkzeug3\/_types.cpp#L2288\">they indeed used purely conventional C++ code for it<\/a>.<\/p>\n<p>At last <a href=\"http:\/\/www.flipcode.com\/archives\/Faster_Vector_Math_Using_Templates.shtml\">this quite dated article<\/a> (with regards to hardware evolution) by Tomas Arce takes a completely orthogonal approach, consisting of using C++ templates to evaluate a full expression component after component, thus avoiding wasting time moving and copying things around.<\/p>\n<p>I am curious to implement and compare them on nowadays hardware.<\/p>\n<hr \/>\n<p><strong>Update<\/strong>: this is 2016 and the topic was brought back recently when someone wrote the article <a href=\"http:\/\/www.codersnotes.com\/notes\/maths-lib-2016\/\">How to write a math library in 2016<\/a>.<\/p>\n<p>The point of the article is that the old advice to not bother with SSE and stick with floats doesn&#8217;t apply anymore, and it goes on to show results and sample code. This sparked a few discussions on Twitter, with opinions voiced to put it mildly.<\/p>\n<blockquote class=\"twitter-tweet\" data-lang=\"en\">\n<p dir=\"ltr\" lang=\"en\"><a href=\"https:\/\/twitter.com\/nothings\">@nothings<\/a> No, that&#8217;s bullshit. Let the compiler do it, and if it can&#8217;t, don&#8217;t worry. At all. Cost &gt;&gt; benefit.<\/p>\n<p>\u2014 Tom Forsyth (@tom_forsyth) <a href=\"https:\/\/twitter.com\/tom_forsyth\/status\/709963042826665984\">March 16, 2016<\/a><\/p><\/blockquote>\n<p><script src=\"\/\/platform.twitter.com\/widgets.js\" async=\"\" charset=\"utf-8\"><\/script><\/p>\n<p>It seemed the consensus was still against the use of SSE for the following reasons:<\/p>\n<ul>\n<li>Implementation is tedious.<\/li>\n<li>For 3 dimensional vector, which is the most common case, there is a 25% waste.<\/li>\n<li>For 4 dimensional vectors, like homogeneous coordinates and RGBA, it doesn&#8217;t work so well either since the fourth component is treated differently than the other ones.<\/li>\n<li>Even if the implementation detail is hidden behind a nice interface, the alignment requirements will leak and become constraints to the rest of the code.<\/li>\n<li>Compilers like clang are smart enough to generate SSE code from usual float operations.<\/li>\n<\/ul>\n<blockquote class=\"twitter-tweet\" data-conversation=\"none\" data-lang=\"en\">\n<p lang=\"en\" dir=\"ltr\"><a href=\"https:\/\/twitter.com\/kenpex\">@kenpex<\/a> <a href=\"https:\/\/twitter.com\/Zavie\">@Zavie<\/a> &quot;against&quot; list: It&#39;s SSE only. NEON, and other SIMDs might have deeper pipeline, it makes no sense with wider SIMD like AVX.<\/p>\n<p>&mdash; Branimir Karad\u017ei\u0107 (@bkaradzic) <a href=\"https:\/\/twitter.com\/bkaradzic\/status\/718854695251357697\">April 9, 2016<\/a><\/p><\/blockquote>\n<p><script async src=\"\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Now that Revision has passed, we feel tempted to grab the ax and happily chop into parts of our code base we wanted to change but couldn&#8217;t really since we had other priorities. One tempting part is the linear algebra &hellip; <a href=\"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[648,647,107,818,819,817,135,371,646,134,227,385,131,132,133],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.13 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Readings on vector class optimization &ndash; Light is beautiful<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Readings on vector class optimization &ndash; Light is beautiful\" \/>\n<meta property=\"og:description\" content=\"Now that Revision has passed, we feel tempted to grab the ax and happily chop into parts of our code base we wanted to change but couldn&#8217;t really since we had other priorities. One tempting part is the linear algebra &hellip; Continue reading &rarr;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/\" \/>\n<meta property=\"og:site_name\" content=\"Light is beautiful\" \/>\n<meta property=\"article:published_time\" content=\"2012-04-24T07:44:22+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2016-04-26T11:43:04+00:00\" \/>\n<meta name=\"author\" content=\"Julien Guertault\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Zavie\" \/>\n<meta name=\"twitter:site\" content=\"@Zavie\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Julien Guertault\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/\"},\"author\":{\"name\":\"Julien Guertault\",\"@id\":\"https:\/\/lousodrome.net\/blog\/light\/#\/schema\/person\/a16a2a69d73eca763ffdf125c49eaa2f\"},\"headline\":\"Readings on vector class optimization\",\"datePublished\":\"2012-04-24T07:44:22+00:00\",\"dateModified\":\"2016-04-26T11:43:04+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/\"},\"wordCount\":454,\"commentCount\":4,\"publisher\":{\"@id\":\"https:\/\/lousodrome.net\/blog\/light\/#\/schema\/person\/a16a2a69d73eca763ffdf125c49eaa2f\"},\"keywords\":[\"cross product\",\"dot product\",\"Farbrausch\",\"float\",\"homogeneous coordinates\",\"implementation\",\"linear algebra\",\"math\",\"mutliplication\",\"optimization\",\"reading list\",\"rgba\",\"SIMD\",\"sse\",\"vector\"],\"articleSection\":[\"Random\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/\",\"url\":\"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/\",\"name\":\"Readings on vector class optimization &ndash; Light is beautiful\",\"isPartOf\":{\"@id\":\"https:\/\/lousodrome.net\/blog\/light\/#website\"},\"datePublished\":\"2012-04-24T07:44:22+00:00\",\"dateModified\":\"2016-04-26T11:43:04+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/lousodrome.net\/blog\/light\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Readings on vector class optimization\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/lousodrome.net\/blog\/light\/#website\",\"url\":\"https:\/\/lousodrome.net\/blog\/light\/\",\"name\":\"Light is beautiful\",\"description\":\"Thoughts of a graphics programmer, demoscener and spare time photographer\",\"publisher\":{\"@id\":\"https:\/\/lousodrome.net\/blog\/light\/#\/schema\/person\/a16a2a69d73eca763ffdf125c49eaa2f\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/lousodrome.net\/blog\/light\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/lousodrome.net\/blog\/light\/#\/schema\/person\/a16a2a69d73eca763ffdf125c49eaa2f\",\"name\":\"Julien Guertault\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lousodrome.net\/blog\/light\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/2e5fc7a18e1701e1bb61a5da0ef35cf7?s=96&d=identicon&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/2e5fc7a18e1701e1bb61a5da0ef35cf7?s=96&d=identicon&r=g\",\"caption\":\"Julien Guertault\"},\"logo\":{\"@id\":\"https:\/\/lousodrome.net\/blog\/light\/#\/schema\/person\/image\/\"},\"url\":\"https:\/\/lousodrome.net\/blog\/light\/author\/admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Readings on vector class optimization &ndash; Light is beautiful","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/","og_locale":"en_US","og_type":"article","og_title":"Readings on vector class optimization &ndash; Light is beautiful","og_description":"Now that Revision has passed, we feel tempted to grab the ax and happily chop into parts of our code base we wanted to change but couldn&#8217;t really since we had other priorities. One tempting part is the linear algebra &hellip; Continue reading &rarr;","og_url":"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/","og_site_name":"Light is beautiful","article_published_time":"2012-04-24T07:44:22+00:00","article_modified_time":"2016-04-26T11:43:04+00:00","author":"Julien Guertault","twitter_card":"summary_large_image","twitter_creator":"@Zavie","twitter_site":"@Zavie","twitter_misc":{"Written by":"Julien Guertault","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/#article","isPartOf":{"@id":"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/"},"author":{"name":"Julien Guertault","@id":"https:\/\/lousodrome.net\/blog\/light\/#\/schema\/person\/a16a2a69d73eca763ffdf125c49eaa2f"},"headline":"Readings on vector class optimization","datePublished":"2012-04-24T07:44:22+00:00","dateModified":"2016-04-26T11:43:04+00:00","mainEntityOfPage":{"@id":"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/"},"wordCount":454,"commentCount":4,"publisher":{"@id":"https:\/\/lousodrome.net\/blog\/light\/#\/schema\/person\/a16a2a69d73eca763ffdf125c49eaa2f"},"keywords":["cross product","dot product","Farbrausch","float","homogeneous coordinates","implementation","linear algebra","math","mutliplication","optimization","reading list","rgba","SIMD","sse","vector"],"articleSection":["Random"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/","url":"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/","name":"Readings on vector class optimization &ndash; Light is beautiful","isPartOf":{"@id":"https:\/\/lousodrome.net\/blog\/light\/#website"},"datePublished":"2012-04-24T07:44:22+00:00","dateModified":"2016-04-26T11:43:04+00:00","breadcrumb":{"@id":"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/lousodrome.net\/blog\/light\/2012\/04\/24\/readings-on-vector-class-optimization\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/lousodrome.net\/blog\/light\/"},{"@type":"ListItem","position":2,"name":"Readings on vector class optimization"}]},{"@type":"WebSite","@id":"https:\/\/lousodrome.net\/blog\/light\/#website","url":"https:\/\/lousodrome.net\/blog\/light\/","name":"Light is beautiful","description":"Thoughts of a graphics programmer, demoscener and spare time photographer","publisher":{"@id":"https:\/\/lousodrome.net\/blog\/light\/#\/schema\/person\/a16a2a69d73eca763ffdf125c49eaa2f"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/lousodrome.net\/blog\/light\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/lousodrome.net\/blog\/light\/#\/schema\/person\/a16a2a69d73eca763ffdf125c49eaa2f","name":"Julien Guertault","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lousodrome.net\/blog\/light\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/2e5fc7a18e1701e1bb61a5da0ef35cf7?s=96&d=identicon&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/2e5fc7a18e1701e1bb61a5da0ef35cf7?s=96&d=identicon&r=g","caption":"Julien Guertault"},"logo":{"@id":"https:\/\/lousodrome.net\/blog\/light\/#\/schema\/person\/image\/"},"url":"https:\/\/lousodrome.net\/blog\/light\/author\/admin\/"}]}},"_links":{"self":[{"href":"https:\/\/lousodrome.net\/blog\/light\/wp-json\/wp\/v2\/posts\/221"}],"collection":[{"href":"https:\/\/lousodrome.net\/blog\/light\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lousodrome.net\/blog\/light\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lousodrome.net\/blog\/light\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/lousodrome.net\/blog\/light\/wp-json\/wp\/v2\/comments?post=221"}],"version-history":[{"count":0,"href":"https:\/\/lousodrome.net\/blog\/light\/wp-json\/wp\/v2\/posts\/221\/revisions"}],"wp:attachment":[{"href":"https:\/\/lousodrome.net\/blog\/light\/wp-json\/wp\/v2\/media?parent=221"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lousodrome.net\/blog\/light\/wp-json\/wp\/v2\/categories?post=221"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lousodrome.net\/blog\/light\/wp-json\/wp\/v2\/tags?post=221"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}