{"id":2052,"date":"2013-01-20T19:54:07","date_gmt":"2013-01-21T02:54:07","guid":{"rendered":"http:\/\/the-witness.net\/news\/?p=2052"},"modified":"2013-01-20T22:12:03","modified_gmt":"2013-01-21T05:12:03","slug":"how-do-you-say-plane-in-the-witness","status":"publish","type":"post","link":"http:\/\/the-witness.net\/news\/2013\/01\/how-do-you-say-plane-in-the-witness\/","title":{"rendered":"How do you say &#8220;plane&#8221; in The Witness?"},"content":{"rendered":"<p>Just before Christmas break, I <a href=\"http:\/\/the-witness.net\/news\/2012\/12\/a-cliffside-cliffhanger\/\">fixed a bug in the transform manipulator<\/a> and checked it in. Having a transform manipulator is a nice addition to any editor, and we were all happy to have it working. But shortly thereafter, I had to leave for a cross-country family Christmas trip. This meant no more real <i>Witness<\/i> work for about ten days, because my laptop only has Linux on it, and right now the Linux version of <i>The Witness<\/i> is still under construction.<\/p>\n<p>This hiatus was the primary motivation behind writing my previous entry as a cliffhanger: although I had fixed the bug, I didn\u2019t have time to adequately document the bug via debugger inspection, so it would have been hard to write anything meaningful showing how the bug manifested itself in practice. I wanted to wait until I had a chance to go through every last little bit of it, so I could be as specific as possible. In that sense, it was meant to be a bit of a cliffhanger for me as well.<\/p>\n<p>But as it turned out, it was much more of a cliffhanger than I\u2019d originally anticipated.<\/p>\n<p><!--more--><\/p>\n<h2>Good News, Bad News<\/h2>\n<p>The good news: congratulations to those of you who posted \u201cfloating point precision\u201d as a suggestion for the cause of the bug. You aren\u2019t any dumber than I am.<\/p>\n<p>The bad news: we\u2019re all dumber than we should be.<\/p>\n<p>For those of you who don\u2019t already know all about floating point precision problems, I\u2019ll explain the problem we all thought we were solving. The original code, verbatim, looked like this:<\/p>\n<pre>bool ray_vs_plane(Vector3 *xpoint, Vector3 const &amp;p0, Vector3 const &amp;dir,\r\n                  Plane3 const &amp;plane) {\r\n    Vector3 n = plane.get_normal();\r\n\r\n    float denom = dot_product(n, dir);\r\n    float epsilon = 0.0001f;\r\n    if (fabs(denom) &lt; epsilon) return false;\r\n\r\n    float t = - ((plane.d + dot_product(n, p0)) \/ denom);\r\n    if (t &lt; 0) return false;\r\n\r\n    *xpoint = p0 + dir * t;\r\n    return true;\r\n}<\/pre>\n<p>To fix the bug, I wrote a modified version to use for the transform manipulator:<\/p>\n<pre>bool ray_vs_plane(Vector3 *xpoint, Vector3 const &amp;p0, Vector3 const &amp;dir,\r\n                  Vector3 const &amp;plane_point, Vector3 const &amp;plane_n)\r\n{\r\n    Vector3 local_p0 = p0 - plane_point;\r\n\r\n    float denom = dot_product(plane_n, dir);\r\n    float epsilon = 0.0001f;\r\n    if (fabs(denom) &lt; epsilon) return false;\r\n\r\n    float t = - (dot_product(plane_n, local_p0) \/ denom);\r\n\r\n    *xpoint = p0 + dir * t;\r\n    return true;\r\n}<\/pre>\n<p>I\u2019ve done exactly the same operation, but instead of using the d value of the plane equation to solve for t, I just move the ray itself to be centered around a point on the plane, so that the d value is implicitly zero. My reasoning was very simple: if I subtract two things that I know are close together (the camera location and the picked point on the manipulator), I will have less cancellation error than if I subtract two things that may be wildly different in scale (the d value and the dot product of the camera location and the plane normal).<\/p>\n<p>This is pretty weak sauce as far as numerical precision improvements go. I am far from an expert on the subject (<a href=\"http:\/\/www.amazon.com\/Forman-S.-Acton\/e\/B001IYTXGY\/ref=sr_ntt_srch_lnk_2?qid=1358735012&amp;sr=8-2\" onclick=\"_gaq.push(['_trackEvent', 'outbound-article', 'http:\/\/www.amazon.com\/Forman-S.-Acton\/e\/B001IYTXGY\/ref=sr_ntt_srch_lnk_2?qid=1358735012&amp;sr=8-2', 'read some Forman Acton']);\" >read some Forman Acton<\/a>, for example, if you want to see how the pros roll). But, when I did this, it fixed the bug perfectly, so I assumed, at least for the moment, that it must be a reasonable fix.<\/p>\n<p>But, honestly, I didn\u2019t really have a good feeling about it.<\/p>\n<h2>Christmas Stew<\/h2>\n<p>Sometimes, when there\u2019s a lot to do, you\u2019ll live with a bug fix because it works, rather than because it makes sense. But invariably, at least for me, I end up going back to look more closely at something if I don\u2019t feel like I fully understood it. I am sure this is mostly just an obsessive personality trait, but regardless, I rarely find the time to be wasted, because I almost always learn something from the process.<\/p>\n<p>In this specific case, I didn\u2019t have time to look into the problem in depth before I left. I planned to do so for the eventual writeup, but of course, I couldn\u2019t do much without being able to run the code, and while I was on Christmas break, that wasn\u2019t an option. Unfortunately, this meant that I had the entire Christmas break to stew about the bug that I wasn\u2019t sure I\u2019d really fixed properly.<\/p>\n<p>From the beginning, the thing that didn\u2019t sit right with me about the fix was that when Jon originally reported the problem to me, he didn\u2019t mention anything about it happening more on the outskirts of the island than on the interior. Jon\u2019s a very experienced programmer, so if something like that were happening, he would probably notice and include that in the report, because he\u2019d know it would help me find the bug faster.<\/p>\n<p>Instead, it seemed pretty clear that the bug was related more to the camera angle than the position in the world. While obviously the camera angle affects the input values, and could have a number of effects on the numerical precision of the program, it\u2019s not intuitively the kind of thing that I would expect. If these sorts of algorithms were that sensitive in this way, I would have expected to encounter similar problems with camera-relative plane intersections many times before\u00a0\u2014 but I never had.<\/p>\n<p>The more I thought about that, the more it gnawed at me that I couldn\u2019t go see exactly what was happening and prove to myself why the fix worked. I did have the game on my laptop, because I\u2019d done some initial Linux porting work, so I realized that maybe, if I was lucky, I could reconstruct the problem without actually being able to load and run the editor.<\/p>\n<p>That is when I became very thankful that I had been writing these blog entries, because it meant I had stopped to take this screenshot before I left, specifically so I could do the writeup over break:<\/p>\n<p><a href=\"http:\/\/the-witness.net\/news\/2012\/12\/a-cliffside-cliffhanger\/witness3_yworks\/\" rel=\"attachment wp-att-1995\"><img class=\"aligncenter size-large wp-image-1995\" alt=\"witness3_yworks\" src=\"http:\/\/the-witness.net\/news\/wp-content\/uploads\/2012\/12\/witness3_yworks-512x288.jpg\" width=\"512\" height=\"288\" \/><\/a><\/p>\n<p>Do you see what I see? That lovely entity number up in the corner is pure gold for step one of reconstructing the problem. Because I could see the entity number, I was able to go into the repository and get the data file that says where that entity is placed in the world. That let me know the numbers of the center point of the manipulator: -304.114, 193.4923, 4.7582.<\/p>\n<p>If you\u2019re an experienced 3D programmer, then you probably are thinking the same thing that I was thinking when I saw those: how the hell could I be having numerical precision problems with such low numbers? <i>The Witness\u2019s<\/i> scale is one unit per meter, which means that, at the scale of the dragging we\u2019re talking about, there should be absolutely no problems with solving a single plane equation, regardless of how sloppily you did it.<\/p>\n<p>But if that was the case, then how did I fix the bug? This lead me to step two of reproducing the problem: writing a test program. I wrote a little C program that would take numbers around that point and, for a large set of programmatically generated planes and rays, perform the intersection using both the old ray intersection routine and my \u201cimproved\u201d one.<\/p>\n<p>Perhaps unsurprisingly, these results reinforced my suspicions that there wasn\u2019t enough of a difference between the two routines to account for the original bug. In all the tests, my \u201cimproved\u201d routine was no more than one decimal digit more precise; so if the original routine produced intersection points that were actually, say, 0.0001 units away from the plane, the new routine might produce ones that were 0.00001 away.<\/p>\n<p>Clearly, the fix just plain shouldn\u2019t have worked. But it did.<\/p>\n<p>Why?<\/p>\n<h2>The Wrong Part of the Post<\/h2>\n<p>I would love to be able to say that I guessed the real reason before stepping through the code, but I absolutely didn\u2019t.<\/p>\n<p>When I returned from Christmas break, I eagerly booted up the <i>Witness<\/i> and stepped through the code very carefully, keeping accurate track of all the numbers as I went, waiting to find that super-elusive numerical thing that I had overlooked. To my complete surprise, it turned out to have <i>absolutely nothing to do with precision<\/i>.<\/p>\n<p>In fact, somehow, although I didn\u2019t know it at the time, I actually did write about the root cause of the problem in my original post. It just wasn\u2019t in the part that I thought. Indeed, rather than <i>Always Look Behind You<\/i> foretelling the eventual solution, it was <i>Foreign Languages<\/i> that contained the hint: due to my naivete with <i>The Witness<\/i> codebase, I misunderstood what it was doing with planes. It turns out that this one line from the original routine was the entire culprit:<\/p>\n<pre>    Vector3 n = plane.get_normal();<\/pre>\n<p>You will notice that it only occurs in the buggy version of the ray intersection code. In my version, I couldn\u2019t use the plane structure, because I didn\u2019t want an \u201ca, b, c, d\u201d representation of the plane. I wanted \u201ca, b, c, p\u201d, if you will, where p is an actual 3D point on the plane. So I never called get_normal().<\/p>\n<p>Why does this matter? Well, upon stepping through all the code, I realized that get_normal(), being true to its name, takes the unusual step of <i>normalizing<\/i> the a, b, c values of the plane before returning them to you as a vector. This means that the a, b, c values of the plane that you pass in when you create the plane aren\u2019t the ones you get out.<\/p>\n<p>That wouldn\u2019t be a problem if it weren\u2019t for the fact that, when you create one of these planes, it <i>does not<\/i> normalize the values you pass in:<\/p>\n<pre>    Vector3 plane_n = cross_product(camera_back, translation_axis);\r\n    plane_n = cross_product(plane_n, translation_axis);\r\n    grabbing_plane = plane_from_point_and_normal(hit_pos, plane_n);<\/pre>\n<p>This is my fault on its face, because hey, it <i>does say normal<\/i>, and normals are traditionally supposed to be normalized. But in keeping with my reference to foreign languages, this is something I really just didn\u2019t think about, because I am used to the way I\u2019ve always done things with planes: nothing is normalized automatically, everything is just left unnormalized, because back in the day, normalization was very expensive and you never did it in a math routine unless it was absolutely necessary.<\/p>\n<p>So plane_from_point_and_normal was actually doing what I thought it would: it just stored plane_n as the a, b, c values, and computed d with the inner product. But get_normal() was not doing what I thought it would. Rather than just returning me the vector {a,b,c}, it was normalizing them before giving them to me, which meant <i>all<\/i> my calculations with them would no longer agree with the d value computed in plane_from_point_and_normal.<\/p>\n<p>And therein, clearly, lies the entirety of the bug.<\/p>\n<p>Just to underscore how much of a native language issue this really is, when I showed Jon the preview of this article, he said, \u201cIf you are thinking of the plane as a geometric object, of course the normal is unit length, what else would it be?? But if you are thinking of it as some specific equation all the time, then it is natural to care about the a, b, c, you put in.\u201d So I can only assume, were he using my math library, he would\u2019ve been tripped up by the fact that I <i>don\u2019t<\/i> normalize normals automatically, exactly the inverse of the problem I encountered with his.<\/p>\n<p>I honestly don\u2019t believe that this sort of thing is a case of us being presumptuous or sloppy; it\u2019s just the price you have to pay for being able to program complex things at a reasonable speed. If we constantly have to think about what it means to work with a vector, or a plane, or a quaternion, we\u2019re going to be a lot slower. It\u2019s much more efficient to have a consistent perspective, and always rely on it.<\/p>\n<p>And of course, the cost is bugs like this\u00a0\u2014 but that\u2019s a very small cost relative to the benefits.<\/p>\n<h2>Partial Credit<\/h2>\n<p>So, where does that leave you?<\/p>\n<p>Clearly, automatic partial credit goes to everyone who wrote in suggesting that numerical precision was the problem. You came to the same conclusion I did with the information I knew at the time, which was all you had to go on. You certainly can\u2019t be blamed for not finding the real bug since you would have needed the source code to the <i>Witness<\/i> math library to even see it. And it is true that the numerical precision of the routine could have been improved, as I believe I was able to verify separately with my test program.<\/p>\n<p>But by the same token, nobody gets automatic full credit. There were a lot of reasons why, even with only the information in the original article, you could have said, \u201cI don\u2019t think it\u2019s numerical precision, Muratori.\u201d Maybe I am ascribing overly heroic attributes to the man, but I feel like <a href=\"http:\/\/en.wikipedia.org\/wiki\/Forman_S._Acton\" onclick=\"_gaq.push(['_trackEvent', 'outbound-article', 'http:\/\/en.wikipedia.org\/wiki\/Forman_S._Acton', 'Forman Acton']);\" >Forman Acton<\/a> would have known, from looking at the plane intersection snippet, that there wasn\u2019t any way to lose enough precision in there to account for the bug as I described it, at least not in any way that would have been substantially improved by the suggested fixes people wrote in (or the fix I actually did, for that matter).<\/p>\n<p>So in order to actually get full credit, you\u2019re going to have to earn it, I think, but on the honor system. Ask yourself, if you had encountered this bug, and you had \u201cfixed\u201d it as I had, would you have gone back and figured out what was really happening before moving on to something else?<\/p>\n<p>If you can honestly answer yes, then you get full credit, for this is about as good as you can hope to do as a working programmer, I think. You\u2019re no Forman Acton, of course.\u00a0.\u00a0. but then again, who among us is?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Just before Christmas break, I fixed a bug in the transform manipulator and checked it in. Having a transform manipulator is a nice addition to any editor, and we were all happy to have it working. But shortly thereafter, I had to leave for a cross-country family Christmas trip. This \u2026<\/p>\n<p class=\"continue-reading-button\"> <a class=\"continue-reading-link\" href=\"http:\/\/the-witness.net\/news\/2013\/01\/how-do-you-say-plane-in-the-witness\/\">Continue reading<i class=\"crycon-right-dir\"><\/i><\/a><\/p>\n","protected":false},"author":180,"featured_media":1995,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[4],"tags":[],"_links":{"self":[{"href":"http:\/\/the-witness.net\/news\/wp-json\/wp\/v2\/posts\/2052"}],"collection":[{"href":"http:\/\/the-witness.net\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/the-witness.net\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/the-witness.net\/news\/wp-json\/wp\/v2\/users\/180"}],"replies":[{"embeddable":true,"href":"http:\/\/the-witness.net\/news\/wp-json\/wp\/v2\/comments?post=2052"}],"version-history":[{"count":7,"href":"http:\/\/the-witness.net\/news\/wp-json\/wp\/v2\/posts\/2052\/revisions"}],"predecessor-version":[{"id":2059,"href":"http:\/\/the-witness.net\/news\/wp-json\/wp\/v2\/posts\/2052\/revisions\/2059"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/the-witness.net\/news\/wp-json\/wp\/v2\/media\/1995"}],"wp:attachment":[{"href":"http:\/\/the-witness.net\/news\/wp-json\/wp\/v2\/media?parent=2052"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/the-witness.net\/news\/wp-json\/wp\/v2\/categories?post=2052"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/the-witness.net\/news\/wp-json\/wp\/v2\/tags?post=2052"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}