Packaging Update

A while ago, I wrote a post about package sorting in order to minimize diffs when patching on Steam.

I ended up running a straightforward test where we would change the game minimally and build consecutive patches. The patches turned out to be quite large, but upon investigation, it was clear there usually wasn't a good reason for this. Our compiled shaders would contain small differences, as would many meshes in the game.

These looked like uninitialized memory errors and, upon further investigation, that's indeed what they were. They are fixed now, so patches are a lot smaller.

However, patches may never be as small as we would like, and I find there is a pretty deep philosophical issue where.

Right now we have 17MB of files that change in response to minimal changes in the game (say, moving a few entities in the world by small distances). These are mostly cube maps for Light Probe entities, which are used to do lighting on nearby objects. Here's what a Light Probe looks like in the game:

(Well, you don't see that sphere the game, only in the editor!) To generate one of these cube maps, we render the scene in all directions, from the point at the center of that sphere, then apply a blur to the result.

Imagine that we move a couple of entities in the world, even slightly. All Light Probes that can see those entities are going to change, by at least a pixel or two. Because we are applying a big fat blur kernel, a 1-2 pixel change becomes small changes spread across the entire image.

We have an automated script that preps the game for release, clearing out data that we only use for development, and ensuring that automatically-computed things get recomputed (like LODs for faraway objects). Recently we added a step to this process that re-generates all the Light Probes, because it just makes sense to update those automatically. If we have to keep track of them manually somehow we are just going to make mistakes.

What this means is that you can pretty much guarantee that a lot of light probes are going to change if you move some entities and patch the game. If we change anything basic about any of the shaders, which seems very likely, then all of the light probes are going to change (because the output of the shaders feeds back into the light probes).

That said, maybe it's the case now that more light probes are changing than I would expect. I am wondering if other things in the game are causing them to change. For example, when you are playing the game, foliage waves in the wind. I wonder if the animation timer for plants is ticking during the process when we generate light probes, causing them to move over time? This would cause most light probes to be different every time we bake them, regardless of other factors.

Now, there's a bigger issue. In addition to light probes, we have pre-baked lightmaps all over all geometry in the game:

In a way that is reminiscent of the light probes (but a bit different), we generate these by placing a camera at many points along each surface, rendering the scene from each camera, and interpolating those into a final result.

So, just like light probes change when entities move slightly, lightmaps are going to change also. Lightmaps will also change if shaders change, but furthermore, lightmaps will change if light probes change (because light probes determine the look of entities that shine on the surfaces we are lightmapping), and light probes will change if lightmaps change (because the lightmaps influence the objects rendered in the light probes). So we have a circular dependency, which is not so bad in practice, except it means that if you change one entity in the game, then repeat this process iteratively, you may well end up changing most lightmaps and light probes in the game.

The light probes are not so big, but last I checked there are between 500MB and 700MB of lightmaps. However, re-baking the lightmaps is not part of the automated build process; we have to trigger a bake manually. But it seems like good policy to do a full rebake any time we change the game significantly. So, we are probably looking at a minimum patch size in the 500MB range.

I guess that is not such a big deal, since we are not an online game that would be patching all the time. Hopefully once we release the initial version of the game, the "content" won't change very much (sorry, but the word "content" as used in video games is kind of absurd, so it's hard for me to use it without sarcasm quotes; it's almost as bad as "title", which is a word I refuse to use, except to make fun of its use)... the "content" won't change very much, so patches would be rare.

We'll see. We have a while to figure these things out.

A note about programming language design

This is not about The Witness, but it may be relevant future games that we do! In my spare time I am building a new programming language. Today I made a bunch of tweets about something I was thinking about there... but the ideas are hard to follow on Twitter so I am crossposting them here. I started out by lamenting that so many people thought that a language without member functions was unthinkable, then continued:


 

Even the idea of UFCS is just a massive overcomplication. It only seems simple compared to baseline C++.

I should say that the biggest problem I find with the idea of member functions is that they are extremely anti-code-reuse. I know this may seem paradoxical, because the whole point is to encapsulate code in an easy-to-reuse way, but I'll give an example.

Let's say you have some basic data type like a 3D Vector called Vector3. You want everyone to be motivated to use Vector3 so that there's no friction when passing data between the main program and various libraries, etc.

All that is necessary for compatibility in a fast program is that everyone agrees on the data layout of Vector3. But as soon as you tie member functions to the data, you are insisting that everyone needs to think the same way about the data that is in a Vector3, and operate on it the same way. And that will never be true, because different code needs to think about data types in different ways; and also because programmers express their personality in programs, and different programmers want to think in different ways about the most common data types.

So if you force people to think a certain way about Vector3 or other basic data types, you are motivating them (or *requiring* them) to go against that grain and make their own alternatives. Then the system is fragmented and you have all these data types that need to be converted again. Or, if people don't do that, they just end up passive-aggressively hating the task of programming, which is not what you want.

So in reality, you want Vector3 to define only a common piece of data to be exchanged. You can provide default functions for operating on Vector3, but in a way that people can easily shadow them with their own functions or else use a completely disjoint set. That is just not what most "object models" encourage. Thus they put huge amounts of friction onreuse in this particular kind of case, which is very common and very important.

(I wish people would stop telling me that member functions are good because you can do obj . and get a list of procs in the IDE after typing the dot. You can obviously do this with flat functions too. Please stop telling me this kthx.)


 

Because this came out as a series of tweets, it is probably less coherent than what I would have written if it were a straight-up essay, but hey, this is how it went.

Island Snapshot

You can see Ignacio's new water in this shot (which will continue to improve, and already we have a lot of control over water visuals that we are not using yet). There's a bunch of new audio in the game. Art folks are working on revising various areas and getting them into fully playable shape. Lately I have been thinking hard about how to improve various puzzles (a couple of puzzles already just got *way* better this week), and I have also started rearranging the jungle area, which you see in the left foreground. We've had a lot of discussions with the landscape architects, and they've given us a detailed topography for the island that helps define some of the features we think will be good to have, like this river you can see to the left of the mountain (solid blue for now; it's temporary!).

Also, a couple of us have been on vacation!

Regarding Water

(Here's an internal email that Ignacio sent today that may provide some flavor on what we're doing lately.)

From: Ignacio Castaño
Date: 24 August 2012

I just checked in a few changes to the water shader. Note that this is still work in progress, there are some artifacts and unfinished features.

From the art point of view, the main change is that there's now a water_color_map and a water_flow_map that you can use to change the behavior of the shader.

The RGB channels of the color map control the water extinction color and the alpha controls the opacity factor. The previous underwater shading model was simply wrong, the new one works nicely and is easy to tweak. Note that I assume the water color changes very smoothly, avoid sharp color changes!

The flow_map is intended to contain additional water attributes. Right now it's only used for the stillness, but I will certainly add more in the future.

You will also notice that the shoreline now also has a foam layer:

The way it's computed is by comparing the screenspace depth difference. This is not very correct, it looks OK from some angles, but it breaks badly when the underwater geometry has screen-space depth discontinuities:

If the foam is a desirable feature, I'm considering modifying the print_map command to render a foam map, or maybe this is something that can be painted manually and added to the water_flow_map. One of my concerns is that this map may not have enough resolution.

Just Cause 2 seems to use this method to render foam, but they apply a big screenspace distortion to hide the artifacts, I found it sickening when you stare at it closely...

The way the water is shaded has also changed somewhat. First, I now approximate the specular exponent based on the position of the camera with respect to the water and the water roughness. Somewhat like the LEAN paper does, but assuming the roughness of the water is the same for the whole texture. Depending on the value of the exponent I either sample the mirrored reflection map, a filtered environment map, or blend between the two. My goal was to be able to use the mirrored reflection map only on a small subset of the lake entities. The physically based approach does not allow us to cull many, so I'll probably have to introduce some hacks. BTW, I'm doing all these work not just an optimization, but also so that we can use the light probe reflection model in the river and lakes inside the island, where the mirrored reflection map is not available.

The main problem with the filtered environment map is that it does not take local reflections and occlusion from the environment into account. I have two ideas to work around this, one is to store an ambient occlusion map in the lightmaps, this may be a bit too smooth and will lack directional information. The other is to do screenspace raycasting of the depth buffer to decide whether the environment map is occluded or not.

This is somewhat like what Crysis 2 does. Insomniac also uses a similar approach with just a single sample:

http://www.insomniacgames.com/tech/articles/0409/files/water.pdf

I'm hoping we can get away with a crude approximation too. I did a quick test, where the maths are not exactly right, with only 4 samples and no smoothing, and it was starting to look promising:

This won't allow us to use the filtered environment maps closer to the camera, but if the screenspace raycasting works well, I'm thinking about using non-filtered environment maps in place of the mirrored reflection map, whenever those are not available.

Finally, I've also been playing with Valve's water flow model, I have the basic implementation working, but I haven't yet been able to reproduce the workaround for the pulsing and repetition artifacts that Alex proposes.

http://www.valvesoftware.com/publications/2010/siggraph2010_vlachos_waterflow.pdf

Fun with package sorting

As you see in the previous post, lately I've been getting the game set up on Steam. Now that it's in there and working, there are still things to think about and make better.

A couple of months ago I was playing Super Monday Night Combat; a much-anticipated weekly update came along, but its file size was over 800MB! MNC is an Unreal Engine game and Unreal, like many game engines (and like The Witness), packages its data files together for robustness and speed. This package format interacts with the patch delivery system of any online distribution service, though.

Steam's system performs a binary-data comparison between the old version of your file and the new version, sending only the pieces that have changed. To make this analysis tractable, files are thought of as collections of 1 megabyte chunks; if anything in a particular chunk changes, even one byte, the whole chunk gets included in the patch. If you aren't careful about this, small details of your package format can trigger very large patches, and this is what was happening with Super MNC (not really the developers' fault since they were using a licensed engine; I have heard that the Unreal guys are addressing the problem.)

The Witness packs most of its data into a single 2-gigabyte file stored in the .zip format. Before uploading the initial version to Steam, I took a safeguard to minimize patch sizes, or so I thought: Files are sorted lexicographically and then placed into the zip file in that specific order. (Imagine that files are stored in random order: probably every patch involves every player re-downloading the entire game!)

Last night, as a test, I pushed a new build to Steam. The hope was that the patch would be small, since we had only done two days of work. Instead, here's what we got:

The patch was 660 megabytes. Clearly I had not done enough to prevent this. I opened up a zip file reader and took a look at the file order. In many cases, it was as you'd expect, but in the case of meshes and lightmaps we see this:

'save' is the world name and the 5-digit number is the identifier for each entity. (The world name is redundant right now, since The Witness takes place entirely in one world). We re-bake lightmaps pretty often, but we change meshes much less often. So, the packing order you see here will result in us patching more chunks than we should. Not necessarily by much, in this case, since meshes are much smaller than lightmaps, but that might not be true for all meshes, and other data types may have a similar issue but exhibit more-severe problems. So, I had the thought that instead of sorting things like this:

a.lightmap
a.mesh
b.lightmap
b.mesh

What we really want is to sort by file extension first, then sort by name:

a.lightmap
b.lightmap
a.mesh
b.mesh

... but not always. Sometimes we have multiple files associated with one source file, and that will all change when that source file changes, but that have different extensions. This is the case with shaders:

So I hacked together a solution that sorts anything with an extension of .shader*, in terms of the global file list, as though the extension were only ".shader" -- that means all these files will stick together -- but locally, you still need to keep the full extension, so that the files for one particular shader do not randomize their order every time you make a patch.

The way I do this is to tack a modified extension onto the front of every filename, for sorting purposes only. So "hello.texture" becomes ".texture hello.texture", "whatever.shader_ps_text" becomes ".shader whatever.shader_ps_text", "whatever.shader_dump" becomes ".shader whatever.shader_dump".

If this doesn't make sense to you, don't worry too much about it, because it's not that big of a problem (and I am not even sure if those ps_text and vs_text files are needed for the running game; if we should be discarding them it would certainly mitigate the issue.) These kinds of problems would modestly increase the patch size, but I would not expect the increase to be terrible.

The whole time I was doing this sorting stuff, I knew it wasn't the real problem, and in the back of my mind I was thinking, "I hope the Steam patch mechanism ignores timestamps on files so that we don't have to go touching all files to a known time. (Or I hope we aren't packing a newly-generated build ID into each file or something.)" But this was an example of my brain not working properly: because we are packing all this stuff into a zip file, Steam doesn't know they are separate files! Since the zip file stores timestamps for all the files, of course that data changes every build (since our build process generates new files every time).

At first I was thinking that this wouldn't be the real problem, either, because the zip file format has a main directory that is stored compactly, and that is where timestamps would be, because you want to be able to access that data quickly. So I would have guessed that, if all the timestamps changed, the worst that would happen is that the directory would change every build, but that is at most a couple of megabytes. But looking at the zip file specification, I discovered how wrong I was. Inexplicably, timestamps for each file in a zip archive are stored both in the central directory and in the local header for each file, so if you change timestamps for all your files, you are strewing changes everywhere.

So I made the packaging script set all timestamps to 0, but the zip library we are using didn't like that, so I set the timestamp to January 1, 2012 00:00:00 GMT.

After fixing this, as a test I built two patches in a row from scratch, with the game recompiling all shaders and re-compressing all textures each time. In theory the results should be exactly the same (certainly if this still generated a huge patch, we would still have some kind of big problem). Happily, we have found at least some bit of success:

I am not sure what it is saying there about 2046 of 2262 matching, or 60 changed chunks, but saying it uploaded 0 chunks. In any case it's better than before. I am going to ask the Steam people what these various numbers actually mean.

Feeling good about this, I ran a final test where I moved around a few objects in the world and changed a constant in one shader:

That's once again not so good (140 MB!) I had been sweeping the 18MB from the previous run under the rug, thinking "hey, it says 0 chunks sent, so maybe that 18MB is protocol overhead involved in figuring out which chunks changed, and it doesn't impact patch size." Apparently not. The next step involves asking the Steam folks for advice, hunting around for a binary diff file with which to analyze our own packages to see if we are doing something unsavory, etc.

Here's the current Perl code for the part of our script that builds the zip file:

sub modify_name {
    my $name = $_[0];

    my ($file, $dir, $ext) = fileparse(lc($name), qr/\.[^.]*/);
    my $orig_ext = $ext;

    if ($ext =~ m/^\.shader/) { 
        $ext = '.shader';
    }

    # Throw away 'dir' since we discard it when writing the file, anyway.
    return "$ext $file$orig_ext";
}

sub make_uncompressed_zip {  # Sort zip members alphabetically so that binary diffing will have an easier time.
    my ($source, $prefix, $dest_zip_file) = @_;

    my @unsorted = glob("$source/*.*");
    my @sorted = sort { 
        modify_name($a) cmp modify_name($b)
    } @unsorted;

    print("Sorted $#sorted files.\n");

    use Archive::Zip qw(:ERROR_CODES);
    my $zip = Archive::Zip->new();

    
    # Zip API complains if time is before 1980, so use an arbitrary timestamp in that range (yuck).
    my $timestamp_2012_jan1 = 1325376000;
    foreach my $fullname (@sorted) {
        my ($file, $dir, $ext) = fileparse($fullname, qr/\.[^.]*/);

        # print("Adding '$fullname' as '$prefix$file$ext'\n");
        my $member = $zip->addFile("$fullname", "$prefix$file$ext");
        $member->setLastModFileDateTimeFromUnix($timestamp_2012_jan1);  # Clobber timestamp. 
    }

    # Set the compression level on all the members.

    my @members = $zip->members();
    foreach my $member (@members) {
        $member->desiredCompressionMethod(COMPRESSION_STORED);
    }

    $zip->writeToFileNamed($dest_zip_file);
}

Steam is cool.

Last night I put together a test build of The Witness for Steam. I'd been wanting to do this for a while, but the game's ability to run in full-release mode had been broken for a while. From day to day, we run the game with unpackaged data files, because we change them a lot. For a release you generally want to pack them together.

We finally got that working well enough; here's me getting ready to download the game on Steam:

And here I am after having launched the game:

I don't want people out there to get too excited about these screenshots: this does not mean the game is close to release. You wouldn't want to play it right now. But this will help us a lot when it comes to getting the game out to playtesters and keeping it updated.

For actually shipping our game off to Steam, we are using the new version of the Steam developer tools, and they are really good. The old stuff (which I had used for Braid) was serviceable but had a lot of undesirable things going on. The new stuff is exactly what you want: you write a couple of very small and easy to read config files, then you run a command and it uploads your game. This process is now fully-automated, which means developers can modify their game and push changes out to users faster and with less hassle than with any other online distribution service. (It is basically as fast as if you host the game yourself).

Fun with in-engine color grading

Quite some time ago, Ignacio put some color grading features into the engine, so that you could tell the postprocessor to tweak colors in particular ways. For a while these features were very experimental; we just had them on a debug menu and there wasn't a way to save the settings. We didn't really use the features (in part because we were pretty busy just getting the basic parts of the game together).

Lately, Ignacio and Salvador took the features all the way so that we can use them easily in normal workflow. (This happened over some period of time, but the last parts came together this week). Now you can take a screenshot, screw around with the colors in Photoshop, and load it back into the game to control the grading. I am pretty sure this is not our idea, and we got it from the Uncharted guys or someone like that (Update: The idea was developed by Naty Hoffman and others; see the comments to this article).

Here's how it goes. First, I find an area the look of which I would like to change:

Next I hold down the control key while taking the screenshot, causing it to be saved with a special color bar along the top of the image:

Then I go into Photoshop and screw around with the colors. Here I am using the Hue/Saturation tool to shift the palette away from red:

Or, instead, I may wish to shape the histogram of the image:

That color bar at the top of the screen acts as a look-up table. The game doesn't need to understand what operations we have performed in Photoshop; it just knows what that color bar looked like when the screenshot was saved out, and, by comparing it to the colors in the result, knows how to transform any color in the rendered scene.

In the game, you can now attach these grading operations to spatial markers, so that color grading kicks in when you are inside specific areas. Here I am setting up some markers in the editor (the marker is the big white box):

After setting up two of these boxes, one for each of the two weird color operations I created, then I can walk around in the game and have these operations apply to everything:

This gives us a really nice degree of control over how the game looks. We plan to use it to help with the color theming that will tie together specific areas.

Island Update, with bonus map

It's time for another island update:

As you'll notice, we are starting to work on that big mountain that is closest to the camera. (Finally!) This is still an early concept; it will change a lot.

Ignacio recently implemented a clean way to print an overhead map of the island, in flat colors, which is probably useful for marking things up:

On the modeling side, Eric has been refining an area that's far from the camera in these shots (you'll see some screenshots of the interior soon). Luis has been working on the style of our outdoor scenes; we have interesting problems that involve maintaining a visual style that is good for gameplay while also creating images that are most striking. Orsi has been working on refinements to a few particular areas, particularly, adding some deeper puzzle complexity to the area with the tall trees (off to the right in the screenshot image).

Until recently I had been bothered by some unanswered questions regarding the endgame. There were things I wanted to do that didn't fit together and that felt clumsy. But thankfully, after taking a relaxing weekend where I barely worked on the game at all, on Monday solutions to most of these problems just popped into my head. So that's cool.

There are still some questions about the gameplay but I think the biggest ones are now nailed down. Over the next couple of weeks I will be implementing the remaining parts of this endgame.

Tech-wise, Andy is pretty far along with the OpenGL renderer, which will help us hit other platforms, and Salvador has tied up the basics of asset streaming and is doing some rendering optimization stuff. Ignacio is refining the LOD system in addition to many miscellaneous tasks.