Fun with package sorting

As you see in the previous post, lately I've been getting the game set up on Steam. Now that it's in there and working, there are still things to think about and make better.

A couple of months ago I was playing Super Monday Night Combat; a much-anticipated weekly update came along, but its file size was over 800MB! MNC is an Unreal Engine game and Unreal, like many game engines (and like The Witness), packages its data files together for robustness and speed. This package format interacts with the patch delivery system of any online distribution service, though.

Steam's system performs a binary-data comparison between the old version of your file and the new version, sending only the pieces that have changed. To make this analysis tractable, files are thought of as collections of 1 megabyte chunks; if anything in a particular chunk changes, even one byte, the whole chunk gets included in the patch. If you aren't careful about this, small details of your package format can trigger very large patches, and this is what was happening with Super MNC (not really the developers' fault since they were using a licensed engine; I have heard that the Unreal guys are addressing the problem.)

The Witness packs most of its data into a single 2-gigabyte file stored in the .zip format. Before uploading the initial version to Steam, I took a safeguard to minimize patch sizes, or so I thought: Files are sorted lexicographically and then placed into the zip file in that specific order. (Imagine that files are stored in random order: probably every patch involves every player re-downloading the entire game!)

Last night, as a test, I pushed a new build to Steam. The hope was that the patch would be small, since we had only done two days of work. Instead, here's what we got:

The patch was 660 megabytes. Clearly I had not done enough to prevent this. I opened up a zip file reader and took a look at the file order. In many cases, it was as you'd expect, but in the case of meshes and lightmaps we see this:

'save' is the world name and the 5-digit number is the identifier for each entity. (The world name is redundant right now, since The Witness takes place entirely in one world). We re-bake lightmaps pretty often, but we change meshes much less often. So, the packing order you see here will result in us patching more chunks than we should. Not necessarily by much, in this case, since meshes are much smaller than lightmaps, but that might not be true for all meshes, and other data types may have a similar issue but exhibit more-severe problems. So, I had the thought that instead of sorting things like this:

a.lightmap
a.mesh
b.lightmap
b.mesh

What we really want is to sort by file extension first, then sort by name:

a.lightmap
b.lightmap
a.mesh
b.mesh

... but not always. Sometimes we have multiple files associated with one source file, and that will all change when that source file changes, but that have different extensions. This is the case with shaders:

So I hacked together a solution that sorts anything with an extension of .shader*, in terms of the global file list, as though the extension were only ".shader" -- that means all these files will stick together -- but locally, you still need to keep the full extension, so that the files for one particular shader do not randomize their order every time you make a patch.

The way I do this is to tack a modified extension onto the front of every filename, for sorting purposes only. So "hello.texture" becomes ".texture hello.texture", "whatever.shader_ps_text" becomes ".shader whatever.shader_ps_text", "whatever.shader_dump" becomes ".shader whatever.shader_dump".

If this doesn't make sense to you, don't worry too much about it, because it's not that big of a problem (and I am not even sure if those ps_text and vs_text files are needed for the running game; if we should be discarding them it would certainly mitigate the issue.) These kinds of problems would modestly increase the patch size, but I would not expect the increase to be terrible.

The whole time I was doing this sorting stuff, I knew it wasn't the real problem, and in the back of my mind I was thinking, "I hope the Steam patch mechanism ignores timestamps on files so that we don't have to go touching all files to a known time. (Or I hope we aren't packing a newly-generated build ID into each file or something.)" But this was an example of my brain not working properly: because we are packing all this stuff into a zip file, Steam doesn't know they are separate files! Since the zip file stores timestamps for all the files, of course that data changes every build (since our build process generates new files every time).

At first I was thinking that this wouldn't be the real problem, either, because the zip file format has a main directory that is stored compactly, and that is where timestamps would be, because you want to be able to access that data quickly. So I would have guessed that, if all the timestamps changed, the worst that would happen is that the directory would change every build, but that is at most a couple of megabytes. But looking at the zip file specification, I discovered how wrong I was. Inexplicably, timestamps for each file in a zip archive are stored both in the central directory and in the local header for each file, so if you change timestamps for all your files, you are strewing changes everywhere.

So I made the packaging script set all timestamps to 0, but the zip library we are using didn't like that, so I set the timestamp to January 1, 2012 00:00:00 GMT.

After fixing this, as a test I built two patches in a row from scratch, with the game recompiling all shaders and re-compressing all textures each time. In theory the results should be exactly the same (certainly if this still generated a huge patch, we would still have some kind of big problem). Happily, we have found at least some bit of success:

I am not sure what it is saying there about 2046 of 2262 matching, or 60 changed chunks, but saying it uploaded 0 chunks. In any case it's better than before. I am going to ask the Steam people what these various numbers actually mean.

Feeling good about this, I ran a final test where I moved around a few objects in the world and changed a constant in one shader:

That's once again not so good (140 MB!) I had been sweeping the 18MB from the previous run under the rug, thinking "hey, it says 0 chunks sent, so maybe that 18MB is protocol overhead involved in figuring out which chunks changed, and it doesn't impact patch size." Apparently not. The next step involves asking the Steam folks for advice, hunting around for a binary diff file with which to analyze our own packages to see if we are doing something unsavory, etc.

Here's the current Perl code for the part of our script that builds the zip file:

sub modify_name {
    my $name = $_[0];

    my ($file, $dir, $ext) = fileparse(lc($name), qr/\.[^.]*/);
    my $orig_ext = $ext;

    if ($ext =~ m/^\.shader/) { 
        $ext = '.shader';
    }

    # Throw away 'dir' since we discard it when writing the file, anyway.
    return "$ext $file$orig_ext";
}

sub make_uncompressed_zip {  # Sort zip members alphabetically so that binary diffing will have an easier time.
    my ($source, $prefix, $dest_zip_file) = @_;

    my @unsorted = glob("$source/*.*");
    my @sorted = sort { 
        modify_name($a) cmp modify_name($b)
    } @unsorted;

    print("Sorted $#sorted files.\n");

    use Archive::Zip qw(:ERROR_CODES);
    my $zip = Archive::Zip->new();

    
    # Zip API complains if time is before 1980, so use an arbitrary timestamp in that range (yuck).
    my $timestamp_2012_jan1 = 1325376000;
    foreach my $fullname (@sorted) {
        my ($file, $dir, $ext) = fileparse($fullname, qr/\.[^.]*/);

        # print("Adding '$fullname' as '$prefix$file$ext'\n");
        my $member = $zip->addFile("$fullname", "$prefix$file$ext");
        $member->setLastModFileDateTimeFromUnix($timestamp_2012_jan1);  # Clobber timestamp. 
    }

    # Set the compression level on all the members.

    my @members = $zip->members();
    foreach my $member (@members) {
        $member->desiredCompressionMethod(COMPRESSION_STORED);
    }

    $zip->writeToFileNamed($dest_zip_file);
}

71 Comments:

  1. Thanks for sharing all of this Jon. Really interesting stuff I probably would not have thought about otherwise.

  2. Just thought I would drop by and say I love these posts you make. They are so interesting and you write so clearly! Really looking forward to playing The Witness. Another masterpiece, no doubt.

  3. Hey Jon, I’ve been looking forward to The Witness ever since I first heard you were making a new game. I really appreciate these posts as they always teach me something and they keep me interested in the game :D
    My question is, what are you going to do in regard to YouTube videos and the like? Are you going to allow people like me to make Let’s Play series of the game with commercial-use? I understand that copyright is always a dodgy business, but surely there is a way that you can support the Let’s Play and walkthrough industry, especially considering how much free marketing it get’s you.

    • I am not a money-motivated person and I am not very sympathetic to people in money-motivated situations.

      I mean, in principle I don’t mind people doing whatever they want, but when people email me with requests like this it is just kind of gross.

      • I think you have serious misconceptions about how some people use YouTube and other sites to make money. While many people of course have making loads of money as their only goal when entering any potential money-making venture, for many (including myself) there are many other reasons to be making YouTube videos and making money off of those. I for instance plan on continuing to work in my petrol station job until such a time as I can be self-employed off of YouTube/other entertainment avenues via gaming. At that point I am going to begin setting up a framework for myself and others to more easily produce gaming videos on video-sharing sites, and make it very easy to donate some of that money to charity. My only personal monetary goal in doing this is to be self-employed from making my videos, after that all of my money and the other people I help will be going to charities.

        • All I am saying is that the tone of the emails I get often makes me queasy.

          It’s not a personal judgement (I know very little about people who randomly email!)

        • Why not just get a real job? Or even better, why not learn a skill set that is actually applicable in the real world to be a productive human being, like say… Programming or learning a science? Why not gain a skill that is usable in life to live?

          Capcha: Ate mooney

        • Please forgive me, your intentions seem genuine, but there are better ways to make money than off the back of YT videos. Asking permission for such a thing under the premise of it helping to promote a game sounds a bit desperate, let people post and share freely is what will help promote a game, not a money driven attempt with promises of donating to charity. It’s yet another layer of obfuscation a developer doesn’t want to have to deal with.

          Jonathan, I watched Indie Game twice on the weekend, your story is inspiring, whilst I’m an app developer, not a game developer, it was interesting to see how much you stuck to your prototype. It’s like you had a clear vision from the outset. I have trouble with indecisiveness , especially when it comes to the design of an application, I have in my mind how I want something to look and work, but sticking to those principals I find hard and often I meander down the path of many fruitless projects due to my indecision. How do you go about managing your ideas, do you draw out each level as thoughts come to you, I have literally hundreds of PSD mock-ups for apps that will never come to anything, maybe I hesitate too much, but I find that trying to be productive when you have so many ideas often leads to failure.

          It’s like Braid and Fez, both stuck to their original game mechanic ideas, which was the key point of interest to the games and something that you can then build a story around. App design is similar in that you have a core idea, a selling point, but this often gets lost into the ether by future ideas. It’s like having the idea for Braid, only to keep building and building on that key idea until it is no longer relevant. Keeping things simple and disregarding irrelevance is hard, my only hope is to gauge my ideas, but that in itself is hard when you are only a one man team.

          • it’s good to know that if i’m nuts for watching the doco back to back, i’m not the only one :p

  4. Hmm, interesting. I never thought about how Steam handles updates before. No wonder a lot of updates I’ve seen for Steam games are in the gigabyte rather than megabyte range.
    Because of this, would you recommend multiple zip archives, or using hand-made compressed file types rather than one big .zip? I know that the sizes of the individual zip headers would stack up eventually to be greater than the savings of compression in the first place if you had too many of them, but what about one zip for maps, one for shaders, one for textures (assuming these are uncompressed in the first place), etc? I would think this would cut down, at least slightly, on the final update size, assuming updates aren’t changing all of the files anyway.

    • I will post recommendations when I get everything figured out. I don’t think having multiple zip archives would really solve anything. I also don’t think zip header size would matter tremendously — it would be a relatively tiny bit of overhead.

  5. Super pedantic perl bug: $#sorted is the index of the last entry in @sorted. Thus it is one less than the number of entries.

    • Oops!

      • So I guess we never escape the off-by-one errors, at least not without foreach or some such to handle that nonsense for us.

        • It’s just that I never program in Perl so I don’t remember what $# really means. If you are using a language that you almost never use, especially one as crazy as Perl, off-by-one errors are one of your milder problems.

          • (e.g. my frequency of off-by-one errors in C++ is pretty damn asymptotically close to 0%).

          • Ah, well I thought that the rarely used nature of a Perl script was probably the culprit. It is nice to know that there may be hope for me someday to not make those kinds of mistakes in my common coding.

            The only reason that I found one strange bug was because I upgraded my compiler and it now warns about assignment inside of an if statement. I have been considering turning on all of GCC’s compiler warnings, as that might be conducive to a safer coding style. I am very worried about falling prey to the “it compiles, so it’s fine,” style of programming. John Carmack’s recent focus on static analysis has piqued my interest, as that may be one of the best ways to encourage safer coding.

            Either way, my coding style has certainly improved over the course of the project I’m currently working on. I have to put aside my desire to refactor things for the sake of completion though. Maybe on the the next game, I say.

  6. Huh. The way that you describe Steam doing patches is totally brain-dead and it would be pretty trivial to make patches only send the bytes that have changed (not one megabyte chunks). Then developers wouldn’t have to go through crazy rigmarole like this.

    There are already lots of archivers that provide good incremental patch facilities, so I have no idea why they have rolled their own terrible version of this.

    • Hey man, I hear Valve is hiring…

      • And there is a plus side to this, which is that we will be looking into why the contents of our mesh and shader files are changing in ways we don’t expect. Maybe we will find uninitialized memory bugs or something like that.

        But yeah, it is better if you are doing that on your own schedule as you want to, rather than needing to do it to keep patch size down. It looks like whatever is causing our mesh files to differ is changing just 32 bytes at a time (4 8-byte chunks) … but often the meshes are hundreds of K or a couple of megabytes…

  7. Loving the regular updates these last couple of months! Much appreciated. Even though a lot of it goes over my head

  8. Wow, interesting. I never really thought about how Steam updated and installed patches. And even though it doesn’t have faults like this, you have to admit it’s pretty clever. I bet that a lot of updates that would be massive otherwise (e.g. if you were to download a patch installer) are squished pretty well.

  9. Not so relevant to this most, but maybe The Witness and Braid in general.
    Have you seen this? http://penny-arcade.com/patv/episode/mechanics-as-metaphor-part-1

  10. This sounds a lot like the problem the Linux packaging guys have solved to some degree with delta RPMs realised in .

  11. Wow. I would not have thought about this. Specifically, I would have dismiss it because from one compression to another, the Huffman tree might change, causing all subsequent bytes to change. Is that only a theoretical problem, or maybe even not a problem at all?

    Definitely something I’ll try out, though. Thanks for the tip!

    • We don’t compress the contents of the zip file; it is just used as a storage format.

      • Since you aren’t using compression and you know that Steam uses a 1 megabyte chunk you could try forcing certain groups of files to start exactly on the chunk boundary. The zip file format looks like it would be fairly easy to calculate the size of dummy files required.

        Assuming you keep sorting by file extension, you could force alignment at the start of each extension. And for larger file types (eg: textures), even at the start of each letter of the alphabet.

        This would increase your overall package size a little, but it would also give a little breathing room for files to expand before they cause a shift in subsequent chunks.

        • That still strikes me as an insane workaround on the dev-side. Given the frequency of patches, it wouldn’t surprise me if Valve is working on improving this procedure.

  12. I don’t know anything about Steam’s compression format but I’d be willing to guess it uses some form of RLE first. The naive thing would be to align every file to megabyte chunks and waste all the space, letting Steam’s compression take care of the zero runs. That’d minimize patch size, but you’d end up wasting tons of space on people’s computers. A step up from that could to be lay everything out like an OS kernel allocating memory, with different chunks for different sizes of data. You’d then pad each chunk with zeroes as before, but you’d waste much less space. Then you’d at least benefit from changes to files only changing (at most) the entire chunk for data of that size (e.g. changing a 1KB file may change an entire 1MB “small” chunk, but it won’t affect a huge 100MB chunk sitting next to it).

  13. Completely off-topic, but I have a gripe with the layout of this site’s comments section.

    I don’t understand why answers to comments are being put into their own frame, thus separating them from the parent comment. My brain is having a hard time associating the right comments with each other.

    Why not put every new comment into its own box and putting answers to it into that same box? And maybe let child comments only be one layer deep for simplicity’s sake?

    Just a thought. Great informative post as always.

  14. Why don’t you just use quake style pak scheme, where patch just adds another *.pak file. It would be much more Steam friendly, flexible and faster to implement than file sorting or similar hacks.

  15. Hey Jonathan – kind of off-topic – just watched an interesting talk of yours to some college students where you mentioned you wrote a lot of compilers when you were younger. Was wondering if that knowledge has been useful in your game codebases, for special domain-specific languages, custom tailored scripting syntaxes etc., and if you had any thoughts on that as a game tool in general.

    Had some ideas for a scripting language in a game I’m working on, but not sure the investment of time up-front would pay off over just keeping things in the primary language.

    • One of the biggest things I learned in making all those languages is that using custom languages is a pain in the ass and is usually not worth it. Ideas behind extension languages are usually neat and cool, but the overhead of using the thing usually far outweighs any benefit (for competent, productive programmers).

      There are some definite use-cases for this kind of language (e.g. one wants designers to be able to script things in a game without crashing the game or infinite-looping, etc), but outside that kind of domain, I have yet to see the value. (Well, there is just the value of being able to code changes in a running system, but that is sort of independent of language choice). Since I am the designer on The Witness, I just type all the “script” code in C++. It’s totally fine.

      • Thanks for the insight – sounds like it’s probably more trouble than it’s worth in my situation. Faster iterations by being able to modify a script at runtime was definitely something I had my eye on – but most modern IDEs can do that in some form now with the compiled languages, so not an overly compelling reason on its own.

        • Actually the “Edit and Continue” stuff in Visual C++ just doesn’t work right. I think it works for trivially small programs, but for reasonably-sized projects it is just broken so nobody tries to use it.

          What we do is just keep our compile and startup times low. The game takes a few seconds to do an incremental compile and maybe 5 seconds to start up.

  16. Nice piece on CBS. It was cool to see your studio. I’m glad the game is generating interest outside the gammer world.

  17. Surprisingly well researched for a mainstream news story. Although I still cringe when every game related story has to start with how much money the industry earns as a whole. Nobody starts stories about developed mediums this way. We are still a cultural niche until we break the mould of what games are and can be. We can’t just accept cheap junk food experiences, we need real nourishment.

    Thanks for the link.

  18. Just for better understand of what happens here:

    If you were to only insert a single byte in the middle of the file, and then upload it to Steam/let Steam patch it: How much data would be transferred?

    Every chunk after the inserted byte? Or just 1 or 2 chunks like it would happen with the rsync algorithm, which uses rolling crc-checksums to find the byte-offset where existing chunks should go with minimal data transfer

  19. A really interesting blog post. Just wanted to speak out and say that I’m looking forward to buying your game on Steam when it’s ready. I really enjoyed the depth to Braid so excited to see what’s next.

  20. Hi Jonathan,
    I’m curious. When you work with C++, what API’s did you like to use for development? I am a personal fan of Allegro 5 for 2d games, but with my inexperience in game development, I don’t know if there is a “better” way to code. I imagine that most of it comes down to your style of coding though.

    • Aside from Direct3D to interface with graphics (or OpenGL for other platforms, which is now in progress), it is pretty much all our own code. We use Bink to play movies.

      As a beginner it makes sense to heavily leverage a lot of APIs to do whatever, but as you try and do more sophisticated things, you often find that said APIs are preventing you from doing what you want, or making it more work than necessary, or making your game behave unreliably or feel bad or are crappy in some other way. When you have control over the code yourself you can fix those things.

      • “We use Bink to play movies.” So you will have video recordings as well as voice ones?

        • It depends. We may, may not; we’ll see how it goes.

          • That would be so amazing! I remember a picture in a magazine where there was a lone chair in front of a big screen in a room. What would the videos be about? It brings so many questions! It just sounds so cool!

            I could see the player just sitting down and looking at the big black screen and then one of Jonathan Blow’s speeches is shown, That would be awesome!

  21. Daniel Ribeiro Macie

    Why Perl? (I loved what you said about XML, and I believe Perl is in the exact same category as XML.)

    • Perl is kind of an icky language and I would never use it for serious programming, but it is crazy and fun and it works differently enough from other languages that it is a little mind-expanding to use.

      XML, on the other hand, is like something invented by an IP lawyer who has no tastebuds and no sense of smell.

  22. Shout out to my boy justin, for keepin’ it creepy and weird on every single one of these blog posts.

    • I don’t know if Jonathan has any desire to address you publicly, but I’ve been following this blog for as long as you’ve been posting on it and I can tell you Justin that your behavior has become increasingly erratic and rude.

      Talking down on people and responding in a negative or judgmental way is not necessary on a public forum – or anywhere else in life. Some of the people you’ve responded to are surely industry professionals who would otherwise never tolerate being addressed in such a manner.

      I know for certain that if I replied to a client email, comment or request in a manner similar to how you’ve spoken to people on Jonathan’s blog that I’d either get fired or, more importantly, ruin my relationship with the client.

      In any industry, your most valuable asset isn’t the product you produce or service you offer, but the people who value what you do. After all, they pay your bills, land you contracts and offer you a wide network of support when you need it.

      That has almost nothing to do with you personally, but it affects Jonathan’s blog in that it makes it difficult for it to be a place where moderate, professionally-minded people can express their opinion without having to feel somewhat embarrassed or confused as to the responses they receive.

      And so please, out of respect for Jonathan’s blog, take a few moments, or perhaps even a day, to review your commentary before posting it. There’s no rush. Take your time.

    • @Justin: Wow! No omg! This is not me, justin! I don’t know who this is but its not funny, why is there so much heat on me here.

      @ Mr. Verdon: huh… err.. ok, not being creepy, just trying to learn more about the game, yes i comment on every post but not to be weird!!

      @Pritchard: Hmm… Yes I understand. I never saw me commenting as having a big influence on the game and the buyers of it it self. Never thought that addressing people here and their questions would hurt any body. I was trying to just answer questions…

      • Hi justin,

        I apologize if I mistakenly associated you with the “justin” above. Since this blog permits anonymous posts (on the user-end, at least), I had assumed you were the one who had replied to Mr. Verdon and may have spoken out of turn.

        If the post above isn’t you, shame on them for making that post. I don’t think anyone is fine with the idea of someone unjustly tarnishing their public reputation.

        Since your other posts on this blog are fairly moderate – although a few remarks could be misinterpreted as being snarky – I encourage you to continue posting and taking interest in Jonathan’s game. I only hope that in the future, no one decides to act so foolishly while using another user’s handle.

        – Pritchard

        • I know where Justin (the real Justin) is coming from. It’s easy to become so enthusiastic about a game that it appears to border on obsession. This isn’t a bad thing, per se, but it can be interpreted wrongly, especially when the posts can veer into the off-topic, skimming on the surface and not deep into the substance of the stuff. I’ve done it many times in my career on the internet.

          Justin is very prolific on this blog, as anyone who follows it knows. My suggestion to you, Justin, is to approach this forum with a lighter touch. I myself check this blog every day and its comments’ section, but I never find myself having anything to post; this isn’t my area of expertise. Consider all the posts above and below yours: it’s mostly techie stuff. So you stand out. That’s not a bad thing, but it’s a noticeable contrast, and that can be jarring.

          So if this ends up bothering you, just take a step back, see what’s going on. Where are you needed? Where do you want to express yourself? There are plenty of opportunities, right here. I guess it’s a matter of extent. :-)

  23. I check this blog everyday for post and comments everyday. I have been following this blog for more than two years already (since Jonathan linked from the Braid blog which I was following first)

    I am very excited for this game I have never anticipated anything so much in my entire life and I like Jonathan Blow a whole lot and he has influenced things I do and care about that have nothing to do with video games.

    I post on this blog to start conversations about the game and what it could be and might be about (before we knew what it was or that it was 3D, we thought earlier it would be a point and click adventure) No body really cared to talk about the game so them I was commenting to voice opinions about what I thought would be interesting ideas in games that would be original and haven’t seen. Basically I commented for trying to communicate with Jonathan and other people here (I guessed if you are here: 1. You liked Braid, 2. You like Jonh, 3. You are smart, 4. you like good games)

    If you look at my comments since the very first post I was just curious and disusing what this could become and later sharing links to Jonathan’s talks, interview, The Witness previews, Youtube videos, Braid talk and Jonathan’s name being made fun of. I even once saw a Preview of The Witness in The Official PlayStation Magazine and I immediately bought it, ripped the the pages, went to my friends house, scanned it and posted it here in the comment section (and it was only a one page mentioned). Then I did the same with the GameInformer Mag preview (Which were 8 pages) I scanned them and posted them here, not to “spam” or whatever but to help people here look and give more information about it . I never meant to hurt the game, John or anyone else here. From the begging I just wanted to help.

    I even post these blogs post one news hubs to let people know this exist and tell them about Braid, about John and the Witness. I go on the news hubs that I have an account as “contributor” and post the interviews, talks, Magazine scans, lectures, previews and just about everything I can find about Jonathan and his games. This is a person I care deeply about and I want more people to care and stop talking bad about him and stop calling him names and saying he is “pretentious” or “has a huge ego” because is not true! (actually in a certain game hub channel someone must *really* like Jonathan because every entry passes trough reviews and needs votes. 10 votes to be approved and get passed to the main page, but when I post there about The Witness they get 3 or 5 votes and the submission is immediately accepted! must be a moderator?)

    I share these post where ever I can! I actually rented a server for Battlefield 3 for PS3 and you know how you can put your clan picture and rules? I put the picture of The Witness “Spawn point door” and in the rules I put “The Witness by Thekla Inc. Coming to PC, iOS in late 2012” I made it a Conquest Domination map on the Hardcore mode rotations, because that’s the only mode really good players play in! And yes I know Jonathan hates advertising and all that stuff but I think this is totally OK, because is us the fans doing it and not him, and in my case I’m not being paid or doing anything for money butt this is just something that I like and care about a whole lot and this is why I want to share it because i think it’s really worthwhile and important and good. So me “Advertising” is ok because there is no money involved and its just me, the fan just some kid talking about video games and a man he respects alot.

    Something else I wanted to explain: I know people here ask a lot of questions in the comments and these are directed to Jonathan and I answered alot of those and later I say something like “I know you didn’t ask *me* but I answered so sue me : P ” I do this often because I know and John has a fame of not answering questions (He doesn’t want to or is busy making a game) so since I pretty much know everything about Jonathan and the games I just answer question but to help not to be rude or whatever! I know when the game will be release so when someone is new here and ask “When does TW come out? What platforms?” I just answer TO HELP “It comes out ‘when ever is done (sometime 2013)’ right now for PC (Steam) and iOS (iPad)” because I know!

    And yes I’m not a technical person so on tech heavy post I have asked “wtf is that supposed to mean” and people answer and explain! Because I’m 18, in college and working on a masters for Computer Science because I want to be a Programmer like Jonathan too! so that’s why I comment on tech post. To better understand and it has helped me irl reading the posts here about development!

    I’m not here to start internet wars or spam links or whatever. I’m just here to help out and if you say that I should stay out of the comment section and read the posts only and leave maybe you guys are the one that shouldn’t scroll lower to the this section. If you are too sensitive or have a weak heart to internet arguments. I’m the guy that’s here to help…

  24. Let’s go with ‘infamous’ ;)

  25. oh lol. completely missed that :P

    p.s. anyone else feel like they must be a spam-bot because they can’t decipher the reCaptcha? or just me?

    • Yeah, just got confused with the “Most Dangerous Gamer” article, which is only slightly better in titling, but much better in terms of content.

      Yeah, have you ever clicked the little question mark and read about reCAPTCHA in depth? I think it’s like one of the coolest things in the world. Kills two birds with one stone. It’s sorta marvelous from a standpoint of engineering. I always love solutions to unsolved computing problems that put humans in the loop, even better if most of those humans don’t even know they are doing work.

      • my god that is awesome :O

        • but now i am confused…. i don’t get how you can solve both problems at once =\ if you don’t know what the word is, how do you test if people are getting it right in the first place? I get that it’s easy enough to use everyone’s guesses to choose the most likely word, but that does mean either letting everyone through to start with regardless what they typed, or having someone type out all the words to start with anyway.

          Also, have you ever tried to use the audio challenge? surely no one has ever been able to do anything with that

  26. Definitely very valuable information. Some thoughts:

    As already asked above, what happens if you insert or remove something in between? How much of the file gets updated. Hopefully only a few chunks to ‘fix’ those areas, otherwise it would be impossible to make a patch that doesn’t just replace everything.

    You mentioned that you don’t use the zip compression, only the zip format for archiving. Honestly, that is the worst decision you could have done. The zip format is horribly inefficient. Just to locate a file you have to jump back and forth several times. So as an archive it’s crap. If you then don’t even use the compression of it, that’s just a total waste.

    My advice: Use zlib for the compression/decompression algorithms, but implement your own archive format. It’s easily done in 2-3 days (including tools), you have full control over the file layout, you can optimize the header and the overhead and actually using the compression helps a lot. With typical content you can at least halve your package size. Compress every file individually (just as zip does). That should also tremendously reduce your patch size.

    And of course, please make a few more tests how steam handles such changes and post the results ;-) This is very helpful for people that don’t have access to the steam tools.

    • You actually don’t want to compress your data with e.g. entropy coding because this tends to defeat patching. Stuff is compressed *after* diffs are done, by Steam, so we don’t worry about that.

      For distribution on other services that don’t compress, we would do something else, but it would probably just be to wrap the whole game in an installer package that decompresses at install time, so nothing about the game’s actual data changes.

      As for zip files, I don’t remember exactly how the format is set up, but I am pretty sure it does not impact our performance profile, so it doesn’t matter. Plus we get the benefit that third-party tools like Beyond Compare understand the format (I have already used this fact; was pleasantly surprised!)

Leave a Reply

Your email address will not be published.