A note about programming language design

This is not about The Witness, but it may be relevant future games that we do! In my spare time I am building a new programming language. Today I made a bunch of tweets about something I was thinking about there... but the ideas are hard to follow on Twitter so I am crossposting them here. I started out by lamenting that so many people thought that a language without member functions was unthinkable, then continued:


 

Even the idea of UFCS is just a massive overcomplication. It only seems simple compared to baseline C++.

I should say that the biggest problem I find with the idea of member functions is that they are extremely anti-code-reuse. I know this may seem paradoxical, because the whole point is to encapsulate code in an easy-to-reuse way, but I'll give an example.

Let's say you have some basic data type like a 3D Vector called Vector3. You want everyone to be motivated to use Vector3 so that there's no friction when passing data between the main program and various libraries, etc.

All that is necessary for compatibility in a fast program is that everyone agrees on the data layout of Vector3. But as soon as you tie member functions to the data, you are insisting that everyone needs to think the same way about the data that is in a Vector3, and operate on it the same way. And that will never be true, because different code needs to think about data types in different ways; and also because programmers express their personality in programs, and different programmers want to think in different ways about the most common data types.

So if you force people to think a certain way about Vector3 or other basic data types, you are motivating them (or *requiring* them) to go against that grain and make their own alternatives. Then the system is fragmented and you have all these data types that need to be converted again. Or, if people don't do that, they just end up passive-aggressively hating the task of programming, which is not what you want.

So in reality, you want Vector3 to define only a common piece of data to be exchanged. You can provide default functions for operating on Vector3, but in a way that people can easily shadow them with their own functions or else use a completely disjoint set. That is just not what most "object models" encourage. Thus they put huge amounts of friction onreuse in this particular kind of case, which is very common and very important.

(I wish people would stop telling me that member functions are good because you can do obj . and get a list of procs in the IDE after typing the dot. You can obviously do this with flat functions too. Please stop telling me this kthx.)


 

Because this came out as a series of tweets, it is probably less coherent than what I would have written if it were a straight-up essay, but hey, this is how it went.

29 Comments:

  1. i absolutely agree with your sentiments about methods, in my language though i used the familiar method-like syntax (kind of an expansion of golang’s receiver argument) for a totally different purpose, and in the process saved some syntax elsewhere (const refs and ptrs). you can define a func like this:

    func (i32 by_ref_arg1, i32 by_ref_arg2)my_func(i32 by_val_arg1, i32 by_val_arg2) {…}

    so at the call site it’s clear which args could be modified by the function, and which are read-only:

    (a, b).my_func(2, 4)

  2. This seems obvious. I haven’t read any of Bjarne Stroustrup’s work, unfortunately, so I can’t really discuss it. The modern object model has become a de facto standard in business applications for defining these relationships. Concepts have been slightly modified or extended to support defining contracts and APIs between services.

    I believe that your APIs do matter, and that the way things are expressed does matter. I also don’t buy into the idea that I shouldn’t need to know how anything works under the hood, because more often than not, that’s where the actually difficult problems lie.

    I think OOP and OOD has largely been a failure, and rigidly and inflexibility provides certain relationship or expressive mechanisms which some people find useful. I haven’t followed your language enough to comment much further, and my language design experience is next to none, but I’m not opposed to closures and some alternative means to OO-mess to establishing contracts between data and processing algorithms.

  3. Are you actually arguing against data abstraction in general? That surprises me. Maybe I’m misunderstanding.

    A Vector3 is a very simple case. It’s unlikely to ever consist of anything but 3 numbers, probably indexed as 0, 1, and 2. And any three numbers make a legal vector, and callers probably want to access the individual numbers, so you might as well let them (though you might want to make them read-only). But for anything much more complicated, if you just let all callers access the raw data without imposing some rules, you’re asking for trouble. (Sorry to be so vague; don’t have time to write an essay on the reasons for data abstraction.)

    Not that I think you should always impose strict abstraction on every data object. There are different considerations for every project.

  4. If I get the sentiment right, your problem is that code reuse is hampered by the rigid data structures (really, data layout in memory) functions and methods are designed around.
    The solution to this problem in C++, Rust etc. is generic functions. Although ideally, they should be combined with “concepts” to allow the compiler to check proper usage. What You’re suggesting – disconnecting algorithms from data structures – is already possible, if at times unwieldy. Is this the problem You’re after?

  5. I think what you’re getting at is the “expression problem” – see https://en.wikipedia.org/wiki/Expression_problem . In functional languages, you often fix the data type and allow defining new operations against the data, while in OO languages you often fix the operations, and allow defining new data types that can implement the operation. Allowing extension in both operations and data at once is difficult…

  6. Haskell has an interesting take on this with typeclasses, Clojure with multi-methods, and Objctive-C less interesting with Categories.

    The idea is: allowing types to be extensible, and to use functions on them by properly controlling a static namespace. Theferore they feel likes functions, but they are polymorphic on the objects type (or with multi-methods, of the type of all or any of the arguments).

    Worthy mention: Prolog replaces this with whole notion with unification, which is a more general and powerful concept than patter matching / ad-hoc polymorphism (it is aslo a lot harder to reason and implement).

  7. It is such a pain, when programming comes to be about conform to rigid, massive API’s, and worst of all “Frameworks” (at this point it doesn’t feel like programming at all).

    Recently I figured out that OOP is kinda BS, and I’m unlearning it. Golang has been great for me on that matter (I’m using it in both work and in a toy project).

    Recently I’ve been assigned in a task, and for it’s nature it was well suited to Python (“company’s native language”). At some point a “Object pattern”
    started to appear heavily, and I said: “Ok. time to be mature and reasonable and take advantage of some real OOP, and put aside my recent feelings about it”. At the first couple days it was all great, and the more functionality I stuffed on my “main object” the easier thinks became, but didn’t took long before it started to smell.

    This should be really obvious but: This kind of tight coupling start to fall apart when you have to change things more deeply. You usually have to that once in awhile, unless you’re doing BigDesignUpFront (which is another counter-productive pain in the ass in most contexts).

  8. The have already tweeted the following:

    This is why I like the way Go deals with member functions. It is a good compromise in the sense that you get clean data structures to share and reason about with the benefits of function fluidity, code reuse, in addition to a convenient mechanism for operating on that data without the rigidity of C++ member function. An example can be found here:

    https://golang.org/ref/spec#Method_expressions

    I use strictly C in the embedded space and I get real tired of the following pattern:

    struct hardware;
    struct hardware_ops{
    int (*read)(struct hardware *hw);
    };

    struct hardware{
    char data[MAX_DATA_SIZE];
    struct hardware_ops ops;
    };

    static int hardware_read(struct hardware *hw)
    {
    return 0;
    }
    void hardware_init(struct hardware *hw)
    {
    hw->ops.read = hardware_read;
    }

    struct hardware hw;
    hardware_init(&hw);

    /* Just to do this: */
    hw.ops.read(&hw);

    Lastly, golang got so much right in so many ways but the GC – can’t have pauses in critical medical devices.

  9. Slight typo: I meant “I have” instead of “The have” in the very first sentence.

  10. I pretty much agree with what younare saying here, however, the method call syntax (object.method()) is very convenient, understandable and nice looking. Sometimes it really makes the code look alot more clean.

    One way to solve this problem is to make regular functions callable with the dot syntax, so that you could rewrite this:
    function(object, somethingA, somethingB);
    as:
    object.function(somethingA, somethingB)
    Basically, when using the dot syntax, the parameter before the dot goes as the first parameter of the function.

    This semantics would allow for things like:
    x.factorial();
    “%, %”.format(x, y);
    and so on.

  11. I see your point with a Vector3, where the data storage is equivalent to the data structure. But don’t you find OOP useful for structures that don’t well reflect the underlying storage, or that need maintenance? Examples might be storing a binary tree as a flat array (where the indexing can get confusing), or a red-black tree that needs to be rebalanced according to some hidden information.

    • Not really. I think though you are confusing “OOP” with “providing an abstract interface for some data you don’t need to think about the specifics of”. The latter I do all the time, yes.

    • In my opinion “extension methods” are just a further complication of something that should not have been complicated to begin with.

      • Not really:

        > So in reality, you want Vector3 to define only a common piece of data to be exchanged

        Precisely how it’s done in MonoGame PCL.

        > You can provide default functions for operating on Vector3, but in a way that people can easily shadow them with their own functions or else use a completely disjoint set.

        Precisely achieved by extension methods.

        > I wish people would stop telling me that member functions are good because you can do obj . and get a list of procs in the IDE after typing the dot.

        I wish people would stop saying object model is bad just because it can be abused :).

  12. I find prototypal languages (ECMAScript, Lua, etc.) seem to provide the most expressiveness in this scenario. Since the inheritance hierarchy is dependent on objects, and the objects are mutable (including their functions), you are able to extend the objects as needed for the scope needed.

    However, I’m not aware of any prototypal languages that would be suitable for performance critical systems like game engines (not saying they don’t exist).

  13. I like to think that member functions is just a syntactic sugar.
    1. any function is static (not member)
    2. you can call any function both bar(foo) and foo.bar(), and the only difference – is where you put your first argument.
    So name “member function”, or method, is just a synonym to “use this object as the first argument”. And declaring function as “member” – is just sugar for declaring function with specific first argument. And “dot” – is just a binary operator that gives obj from the left to the function on the right.

    There are many profits:
    1. you can easily add any “extension” to any object by just declaring usual function.
    2. your IDE knows what to show after your “obj.”
    3. it is simplification for user, not a complication (as extensions, for ex.)

    My ideal language will behave just in that way.

    (I think that ArrayList would be a better illustration than Vector3)

    • Being able to call member functions with the same syntax as regular functions (and vice versa) is very nice – just so long as you don’t loose sight of the value of having a closed set of member functions.

      Member functions have privileged access to the data inside an object. It is often crucial that object data be accessed in a particular way to ensure correct function (eg: locking mutexes in the correct order). If the designer of a class cannot restrict the access to a known set of functions at design time it becomes impossible to write robust code.

      Still – I do like the idea of being able to call free functions, which are themselves only implemented using the original class interface, as if they are members.

  14. Hi Jon!
    Are you aware of the Jai Primer wrote by Jorge?
    https://sites.google.com/site/jailanguageprimer/

    Regards,
    Tom.

  15. I find UFCS to be a really good take on programming. It’s “only” a syntactic construction, that allows expressing code as clear as possible in its’ context. Almost.
    I’m currently working on Onyx language, where I’m pondering (leaning heavily towards) implementing UFCS.
    The pro is, in Onyx, _only_ member funcs (“methods”) of a type can access private state – but you can of course make getters/setters simply with sugar, and they of course compile to direct accesses in machine code – no loss. And _if_ you need to add logic, snap: done, _if_ you need to add locking for concurrency, snap: done.

    The model works around defining a type and the smallest set of methods necessary to work with its state, the rest of logic is likely best off implemented as “free functions” (which then brings us to UFCS), since they can match and cover several types horizontally too (ducktyped – “it has the methods needed? It works.” [Of course sumtypes, virtual (“ancestor”) types, generics and unique typing is fully available also – pick and choose, mix and match.]) (DRY!), and has to go through accessor methods for the data (encapsulation).

    Pretty much win win in my book.

  16. You say: “the idea of UFCS is just a massive overcomplication”

    This seems to me like a bold statement in need of some justification. Like, I’m not advocating for writing games in lisp, the library support isn’t there. But from the perspective of a brand new programmer, how is uniform function call syntax an overcomplication?

    btw: I loved The Witness, thanks for making it :)

    • How is it *not* an overcomplication?

      You have two totally parallel systems and syntax-sets that do the same thing. (“Object method calls” and “procedures taking a struct pointer as their first argument”). Why?

      • Oh, nevermind. I thought UFCS was something else for some reason. I was thinking of the way lisp’s built-in semantics have the same syntax as user-defined semantics. Yeah, I agree UFCS doesn’t solve the problem you are describing here. Python has something like that and it always irked me. Like, if you are locked into a classical OOP mindset and everything is always a method.. maybe it makes sense.. but then why did you want UFCS to begin with.

        I kind of like the way JavaScript does this (I also hate the way JavaScript does this for other reasons). In JavaScript methods are just functions that happen to also be properties of objects. In a weird way it’s almost the opposite of UFCS..

Leave a Reply to Daniel Ribeiro Cancel reply

Your email address will not be published.