Concatenative language/Name code not values

In a stack language, your program is a sequence of words and literals -- literals are pushed on the stack when encountered by the evaluator, and words are either primitive or they are subroutines, themselves sequences of words and literals.

Values which are passed between words -- "parameters" -- are not named, and are not referenced directly by the language. Your values can be as rich and complex as you want: Factor supports a rich set of collections (arrays, hashtables, etc.) as well as user-defined classes with named slots. However the values themselves are not named. Just like in a language like Lisp or Java, in Factor we still name the following:

  • Words (functions)
  • Classes
  • Slots (instance variables of classes)
  • Global variables
  • ... and more

However, we don't name parameters to words.

Because your program is just a sequence of words and literals -- and any sequence of words and literals can be "factored out" into a new word, you end up naming operations, not values. And any repetition at all can be factored out and given a name. The importance of this cannot be overstated. In applicative languages, it is sometimes difficult to take a set of statements and expressions, and extract it into a new function. At the very least, it requires identifying the free variables in this expression, and replacing the expression with a function call that passes these variables to a new function. However, if the code snippet assigns to some local variables, you might be unable to factor it out, or you will need to pass the mutated values back in some kind of data structure. All the modern Java IDEs have "extract method" commands which automate this process. In a stack language, "extract method" is just cut and paste.

When programming in a stack language, many patterns become apparent that you otherwise would not have thought of factoring out at all. For example, the plumbing between words -- the data flow -- becomes explicit, and it too, can be factored out.

Here is a common pattern in applicative languages, using a C-like pseudocode:

var x = ...;

You might not even consider this to be much of a pattern, and certainly not something you would bother naming or abstracting out. In Factor, we have a combinator (higher-order function) called bi, which does exactly this: it takes a value and two quotations (anonymous functions), and applies each quotation to that single value. For example, if the value x is at the top of the stack,

[ foo ] [ bar ] bi

applies foo to x, then applies bar to x.

Note that instead of naming the value (x), we name the data flow pattern (bi).

Here is another dataflow pattern:

var x = ...;

In Factor, we call this dup. If the value x is at the top of the stack,

dup foo bar

applies foo to x, then applies bar to x and the result from foo.

Now consider this applicative code, in a language that uses the . infix operator for instance variable access and method invocation:

var customer = ...;
var price = customer.orders[0].price;

In Factor, this type of code looks pretty natural, because you start with the first object on the stack, then you continue to access members (accessors consume one stack value and leave with one stack value) until you're at the end of the path. So if the customer object is at the top of the stack, the above code translates to:

orders>> first price>>

Accessors in Factor are named slot>>; various other stack languages use different conventions, but the idea is the same.

However, let's try taking the above example further. Suppose we want to perform the same operation, except any one of the intermediate objects -- the list of orders, the first order, the price, even the customer object itself -- could be null (in Factor-speak, f, the boolean false and canonical "missing value" marker). Well, we can't have a nice one-liner anymore:

var customer = ...;
var orders = (customer == null ? null : customer.orders);
var order = (orders == null ? null : orders[0]);
var price = (order == null ? null : price);

So we've had to turn a one-liner into 4 lines of code, we've introduced 3 new local variables, each one is referenced twice but probably never again after this code is done, and we have some repetition that we cannot abstract out, the [javascript{(foo == null ? null :}] that appears three times. Handling incomplete data in general and null values specifically is a common issue in mainstream languages, so many workarounds have been developed. Let's consider the three common ones:

  • Sweep the problem under the rug and invoke the "no true scotsman" defence. No real programmer would ever want such a feature, because if you need it, your code is badly designed.
  • Using "null objects" instead of real nulls. The idea here is that you implement things such that customer.orders is never null, but it may be an empty list; and customer.orders[0] is never null, but it may be an empty order object, and the price of an empty order object is null. This is common practice in Java, C# and C++.
  • Adding a new syntax, like the . infix operator, but which returns null if the left hand side is null, instead of raising an error. The Groovy scripting language has this feature (the "Elvis" operator "?").

The problem is that none of these solutions are satisfactory; they either require unnecessary work on the developer's part, or they complicate the language with a one-off syntax designed to solve a very narrow problem.

In Factor, you would write the second example as follows, just by using the when combinator, which is your run-of-the-mill if statement with one branch:

dup [ orders>> ] when dup [ first ] when dup [ price>> ] when

If this came up often enough, you could get rid of the dup [ ... ] when repetition too, by defining a new combinator; let's call it maybe, since it is similar in spirit to Haskell's Maybe monad:

MACRO: maybe ( quots -- ) [ '[ dup _ when ] ] map [ ] join ;

Now, it is just

{ [ orders>> ] [ first ] [ price>> ] } maybe

That's pretty good considering we didn't need to use any kind of "design pattern", we didn't need to add new syntax to the language, and we didn't need to write any unnecessary code. This is just a very basic example where we can see stack languages have the upper hand, but it comes up again and again: with stack languages you can avoid unnecessary work.

This revision created on Sat, 23 Apr 2011 14:30:45 by airolson (Corrected understated to overstated.)