Concatenative language

There are many ways to categorize programming languages, and one way is contrasting concatenative and applicative. In an applicative language, things are evaluated by applying functions to arguments. This includes almost all programming languages in wide use, such as C, Python, ML, Haskell, and Java. In a concatenative programming language, things are evaluated by composing several functions which all operate on a single piece of data, passed from function to function. This piece of data is usually in the form of a stack. Additionally, in concatenative languages, this function composition is indicated by concatenating programs. Examples of concatenative languages include Forth, Joy, PostScript, Cat, and Factor.

Concatenative and stack languages

There are two terms that get thrown around, stack language and concatenative language. Both define similar but not equal classes of languages. For the most part though, they are identical.

Stacks are a pretty fundamental concept in computer science, and many languages use stacks internally in the implementation. Any language that allows recursive definitions uses some type of call stack to save return addresses between function calls, and often the same stack is used to spill values which cannot be allocated in registers. However, this is just implementation detail, and this call stack is not exposed directly to the programmer (except in languages with first-class continuations; I'll touch upon this later).

So what makes stack languages different? The key concept here is that there are multiple stacks: all stack languages have a call stack to support recursion, but they also have a data stack (sometimes called an operand stack) to pass values between functions. The latter is what stack language programmers mean when they talk about "the" stack.

Most languages in widespread use today are applicative languages: the central construct in the language is some form of function call, where a function is applied to a set of parameters, where each parameter is itself the result of a function call, the name of a variable, or a constant. In stack languages, a function call is made by simply writing the name of the function; the parameters are implicit, and they have to already be on the stack when the call is made. The result of the function call (if any) is then left on the stack after the function returns, for the next function to consume, and so on. Because functions are invoked simply by mentioning their name without any additional syntax, Forth and Factor refer to functions as "words", because in the syntax they really are just words.

One mutable stack or functions from stacks to stacks?

Sometimes, people talk about words pushing and popping values on "the stack". Other programmers, mostly those who prefer the term "concatenative language", instead talk about words as being functions which take a stack as input, and return a whole new, possibly different, stack as output. The first point of view is more intuitive, and it is also how most implementations work: there really is a location in memory that is the data stack. The latter is more amenable to formal reasoning: it is easier to work with rewrite rules and type systems if your functions are pure. However, these two points of view are equivalent: because in a given thread of execution, only one stack is "live" at any given point in time, updating the stack in-place has the same semantics as the purely functional world view. More details about this can be found in Manfred von Thun's paper Mathematical Foundations of Joy.

Fundamentals of concatenative languages

Concatenative idioms

Other interesting properties

Publications

research paper

Publications

This revision created on Wed, 27 Apr 2011 09:08:23 by PhilipWalker