Concatenative language

There are many ways to categorize programming languages; one is to define them as either "concatenative" or "applicative". In an applicative language, things are evaluated by applying functions to arguments. This includes almost all programming languages in wide use, such as C, Python, ML, Haskell, and Java. In a concatenative programming language, things are evaluated by composing several functions which all operate on a single piece of data, passed from function to function. This piece of data is usually in the form of a stack. Additionally, in concatenative languages, this function composition is indicated by concatenating programs. Examples of concatenative languages include Forth, Joy, PostScript, Cat, and Factor.

Concatenative and stack languages

Although the terms stack language and concatenative language sometimes get thrown around interchangeably, they actually represent similar but distinct classes of languages. A concatenative language is not necessarily a stack language. For example, Om uses prefix notation, rather than postfix, and passes the remainder of the program as the data from function to function. See Deque for another example of a stack-free concatenative language.

Stacks are a pretty fundamental concept in computer science, and many languages use stacks internally in the implementation. Any language that allows recursive definitions uses some type of call stack to save return addresses between function calls, and often the same stack is used to spill values which cannot be allocated in registers. However, this is just implementation detail, and this call stack is not exposed directly to the programmer (except in languages with first-class continuations; I'll touch upon this later).

So what makes stack languages different? The key concept here is that there are multiple stacks: all stack languages have a call stack to support recursion, but they also have a data stack (sometimes called an operand stack) to pass values between functions. The latter is what stack language programmers mean when they talk about "the" stack.

Most languages in widespread use today are applicative languages: the central construct in the language is some form of function call, where a function is applied to a set of parameters, where each parameter is itself the result of a function call, the name of a variable, or a constant. In stack languages, a function call is made by simply writing the name of the function; the parameters are implicit, and they have to already be on the stack when the call is made. The result of the function call (if any) is then left on the stack after the function returns, for the next function to consume, and so on. Because functions are invoked simply by mentioning their name without any additional syntax, Forth and Factor refer to functions as "words", because in the syntax they really are just words.

One mutable stack or functions from stacks to stacks?

Sometimes, people talk about words pushing and popping values on "the stack". Other programmers, mostly those who prefer the term "concatenative language", instead talk about words as being functions which take a stack as input, and return a whole new stack as output. The first point of view is more intuitive, and it is also how most implementations work: there really is a location in memory that is the data stack. The latter is more amenable to formal reasoning: it is easier to work with rewrite rules and type systems if your functions are pure. However, these two points of view are equivalent: because in a given thread of execution, only one stack is "live" at any given point in time, updating the stack in-place has the same semantics as the purely functional world view. More details about this can be found in Manfred von Thun's paper Mathematical Foundations of Joy.

Fundamentals of concatenative languages

Concatenative idioms

Other interesting properties

Publications

Publications

This revision created on Fri, 10 Sep 2021 18:36:34 by AliAsshole (Replaced dead link with live one)