The Forth Language

Added 31 Jul 2008

Viewed strictly as a language, Forth is clearly in the camp of the modern, structured languages such as C, Pascal, and Modula II. It has all the proper flow of control constructs and, unlike C, does not have a GOTO. Like C, but unlike Pascal and Modula II, Forth draws no distinction between procedures and functions. A more important difference between Forth and the other structured languages is that Forth has no support for data typing either at compile time or at runtime

.The key difference between Forth and the other structured languages is that the Forth compiler is extensible. Forth allows the programmer to declare new data types and allows him to define new compiler keywords (such as control structures) and then use them immediately " even within the very next procedure. Forth even has a compiler facility for creating new kinds of language objects, if you will: the programmer can separately define what happens at compile time (when the object is instantiated) and what happens at runtime (when the object is invoked). The closest C comes to being extensible is to allow you to build record formats out of existing data types with struct.

But Forth is not simply a set of keywords and syntax rules, it is a description of a virtual machine. The most important aspect of the virtual machine is that it has exactly two stacks: a parameter stack and a return stack. Nearly all Forth words use the parameter stack for both their arguments and their results; the return stack is used for flow of control within a program and (occasionally) for temporary storage of working values. Forth lends itself naturally to recursive algorithms because of this stack-based architecture; in "classic" Forth programming, global variables are frowned on and rarely used

.The extant Forth standards specify the names and actions of less than 200 words, which suffice to define the Forth virtual machine and the behavior of the interpreter/compiler. Commercial Forth development systems are much more elaborate, and have at least 500 words resident and available in the interpreter/compiler. Unfortunately, in many Forth manuals the words are simply documented in ASCII collating sequence, and are discussed as functional groups only as an afterthought (if at all). This makes the prospect of mastering a Forth system quite forbidding.

Actually, 500-600 language elements is almost exactly the same as the combined number of operators, keywords, and runtime library functions in Microsoft C or Borland C++, and you can study Forth initially with much the same approach as you would use for C. As a first approximation, you can view the relatively few "magic" words known to the Forth compiler as the "language," and treat the rest as a runtime library divided into functional categories such as stack operators, arithmetic/logical operators, memory access operators, and so on. As with any language, you will use 10% of the library 90% of the time, so you can just learn the most common functions first, and look up in the rest in the manual when you need them.

Once you've got a basic grasp of what's available in the Forth language, it's better to think about the system using a "layered" model, because this is the way the system is actually built up. The lowest layer consists of the primitives, which are written in the assembly language of the host CPU and implement the Forth virtual machine. The words coded as primitives tend to be the most frequently used arithmetic/logical, comparison, stack, and memory operators along with a few text-parsing building blocks for the interpreter/compiler. Each additional layer contains increasingly complex functions built out of the words defined in the layers below: console I/O, mass storage, formatting and string management, the interpreter, the compiler, and at the top, utilities such as the editor and assembler.

Two aspects of Forth that are frequently criticized are its cryptic names and its postfix (or reverse Polish) syntax. To address the former: when the fundamental Forth names were assigned, 10 character-per-second Teletypes were common and so were minicomputer systems with 8 or 16 KB of RAM. Consequently, short names were highly desirable to conserve memory and keystrokes, and we ended up with symbols such as @ for a memory fetch, ! for a memory store, and so on. I won't make any attempt to defend the historical Forth namings; each language has its conventions, and Forth's are no stranger than some found in C, LISP, or APL. Readable, maintainable programs can be written in any language, as can write-only, unmaintainable programs. The secret of obtaining the former rather than the latter is good design, discipline, and documentation, not use of a particular language. At least in Forth, since it is extensible, you can rename any language element to anything you like!

The complaints about about Forth's postfix syntax are more to the point, and deserve a more cogent response. In postfix systems, the arguments precede the operator; for example, to add 1 and 2 in Forth you would write: