This article appears as it did in the November-December 1997 issue of Forth Dimensions. I have attempted to make the HTML look as much like the printed version as possible. This article is copyrighted, 1997.

Misty Beach Forth:

An Implementation in Java

This article provides an overview of Misty Beach Forth, a Forth implementation running under Java. While, at first glance, the two technologies would seem an easy fit, the Java Virtual Machine (JVM) has peculiarities that make the fit awkward. Further, in the spirit of Java's well-defined semantics, I added some design constraints of my own that made implementing Misty Beach Forth harder than it needed to be.

What, another Forth implementation
I have known of Forth for many years, but did little with it. Why bother? Wasn’t Forth just an HP calculator on steroids? Then in early 1996, I read History of Programming Languages - II, by Bergin and Gibson. The chapter on Forth hooked me and I decided it was time to learn Forth. I decided the best way to truly understand the language was to build an implementation myself, so I started coding one in C++.

I was about 10 hours into this project when I realized two things: (1) the world didn’t really need another low end x86 Forth, and (2) I had been looking for a project to do in Java for a while and maybe what the world really needed was a Java implementation of Forth.

I scrapped the C++ implementation and began a Java implementation. The first version was a hack designed to unearth all the problems I would have. I finished this in late 1996. After finishing, I went back and did a design for the implementation I am working on now. I expect this design to hold up.

Design Goals
It is misleading to speak of design goals as if I sat down and decided up front what I wanted out of Misty Beach Forth. I didn’t. I started with a general idea of what I wanted, but no grand plan. Today, I am implementing Misty Beach Forth to meet the following requirements:

  1. Run as a Java applet using neither processor- nor OS-specific code.
  2. Comply with the ANS Forth standard
  3. Run multi-threaded Forth programs
  4. Implement better defined semantics than typical Forths
  5. Provide a pleasant development environment
  6. Run at speeds comparable to native Forths
Not surprisingly, running at speeds comparable to native Forths turns out to be the hardest to accomplish.

Run as a Java applet using neither processor- nor OS-specific code
Part of what allows untrusted Java applets to run securely on a client machine is a multi-level security mechanism. One of the tiers of this mechanism performs a simple proof on each method in each Java class. Among other things, this proof checks that all the operations are type-safe. I want a Forth that runs with, or on top of, the JVM and that passes all the security checks so that I can run Forth programs as applets. This requirement contributes to the difficulty in implementing Forth directly on the JVM. Forth allows words to have variable stack effects; the Java security model does not.

Comply with the ANS Forth standard
This is self explanatory.

Run multi-threaded Forth programs One of my interests is parallel programming, and I want a Forth that allows multi-threading.

Implement better defined semantics than typical Forths
Most current Forth implementations provide no protection against indexing off the ends of arrays and generally stomping all over memory. In a protected operating system like Unix or Windows NT, the damage is contained to the Forth program and other programs continue to execute. I want to contain the damage even further because I dislike memory corruption bugs manifesting themselves long after the offending corruption.

Provide a pleasant development environment
My programming background is primarily in Pascal, C, C++, and Java. All of these languages have inexpensive implementations with IDEs, source code debuggers, profiles, etc. I want Misty Beach Forth to provide an equally pleasant and powerful development environment.

Run at speeds comparable to native Forths
Not surprisingly, this is the hardest goal to achieve and I am unsure whether it is actually achievable. The restrictions provided by the JVM, combined with the fact that Java applets are not coded to the native instruction set of the processor they run on are two hurdles to overcome. The largest barrier, however, is my desire to implement a Forth with more defined semantics than typical Forths. I may have to relax the precise semantic requirment to get even close to the speed I desire.

Restrictions of the JVM Forth is stack based. Java is stack based. At first glance, implementing Forth on the JVM appears to be no more difficult than implementing Forth on other CPUs. However, unlike most other CPUs, the JVM is designed only to run Java programs and little consideration is paid to other programming languages. In this sense, the JVM is a special purpose CPU, much like Forth and Lisp engines. The clash between programming models comes in several flavors:

Java and the JVM are strongly typed, Forth is not
Most programming languages provide typing either for variables or values. Some provide typing for both. Forth provides typing for neither, relying instead on the programmer to know how to interpret blocks of bits. Table One illustrates the differences:

Table One
Language Typing Example Explanation
C variables int x = 5; Declare a block of memory large enough to hold an integer, and treat the bits as if they are an integer unless told otherwise, e.g., (float)x. Other values assigned to this variable will be converted to an integer before being assigned, e.g., x = (int)9.45;.
Smalltalk values x := 1. Declare a variable, x, and make it refer to a new object of type Number. If you tell the runtime to treat the Number object as if it were a Set, you get an error. You can make x refer to something else, and this something else can be a Set, e.g., x = Set new..
Java variables
and values
Integer x = new Integer (1); Declare a variable, x, and make it reference a new object of type Integer. x can only reference objects of type Integer or objects of child classes of type Integer. Trying to make x reference a Set generates an error. You cannot tell the runtime to treat Integer as if it were a Set without getting an error.
Forth neither VARIABLE X
1 X !
Declare a new variable, x. Then store an integer 1 value in it. It is just as legal to store float values or pointers or characters in x. Further, the integer stored in x can be treated as a pointer, a character or a floating-point number. The Forth runtime doesn't care.

This typing for both variables and values contributes to the security provided by Java applets. The Forth approach of treating memory as a sea of bits that the programmer can interpret as desired maps poorly onto the Java model. Adding to the mismatch, the JVM implements the strong typing called for in the Java language. Writing the sort of untyped programs allowed by Forth is legal, but means that the resulting program and implementation will not pass the Java applet verifier and will not run as a Java applet.

Java and the JVM have no pointer math
Forth, like C, is built on pointer math. Simple arrays in Forth are pointers that are indexed using pointer arithmetic. An example of Forth code creating an array of two elements and storing two values in it:

    CREATE TESTARRAY 2 CELLS ALLOT
    220 TESTARRAY !
    340 TESTARRAY CELL+ !

This has a simple equivalent in C:

    int *p = (int *) malloc(2 * sizeof(int));
    *p = 220;
    *(p + 1) = 340;

But the Java equivelent eliminates the pointer math:

    int[] p = new int[2];
    p[0] = 220;
    p[1] = 340;

This difference is not just syntactic sugar! Arrays in Java, unlike those in C, are objects with a well defined API. In C, this sort of rewrite improves readability, but does not change the underlying operations. In Java, the array orientation is necessary because p is not a pointer to memory, but instead is a reference to an array object. This can be worked around, with tremendous effort, but the resulting Java byte codes once again will not pass the Java applet verifier and will not even run on all JVMs.

The JVM stack can be discontinuous
While Java and the JVM are stack based, in the sense that operations take place on a stack of operands, the JVM breaks the stack up into stack frames, one frame per function call. These stack frames are all independent. To quote from The Java Virtual Machine Specification:

Because the stack is never manipulated directly except to push and pop frames, it may actually be implemented as a heap, and Java frames may be heap allocated. The memory for a Java stack does not need to be contiguous.
This causes no trouble for programs written in the Java programming language, since the Java language has no way to express the notion of accessing variables in a caller’s stack frame. It does, however, mean that a subroutine-threaded Forth implementation using the JVM stack is effectively impossible, and any Forth implementation will be hard pressed to use the JVM stack as the data stack.

The Java applet security mechanism does not allow variable stack effects
To pass the applet-verification process, functions and blocks can not have variable stack effects. The depth of the stack upon leaving a function or block must be a fixed value relative to its depth upon entering the function or block. This restriction provides yet another hindrance to using the JVM stack as the data stack for Misty Beach Forth.

Implementation details
Since it appeared that using the JVM stack as the data stack was not possible, I decided to build a Forth implementation roughly comparable to native-mode Forths written in C. The data and return stacks are simple arrays. I don’t worry about optimal register allocation.

Some implementation details were easy to decide. Java defines the underlying virtual machine as a 32-bit, two’s complement, big endian machine. This is a legal ANS configuration, so that is what Misty Beach Forth uses. The next big decision was deciding what Cell would look like.

Several observations drove the implementation of Cell:

  1. Most operations are on integers.
  2. Pointer math is usually used for array references.
  3. Referencing off the end of an array leads to undefined behavior, so I can do what I want when this happens.
The first observation led to the decision to make integers the fundamental type. This leads to slower access times for character types, but treating characters as the fundamental type leads to even slower access times for integers. Since integer operations seem more common, integer operations are given priority. Interestingly, this leads to an architecture that is very similar to a word-oriented machine.

The second and third observations led to the creation of a Cell that contained an integer an array (possibly null). Misty Beach Forth cells look like this:

    public class Cell
    {
        int    intValue;
        Cell[] arrayValue;
    }
The implementation of +, for example, adds the intValue element of two cells together. The arrayValue element is needed to deal with address operations. To see how, we’ll walk through the array example from above:
    CREATE TESTARRAY 2 CELLS ALLOT
Create a new dictionary entry. The dictionary entry is an array of two Cells.
    220 TESTARRAY !
Place 220 on the stack:
intValue arrayValue
220
null
-
-

Place TESTARRAY on the stack:
intValue arrayValue
0
reference to
TESTARRAY Cell[]
220
null

Set ‘the value of TESTARRAY’ to 220. This executes Java code that looks like this:

    TESTARRAY.arrayValue[TESTARRAY.intValue / 4].intValue     = 220;
    TESTARRAY.arrayValue[TESTARRAY.intValue / 4].arrayValue = null;
    Top Of Stack = Top Of Stack - 2;
We divide by 4 in the array index, because cells are four bytes wide.

    340 TESTARRAY CELL+ !
Place 340 on the stack:
intValue arrayValue
340
null
-
-

Place TESTARRAY on the stack:
intValue arrayValue
0
reference to
TESTARRAY Cell[]
340
null

Add the size of a Cell to the value on the top of the stack:
intValue arrayValue
4
reference to
TESTARRAY Cell[]
340
null

Set ‘the value of TESTARRAY + 4’ to 340. This executes Java code that looks like this:

    TESTARRAY.arrayValue[TESTARRAY.intValue / 4].intValue     = 340;
    TESTARRAY.arrayValue[TESTARRAY.intValue / 4].arrayValue = null;
    Top Of Stack = Top Of Stack - 2;
This approach works well for reasonable Forth programs. Adding two addresses, incrementing that value and then subtracting one of the original addresses produces a valid address in most native Forth implementations. In Misty Beach Forth, it may or may not produce a valid address. Hopefully, code of this nature will be rare.

The Misty Beach Forth architecture has the notion of a Forth Engine. This engine contains the data and return stacks, is responsible for tokenizing input strings, maintains the current state (interpreting, compiling) and contains the dictionary. The engine is almost completely ignorant of the variously defined words. With some work, I expect to make the engine ignorant of all the defined words and thus completely decoupled from them.

The object-oriented nature of Java provides a framework for implementing the Forth words. Forth words can be thought of as objects with both data and code. Misty Beach Forth implements each Forth word as an object of a Java class descending from a common base class, ForthWord. Each word contains methods to:

Performance
Designing good benchmarks is difficult. I have little experience in benchmarking, and do not wish to take the time required to become good at it. Nevertheless, without some performance numbers it is impossible to discuss relative speeds. I thus present the following toy benchmark results comparing Misty Beach Forth against two shareware Forths and a commercial Forth. There are numerous problems with them, so great care should be used in drawing conclusions from them.

Table Two. Speed of stack operations

    : INNER 10000 0 DO 34 DROP LOOP ;
    : OUTER 10000 0 DO INNER LOOP ;
    OUTER
Forth Implementation Time in seconds
Jax4th 1.25144
Misty Beach Forth V0.30 on Netscape 3.x92
Misty Beach Forth V0.30 on Internet Explorer 3.x67
Misty Beach Forth V0.30 with Symantec JIT 2.0.5459
Win32Forth 3.2 build 081959
Misty Beach Forth V0.30 on Netscape 4.x54
LMI Forth (no optimizations)27
LMI Forth (optimized with NCC and COMPILE:)7

Table Three. Speed of variable operations

    VARIABLE TEMP
    : INNER 10000 0 DO 34 TEMP ! LOOP ;
    : OUTER 10000 0 DO INNER LOOP ;
    OUTER
Forth Implementation Time in seconds
Misty Beach Forth V0.30 on Netscape 3.x203
Jax4th 1.25186
Misty Beach Forth V0.30 with Symantec JIT 2.0.54144
Misty Beach Forth V0.30 on Internet Explorer 3.x133
Misty Beach Forth V0.30 on Netscape 4.x116
Win32Forth 3.2 build 081968
LMI Forth (no optimizations)35
LMI Forth (optimized with NCC and COMPILE:)3

All tests ran on a 90 MHz Pentium with 64 MB RAM running Windows NT 4.0 with no patches applied. No other applications were running.

I selected the native code Forths, based on the belief that Jax4th was a simple shareware Forth (a few hundred man hours of effort), Win32Forth was a serious shareware Forth (thousands of man hours of effort) and LMI was a serious commercial Forth implementation. What I did not realize was that, while Win32Forth is a serious shareware Forth, it emphasizes maximum functionality instead of speed. Serious shareware Forths emphasizing speed should perform much better.

The variable operation time for Misty Beach Forth (Table Three) is much worse than the stack operation time (Table 2) . This is primarily caused by the extra indirection and memory accesses introduced to provide the better semantics I want. Preliminary tests indicate that the stack operation time can be brought down to about 15 seconds (half as fast as LMI) and the variable operation time can be brought down to about 18 seconds if I relax the safety requirment and implement a few other speed optimizations. I expect to investigate this over the next few months. An environment allowing safer runtime during test and debug with the option to convert over to a faster, but more dangerous, runtime environment later might be the best mix of speed and safety.

Conclusion
I am pleasantly surprised at how fast Misty Beach Forth runs without serious performance tuning, and expect that, with sufficient work, it can eventually run within a factor of two of serious native-mode Forths. As Java Just-In-Time compiler technology improves, the performance gap may be even lower. One of the advantages that Misty Beach Forth has is that companies like Borland, Microsoft, Sun and Symantec are pouring millions of dollars into improving their Java Just-In-Time compilers. As these improve, Misty Beach Forth’s speed improves as well. Misty Beach Forth’s speed has already seen vast improvement going from Netscape 3.x to Netscape 4.x, and future improvements seem certain.

Misty Beach Forth can be found at: http://www.mistybeach.com

References
The Java Virtual Machine, Tim Lindholm and Frank Yellin. Addison Wesley, 1997, ISBN 0-201-63452-X

Smalltalk/V Tutorial and Programming Handbook, Digitalk, Inc. 1987 (user manual, no author or ISBN number)

History of Programming Languages - II, Thomas J. Bergin, Jr. And Richard G. Gibson, Jr. Editors, Addison Wesley, 1996, ISBN 0-201-89502-1