EAS - The ETA Assembler

Designed by Mike Taylor, beginning Tuesday 7th September 1999
Copyright © Mike Taylor, 1999.

This document may be freely redistributed provided it is not modified in any way. Specifically, the authorship must remain clear. All feedback is extremely welcome, and should be emailed to the author at mike@tecc.co.uk.

This document describes version 1.0 of EAS.
Document SCCSID("@(#)/export/home/staff/mike/src/language/eta/doc/SCCS/s.eas.html 1.6")

Table of Contents

Introduction

Although ETA is an elegant language, one or two of its features make it slightly difficult to write in: for example, the base-seven representation of numbers, using letters instead of digits; and the necessity to embed absolute line-number in the code as targets for the Transfer-execution instruction.

To ameliorate this problem, an ETA assembler, EAS, is provided: this reads a program in a simple syntax which includes line labels, decimal constants, character-value constants and file inclusion, and writes an equivalent pure-ETA program.

This document specifies the EAS syntax, gives some examples, and describes how to invoke the assembler.

Some familiarity with ETA is assumed.

Syntax

Overview

The relationship between the EAS source code and the ETA code is deliberately very close, since EAS has the same relationship to ETA that (say) a 6502 assembler has to 6502 machine code. For example, each line of EAS (except blank lines and those which are blank after comment-stripping) yields a single, corresponding line of ETA.

It would be perfectly possible to write a compiler for a high-level language that produces ETA as its object code, but EAS isn't it. (Or here's a great idea: someone with a lot of time to kill could re-target GCC for the ETA virtual machine!)

Comments

Comments are introduced by a hash character (#) and continue to the end of the line.

Instructions

The ETA instruction E, T, A, O, I, N, S and H may be represented by themselves, or by the keywords dividE, Transfer, Address, Output, Input, Number and Subtract respectively. The single-letter and whole-word forms may be freely intermixed.

All instructions are recognised case-insensitively, so (for example), divide, DIVIDE, Divide, dividE and DiViDe are all equivalent.

Many instructions may occur on a single line, but they must be separated by whitespace (i.e., a sequence of one or more SPACE and/or TAB characters.)

Numeric Constants

The N, or Number, instruction is unique in that it alone takes an argument from the EAS program text. The argument may follow immediately after the letter N if the single-letter form is used, but must be separated from the whole-word form by whitespace.

Numeric constants may be expressed in three ways:

Labels

Labels are provided as an abstraction of addresses, so that portions of code can be relocated without needing to re-count lines.

A line may be labelled by prefixing it with the sequence >NAME: (i.e. greater-than, label name, colon.) There is no limit to the length of labels; they may be contain any non-white characters other than the colon.

A label may not be defined more than once.

A line may carry multiple labels (e.g. >FOO: >BAR: code). This is useful primarily when defining functions: the first line of a function always needs a public name, used as the entry point; and sometimes also needs a private name, used only within the text of the function.

Labels may be used as the argument to the N instruction as in N<NAME (i.e. less-than, label name)

Labels may be used both before and after definition.

File Inclusion

A line starting with a star (*) requests the inclusion at that point of the file whose name immediately follows the star. No intervening whitespace should be used. Compilation proceeds as though the starred line were replaced by the contents of the named file.

Included files may include other files, and so on ad infinitum.

Labels defined in an included file may be used in the including file, and vice versa. The same label may not be defined in more than one file contributing to a single compilation.

The paths of included files are always interpreted relative to the working directory of the EAS process rather than (for example) relative to a well-known library directory, or relative the location of the file containing the inclusion request. This may be a bug. Ask me again next week.

Invocation

The ETA Assembler is invoked as follows:

eas [ -d ] [ -O ] [ eas-file [ eas-file2 ... ] ]

It assembles the eas-files named on the command-line, concatenating them into a single program if there are more than one, and reading the program from standard input if no files are named. It writes the assembled ETA program to standard output; so it can be used as a filter.

The options have the following effects:

-d
Print debugging output to standard error.
-O
Emit OIL code (q.v.) instead of ETA code.

The debugging output contains details such as the line numbers assigned to each label; it is unlikely to be of interest to anyone except the maintainer of the assembler.

The Standard Library

Overview

Since most ETA programs need access to the same basic facilities such as multiplication, numeric input, etc., there is an obvious need for a standard library of EAS files which can provide these facilities. The result is a set of EAS source files, distributed along with the assembler itself, which have well-known filenames (for inclusion), and define well-known function names (labels) for calling. This section describes the standard EAS library.

It would be nice also to provide these routines in the form of compiled ETA code, ideally padded to make good prose or poetry. Unfortunately, compiled ETA code will in general be different each time it's used, since it will be Transferring to addresses in itself and in other standard routines which may be at different locations. We could ameliorate (but not solve) this problem by getting EAS to generate position-independent code by calculating Transfer addresses relative to the current address. In the mean time, the standard library is provided in EAS form only.

Conventions

File and Label Names

Standard library files must be named all in lower case, with names no more than eight letters long, and with a .eas extension. (This restriction is so that the filenames are unambiguous for use on an MS-DOS or ISO 9660 filesystem.) For example, multiply.eas, writenum.eas and sum.eas would all be legitimate filenames for standard functions: Multiply.eas, writenumber.eas and sum_num.eas are not.

Functions defined in the standard library must be named exactly the same as the files that contain them except that they must be all in upper case (and of course omit the .eas extension.) For example, a standard library file called multiply.eas must define function labelled MULTIPLY.

All the labels used within a function labelled FOO, say, must begin with FOO, followed by a lower-case letter or underscore, followed by any sequence of upper- and lower-case letters and underscores - there is no length limit. For example, the MULTIPLY function might internally use the labels MULTIPLYloop and MULTIPLY_done.

These conventions mean that a standard library file called foo.eas, say, has complete control over the namespace of labels beginning with FOO and not followed by another capital letter; so no clash could arise from a pair of standard library files called foo.eas and food.eas, for example.

Nested Inclusion

When one library routine uses another, should it include it the file that defines it? For example, the standard WRITENUM routine uses WRITESTR; does that mean that writenum.eas should include readnum.eas and save the top-level programmer the bother?

No, because in the case where a top-level program includes two or more complex routines, each of which includes the same lower-level one, two or more copies of the low-level routine would be included. Then before you know it, you're committing Microsoft-level stupidities like having seventy-odd copies of the getchar() code in MS-Word1.

It's better, though admittedly clumsy, to require the top-level program to include all the prerequisite functions of the functions it uses. At least the assembler helps with this, by complaining when a used label is not defined.

Comments

Each standard library file must begin with a comment line stating the name of the file, followed a space, two hyphens, a space, and a one-line description of the file. Examples:

# multiply.eas -- multiply two numbers
# readnum.eas -- read a decimal number
# writenum.eas -- write a decimal integer

The next line should be a comment containing version-control information, if appropriate. Examples:

# SCCSID("@(#)/home/mike/eta/easpit/SCCS/s.writestr.eas	1.1")
# SCCSID("@(#)/home/mike/eta/easpit/SCCS/s.writenum.eas	1.2")

If the file requires any other files to be included in order to provide lower-level routines which it uses, these should be listed, separated by commas, on a subsequent comment line beginning # Requires:. Examples:

# Requires: WRITESTR
# Requires: READNUM, MULTIPLY

(These requirements may seem unnecessarily draconian, but they do facilitate automatic processing of the library files to produce tables like the one below.)

Routines

Here, then, are the routines currently provided in the standard library:

Label Requires Description
MULTIPLY nothing Multiplies together the top two numbers on the stack, leaving the product on top of the stack in their place. The multiplication takes time proportional to the argument on top of the stack (i.e. the second argument to be pushed), so (for example) N50 N5 N1 N<MULTIPLY T is about ten times faster than N5 N50 N1 N<MULTIPLY T.
WRITESTR nothing Writes to the standard output stream the characters that have been pushed onto the stack, up to but not including a terminating NUL character (zero), and consuming all the characters including the NUL. No implicit newline character is written: this must be done explicitly (N10 O) if required.

Note that the characters must be stacked in reverse order from how they are to be displayed, as in N0 N'o N'l N'l N'e N'H N1 N<WRITESTR T.

WRITENUM WRITESTR Writes to the standard output stream the decimal representation of the number on top of the stack, consuming it in the process. No implicit newline character is written: this must be done explicitly (N10 O) if required.
READNUM MULTIPLY Reads a decimal integer from the standard input stream and leaves it on the stack. An optional leading sequence of space characters is consumed, followed by a sequence of digits and a terminating space or newline character.

(Source code for some of these routines is provided below in the Tutorial Examples section.)

Development/Debugging Hints

It's worth using the -d 4 option of the Reference ETA interpreter to get a debugging trace: before each instruction is run, this shows the current line-number, the contents of the stack, and the instruction itself.

It turns out to be very useful to formalise an invariant that is always true at the top of a loop. I always used to look down on this sort of computer-science-for-the-sake-of-it behaviour, with Bound Functions and Weakest Preconditions and all that, but I can see where it comes from now I'm programming a really primitive machine!

Another useful technique is to comment each line of your EAS program with a picture of how the stack's expected to look after it executes. This gives you something to compare the debugging output with.

Future Directions

Possible enhancements for the future include:

Tutorial Examples

Introduction

This section presents a sequence of twelve increasingly complex EAS code fragments - some entire programs, some functions - which together constitute a small but complete tutorial in the use of EAS to write Real World ETA programs. The later examples build on the earlier, to yield programs of some complexity.

true

This is a re-implementation of the Unix utility /bin/true, the purpose of which is to return a ``success'' exit-status to the operating system. No instructions at all are needed, since falling off the end of an ETA program is equivalent to an explicit Transfer to address zero, causing the interpreter to return ``success'' to the operating system. So here is the program in its entirety:

Hello, World!

This program prints everyone's favourite message. The interleaving of the Number and Output instructions is obviously pretty arbitrary. I've chosen to avoid the fenceposts: the selected interleaving is neither n × (Number; Output) nor n × Number; n × Output.

An alternative approach would be push a marker character on to the stack - NUL would be an obvious choice - followed by all the characters of the message, then call a well-known function to output the stacked characters: see below for this program.

Copying Input to Output

This program demonstrates the use of the Transfer instruction for conditionals (line 4), discarding unwanted values from the stack (line 5), termination (line 6) and looping (line 7).

(Not all the features of the original CP/M pip are implemented!)

Function Definition

Although this code performs perfectly good addition, its main purpose is to demonstrate function definition. (After all, addition is pretty easy in EAS: just subtract the negation of the number you want to add. Make the negation by subtracting the number from zero.)

The prologue N2 H N2 H can be considered as a standard idiom for entry to a two-argument function; equivalent sequences for functions of zero, one, three or more arguments are obvious.

Similarly, the epilogue N1 N2 H T can be considered as a standard idiom for exit from a function of a single return-value.

Function Call

Here, we demonstrate how to call the addition function defined in the previous example. Note that the arguments are pushed on the stack before the call address, so the calling sequence itself (N1 N<ADD T) can be considered as atomic.

(This program is not actually particularly useful as it stands, since the numbers to be added are read as the ASCII values of consecutive input characters, and the sum is written as the character with the appropriate ASCII value.)

Writing a String (Variadic Function)

This is an unusual function, in that it is variadic. This means that we can't use a standard prologue to roll the return address down under the arguments. Instead we maintain the invariant that the top of the stack is the return address, and the characters remaining to be written are immediately below it.

This invariant is true on entry, so there is no prologue as such.

The epilogue is still called an epilogue, even though it's in the middle of the code.

A Better Implementation of Hello, World!

This re-implementation of the ``Hello, World!'' program uses the WRITESTR function defined in the previous example. It merely stacks up the characters of its message, then calls WRITESTR to write them out.

Writing a Number (yes, I know it's pathetic)

In addition to being a very useful routine for real programs, this demonstrates some fairly sophisticated stack management (with extensive use of the Halibut instruction), and consequently, the technique of using a semi-formal loop invariant and stack pictures as an aid to development and debugging. This routine also introduces the dividE instruction.

Multiplication

Here we multiply two numbers, x and y by starting with a zero-valued accumulator and adding x to it y times. This routine, or an equivalent, is all but indispensable when writing real programs of any substance.

Note the use of the idiom

A N0 N1 S S T

to skip a single line if the number on top of the stack is true. This is a classic case of a non-trivial sequence of operations being treated as an indivisible unit.

(This routine is an execution bottleneck for many EAS programs, since it runs in O(arg2) time. A faster multiplication routine would be a great boon: but is one even theoretically possible?)

Reading a Number

This is written as a function of no arguments, returning a single value.

The numeric input convention that we use is that any leading spaces are skipped, then a sequence of non-spaces (assumed but not checked to be decimal digits) is accumulated numerically, then a terminating space is consumed. Pragmatic considerations dictate that a trailing newline or EOF must be treated as an alternative terminator.

This is not perfect: we'd like to treat all whitespace equivalently (but that would be dull); we'd like to check that the digits are between 0 and 9 (but we have no inequality check); we'd like not to consume the trailing space (but we don't have a PEEK or UNGET) facility. Nevertheless, this is an invaluable routine.

Note that the MULTIPLY routine, defined in a previous example, is used here: so any program that includes this file must also include multiply.eas

Factorial (Recursive)

We make extensive use of function-calls here: first, to obtain the number whose factorial we wish to calculate; second, to perform that calculation - recursively as it happens; and third, to write the result in a human-readable form.

99 Bottles of Beer on the Wall

This program, again making use of pre-defined routines, prints the words to the classic song 99 Bottles of Beer on the Wall:

Note that the function BoBotW, which prints the message number bottles of beer on the wall, works by calling BoB to print the first part of the message, and appending the latter part.

Appendix: Additional ETA-Programming Tools

EASy

There is a wrapper, EASy, that assembles an EAS program and executes the resulting ETA code. As a side-effect, it leaves the ETA program in a file in the same directory as the EAS program, and whose name is formed from its name by removing the .eas extension (if any) and adding a .eta extension.

ETAword

This is a trivial hack to find words from the system dictionary that can be incorporated into an ETA program containing a known sequence of significant characters. The command-line arguments are sequences of consecutive instructions to be incorporated into single words.

For example:

etaword nen → drunken, gunmen
etaword tes → cutesy, Rutgers, uterus
etaword nt → blunt, burnt, grunt

ETAinst

This is a truly trivial hack to find the significant characters in candidate words for an ETA program. This will tell you (for example) what your name does, so that you can interpolate it in an ETA program.

For example:

etainst Mike Taylor → ie, Tao
etainst Programming → oain
etainst Language → anae

[Your Utility Here]

There are plenty of other ETA-programming utilities yet to be written.

An obvious one would be the program that takes an unpadded ETA program (such as the output of EAS) and writes an equivalent program consisting entirely of words found in the system dictionary. A simple version of this program could be written just by making repeated calls to etaword; although a cleverer version would know something about grammar, and perhaps even about style and taste.

There are other possibilities, including (for example): a program to generate no-op sequences of ETA instructions for harmless inclusion in programs; a program to re-locate ETA code to start at a different line; and of course, a disassembler.

Creators of new ETA-programming utilities are encouraged to email them to the author (address above) for attributed inclusion in future releases of the ETA distribution.

 

 

 

 


Note 1
Disclaimer: in no way does the author think that Microsoft products are of a uniformly low quality in terms both of design and implementation. Nothing could be further from the truth that to assume that he spends large portions of the typical working day in a raging fury at arbitrary stupidities in Microsoft applications, nor that he routinely experiences several crashes per day. In fact, the author is extremely fond of the Microsoft corporation, and especially likes its lawyers. [Back]