Designed by Mike Taylor, beginning Tuesday 7th September 1999
Copyright © Mike Taylor, 1999.
This document may be freely redistributed provided it is not modified in any way. Specifically, the authorship must remain clear. All feedback is extremely welcome, and should be emailed to the author at mike@tecc.co.uk.
This document describes version 1.0 of EAS.
Document SCCSID("@(#)/export/home/staff/mike/src/language/eta/doc/SCCS/s.eas.html 1.6")
Although ETA is an elegant language, one or two of its features make it slightly difficult to write in: for example, the base-seven representation of numbers, using letters instead of digits; and the necessity to embed absolute line-number in the code as targets for the Transfer-execution instruction.
To ameliorate this problem, an ETA assembler, EAS, is provided: this reads a program in a simple syntax which includes line labels, decimal constants, character-value constants and file inclusion, and writes an equivalent pure-ETA program.
This document specifies the EAS syntax, gives some examples, and describes how to invoke the assembler.
Some familiarity with ETA is assumed.
The relationship between the EAS source code and the ETA code is deliberately very close, since EAS has the same relationship to ETA that (say) a 6502 assembler has to 6502 machine code. For example, each line of EAS (except blank lines and those which are blank after comment-stripping) yields a single, corresponding line of ETA.
It would be perfectly possible to write a compiler for a high-level language that produces ETA as its object code, but EAS isn't it. (Or here's a great idea: someone with a lot of time to kill could re-target GCC for the ETA virtual machine!)
Comments are introduced by a hash character (#) and continue to the end of the line.
The ETA instruction E, T, A, O, I, N, S and H may be represented by themselves, or by the keywords dividE, Transfer, Address, Output, Input, Number and Subtract respectively. The single-letter and whole-word forms may be freely intermixed.
All instructions are recognised case-insensitively, so (for example), divide, DIVIDE, Divide, dividE and DiViDe are all equivalent.
Many instructions may occur on a single line, but they must be separated by whitespace (i.e., a sequence of one or more SPACE and/or TAB characters.)
The N, or Number, instruction is unique in that it alone takes an argument from the EAS program text. The argument may follow immediately after the letter N if the single-letter form is used, but must be separated from the whole-word form by whitespace.
Numeric constants may be expressed in three ways:
Labels are provided as an abstraction of addresses, so that portions of code can be relocated without needing to re-count lines.
A line may be labelled by prefixing it with the sequence >NAME: (i.e. greater-than, label name, colon.) There is no limit to the length of labels; they may be contain any non-white characters other than the colon.
A label may not be defined more than once.
A line may carry multiple labels (e.g. >FOO: >BAR: code). This is useful primarily when defining functions: the first line of a function always needs a public name, used as the entry point; and sometimes also needs a private name, used only within the text of the function.
Labels may be used as the argument to the N instruction as in N<NAME (i.e. less-than, label name)
Labels may be used both before and after definition.
A line starting with a star (*) requests the inclusion at that point of the file whose name immediately follows the star. No intervening whitespace should be used. Compilation proceeds as though the starred line were replaced by the contents of the named file.
Included files may include other files, and so on ad infinitum.
Labels defined in an included file may be used in the including file, and vice versa. The same label may not be defined in more than one file contributing to a single compilation.
The paths of included files are always interpreted relative to the working directory of the EAS process rather than (for example) relative to a well-known library directory, or relative the location of the file containing the inclusion request. This may be a bug. Ask me again next week.
The ETA Assembler is invoked as follows:
eas [ -d ] [ -O ] [ eas-file [ eas-file2 ... ] ]
It assembles the eas-files named on the command-line, concatenating them into a single program if there are more than one, and reading the program from standard input if no files are named. It writes the assembled ETA program to standard output; so it can be used as a filter.
The options have the following effects:
The debugging output contains details such as the line numbers assigned to each label; it is unlikely to be of interest to anyone except the maintainer of the assembler.
Since most ETA programs need access to the same basic facilities such as multiplication, numeric input, etc., there is an obvious need for a standard library of EAS files which can provide these facilities. The result is a set of EAS source files, distributed along with the assembler itself, which have well-known filenames (for inclusion), and define well-known function names (labels) for calling. This section describes the standard EAS library.
It would be nice also to provide these routines in the form of compiled ETA code, ideally padded to make good prose or poetry. Unfortunately, compiled ETA code will in general be different each time it's used, since it will be Transferring to addresses in itself and in other standard routines which may be at different locations. We could ameliorate (but not solve) this problem by getting EAS to generate position-independent code by calculating Transfer addresses relative to the current address. In the mean time, the standard library is provided in EAS form only.
Functions defined in the standard library must be named exactly the same as the files that contain them except that they must be all in upper case (and of course omit the .eas extension.) For example, a standard library file called multiply.eas must define function labelled MULTIPLY.
All the labels used within a function labelled FOO, say, must begin with FOO, followed by a lower-case letter or underscore, followed by any sequence of upper- and lower-case letters and underscores - there is no length limit. For example, the MULTIPLY function might internally use the labels MULTIPLYloop and MULTIPLY_done.
These conventions mean that a standard library file called foo.eas, say, has complete control over the namespace of labels beginning with FOO and not followed by another capital letter; so no clash could arise from a pair of standard library files called foo.eas and food.eas, for example.
No, because in the case where a top-level program includes two or more complex routines, each of which includes the same lower-level one, two or more copies of the low-level routine would be included. Then before you know it, you're committing Microsoft-level stupidities like having seventy-odd copies of the getchar() code in MS-Word1.
It's better, though admittedly clumsy, to require the top-level program to include all the prerequisite functions of the functions it uses. At least the assembler helps with this, by complaining when a used label is not defined.
# multiply.eas -- multiply two numbers # readnum.eas -- read a decimal number # writenum.eas -- write a decimal integer
The next line should be a comment containing version-control information, if appropriate. Examples:
# SCCSID("@(#)/home/mike/eta/easpit/SCCS/s.writestr.eas 1.1") # SCCSID("@(#)/home/mike/eta/easpit/SCCS/s.writenum.eas 1.2")
If the file requires any other files to be included in order to provide lower-level routines which it uses, these should be listed, separated by commas, on a subsequent comment line beginning # Requires:. Examples:
# Requires: WRITESTR # Requires: READNUM, MULTIPLY
(These requirements may seem unnecessarily draconian, but they do facilitate automatic processing of the library files to produce tables like the one below.)
Here, then, are the routines currently provided in the standard library:
Label | Requires | Description |
---|---|---|
MULTIPLY | nothing | Multiplies together the top two numbers on the stack, leaving the product on top of the stack in their place. The multiplication takes time proportional to the argument on top of the stack (i.e. the second argument to be pushed), so (for example) N50 N5 N1 N<MULTIPLY T is about ten times faster than N5 N50 N1 N<MULTIPLY T. |
WRITESTR | nothing |
Writes to the standard output stream the characters that have
been pushed onto the stack, up to but not including a
terminating NUL character (zero), and consuming all the
characters including the NUL. No implicit newline character
is written: this must be done explicitly (N10 O) if
required.
Note that the characters must be stacked in reverse order from how they are to be displayed, as in N0 N'o N'l N'l N'e N'H N1 N<WRITESTR T. |
WRITENUM | WRITESTR | Writes to the standard output stream the decimal representation of the number on top of the stack, consuming it in the process. No implicit newline character is written: this must be done explicitly (N10 O) if required. |
READNUM | MULTIPLY | Reads a decimal integer from the standard input stream and leaves it on the stack. An optional leading sequence of space characters is consumed, followed by a sequence of digits and a terminating space or newline character. |
(Source code for some of these routines is provided below in the Tutorial Examples section.)
It's worth using the -d 4 option of the Reference ETA interpreter to get a debugging trace: before each instruction is run, this shows the current line-number, the contents of the stack, and the instruction itself.
It turns out to be very useful to formalise an invariant that is always true at the top of a loop. I always used to look down on this sort of computer-science-for-the-sake-of-it behaviour, with Bound Functions and Weakest Preconditions and all that, but I can see where it comes from now I'm programming a really primitive machine!
Another useful technique is to comment each line of your EAS program with a picture of how the stack's expected to look after it executes. This gives you something to compare the debugging output with.
Possible enhancements for the future include:
This section presents a sequence of twelve increasingly complex EAS code fragments - some entire programs, some functions - which together constitute a small but complete tutorial in the use of EAS to write Real World ETA programs. The later examples build on the earlier, to yield programs of some complexity.
This is a re-implementation of the Unix utility /bin/true, the purpose of which is to return a ``success'' exit-status to the operating system. No instructions at all are needed, since falling off the end of an ETA program is equivalent to an explicit Transfer to address zero, causing the interpreter to return ``success'' to the operating system. So here is the program in its entirety:
This program prints everyone's favourite message. The interleaving of
the Number and Output instructions is obviously pretty
arbitrary. I've chosen to avoid the fenceposts: the selected
interleaving is neither
n × (Number; Output)
nor
n × Number; n × Output.
An alternative approach would be push a marker character on to the
stack - NUL would be an obvious choice - followed by all the
characters of the message, then call a well-known function to output
the stacked characters: see below for this program.
This program demonstrates the use of the Transfer instruction
for conditionals (line 4), discarding unwanted values from the stack
(line 5), termination (line 6) and looping (line 7).
(Not all the features of the original CP/M pip are
implemented!)
Although this code performs perfectly good addition, its main purpose
is to demonstrate function definition. (After all, addition is pretty
easy in EAS: just subtract the negation of the number you want to add.
Make the negation by subtracting the number from zero.)
The prologue N2 H N2 H can be considered as a standard
idiom for entry to a two-argument function; equivalent sequences for
functions of zero, one, three or more arguments are obvious.
Similarly, the epilogue N1 N2 H T can be considered as a
standard idiom for exit from a function of a single return-value.
Here, we demonstrate how to call the addition function defined in the
previous example. Note that the arguments are pushed on the stack
before the call address, so the calling sequence itself
(N1 N<ADD T) can be considered as atomic.
(This program is not actually particularly useful as it stands, since
the numbers to be added are read as the ASCII values of consecutive
input characters, and the sum is written as the character with the
appropriate ASCII value.)
This is an unusual function, in that it is variadic. This means
that we can't use a standard prologue to roll the return address
down under the arguments. Instead we maintain the invariant that the
top of the stack is the return address, and the characters remaining
to be written are immediately below it.
This invariant is true on entry, so there is no prologue as such.
The epilogue is still called an epilogue, even though it's in the
middle of the code.
This re-implementation of the ``Hello, World!'' program uses the
WRITESTR function defined in the previous example. It merely
stacks up the characters of its message, then calls WRITESTR to write
them out.
In addition to being a very useful routine for real programs, this
demonstrates some fairly sophisticated stack management (with
extensive use of the Halibut instruction), and consequently,
the technique of using a semi-formal loop invariant and stack
pictures as an aid to development and debugging. This routine also
introduces the dividE instruction.
Here we multiply two numbers, x and y by starting with a
zero-valued accumulator and adding x to it y times.
This routine, or an equivalent, is all but indispensable when writing
real programs of any substance.
Note the use of the idiom
to skip a single line if the number on top of the stack is true. This
is a classic case of a non-trivial sequence of operations being
treated as an indivisible unit.
(This routine is an execution bottleneck for many EAS programs, since
it runs in O(arg2) time. A faster multiplication routine would
be a great boon: but is one even theoretically possible?)
This is written as a function of no arguments, returning a single
value.
The numeric input convention that we use is that any leading spaces
are skipped, then a sequence of non-spaces (assumed but not checked
to be decimal digits) is accumulated numerically, then a terminating
space is consumed. Pragmatic considerations dictate that a trailing
newline or EOF must be treated as an alternative terminator.
This is not perfect: we'd like to treat all whitespace equivalently
(but that would be dull); we'd like to check that the digits are
between 0 and 9 (but we have no inequality check); we'd like not to
consume the trailing space (but we don't have a PEEK or
UNGET) facility. Nevertheless, this is an invaluable
routine.
Note that the MULTIPLY routine, defined in a previous
example, is used here: so any program that includes this file must
also include multiply.eas
We make extensive use of function-calls here: first, to obtain the
number whose factorial we wish to calculate; second, to perform that
calculation - recursively as it happens; and third, to write the
result in a human-readable form.
This program, again making use of pre-defined routines, prints the
words to the classic song 99 Bottles of Beer on the Wall:
Note that the function BoBotW, which prints the message
number bottles of beer on the wall, works by calling
BoB to print the first part of the message, and appending the
latter part.
There is a wrapper, EASy, that assembles an EAS program and executes
the resulting ETA code. As a side-effect, it leaves the ETA program
in a file in the same directory as the EAS program, and whose name is
formed from its name by removing the .eas extension (if any)
and adding a .eta extension.
This is a trivial hack to find words from the system dictionary that
can be incorporated into an ETA program containing a known sequence
of significant characters. The command-line arguments are sequences
of consecutive instructions to be incorporated into single words.
For example:
This is a truly trivial hack to find the significant characters in
candidate words for an ETA program. This will tell you (for example)
what your name does, so that you can interpolate it in an ETA program.
For example:
There are plenty of other ETA-programming utilities yet to be
written.
An obvious one would be the program that takes an unpadded ETA program
(such as the output of EAS) and writes an equivalent program
consisting entirely of words found in the system dictionary. A simple
version of this program could be written just by making repeated calls
to etaword; although a cleverer version would know something
about grammar, and perhaps even about style and taste.
There are other possibilities, including (for example): a program to
generate no-op sequences of ETA instructions for harmless inclusion in
programs; a program to re-locate ETA code to start at a different
line; and of course, a disassembler.
Creators of new ETA-programming utilities are encouraged to email them
to the author (address above) for attributed inclusion in future
releases of the ETA distribution.
Hello, World!
Copying Input to Output
Function Definition
Function Call
Writing a String (Variadic Function)
A Better Implementation of Hello, World!
Writing a Number (yes, I know it's pathetic)
Multiplication
A N0 N1 S S T
Reading a Number
Factorial (Recursive)
99 Bottles of Beer on the Wall
Appendix: Additional ETA-Programming Tools
EASy
ETAword
etaword nen → drunken, gunmen
etaword tes → cutesy, Rutgers, uterus
etaword nt → blunt, burnt, grunt
ETAinst
etainst Mike Taylor → ie, Tao
etainst Programming → oain
etainst Language → anae
[Your Utility Here]
Note 1
Disclaimer: in no way does the author think that Microsoft products
are of a uniformly low quality in terms both of design and
implementation. Nothing could be further from the truth that to
assume that he spends large portions of the typical working day in a
raging fury at arbitrary stupidities in Microsoft applications, nor
that he routinely experiences several crashes per day. In fact, the
author is extremely fond of the Microsoft corporation, and especially
likes its lawyers.
[Back]