Designed by Mike Taylor, beginning Tuesday 7th September 1999
Copyright © Mike Taylor, 1999.
This document may be freely redistributed provided it is not modified in any way. Specifically, the authorship must remain clear. All feedback is extremely welcome, and should be emailed to the author at mike@tecc.co.uk.
This document describes version 1.0 of EAS.
Document SCCSID("@(#)/export/home/staff/mike/src/language/eta/doc/SCCS/s.eas.html 1.6")
Although ETA is an elegant language, one or two of its features make it slightly difficult to write in: for example, the base-seven representation of numbers, using letters instead of digits; and the necessity to embed absolute line-number in the code as targets for the Transfer-execution instruction.
To ameliorate this problem, an ETA assembler, EAS, is provided: this reads a program in a simple syntax which includes line labels, decimal constants, character-value constants and file inclusion, and writes an equivalent pure-ETA program.
This document specifies the EAS syntax, gives some examples, and describes how to invoke the assembler.
Some familiarity with ETA is assumed.
The relationship between the EAS source code and the ETA code is deliberately very close, since EAS has the same relationship to ETA that (say) a 6502 assembler has to 6502 machine code. For example, each line of EAS (except blank lines and those which are blank after comment-stripping) yields a single, corresponding line of ETA.
It would be perfectly possible to write a compiler for a high-level language that produces ETA as its object code, but EAS isn't it. (Or here's a great idea: someone with a lot of time to kill could re-target GCC for the ETA virtual machine!)
Comments are introduced by a hash character (#) and continue to the end of the line.
The ETA instruction E, T, A, O, I, N, S and H may be represented by themselves, or by the keywords dividE, Transfer, Address, Output, Input, Number and Subtract respectively. The single-letter and whole-word forms may be freely intermixed.
All instructions are recognised case-insensitively, so (for example), divide, DIVIDE, Divide, dividE and DiViDe are all equivalent.
Many instructions may occur on a single line, but they must be separated by whitespace (i.e., a sequence of one or more SPACE and/or TAB characters.)
The N, or Number, instruction is unique in that it alone takes an argument from the EAS program text. The argument may follow immediately after the letter N if the single-letter form is used, but must be separated from the whole-word form by whitespace.
Numeric constants may be expressed in three ways:
Labels are provided as an abstraction of addresses, so that portions of code can be relocated without needing to re-count lines.
A line may be labelled by prefixing it with the sequence >NAME: (i.e. greater-than, label name, colon.) There is no limit to the length of labels; they may be contain any non-white characters other than the colon.
A label may not be defined more than once.
A line may carry multiple labels (e.g. >FOO: >BAR: code). This is useful primarily when defining functions: the first line of a function always needs a public name, used as the entry point; and sometimes also needs a private name, used only within the text of the function.
Labels may be used as the argument to the N instruction as in N<NAME (i.e. less-than, label name)
Labels may be used both before and after definition.
A line starting with a star (*) requests the inclusion at that point of the file whose name immediately follows the star. No intervening whitespace should be used. Compilation proceeds as though the starred line were replaced by the contents of the named file.
Included files may include other files, and so on ad infinitum.
Labels defined in an included file may be used in the including file, and vice versa. The same label may not be defined in more than one file contributing to a single compilation.
The paths of included files are always interpreted relative to the working directory of the EAS process rather than (for example) relative to a well-known library directory, or relative the location of the file containing the inclusion request. This may be a bug. Ask me again next week.
The ETA Assembler is invoked as follows:
eas [ -d ] [ -O ] [ eas-file [ eas-file2 ... ] ]
It assembles the eas-files named on the command-line, concatenating them into a single program if there are more than one, and reading the program from standard input if no files are named. It writes the assembled ETA program to standard output; so it can be used as a filter.
The options have the following effects:
The debugging output contains details such as the line numbers assigned to each label; it is unlikely to be of interest to anyone except the maintainer of the assembler.
Since most ETA programs need access to the same basic facilities such as multiplication, numeric input, etc., there is an obvious need for a standard library of EAS files which can provide these facilities. The result is a set of EAS source files, distributed along with the assembler itself, which have well-known filenames (for inclusion), and define well-known function names (labels) for calling. This section describes the standard EAS library.
It would be nice also to provide these routines in the form of compiled ETA code, ideally padded to make good prose or poetry. Unfortunately, compiled ETA code will in general be different each time it's used, since it will be Transferring to addresses in itself and in other standard routines which may be at different locations. We could ameliorate (but not solve) this problem by getting EAS to generate position-independent code by calculating Transfer addresses relative to the current address. In the mean time, the standard library is provided in EAS form only.
Functions defined in the standard library must be named exactly the same as the files that contain them except that they must be all in upper case (and of course omit the .eas extension.) For example, a standard library file called multiply.eas must define function labelled MULTIPLY.
All the labels used within a function labelled FOO, say, must begin with FOO, followed by a lower-case letter or underscore, followed by any sequence of upper- and lower-case letters and underscores - there is no length limit. For example, the MULTIPLY function might internally use the labels MULTIPLYloop and MULTIPLY_done.
These conventions mean that a standard library file called foo.eas, say, has complete control over the namespace of labels beginning with FOO and not followed by another capital letter; so no clash could arise from a pair of standard library files called foo.eas and food.eas, for example.
No, because in the case where a top-level program includes two or more complex routines, each of which includes the same lower-level one, two or more copies of the low-level routine would be included. Then before you know it, you're committing Microsoft-level stupidities like having seventy-odd copies of the getchar() code in MS-Word1.
It's better, though admittedly clumsy, to require the top-level program to include all the prerequisite functions of the functions it uses. At least the assembler helps with this, by complaining when a used label is not defined.
# multiply.eas -- multiply two numbers # readnum.eas -- read a decimal number # writenum.eas -- write a decimal integer
The next line should be a comment containing version-control information, if appropriate. Examples:
# SCCSID("@(#)/home/mike/eta/easpit/SCCS/s.writestr.eas 1.1") # SCCSID("@(#)/home/mike/eta/easpit/SCCS/s.writenum.eas 1.2")
If the file requires any other files to be included in order to provide lower-level routines which it uses, these should be listed, separated by commas, on a subsequent comment line beginning # Requires:. Examples:
# Requires: WRITESTR # Requires: READNUM, MULTIPLY
(These requirements may seem unnecessarily draconian, but they do facilitate automatic processing of the library files to produce tables like the one below.)
Here, then, are the routines currently provided in the standard library:
Label | Requires | Description |
---|---|---|
MULTIPLY | nothing | Multiplies together the top two numbers on the stack, leaving the product on top of the stack in their place. The multiplication takes time proportional to the argument on top of the stack (i.e. the second argument to be pushed), so (for example) N50 N5 N1 N<MULTIPLY T is about ten times faster than N5 N50 N1 N<MULTIPLY T. |
WRITESTR | nothing |
Writes to the standard output stream the characters that have
been pushed onto the stack, up to but not including a
terminating NUL character (zero), and consuming all the
characters including the NUL. No implicit newline character
is written: this must be done explicitly (N10 O) if
required.
Note that the characters must be stacked in reverse order from how they are to be displayed, as in N0 N'o N'l N'l N'e N'H N1 N<WRITESTR T. |
WRITENUM | WRITESTR | Writes to the standard output stream the decimal representation of the number on top of the stack, consuming it in the process. No implicit newline character is written: this must be done explicitly (N10 O) if required. |
READNUM | MULTIPLY | Reads a decimal integer from the standard input stream and leaves it on the stack. An optional leading sequence of space characters is consumed, followed by a sequence of digits and a terminating space or newline character. |
(Source code for some of these routines is provided below in the Tutorial Examples section.)
It's worth using the -d 4 option of the Reference ETA interpreter to get a debugging trace: before each instruction is run, this shows the current line-number, the contents of the stack, and the instruction itself.
It turns out to be very useful to formalise an invariant that is always true at the top of a loop. I always used to look down on this sort of computer-science-for-the-sake-of-it behaviour, with Bound Functions and Weakest Preconditions and all that, but I can see where it comes from now I'm programming a really primitive machine!
Another useful technique is to comment each line of your EAS program with a picture of how the stack's expected to look after it executes. This gives you something to compare the debugging output with.
Possible enhancements for the future include:
This section presents a sequence of twelve increasingly complex EAS code fragments - some entire programs, some functions - which together constitute a small but complete tutorial in the use of EAS to write Real World ETA programs. The later examples build on the earlier, to yield programs of some complexity.
This is a re-implementation of the Unix utility /bin/true, the purpose of which is to return a ``success'' exit-status to the operating system. No instructions at all are needed, since falling off the end of an ETA program is equivalent to an explicit Transfer to address zero, causing the interpreter to return ``success'' to the operating system. So here is the program in its entirety:
# true.eas -- /bin/true reimplemented in ETA Assembler. # SCCSID("@(#)/export/home/staff/mike/src/language/eta/easpit/SCCS/s.true.eas 1.1")
This program prints everyone's favourite message. The interleaving of the Number and Output instructions is obviously pretty arbitrary. I've chosen to avoid the fenceposts: the selected interleaving is neither n × (Number; Output) nor n × Number; n × Output.
# hello.eas -- the "Hello, world!" program in ETA Assembler. # SCCSID("@(#)/export/home/staff/mike/src/language/eta/easpit/SCCS/s.hello.eas 1.1") Number 32 # can't quote a space character: use ASCII value Number ', Number 'o Number 'l Number 'l Number 'e Number 'H Output Output Output Output Output Output Output Number 10 # can't quote a newline either Number '! Number 'd Number 'l Number 'r Number 'o Number 'w Output Output Output Output Output Output Output
An alternative approach would be push a marker character on to the stack - NUL would be an obvious choice - followed by all the characters of the message, then call a well-known function to output the stacked characters: see below for this program.
This program demonstrates the use of the Transfer instruction for conditionals (line 4), discarding unwanted values from the stack (line 5), termination (line 6) and looping (line 7).
# pip.eas -- CP/M's "pip" program, copies input to output. # SCCSID("@(#)/export/home/staff/mike/src/language/eta/easpit/SCCS/s.pip.eas 1.1") >LOOP: Input # Read a character Number 0 Halibut # Duplicate it for the test Number 0 Number 1 Subtract Subtract # Compare with -1 (EOF) Number <WRITE Transfer # If not equal, continue ... N 0 N 1 H T # ... otherwise discard -1 Number 1 Number 0 Transfer # ... and make a successful exit >WRITE: Output # Copy the character Number 1 Number <LOOP Transfer # Go back and read the next one
(Not all the features of the original CP/M pip are implemented!)
Although this code performs perfectly good addition, its main purpose is to demonstrate function definition. (After all, addition is pretty easy in EAS: just subtract the negation of the number you want to add. Make the negation by subtracting the number from zero.)
# function.eas -- define a function to add two numbers together. # SCCSID("@(#)/export/home/staff/mike/src/language/eta/easpit/SCCS/s.function.eas 1.1") # On entry, stack = ... x y addr # Prologue: roll args above return address >ADD: Number 2 Halibut # ... y addr x Number 2 Halibut # ... addr x y # Add the numbers: subtract y from zero, then the result from a. Number 0 # ... addr x y 0 Number 1 Halibut # ... addr x 0 y Subtract # ... addr x -y Subtract # ... addr x+y # Epilogue: tidy up stack to leave result on top, then return Number 1 # ... addr x+y TRUE Number 2 Halibut # ... x+y TRUE addr Transfer # Return to addr, leaving acc on stack
The prologue N2 H N2 H can be considered as a standard idiom for entry to a two-argument function; equivalent sequences for functions of zero, one, three or more arguments are obvious.
Similarly, the epilogue N1 N2 H T can be considered as a standard idiom for exit from a function of a single return-value.
Here, we demonstrate how to call the addition function defined in the previous example. Note that the arguments are pushed on the stack before the call address, so the calling sequence itself (N1 N<ADD T) can be considered as atomic.
# add.eas -- test-harness for addition function. # SCCSID("@(#)/export/home/staff/mike/src/language/eta/easpit/SCCS/s.add.eas 1.1") Input # stack: x Input # stack: x y Address N 1 N <ADD Transfer Output # print result Number 1 Number 0 Transfer 0 # jump to location zero, i.e. stop # include the addition function *function.eas
(This program is not actually particularly useful as it stands, since the numbers to be added are read as the ASCII values of consecutive input characters, and the sum is written as the character with the appropriate ASCII value.)
This is an unusual function, in that it is variadic. This means that we can't use a standard prologue to roll the return address down under the arguments. Instead we maintain the invariant that the top of the stack is the return address, and the characters remaining to be written are immediately below it.
This invariant is true on entry, so there is no prologue as such.
# writestr.eas -- write a NUL-terminated string # SCCSID("@(#)/export/home/staff/mike/src/language/eta/easpit/SCCS/s.writestr.eas 1.1") >WRITESTR: N1 Halibut # move next character above addr N0 Halibut # duplicate it for the test below Address N0 N1 S S T # skip if non-zero # We only get here if we've seen the NUL, and by that point, the stack # is: ... <addr> NUL. We need to strip the trailing NUL and return. # This epilogue is in the middle of the function (why not?) Subtract N1 N1 Halibut Transfer # The character is on top of the stack output and loop. # (Or think of it as tail-recursion if you're a Lisp hacker :-) Output N1 N<WRITESTR Transfer
The epilogue is still called an epilogue, even though it's in the middle of the code.
This re-implementation of the ``Hello, World!'' program uses the WRITESTR function defined in the previous example. It merely stacks up the characters of its message, then calls WRITESTR to write them out.
# hello2.eas -- A better "Hello, world!" program, using an output function. # SCCSID("@(#)/export/home/staff/mike/src/language/eta/easpit/SCCS/s.hello2.eas 1.1") N0 N10 N'! N'd N'l N'r N'o N'w N32 N', N'o N'l N'l N'e N'H Address N1 N<WRITESTR Transfer N1 N0 T # include the string-output function *writestr.eas
In addition to being a very useful routine for real programs, this demonstrates some fairly sophisticated stack management (with extensive use of the Halibut instruction), and consequently, the technique of using a semi-formal loop invariant and stack pictures as an aid to development and debugging. This routine also introduces the dividE instruction.
# writenum.eas -- write a decimal integer # SCCSID("@(#)/export/home/staff/mike/src/language/eta/easpit/SCCS/s.writenum.eas 1.2") # # Requires: WRITESTR # Prologue (called with arg). # Start state # ... arg addr >WRITENUM: N0 N2 Halibut # ... addr NUL arg N0 H A N0 N1 S S T # skip to real start if non-zero N'0 O T N1 N1 Halibut Transfer # special case for arg=0 # Invariant: stack contains a NUL, then the ASCII codes of the digits # rendered so far (least significant first), then the remaining part # of the number being rendered. >WRITENUMloop: N0 Halibut # ... addr NUL <digits> arg arg Address N0 N1 S S Transfer # if not done, skip a line S N1 N<WRITENUMdone Transfer # done: strip NUL, jump to output phase N10 dividE # find next digit N0 N'0 Subtract Subtract # calculate its ASCII code # Stack now contains \0 <digits> quotient ASCII(remainder) # We just need to swap the last two to attain the invariant above. N1 Halibut # swap N1 N<WRITENUMloop Transfer # next iteration # Call existing code to output the accumulated digits. >WRITENUMdone: A N1 N<WRITESTR Transfer # Epilogue: return to the stacked address N1 N1 Halibut Transfer
Here we multiply two numbers, x and y by starting with a zero-valued accumulator and adding x to it y times. This routine, or an equivalent, is all but indispensable when writing real programs of any substance.
# multiply.eas -- multiply two numbers # SCCSID("@(#)/export/home/staff/mike/src/language/eta/easpit/SCCS/s.multiply.eas 1.1") # # Caveat callor: since this loops on the second argument to be pushed, # it should be minimised where possible: it's (roughly) a hundred # times faster to multiply 200*2 than 2*200. # Prologue for a function of two arguments # Initial state: # ... x y addr >MULTIPLY: N2 H N2 H # ... addr x y # Create an accumulator (initially zero), push it below the working numbers. N0 # ... addr x y acc N2 Halibut # ... addr y acc x N2 Halibut # ... addr acc x y # Loop invariant: stack is of the form: ... addr acc x y' # Where y' is gradually decremented, and acc increases by x each time. >MULTIPLYloop: N0 Halibut # dup A N0 N1 S S T # if not done, skip a line N1 N<MULTIPLYdone Transfer # done: jump to end N1 Subtract # ... addr acc x y (but y--) N2 Halibut # ... addr x y acc N0 N0 N3 Subtract Halibut # ... addr x y acc 0 x Subtract Subtract # ... addr x y acc (but acc+=x) N2 Halibut # ... addr y acc x N2 Halibut # ... addr acc x y N1 N<MULTIPLYloop Transfer # next # Stack is: ... addr acc x y(=0); we have to discard two values >MULTIPLYdone: N1 H T # ... addr acc # Standard function-call epilogue. N1 N2 H T
Note the use of the idiom
A N0 N1 S S T
to skip a single line if the number on top of the stack is true. This is a classic case of a non-trivial sequence of operations being treated as an indivisible unit.
(This routine is an execution bottleneck for many EAS programs, since it runs in O(arg2) time. A faster multiplication routine would be a great boon: but is one even theoretically possible?)
This is written as a function of no arguments, returning a single value.
The numeric input convention that we use is that any leading spaces are skipped, then a sequence of non-spaces (assumed but not checked to be decimal digits) is accumulated numerically, then a terminating space is consumed. Pragmatic considerations dictate that a trailing newline or EOF must be treated as an alternative terminator.
This is not perfect: we'd like to treat all whitespace equivalently (but that would be dull); we'd like to check that the digits are between 0 and 9 (but we have no inequality check); we'd like not to consume the trailing space (but we don't have a PEEK or UNGET) facility. Nevertheless, this is an invaluable routine.
# readnum.eas -- read a decimal number # SCCSID("@(#)/export/home/staff/mike/src/language/eta/easpit/SCCS/s.readnum.eas 1.1") # # Requires: MULTIPLY # No input args => no function prologue # Skip leading spaces >READNUM: >READNUMspace: Input N0 Halibut N32 Subtract # duplicate input, compare to SPACE A N0 N1 S S Transfer # Skip if non-zero (i.e. != ' ') N0 N1 T N<READNUMspace T # discard NUL and loop # Initialise accumulator and push it below initial digit N0 N1 Halibut # Accumulate digits: we already have the first >READNUMloop: N'0' Subtract # convert char to number N1 Halibut # stack: num acc N10 A N1 N<MULTIPLY Transfer # multiply by 10 N0 N1 Halibut # stack: num 0 10*acc Subtract Subtract # add accumular back onto new digit Input N0 H N32 S N<READNUMnl T # if not a space, continue N1 N<READNUMdone T # it _was_ a space: out of here >READNUMnl: N0 H N10 S N<READNUMeof T # if not a newline, continue N1 N<READNUMdone T # it _was_ a newline: out of here >READNUMeof: N0 H N0 N1 S S N<READNUMloop T # if not a EOF, loop # Otherwise, we're finished: discard spare space character >READNUMdone: N0 N1 H T # epilogue N1 N2 H T
Note that the MULTIPLY routine, defined in a previous example, is used here: so any program that includes this file must also include multiply.eas
We make extensive use of function-calls here: first, to obtain the number whose factorial we wish to calculate; second, to perform that calculation - recursively as it happens; and third, to write the result in a human-readable form.
# fact.eas -- factorial, via recursion # SCCSID("%Z%%P% %I%") # Caller A N1 N<READNUM T # obtain argument A N1 N<FACT T # call factorial function A N1 N<WRITENUM T # print result N10 O N1 N0 T # print a newline and exit # One input arg: function prologue >FACT: N1 H # roll argument above return address # ... addr arg # Function body # Factorial of 1 is just 1 N0 H # duplicate top for test # ... addr arg arg N1 S A N0 N1 S S T # skip if non-zero (ie. top wasn't 1) # ... addr arg arg-1 here+1 T N1 N<FACTdone T # argument _was_ 1: we're finished # ... addr arg N0 H # dup # ... addr arg arg N1 S # stack is now: ... addr n n-1 # ... addr arg arg-1 A N1 N<FACT T N1 H # recursive call (tail recursion!) # ... addr fact(arg-1) arg A N1 N<MULTIPLY T # multiply Nby fact(n-1) # ... addr fact(arg-1)*arg # result is on the stack: fall through to the epilogue # Finished: function epilogue >FACTdone: N1 N2 H T *readnum.eas *writenum.eas *multiply.eas *writestr.eas
This program, again making use of pre-defined routines, prints the words to the classic song 99 Bottles of Beer on the Wall:
# bottles.eas - sing "99 bottles of beer on the wall" # SCCSID("@(#)/export/home/staff/mike/src/language/eta/easpit/SCCS/s.bottles.eas 1.2") # Entry point: jump past included and defined routines N1 N<MAIN T # Standard library routines *writestr.eas *writenum.eas # Subroutine: write "<n> bottles of beer" # Sole argument: number of bottles. >nBoB: N1 H # prologue A N1 N<WRITENUM T N0 N'r N'e N'e N'b N32 N'f N'o N32 N's N'e N'l N't N't N'o N'b N32 A N1 N<WRITESTR T N1 N1 H T # Subroutine: write "<n> bottles of beer on the wall" # Sole argument: number of bottles. >nBoBotW: N1 H # prologue A N1 N<nBoB T N0 N'l N'l N'a N'w N32 N'e N'h N't N32 N'n N'o N32 A N1 N<WRITESTR T N1 N1 H T >MAIN: N99 # Stack invariant is trivial: top is number of bottles left. >LOOP: N0 H A N1 N<nBoBotW T N', O N32 O N0 H A N1 N<nBoB T N10 O N0 N10 N'd N'n N'u N'o N'r N'a N32 N't N'i N32 N's N's N'a N'p N32 N', N'n N'w N'o N'd N32 N'e N'n N'o N32 N'e N'k N'a N'T A N1 N<WRITESTR T N1 S N0 H A N1 N<nBoBotW T N10 N10 O O N0 H N<LOOP T # Done: tidy up by dropping the spare 0 from the stack N0 T
Note that the function BoBotW, which prints the message number bottles of beer on the wall, works by calling BoB to print the first part of the message, and appending the latter part.
There is a wrapper, EASy, that assembles an EAS program and executes the resulting ETA code. As a side-effect, it leaves the ETA program in a file in the same directory as the EAS program, and whose name is formed from its name by removing the .eas extension (if any) and adding a .eta extension.
This is a trivial hack to find words from the system dictionary that can be incorporated into an ETA program containing a known sequence of significant characters. The command-line arguments are sequences of consecutive instructions to be incorporated into single words.
For example:
etaword nen → drunken, gunmen
etaword tes → cutesy, Rutgers, uterus
etaword nt → blunt, burnt, grunt
This is a truly trivial hack to find the significant characters in candidate words for an ETA program. This will tell you (for example) what your name does, so that you can interpolate it in an ETA program.
For example:
etainst Mike Taylor → ie, Tao
etainst Programming → oain
etainst Language → anae
There are plenty of other ETA-programming utilities yet to be written.
An obvious one would be the program that takes an unpadded ETA program (such as the output of EAS) and writes an equivalent program consisting entirely of words found in the system dictionary. A simple version of this program could be written just by making repeated calls to etaword; although a cleverer version would know something about grammar, and perhaps even about style and taste.
There are other possibilities, including (for example): a program to generate no-op sequences of ETA instructions for harmless inclusion in programs; a program to re-locate ETA code to start at a different line; and of course, a disassembler.
Creators of new ETA-programming utilities are encouraged to email them to the author (address above) for attributed inclusion in future releases of the ETA distribution.
Source filename: eas.html
Last update: Mon Sep 14 23:22:06 2009