Writing Assemblers... What should a good assembler do?

cj7hawk · Dec 19, 2023

An interesting fault. I've completed everything except IM0/1/2 and was testing with code to find bugs, and my assembler was doing very weird stuff. At first I thought I found an emulator bug, but debug showed the code was only going crazy on jumps... And I realized my Cross Assembler I'm using to compile it has a bug detecting out of bounds relative jumps in some cases.

So I went back and checked my crazy code ( it was very simple in z80. How come it took me 40 lines in Basic??? It's a rhetorical question of course... ) and realized my condition checking was equally wacky, and of course, buggy.

So I wrote a detector and found 5 errors in my new compile and suddenly wondered why all my other code assembled perfectly, including my converted DRI source.

Seems DR don't use relative jumps for anything. Anyone know why?

Also my OS code was tight enough I never triggered the issue... It was only when writing my assembler I got into the range where it failed.

daver2 · Dec 19, 2023

DRI source will probably target both 8080 (and derivatives) and Z80 (or the source code is an automatic port of some 8080 source code) - so it may only contain absolute jumps - even though it is targeted for a Z80.

Or the original author(s) of the DRI source code did not know (or care about) the relative jump of the Z80...

Just guessing...

Dave

tofro · Dec 19, 2023

I actually don't recall to have seen any DRI code that was using Z80 instructions. All I've seen was either PL/M or 8080 assembly (so wouldn't have used any relative jumps at all). What you have is probably a translation or disassembly from some other source.

whartung · Dec 19, 2023

cj7hawk said:
An interesting fault. I've completed everything except IM0/1/2 and was testing with code to find bugs, and my assembler was doing very weird stuff. At first I thought I found an emulator bug, but debug showed the code was only going crazy on jumps... And I realized my Cross Assembler I'm using to compile it has a bug detecting out of bounds relative jumps in some cases.

This just reminds me of when I was writing my 6502 simulator and the accompanying assembler at the same time.

If you want a fun time, type in unfamiliar assembly from a listing into a buggy assembler running on a buggy CPU.

It gets even better when the CPU simulator you're using to compare against is also buggy!

Good times! Exciting nights!

Kelly Gray · Dec 19, 2023

cj7hawk said:
Seems DR don't use relative jumps for anything. Anyone know why?

The hardware requirements for CP/M were a CPU that was binary compatible with the 8080, at least 20K of contiguous RAM starting at address 0000H, some form of console device, and some form of disk storage.
It seems entirely plausible to assume that DRI would write all their software to meet that minimum system spec. Software that used the extra instructions of the Z80 would only work on some CP/M systems, not all of them..

cj7hawk · Dec 20, 2023

I went looking at the opcodes and sure enough, the 8080 had more NOPS than I remember from the last time I looked at it. But that fact completely eluded me at the time.

cj7hawk · Dec 20, 2023

whartung said:
This just reminds me of when I was writing my 6502 simulator and the accompanying assembler at the same time.

If you want a fun time, type in unfamiliar assembly from a listing into a buggy assembler running on a buggy CPU.

It gets even better when the CPU simulator you're using to compare against is also buggy!

Good times! Exciting nights!

Reminds me of when I had to train some C programmers to write assembly. They were very smart guys and took to it like ducks to water. But they were making up charts of where code was, with areas marked out in red. I asked what they were, and they explained those were locations where the code was unreliable, but worked well if it was elsewhere. Seems that's a perfectly normal thing to so in high level languages... So I had them show me what they were doing when it didn't work, and they were running some monitoring software while the CPU was still in "program" mode - and reprogramming the CPU on the fly with random data depending on what they were debugging at the time.

I've always had a rule- if something appears 'weird' - stop everything until the weirdness has been located and analyzed. Machine code should be ultra-reliable. Cosmic rays and random bit flips may cause issues. but statistically, they are pretty rare too.

bakemono · Dec 20, 2023

my 0.02 as someone who has used and written assemblers (...and currently working on a Z280 assembler for my Z280 board)

cj7hawk said:
First is should a label be possible to reassign mid-assembly?

It's common to have a separate class of 'local' labels which can be redefined at will.

Second, is how long should a label be allowed to get? 11 characters? 16 characters? 24 characters? 32 characters? 79 characters?

I went with 64 chars and the only reason I went that long is because I want to link with other code where such long names are used.

Thirdly, what operators should be allowed on functions?

Add, Subtract, Divide, Multiply, AND, OR, MOD, XOR, Invert, Shift left/right, Neg ( though NEG can also be invert+1 ) - And is there any standard as to what symbols should be used to represent these functions? And with z80 is there any valid reason to calculate a result more than 16 bits in the assembler? - eg, LD HL, 100000 / 5

Forthly, how useful are macros ? Or includes? And how to best use them?

Compile-time math, macros, and includes are things I rarely use so I'm generally not too concerned about them. I agree with other posters that if you can't link multiple object files then you'd probably want to be able to include other source files at least.

Including binaries (INCBIN) is very important though! Lack of INCBIN in FreeBASIC was one of the top motivations for me to make my own compiler

Fifthly, I'm looking to use . as a "null" character - that except in quotes, is simply ignored by everything... So .ORG and ORG are valid, and it can be used to help with readability - eg, .EQU MASK,%0000.0011.1111.1111

Sounds reasonable. As long as you don't plan to need a decimal point for floating point numbers.

Finally, should a line have length limits or wrap around? or should it just read until EOL is reached, ignoring everything after the comment until the final EOL is found assuming whitespace after the intial command is to be ignored? so that LD HL , 10 would be the same as LD HL,10 without spaces.

I went with 256 chars as maximum line length. Ignoring everything from comment until EOL. BTW, unlike many assemblers I allow stacking multiple instructions on one line so that code can be vertically more compact.

1+2 * 3 would give 9 since it evaluates terms in order.
%01010101 @ "AB" would treat A as the LSB, apply the logical AND to it, and would delete the B since there's no bits in the upper 8 bits of the AND.

Also I don't distinguish between 8 bits and 16 bits except as is relevant to the command, which may trap the exception if it detects the wrong number of bits - eg, LD A,$1234 would give an error, but LD HL,%0101 would not, and H would be 0. However LD A,$234/60 would work since the result is 8 bits.

I approve

cj7hawk · Dec 21, 2023

bakemono said:
my 0.02 as someone who has used and written assemblers (...and currently working on a Z280 assembler for my Z280 board)

Much appreciated -

I've reached the point now where the first pass assembles correctly across all multiple and undocumented instructions.. Though I got the nagging feeling I might have left halt out... Well, I'll find out quickly enough.

It's a big assembler at the moment, a little under 8K.. The extra instructions did take a toll, and making exceptions for the IX/IY register and displacement took a bit of space, since I couldn't find any algorythmically simple ways to know when the displacement should occur before the final argument and after it.

But it supports all bit instructions for IX/IY as well as the extended instructions for the full matrix of HL-IX/IY translation...

It seems my size limit is 256 characters per token, whether it's a label, instruction (not that there are any) or data element. Formula are multiple data elements so only labels and strings are going to be that large, and in the end, I only used DB and DW, and DW is simply a 16 bit number. DB is everything else... ( Strings, Formulas etc. )

I haven't included Macro's yet as I still need to make sure I fully understand how they are expected to work, rather than just how I'm assuming they work presently, so however I implement them, they still achieve what is necessary. But it's an assembler - not a high level language and it's very simple in it's operation, so while macros will be supported, I haven't determined exactly how that will be done yet. Also, I can't use the entire TPA as I want to support being able to load the assembler into high memory and write directly into the TPA as an option too.

If I add an instruction separator label ( eg, | for example ) then multiple commands on one line are not an issue, but because I ignore whitespace in the arguments after the executive is located, I can't simply chain them with whitespace between them since they would be merged - this means that ld a,b and ld a , b are both OK, but spaces in numbers ( as well as using the dot ) is also OK - eg LD HL,$ AA 55 or LD HL, %0011 1111 1111 1111, so it makes it easy to format hex and binary numbers for readability... And an instruction separation symbol isn't a difficult thing to add. A few bytes of code. Probably a pipe symbol would work well.

I found 120K of source code with a little over 4000 lines assembles in a few seconds ( would estimate around 5 to 10 times slower than this on a normal z80 ) and that's for around 8K of code since I had it assemble itself as a test. It also resulted in around 8K of labels, and I use fairly long labels, around 16 to 32 characters on average. I was originally thinking of a matrix for speed of lookup, but then tried a linked list.

Using a linked list for the labels worked out great. It's fairly efficient even for searches. Also, it lets me store macros as labels, so one data structure for all stored data structures. And it's why I gave up on fixing maximum label sizes.

The next step is clean up and remove redundant and test code before going on to running the second pass, and including warnings, as well as generating usable output files (at the moment I'm assembling to memory ).

Command line switches are also now implemented though so far I have only used /D for debug. I also haven't worked out what the defaults should be yet, but I'm thinking writing .COM files is a good default.

After that I can add other formats - eg, Hex etc.

My first goal is to make it useable under z80 based CP/M systems as an assembler for .COM files, and I might write a simple editor to go with it before it gets much further. I used to write software in Wordstar back in the 80s and it was a bit slow of a process. It would work better with a task-switching editor sharing memory with the assembler. At least now I know that 8K holds a lot of labels from a practical example now.

cj7hawk · Dec 23, 2023

Looking for some feedback on the following, as to how they would be received by programmers, as I've had some time to look at how to implement.
Thank you to the forum for the ideas.

Features I'm currently working on the best way to implement ( and how I'm planning on them working )
* Macros - Simple macros supported with relocatable relative jumps internally - which z80 supports beautifully. In the simplest sense, the macro just creates a string of opcodes that are "pasted" in via a single command. Not too complex. Low supporting code overhead. It will be up to the coder to make sure the macros are relocatable.
* IFZ and IFNZ - Will be simple and will ignore code if the number/formula-evaluation after it does not equal zero or equals zero, until the next endif. Cannot be nested. Will ignore anything until ended. Assembly coders won't have difficulty figuring this... I might add others later if GT and LT are necessary.

Current config of the code already implemented - many ideas taken from suggestions already.
* Labels can be any size up to the buffer size ( currently 256 bytes max due to single byte content counter, but I'm wondering if this needs to be more than 128 bytes ) Still, it handles anything from single character to 255 characters.
* Maths - stackless order of precedence based Immediate-Execution-Infix-Notation ( like many common early OS - but without brackets... Like an 80s pocket calculator. I finally found out what it was called. )
* Number systems - Supports Decimal, Hex and Binary. Supports mixing number systems.
* Supports Bytes, Series of Bytes (including strings), and Words. Supports maths for terms.
* Supports +-/* Mod SL SR and has symbols for PC ( Still deciding on symbol selection, but all presently supported ). It's all 16 bit arithmetic.
* Supports special operators '.' = No character - '|' is a command separator for facilitating multiple commands on same line.
* Two pass - Assembles full output on BOTH passes, but does not write or remember the first pass.
* All labels are equal... And are only created on the first pass. I'm still in two minds over whether to allow label changes on the second. Seems it's half wanted, half not wanted. I might make this switchable.
* Command line operation ( not in source code )... Can specify an input and an output file. Can add switches. All switches are /A /a /B /b format. Is case sensitive to switches. Some switches can be read by the source so options in source can be changed through IFZ etc, or to change variables in labels.

Things I'm still working on adding.
* Intel Hex and Binary output.
* The ability for source code messages to be displayed on the console to remind whoever is assembling, eg, "MESSAGE Remember to set the condition before assemble" would show the message on the console during first pass.
* DISPLAY[HEX/DEC/BIN] is like "Message" except displays a label's value - Can be linked with message; eg "ORG $0100 | Message The Current PC is | DisplayHex ^" would show something like The Current PC is 0100 on the console during assembly.
* Offset ( Still have to add this ). Will "tweak" the PC by adding "OFFSET" before writing bytes - so PC is changed by ORG, but OFFSET is added to PC before writing bytes. It's a 16 bit number.
* Writing the location of Jumps to the Label List to allow any output file to be made relocatable. With a switch to turn on.
* Ability to mark variables for export ( see addons ).
* Pipe Mode Execution - Since the assembler takes a stream of information in sequential format, one character at a time, it may be easier to build it into a "Pipe Mode" of execution, rather than trying to fit "Include" type commands or extensive macros. This would allow me to create add-ons for advanced Macros, Includes, Inlines, etc... Without changing the base assembler. This includes code to generate assembly code.
* Ability to execute as a high memory program - so that it can load and wait for input and will simply assemble it, while still creating any necessary switches. Kind of like an API mode operation, but supporting feeding it source code one byte at a time. Labels are shared space for this application, so an external application could change labels on the fly if necessary.
* Ability to assemble directly to TPA memory.

Addons - Not a part of the main codebase but for later writing using either the default or relocated LASM.COM -
* Ability to turn a binary file into a relocatable file from the command line - eg, After running LASM, you could rune "RELOCATE" and it would read the label file from memory and build a table of fixed jump locations.
* Ability to list/display/retrieve/recover all of the labels and their values after assembly ( to assist in debugging ).
* Ability to recover exported label names and values.
* Tools to load binaries to any part of memory and execute them.
* Utilities to use Pipeline mode, and incude more complex macros, as well as includes, Inlines etc.

I think that covers, without missing anything, everything that was suggested without overly expanding the root application code size. I have 300 bytes of memory left and a lot to put into it to make sure it fully loads below 0x2000, so I'll have to start optimizing routines and increasing common instruction reuse. But I'm pretty close, so it should be possible. Although I still need another 128byte DMA zone within the app yet for the output file.

cj7hawk · Dec 24, 2023

For those following my original question, I went with the Hand Coding architectural approach with some table driven element where common tables existed. That was the kind of information I was originally looking for.

I found the kind of book I was looking for in the end- though it's more around compilers and is very generic.

https://dl.acm.org/doi/pdf/10.5555/2737838 (Engineering a Compiler) - But it did discuss formal approaches which are relevant to both assemblers and compilers.

I went with a hand-coded approach with some direct coding and some table use, aiming at small size and efficiency as the objective.

The approach I used has a simple lexer with bare minimum rules to separate tokens for the program to use. This has two modes - Whitespace Sensitive and Whitespace Insensitive, though the same code is otherwise used for both routines and only a switch changes the operation of the routine.

This then drops the removed tokens and other objects into a buffer which provides a whitespace free and consisten token for matching, and an operator if one was found. It doesn't care what was found, it only performs this function, with the exception that special operators are recognized, but do not otherwise get recorded in either the buffer or the operator field.

Token Matching could have gone Tables or Hand Coded, but given a match is little more than a field scanning exercise, which returns zero set for a match, and matching is performed outside of initial token validation, it made more sense to go with a hand coded approach since otherwise table result matching would still have been required.

So once the initial command is pulled from the lexer, then the pass routine calls a "scanning routine" which itself is just a series of calls, followed by a hard-stop error generator. This works well for a fixed finite series of expressions, as z80 assembly requires. To simply the explanation, Pass1 calls Test-Instructions, which calls all the tests in turn, then errors and bombs out.

Something like this;


CALL TEST_LD
CALL TEST_INC
CALL TEST_DEC
;etc
JP ERROR_OUT

This reduces a test to a combination of a DB and a Test matching loop - eg,


LD: DB 2,'LD' ;( Table Component ) 

TESTLD: 
LD DE,LD
CALL TESTLOOP ;(Matching test - tests the buffer content and length against the stored example at the initial label. )
RET NZ
;Code here for what to do with LD
POP HL ;(Dispose of the ret to the scanning routine.)
RET       ;(And return to the main program)

In the end, this was the most efficient structure I could find to break apart the tokens from the lexer and seemed to use the least code for any data structure analysis routine I could think of.

There's some tables where they are relevant, for example, the B,C,D,E,H,L,(HL),A table and the BC,DE,HL,AF/SP table and these work just like TESTLOOP but with tables and return a value that I can splice into a root instructions. So for example, it calls the 8 bit register checks, then the 16 bit register checks, then it goes looking for things like brackets, then finally assumes it's either a number or a label or an error. Otherwise, there's a routine for every step of every instruction which tests syntax and operator state at each level.

z80 assembly is a mess. Some instructions have a comma, some don't, and some can go either way, which really taxes this approach, as it makes it difficult without a table to cleanly address, and I had to make exceptions in the code to deal with it. Which means complicated and messy code.

The stuff I didn't handle well was in the brackets - eg, (IX+d) or (NNNN) or similar, since it breaks the code structure. Other languages such as Intel have much nicer syntax for indirect modes.... Though IX/IY processing turned out to be surprisingly easy with this approach and were very code-efficient.

Errors are handled in an unstructured way due to unpredictable path to find an error, so the error handler restore the stack to the program-entry default, generates the message and drops straight back into the OS. Later I'll prove switch options to just keep searching the code for more errors instead of bombing out.

Overall, it's a very simple assembler architecture... The interesting thing is that a I'm writing it, I'm finding a real need for some of it's capabilities - eg, Conditional Assembly, Macro's etc. Which my original cross-assembler doesn't support. It's strange how I never needed those functions before I wrote this. Anyway, at a high level it looks like this;

I thought I'd share since it's an interesting way to build an assembler for z80 under CP/M... Though I'm going to need some comparable code samples and existing CP/M assemblers to test it for speed comparison purposes. If anyone has any ideas, I'd welcome them. I'm not really sure how fast an assembler should be. My hazy memory tells me they weren't that fast back in the 80s though.

David

Someone · Dec 24, 2023

There’s a book “Structured Assembly Language Programming for the Z80” by Daniel N. Oznick that describes some useful features you may want to consider.
ISBN: 0-8104-7299-6

Dwight Elvey · Dec 24, 2023

I extend Forth to be the assembler. This way, macros become a natural extension of the Forth language. I tend to use a variation of a single pass assembler.
I don't use typical assembler formats as that that tends to make the assembler more complicated. As an example "Val Reg Operation" is a typical sting. When the particular processor allows, I deal with forward references as a marker save as a link, using the data field. Then I'd patch the forwards when the address is located. If the field in the processor is not large enough in size, I keep a linked list using the 'create does' capabilities of Forth.
Things like phase errors are treated by just wasting a little space with nops. If the code has a critical speed requirement I go back and optimize on a final pass using pre-knowledge of where it is need. If not, I leave it as is.
Dwight

cj7hawk · Dec 25, 2023

Hi Dwight,

Speed is important, but space is the main requirement. I want the entire assembler to sit at 0100 and come in under 1FFF - less than 8K including the extra z80 instructions, both documented and undocumented.

So far, I'm on track, but I had to clean up my code a bit as I already hit the limit once. I have around 500 bytes left of my allocated space, and still have to include Macros and a second DMA location for writing the output file. After that, I've completed version 1.

I got my first full compile today with 2 passes, and it compiled itself and got the same hex footprint as my cross-assembler produced at the same location, so I was pretty happy with that... There were some brain-twisting bugs that left me wondering if my emulator was working correctly, but I tracked them down, and then I hit some Meta-related issues with how the IX/IY translation worked, and how it was translating itself in real time while compiling and not how intended, so I had to fix that, and then I finally got a clear diff without any differences between the PC created code and the z80 created code -

It was a good feeling! It was a GREAT feeling !

Next is to push the output to a file, and then to add in the Macro support. Adding conditional support was key to getting the program size down as I was able to turn all the debugging code on and off between compiles. ( Debugging code will be removed from the final but will remain in the source... It basically just gives a lot of detail about how the lexer and lookup systems are working... And since it can reassemble itself, it's easy enough to put back in... ). I also have some bounds checking to do before version 1 is complete.

In my case, the executive must always come first, and operators must be a single character, and labels must be declared by either a colon, or with an EQU ( but the EQU comes at the start of the line, since it's executive ) or via a Macro.

cj7hawk · Dec 27, 2023

Alpha is up on GitHub... I'm just trying to work out this whole Macro / Relocatable Code issue before tackling adding that stuff in.

GitHub - cj7hawk/Loki-Z80-Assembler: A z80 assembler for CP/M. Assembles z80 source code and includes both extended and undocumented instructions. .

A z80 assembler for CP/M. Assembles z80 source code and includes both extended and undocumented instructions. . - GitHub - cj7hawk/Loki-Z80-Assembler: A z80 assembler for CP/M. Assembles z80 sourc...

github.com

Still under 8k in size...

I was reading up on L80 and just saw how big M80 and L80 are ! They are massive. Any idea if they were written in assembly or a high level language?

wrljet · Dec 27, 2023

cj7hawk said:
I was reading up on L80 and just saw how big M80 and L80 are ! They are massive. Any idea if they were written in assembly or a high level language?

Dunno about M80, but for comparison Intel MAC80 was written in FORTRAN. The source file I have is 4361 lines.
(dunno if that's freely available or not, to post here)

C INTEL 8080 MACRO ASSEMBLER 10001100
C MAC80 10001150
C VERSION 3.0 10001200
C NOV 1978 10001250
C 10001300
C COPYRIGHT (C) 1974 10001350
C INTEL CORPORATION 10001400

cj7hawk · Dec 27, 2023

wrljet said:
Dunno about M80, but for comparison Intel MAC80 was written in FORTRAN. The source file I have is 4361 lines.
(dunno if that's freely available or not, to post here)

@wrljet

That is pretty amazing ! it would be great if it was something that could be posted, but I completely understand why you'd be reticent to do so - thank you for mentioning and for answering my question though - that is really appreciated.

The Microsoft one looks like it compiled up pretty large and I can see from the way it was written that it has some quirks around how it processing things that is not likely to have come from the assembly world at all.

My cross-assembler supports formatting like LABEL EQU VALUE and lots of aliases, because they are fairly trivial to do in something written in BASIC, but even then I guessed when I came to the z80 assembly written one, it would be much harder to support and so I went only with the format EQU LABEL,VALUE which has the same syntax structure as LD REGISTER,VALUE so the lexer has no extra work to do and there's very little exception hand coding to address it that way...

But using 2-letter contextual operators and some of the rules in Macro80 looked like it was written in a very contextual way that doesn't easily lend itselt to assembly routines designed for small space. To describe it, I have two algorythms for breaking up structure... I'd need additional ones to start using text-based operators which would add around 512 bytes to 1K in total (estimated) to the final code size if I had to implement it that way, and it would really screw with the simplicity of the mathematics expression evaluation routines which are all designed around 1 byte operators and a purely sequential input stream. It would be a lot messier and a different structural approach to support text operators in assembly. On the other hand, it was fairly easy in BASIC and didn't make such a big difference.

I wonder at that copyright what the connection between microsoft Macro assemble and Intel Macro Assembler is? Do you know what it compiles into as a COM? I know it's probably just similar naming, but it does make me wonder... It would be interesting to know if there were any common people writing software at both companies and whether there might have been a John Fogerty style copyright suit at some point that never happened ( or did happen without the benefit of knowing of that suit )? Working off of the idea that a particular programmer always writes code in a similar way and even if he writes software for two different companies, is likely to write similar code both times and didn't necessarily copy anything.

I know I've done it and later compared two pieces of software I wrote to do the same thing and they looked almost identical, down to the comments... As though one had been changed slightly from the other - but because I wrote both I knew that was not the case.

Sometimes I optimize the next time around, but generally if, for any reason, I follow the same algorythm, I'm likely to approach it the same way on the same system.

wrljet · Dec 27, 2023

Here's some info about the Intel assembler:

https://mark-ogden.uk/files/intel/publications/98-111A%20MCS-80%20Cross%20Assembler%20MAC80-Mar75.pdf

Evidently the files I have were posted by Udo Munk at some point, part of his Z80PACK. But I can't find them online now.
In the README file it mentions:

"The port to GNU Fortran 77 was done by Ron Young..."

wrljet · Dec 27, 2023

cj7hawk said:
@wrljet

I wonder at that copyright what the connection between microsoft Macro assemble and Intel Macro Assembler is? Do you know what it compiles into as a COM? I

No idea. And I don't know if it was ever even intended to run on the micro itself.

commodorejohn · Dec 28, 2023

bakemono said:
Including binaries (INCBIN) is very important though! Lack of INCBIN in FreeBASIC was one of the top motivations for me to make my own compiler

Ditto on this. It's one thing in C where any project of moderate complexity is gonna have to compile to intermediate object files and link separately anyway, but there's no excuse for an assembler not to include this capability.

Writing Assemblers... What should a good assembler do?

Veteran Member

10k Member

Member

Veteran Member

Member

Veteran Member

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Veteran Member

Experienced Member

Veteran Member

Experienced Member

Experienced Member

Veteran Member