Writing Assemblers... What should a good assembler do?

cjs · Apr 10, 2024

cj7hawk said:
I will check out RunCPM when I get some time. Is there a PC x64 installer / binaries available, or is it a case of always needing to build an appropriate environment to assemble it?

I don't know of binaries, but just checking out the repo and typing make posix build worked fine in my PC x64 environment. It's quite small (it compiled in literally seconds, even on my slow old laptop), so there are probably few if any dependencies beyond what's in the Debian/Ubuntu build-essentials package or the equivalent for whatever OS you're using.

Also, there's just the one binary, which can be run from anywhere, so there's no "install" to do for that. The only setup you need to do is to have your CP/M files in a directory A/0/ under the current working directory when start it.

cj7hawk said:
Not really. It just seems impractical to use a z80 assembler when I have a perfectly good cross assembler that is more suited to some of the task.

Well, I'm not clear on how writing a whole second assembler in a vastly different language, testing it to make sure it does even roughly the same thing as the first assembler, and continuing to maintain and update the second assembler with every change to the first, is "practical." But each to his own, I suppose.

cj7hawk · Apr 10, 2024

cjs said:
Well, I'm not clear on how writing a whole second assembler in a vastly different language, testing it to make sure it does even roughly the same thing as the first assembler, and continuing to maintain and update the second assembler with every change to the first, is "practical." But each to his own, I suppose.

I picked up a surprising number of bugs this way when using the full assembly file that didn't show when using small specific test assembly files...

It wasn't a planned activity. It was a response to the kinds of problems I was hitting when rewriting large sections of code.

cjs · Apr 10, 2024

cj7hawk said:
I picked up a surprising number of bugs this way when using the full assembly file that didn't show when using small specific test assembly files...

I am not surprised; from what I've read to date, it doesn't sound as if you have a comprehensive test suite.

But while picking up bugs is all well and good, I still don't see any advantage of having to pick up bugs twice, once in your assembler written in Z80 and once again in your second assembler written in BASIC. That's just twice as much testing because you have two separate programs that (you hope) do the same thing.

Svenska · Apr 11, 2024

cj7hawk said:
I can only assume your CP/M emulator is way faster than the one I wrote, which is only a few times faster than a fast real-z80 based system.

My i8080 emulator runs the (comprehensive) 8080 test suite about 260x faster than a true 8080 would (~42 seconds instead of >3 hours), on a decade-old laptop. A modern laptop improves this by another large factor because of improvements in modern processors.

If you are concerned about emulator speed, you are doing something seriously wrong. Keep in mind that assemblers are not known for doing heavy stuff.

cj7hawk said:
A simple goal is that the assembler must be able to assemble itself -

A noble goal, but one I would never consider for something like this. You are essentially wasting a lot of your own time to produce surprisingly low-quality code. As for the next point...

cj7hawk said:
To create a cross-assembler / native-assembler pair

...that is even less understandable to me. You are writing two buggy assemblers which are at most partially compatible with each other (and nothing else).

A smart approach would have been to at least reuse as much code as possible in the CP/M and Windows versions, to reduce the number of bugs you will cause. But since you are writing one of them in Z80 assembler (and the other one most likely not), you can't even do that. Conclusion: Your assemblers will not even be bug-compatible.

Have you thought about code revisioning and automated regression testing?

cj7hawk said:
As a rule of thumb the labels typically consume around the same memory as the assembled size, so source that generates a 16K ROM will need around 16 to 20K of system memory if long labels are used.

That sounds like a lot of memory, but understandable if you like long identifiers. They bloat the symbol table.

cj7hawk · Apr 11, 2024

Svenska said:
A smart approach would have been to at least reuse as much code as possible in the CP/M and Windows versions, to reduce the number of bugs you will cause. But since you are writing one of them in Z80 assembler (and the other one most likely not), you can't even do that. Conclusion: Your assemblers will not even be bug-compatible.

Writing both in a high level language would have been impractical. They did that in the 80s. The result is a binary three times as big as what I presently have, for similar functionality.

That's what I'm trying to avoid.

Svenska said:
Have you thought about code revisioning and automated regression testing?

For the moment, I'm just aiming at function. I do hit some problems, but the testing tends to pick it up pretty quickly.

cj7hawk · Apr 11, 2024

cjs said:
I am not surprised; from what I've read to date, it doesn't sound as if you have a comprehensive test suite.

But while picking up bugs is all well and good, I still don't see any advantage of having to pick up bugs twice, once in your assembler written in Z80 and once again in your second assembler written in BASIC. That's just twice as much testing because you have two separate programs that (you hope) do the same thing.

I tried RunCPM.

It doesn't do what the label suggests it should and it doesn't compile under Windows 11 as per it's own instructions, so it's not fit for my purposes. I loaded the recommended TDM GCC and there's just too much missing from Windows 11 to compile it, and the instructions don't bridge the gap.

I looked around and found others had the same issues, then found some Windows binaries created by someone called Guidol who I've encountered on other forums, and I managed to get those to work (thank you Guidol), but file operation was *very* slow - way slower than my emulator for just basic file commands ( maybe intentionally? I don't know. It didn't seem quite slow enough for that ) then my program couldn't access the disk and I found it had issues with direct disk access, so I gave up on it for the moment.

cjs · Apr 11, 2024

cj7hawk said:
Writing both in a high level language would have been impractical. They did that in the 80s. The result is a binary three times as big as what I presently have, for similar functionality.

That's what I'm trying to avoid.

Well, it's easily avoided. Write one in Z80 assembler, and use that on both native Z80 CP/M systems and under emulation on modern platforms. As two of us have now said. (It should tell you something that Svenska's response is basically telling you the same things that I have. We're not throwing out crazy ideas here; we're telling you the way software is normally developed in the modern age. (And even back in the '70s when people had enough machine power: the original MS-BASIC was extensively tested under simulation on a PDP-10 long before it was ever run on an actual Altair.)

cj7hawk said:
For the moment, I'm just aiming at function. I do hit some problems, but the testing tends to pick it up pretty quickly.

Yeah, that's the slow and painful way to do it. Building a test suite seems like a lot of work (especially if you've not experienced with doing that sort of thing), but that work hits a break-even point very quickly (within days, or a couple of weeks at worst) and then lets you move much faster on development from that point on.

cj7hawk said:
[RunCPM] doesn't do what the label suggests it should and it doesn't compile under Windows 11 as per it's own instructions, so it's not fit for my purposes. I loaded the recommended TDM GCC and there's just too much missing from Windows 11 to compile it, and the instructions don't bridge the gap.

Well, I've not tried building or using it on Windows, but are you sure you really had TDM GCC and MINGW set up correctly? (Have you used these before?) And, if you insist on using Windows, why do a MINGW build, anyway? The instructions start with, "RunCPM builds natively on Visual Studio," which I understand is the standard tool for building native binaries under Windows.

cj7hawk said:
I looked around and found others had the same issues, then found some Windows binaries created by someone called Guidol who I've encountered on other forums, and I managed to get those to work (thank you Guidol), but file operation was *very* slow - way slower than my emulator for just basic file commands ( maybe intentionally? I don't know. It didn't seem quite slow enough for that ) then my program couldn't access the disk and I found it had issues with direct disk access, so I gave up on it for the moment.

I would suspect a bad build, or something else weird going wrong. I've not done a huge number of file operations under RunCPM yet, but from just a little playing about with it I did not notice any file speed problems with my Linux build. But I'll try to do some testing later.

cjs · Apr 11, 2024

I just consed up a quick, rough test of file I/O on RunCPM: copy a 512 KB file to a new file. This takes less than 100 ms on one core of a decade-old laptop (ThinkPad X230).

I've attached here the script and the log output.

Svenska · Apr 11, 2024

cj7hawk said:
Writing both in a high level language would have been impractical. They did that in the 80s. The result is a binary three times as big as what I presently have, for similar functionality.

Compiler technology has improved substantially since the early 80s, and so have the high-level languages themselves. Remember the video I linked you in the beginning? His ASM.COM reimplementation (done in C) is of comparable size, similar functionality and likely faster than DRI's implementation.

In assembly language, more advanced algorithms - such as hash tables or graphs - are much harder to develop and debug, so they are often avoided. Naive solutions are easy to understand, they rule the code. Doing so also avoids the advantages of better algorithms, and those savings can be substantial, in both binary size and speed.

cj7hawk said:
For the moment, I'm just aiming at function. I do hit some problems, but the testing tends to pick it up pretty quickly.

For trivial bugs, that is often the case, but it doesn't work for more complex failure modes.

cjs · Apr 11, 2024

cj7hawk said:
For the moment, I'm just aiming at function. I do hit some problems, but the testing tends to pick it up pretty quickly.

Svenska said:
For trivial bugs, that is often the case, but it doesn't work for more complex failure modes.

Also, what I've found particularly useful about having a good automated test suite is that it quickly and easily identifies regressions (i.e., things that used to be working that no longer are). My experience has been that this makes it much easier to fix bugs with confidence, refactor, and generally keep your code clean, since you don't have to worry about (re-)introducing bugs when changing and refactoring existing code.

Bruce Tomlin · Apr 15, 2024

cjs said:
Yeah, that's the slow and painful way to do it. Building a test suite seems like a lot of work (especially if you've not experienced with doing that sort of thing), but that work hits a break-even point very quickly (within days, or a couple of weeks at worst) and then lets you move much faster on development from that point on.

A test suite could be as simple as a few .asm files and .hex files generated from them. You then manually run a script that runs the current version and compares that it produces the same .hex files. In some ways it can be a rather leaky test, but you're still testing something, so it makes sure that the core functionality still works.

cj7hawk · Apr 15, 2024

Bruce Tomlin said:
A test suite could be as simple as a few .asm files and .hex files generated from them. You then manually run a script that runs the current version and compares that it produces the same .hex files. In some ways it can be a rather leaky test, but you're still testing something, so it makes sure that the core functionality still works.

I did do that at the beginning since I had to test instructions to make sure they were assembling correctly. I still use those files from time to time, but I don't automate that. I just do it in between major revisions since I'm still finalising the label manipulation directives, improving access to local variables, looking to add arguments to includes and finalising testing of the Macros...

Though I haven't had a moment to code it for a couple of weeks due to work commitments...

But more recently I'm testing assembler directives, so I didn't keep expanding those files, rather I just use the files I need. I found shorted files were easier to process. Also, when I do large debugs, then the debug file is proportional to the test ASM file - Another reason I want a z80 and Windows 10/11 version.

The architecture of the assembly file of the assembler means that once something is assembling an instruction correctly, it's unlikely to assemble it incorrectly without erroring-out. The structure of the lexical analyser and assembler is like that. And if I introduce something that messes with instructions (eg, order of execution ) it is going to show up on just about all test ASM files, not just an instructions specific one.

There are some quirks, but correct assembly of opcodes is not one of the issues I've had since locking that code down. And even if it's a single file, it's very modular and the few times I do introduce a new bug that I didn't notice, I can quickly go back to when it last tested correctly and work out where it went wrong.

I have tested it from time to time and the kinds of bugs I get are all around how it processes stuff, labels etc. Which makes sense since I use labels for everything, but even then, it fails hard with an error, rather than introducing something into the code.

I'll make some space on my desk in the coming weeks so I can test it on my Amstrad ( The test files are a bit big for the Osborne ) to make sure I haven't introduced any weird OS related bugs.

Though I suspect I'm going to need to write some very custom test files when it comes to writing the third version ( Updated Windows version ). Even if it's in basic, I took a very different path with the z80 version to how my original Windows 10/11 assembler went.

My current objective is to close out the "Feature Additions" to the z80 Assembler in May so I can start writing the instruction manual... It's gotten to the point that when I want to add a new feature, I go to implement it, and find I've already done it. While other stubs remain incomplete... Writing a single piece of code for too long is problematic. At least for me anyway.

cjs · Apr 15, 2024

You seem to be making unwarranted and very limited assumptions about what people are suggesting you test.

cj7hawk said:
I did do that at the beginning since I had to test instructions to make sure they were assembling correctly. I still use those files from time to time, but I don't automate that. I just do it in between major revisions since I'm still finalising the label manipulation directives, improving access to local variables, looking to add arguments to includes and finalising testing of the Macros...

So you're saying you feel the tests are only for instructions, and not for "label manipulation directives, ...access to local variables, ...arguments to includes and ...Macros", and that's why they're not useful? There's a very simple solution for this: build test files that test all those other things, too.

cj7hawk said:
But more recently I'm testing assembler directives, so I didn't keep expanding those files, rather I just use the files I need. I found shorted files were easier to process.

So make your automated tests process short files. There's nothing (except you) saying that you must have all your test code in one file.

cj7hawk said:
Also, when I do large debugs, then the debug file is proportional to the test ASM file....

In other words, it's small if you use small test files. Problem solved.

cj7hawk said:
Though I suspect I'm going to need to write some very custom test files when it comes to writing the third version...

I'm pretty sure that it's the assumption of all of us suggesting that you automate your testing that you will need some "very custom test files." I know that every project I've ever done has needed "very custom test files." It's a perfectly normal thing, not some special case.

cj7hawk said:
It's gotten to the point that when I want to add a new feature, I go to implement it, and find I've already done it. While other stubs remain incomplete...

This absolutely never happens to me. If I wonder if something is implemented, and exactly how it works, I just look at the test for it. If it's there, I see what's done. If it's not there, I know it's not done.

cj7hawk · Apr 15, 2024

cjs said:
You seem to be making unwarranted and very limited assumptions about what people are suggesting you test.

Actually I was getting the feeling that the impression I gave was that I didn't do testing, so thought I'd correct that to better inform suggestions.

cjs said:
So you're saying you feel the tests are only for instructions, and not for "label manipulation directives, ...access to local variables, ...arguments to includes and ...Macros", and that's why they're not useful? There's a very simple solution for this: build test files that test all those other things, too.

Not at all. I started writing a single set of tests, but over time I found that using a compartmentalised approach to testing was more helpful than broad testing. Also using code, whether written by me or others to test was also helpful, as was comparing binaries to those produced by other assemblers.

cjs said:
So make your automated tests process short files. There's nothing (except you) saying that you must have all your test code in one file.

True, and I still have some of those early test files, but because of the approach I never formalised them into a structure for testing. I can always go back and rewrite/adapt them, but if it gets in the way of completing the code to the first "checkpoint" then easier to leave that for later. Also, I don't feel the need to retest some of what was tested again because of modularised coding practices. Once the module is complete, if I don't change it, it should still work as designed.

cjs said:
In other words, it's small if you use small test files. Problem solved.

Yes. But that leads to a *lot* of test files, and lots of changes to test files, so I didn't always keep track of the final test files.

cjs said:
I'm pretty sure that it's the assumption of all of us suggesting that you automate your testing that you will need some "very custom test files." I know that every project I've ever done has needed "very custom test files." It's a perfectly normal thing, not some special case.

I don't disagree. It's not how I'd run a different project at work either. But then I can assign dedicated testers to those projects.

cjs said:
This absolutely never happens to me. If I wonder if something is implemented, and exactly how it works, I just look at the test for it. If it's there, I see what's done. If it's not there, I know it's not done.

I do go back and test other functions from time to time... But as I mentioned, only after completing a new module. Otherwise I find it gets in my way. So far, it hasn't resulted in any further bugs - the biggest issue I have had is allowing it to grow organically rather than planning the entire thing out. That means adapting previously written code rather than writing to a plan. That slows things down a lot, but like most fun projects, I'm winging it to a great extent.

When I'm writing a module, I only test the module until it's complete. Then I usually code-test with large assembly files. If that produces the correct binary, I go onto the next module. Sometimes a new module breaks older stuff, but that comes up in tests between modules. I find trying not to break the module makes it more difficult to debug.

It's not a great approach, but it's working well enough for the moment

whartung · Apr 16, 2024

To be honest, since the goal of writing the assembler is to write an assembler that can assemble the assembler, if he assembles the assembler successfully, there isn't any other test that really matters. Assemble the assembler, have that assembler assemble the assembler again, if the output files are identical, then "pass".

If somehow, someway, sometime, someone else tries to use the assembler and assembles something and it fails, they can open a ticket and now you have a new test case to check against.

Until then, its your project, your time, you have your own sense of its completeness, it's capability, and everything else. As long as those ring your "I'm happy with it bell", that's as far as it needs to go.

When I wrote my 6502 simulator, I managed to get it to run a publicly available program that, it said on the tin, test 6502s. It found some issues, I fixed them.

While I was writing the assembler for use with that 6502, it's primary criteria was to assemble the FigForth listing that was available.

Testing a buggy CPU against a buggy assembler while using a buggy web based simulator to check your findings is a special kind of journey.

But I managed to get all of the stars aligned, got Forth assembled with no errors, it ran, it did Forth things, and that was that. My assembler passed assembling my little 10 line programs, and the 4000 lines of FigForth. "It wasn't just good, it was good enough". And that's where it ended, didn't have much drive to do much else with it. No doubt demons may well lurk within that code, but I'm just not inclined to root them out.

Svenska · Apr 17, 2024

I found a good blog post evaluating a set of 6502 cross-assemblers at https://bumbershootsoft.wordpress.com/2016/01/31/a-tour-of-6502-cross-assemblers/. It might be a good read in order to understand how other people think about assemblers, and how they rank features.

No matter what you do, you won't ever properly test features you don't use yourself.

Beyond that, I agree with whartung. If assembling the assembler is the goal, then it is an important milestone. The video I linked to very early in this discussion worked the assembler until it could handle the full CP/M source code.

cj7hawk · Apr 17, 2024

Svenska said:
I found a good blog post evaluating a set of 6502 cross-assemblers at https://bumbershootsoft.wordpress.com/2016/01/31/a-tour-of-6502-cross-assemblers/. It might be a good read in order to understand how other people think about assemblers, and how they rank features.

No matter what you do, you won't ever properly test features you don't use yourself.

Beyond that, I agree with whartung. If assembling the assembler is the goal, then it is an important milestone. The video I linked to very early in this discussion worked the assembler until it could handle the full CP/M source code.

That's a great link, thank you. One of the best and exactly in line with some of what I was looking for ( and didn't previously find ). Even if it is for the 6502.

One of the difficult balance points in assembler design is memory. Cross-assembly doesn't have this limitation, which is why cross-assembly is so much more powerful than native assembly, and why I'm writing a pair.

I find it's interesting how, with cross-assembly, the system extends all the way up to handling programming your programming. A little beyond my objective, but impressive nonetheless and not something I'm including as an objective.

My original goal was assembling the CP/M source code. That involved translating the CP/M source to z80, then adjusting the syntax until it matched my assembler syntax, then I checked until I got near binary compatability with some CP/M binary images taken elsewhere, and manually checked they were all just vendor customsisation issues.

Then I needed it to assemble itself, which is more complicated, because I started to run into meta problems where the outcome of assembly begins to interfere with the original code base of the assembler on which it's being assembled. It's a Meta-problem. Once in a while I hit it, and have to correct it. But generally it's powerful enough to allow stuff like INC: DB 3,'INC' and then I can do things like LD DE,INC and it knows enough to realise this is valid and handle it correctly.

It gets more complex when I start declaring labels that refer to labels it creates to hold elements of assembly, because I can actually cause the assembler to break the "fourth wall" of assembly, and the act of assembling a file that is syntactically correct conflicts with existing data structures in the assembler itself due to collisions with system labels causing an unresolved chicken and egg problem - Which came first - the label in the ASM or the label in the ASSEMBLER?

This is driven by the use of a single variable system using labels, so declaring system labels that conflict with using system labels, which already exist within the assembler that is assembling itself causes a failure to assembler.

As an example, if "ARG1" is a system variable, it's also a label in the assembler once assembled. The cross assembler has no problem with it, but the native assembler already considers this label reserved, so reassigning it in the ASM file is like reusing the same variable in the same context - ie, I'm declaring a label that is already declared by the assembler. So I have to allow ARG to be defined at a meta level by only defining it as text in memory, manually creating the content of the next iteration of assembler and offset the issue it causes by referring to it as VARG1: ( or something similar that's not ARG1 ) to get around the self-assembly-destorying-it's-own-memory-issue.

I could also just suspend use of the system variables too and will make that possible before I finish the current development cycle, but it's nice to be able to retain and use system variables and it's only going to cause issues when the assembler assembles itself, it won't affect any other software. Hence why this was a goal one step beyond assembling CP/M. I do include z80 conversions of the original CP/M system with my current distro BTW, which it can assemble, which is important to me since I would want a user to be able to replace my OS with CP/M live, and customise either LokiOS or CP/M to their own liking... Well, the potential for them to do so - I doubt it will ever happen like that in this era.

whartung said:
Testing a buggy CPU against a buggy assembler while using a buggy web based simulator to check your findings is a special kind of journey.

Yes, this is 100% the case, and I was still debugging the emulator while debugging the OS, while writing the assembler to assemble everything, including itself in the future. I've hit many problems and bugs when I've asked myself,

* Is this the BASIC compiler I'm using? ( sometime it was ! )
* Is this a bug caused by Windows or my PC? ( sometimes it was this also ! )
* Is this the emulator?
* Is this the Cross-Assembler?
* Is this the OS?
* Is this the Assembler?
* Is this still compatible with CP/M when I fix the bug ( when I FIND the bug ) ?

Thank goodness bug entropy decreases as source entropy increases.

Svenska · Apr 19, 2024

cj7hawk said:
This is driven by the use of a single variable system using labels, so declaring system labels that conflict with using system labels, which already exist within the assembler that is assembling itself causes a failure to assembler.

If you are repeatedly hitting limitations and they annoy you, you should maybe consider doing something about them.

Scoping (or namespaces) are an issue which has been named before, especially in the macro context. I'm not suggesting that you should build a complex system, but a few flags marking symbols appropriately might be enough to resolve some of these issues. Not all assembler allow having a label "ADD", for example - which is stupid, in my opinion.

cj7hawk said:
Yes, this is 100% the case, and I was still debugging the emulator while debugging the OS, while writing the assembler to assemble everything, including itself in the future.

You are not the first person in the world to see a Z80 and you don't live in a vacuum, so you don't need to invent everything at the same time.

Nobody will hold a grudge if you use a modern IDE on a high-resolution, flicker-free color screen instead of Wordstar to write your code. Tools are available, feel free to use them. Even if you eventually will replace all of them for fun. Debugging an emulator is done far better using modern and proven test suites; debugging the assembler is much easier with a decent debugger.

cj7hawk · Apr 20, 2024

Svenska said:
If you are repeatedly hitting limitations and they annoy you, you should maybe consider doing something about them.

It's more that there's been lots of good ideas, and I'm working out which ones I can fit into the current architecture. Some I still like, but they aren't going to make the first cut, if ever. I've set myself an 11.75K hard limit on the assembler size, on disk, and I've just cracked 10K with the addition of sending the console output to a file, since exporting things like the entire label list is a big output.

But lately, it feels like I'm approaching a final specification for the design.

The biggest challenge though is that I didn't have this information when I started, so changing some things is essentially the same as rewriting the assembler and it's architecture from scratch. It's not impossible, but doesn't make sense to do when it now does everything I wanted and a whole bunch of stuff I didn't know I wanted when I set out.

Svenska said:
Scoping (or namespaces) are an issue which has been named before, especially in the macro context. I'm not suggesting that you should build a complex system, but a few flags marking symbols appropriately might be enough to resolve some of these issues. Not all assembler allow having a label "ADD", for example - which is stupid, in my opinion.

Actually I've never heard of an "ADD" function for labels? Can you tell me how labels can be added together?

Or do you mean an "ADD" label as in something like ADD: or EQU ADD,VALUE? An ADD label is possible on my assembler, but not an ADD macro since the ADD opcode would take preference in the parser. Well, it is possible to have an ADD macro too, but it would be impossible to call it.

Given it's a z80 assembler, you can imagine that I have labels already like A, HL, DE, BC etc as well as labels line INC, DEC, LD and all the other opcodes are labels. Only system variables are reserved, and even that can be bypassed when I finish the current objective list. I support some pretty crazy label names. The only exceptions are no spaces (though not always) and no operators. But labels can start with numbers, be numbers, and also include some non-alphanumerics. Underscore is an obvious one.

Maximum label length is 120 characters. It used to be 256, but I made it more practical. That's the buffer processing limit for anything that represents a single element in the source.

I've set up a flag system also for special labels to act as group markers with the capability to slot in and out functions, so an include could create it's own label space and turn off the other label spaces, but still recover it's own label space for Pass 2. Or it can delete it's label space between passes, or it can use temporary labels... Or it can just use global labels. Nothing is set in stone there, it's all up to the programmer and it's easy enough to mix and match. But reuse of the same label over and over is definitely possible.

The switch function creates a "group" of subsequently declared labels, and can have the label search bypass or include the group. Label groups can even be nested - eg, a group can turn on and off groups, even hiding groups from group commands to create group hierarchies. Though I'm still writing the supporting code for that. It's intended to allow included modules to have access to their own labels, always in memory.

Also, the final change to the current plans is to add in some minimal nesting capability to macros. They can't nest like includes, but I now want macros to be able to call macros rather than just chaining them. Fortunately there's not a lot of state to save when nesting macros which is unlike includes, so I only need to reserve a small amount of memory for nesting ( and it counts as "program memory" within my imposed limit ). This will make better use of local variables so that programmers can use macros they didn't write. Programmers can also *force* other includes to operate in local mode even if the include or macro wasn't designed that way. It's very flexible.

I settled on 9 post-call arguments per include or macro in the end, and the argument names are always assigned to the same labels (ie, ARGC, ARG1, ARG2 .... ARG9) the same just like they are in many shells, but the programmer can rename/reassign/declare them for a specific routine which makes them permanent and/or local to that routine. The arguments set by the last function ( either an include or a macro ) exist in the variable "ARGC" so code can quickly check that the right number of arguments exist or error and fail assembly. Speaking of which, I still need a "FATAL" opcode to trigger a fatal assembly error when that happens.

Groups and other label functions can also exist outside of includes and macros, so there's no reason individual routines can't be declared locally, and groups can be turned on and off at will by any code, which means hiding group names allows reuse of even group names from other group controllers within called subroutines.

Svenska said:
You are not the first person in the world to see a Z80 and you don't live in a vacuum, so you don't need to invent everything at the same time.

Nobody will hold a grudge if you use a modern IDE on a high-resolution, flicker-free color screen instead of Wordstar to write your code. Tools are available, feel free to use them. Even if you eventually will replace all of them for fun. Debugging an emulator is done far better using modern and proven test suites; debugging the assembler is much easier with a decent debugger.

That's why I want two assemblers - a paired z80 assembler and a Windows10/11/Linux/Mac compatible cross-assembler. I want to have my cake and eat it too. And I'll put more into the cross-assembling IDE than the z80 version has, but functionally, I want them both to use the same syntax and assemble the exact same code from source. A pair so I can assemble locally ( keep in mind I'm intending to support JIT assembly in the OS ) and still do my development on my PC without the memory limitations. The only time that won't work is when the z80 version runs out of memory, although as noted, the target architecture supports 1Mb of memory and even virtual memory, so even that limit may not be an issue in the long term.

I hit a problem with my original first-gen cross-assembler today. I wanted to do something like "BUFFER: BLOCK BUFFERSIZE+1" and it kept failing since the BLOCK command expects only a single number, not a formula. It works nicely under the new assembler, but the old cross-assemble while mostly compatible doesn't recognize mathematical declarations in the same way as the z80 assembler does. It has a very rigid data structure, while the new z80 assembler treats things in a more intuitive way. If a number if expected, then all that matters is that a number is provided. HOW it is provided doesn't matter. It can be a specifc number, a label, an operator or a long formular involving combinations of all of these, which is evaluated on assembly. When I write the new cross-assembler to be compatible with the z80 assembler that limitation will go away, though both will be backwardly compatible to the original cross-assembler.

One of the best things that came out of this thread is the includes and macros and especially the binary includes. I was wondering how to create extensive system functions without using up the RSTs, interrupts or other non-dedicated zero page hooks. Now I can just have a SYSTEM.ASM file on my boot disk, and code can INCLUDE "SYSTEM.ASM" and it will define a bunch of macros to do common system tasks, eg, install drivers, hook interrupts, access extended code space beyond 64K, identify code blocks in the Memory management units, etc. And I can create another called "GRAPHICS.ASM" with graphics extensions. And another "CALCULAT.ASM" for Maths Functions. That lets me predefine all the system functions in an editable, modifiable way without imposing any of it on the programmer as a requirement.

This is important since the target architecture of the system this is designed for has a unified memory/disk architecture to allow all memory use to show up on M: and memory blocks need not be contiguous. So there needs to be a way to address memory blocks directly and page in/out parts of a file directly, which means having routines to do that in memory just like random access to a file is important - though it's also possible just to use random access through the BDOS to achieve the same affect, but blocks in memory are 4K each, so being able to page in Block-N from File <X> to memory-page Y can be turned into a macro, extending the z80 architecture to some extent.

It seems quite an elegant, modular, approach to the problem.

I may not need to invent everything at the same time, but I got a huge number of ideas from this thread, so I'm grateful. It would have been nice to have them all before I started, but that's part of the learning curve.

cjs · Apr 20, 2024

cj7hawk said:
An ADD label is possible on my assembler, but not an ADD macro since the ADD opcode would take preference in the parser. Well, it is possible to have an ADD macro too, but it would be impossible to call it.

I would suggest that you want macros to take preference over internal mnemonics. There can be good reasons for wanting to "replace" a mnemonic, and allowing the user to do that does not remove other functionality, since doing so is completely optional.

cj7hawk said:
I have labels already like A, HL, DE, BC etc as well as labels line INC, DEC, LD and all the other opcodes are labels.

This sounds awkward and unnecessary. Unless I've missed something about Z80 assembly language you can always tell from context whether an instruction, location or a register name is required for any particular value, so there's no need to have instruction mnemonics or register names taking up label space or interfering with user labels.

cj7hawk said:
But labels can start with numbers, be numbers...

How do you distinguish between 12 as a number indicating a location and as a label defined as, e.g., 12 equ 456?

cj7hawk said:
, and also include some non-alphanumerics. Underscore is an obvious one.

There are few assemblers out there that do not allow any non-alphanumerics, and I doubt that there are any for microprocessors that do not allow underscore and several others.

Have you defined character sets for your assemblers. In particular, I would expect a cross assembler running on a modern system to accept Unicode (typically UTF-8) input, and correctly determine which non-ASCII characters are alphabetic. I don't know if I've ever used accented characters in labels (e.g., ârret), but I've definitely used Greek letters, and have regularly wanted to use the prime characters.

cj7hawk said:
I've set up a flag system also for special labels to act as group markers with the capability to slot in and out functions, so an include could create it's own label space and turn off the other label spaces, but still recover it's own label space for Pass 2.

This sounds overly complex and confusing compared to just having scopes and simple scoping rules.

cj7hawk said:
Also, the final change to the current plans is to add in some minimal nesting capability to macros.

Nesting macros is pretty much a basic requirement of any modern macro-assembler.

cj7hawk said:
I settled on 9 post-call arguments per include or macro in the end, and the argument names are always assigned to the same labels (ie, ARGC, ARG1, ARG2 .... ARG9) the same just like they are in many shells, but the programmer can rename/reassign/declare them for a specific routine which makes them permanent and/or local to that routine.

That sounds confusing, too. Why not just do what any sensible programming language does and have the user provide a formal parameter list for the macro or function, and to these symbols bind the argument values supplied at the call site?

cj7hawk said:
...but the old cross-assemble while mostly compatible doesn't recognize mathematical declarations in the same way as the z80 assembler does.

As we all said: if you're writing two separate versions of the assembler, they will be different assemblers.

cj7hawk said:
I was wondering how to create extensive system functions without using up the RSTs, interrupts or other non-dedicated zero page hooks. Now I can just have a SYSTEM.ASM file on my boot disk, and code can INCLUDE "SYSTEM.ASM" and it will define a bunch of macros...

Yes, this is a very common thing to do. It's also normal in these files to use conditional assembly to allow the client (the program including SYSTEM.ASM or whatever) to configure exactly what the included file will generate.

Writing Assemblers... What should a good assembler do?

Experienced Member

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Veteran Member

Experienced Member

Experienced Member

Attachments

Veteran Member

Experienced Member

Experienced Member

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Experienced Member