For those following my original question, I went with the Hand Coding architectural approach with some table driven element where common tables existed. That was the kind of information I was originally looking for.
I found the kind of book I was looking for in the end- though it's more around compilers and is very generic.
https://dl.acm.org/doi/pdf/10.5555/2737838 (Engineering a Compiler) - But it did discuss formal approaches which are relevant to both assemblers and compilers.
I went with a hand-coded approach with some direct coding and some table use, aiming at small size and efficiency as the objective.
The approach I used has a simple lexer with bare minimum rules to separate tokens for the program to use. This has two modes - Whitespace Sensitive and Whitespace Insensitive, though the same code is otherwise used for both routines and only a switch changes the operation of the routine.
This then drops the removed tokens and other objects into a buffer which provides a whitespace free and consisten token for matching, and an operator if one was found. It doesn't care what was found, it only performs this function, with the exception that special operators are recognized, but do not otherwise get recorded in either the buffer or the operator field.
Token Matching could have gone Tables or Hand Coded, but given a match is little more than a field scanning exercise, which returns zero set for a match, and matching is performed outside of initial token validation, it made more sense to go with a hand coded approach since otherwise table result matching would still have been required.
So once the initial command is pulled from the lexer, then the pass routine calls a "scanning routine" which itself is just a series of calls, followed by a hard-stop error generator. This works well for a fixed finite series of expressions, as z80 assembly requires. To simply the explanation, Pass1 calls Test-Instructions, which calls all the tests in turn, then errors and bombs out.
Something like this;
CALL TEST_LD
CALL TEST_INC
CALL TEST_DEC
;etc
JP ERROR_OUT
This reduces a test to a combination of a DB and a Test matching loop - eg,
LD: DB 2,'LD' ;( Table Component )
TESTLD:
LD DE,LD
CALL TESTLOOP ;(Matching test - tests the buffer content and length against the stored example at the initial label. )
RET NZ
;Code here for what to do with LD
POP HL ;(Dispose of the ret to the scanning routine.)
RET ;(And return to the main program)
In the end, this was the most efficient structure I could find to break apart the tokens from the lexer and seemed to use the least code for any data structure analysis routine I could think of.
There's some tables where they are relevant, for example, the B,C,D,E,H,L,(HL),A table and the BC,DE,HL,AF/SP table and these work just like TESTLOOP but with tables and return a value that I can splice into a root instructions. So for example, it calls the 8 bit register checks, then the 16 bit register checks, then it goes looking for things like brackets, then finally assumes it's either a number or a label or an error. Otherwise, there's a routine for every step of every instruction which tests syntax and operator state at each level.
z80 assembly is a mess. Some instructions have a comma, some don't, and some can go either way, which really taxes this approach, as it makes it difficult without a table to cleanly address, and I had to make exceptions in the code to deal with it. Which means complicated and messy code.
The stuff I didn't handle well was in the brackets - eg, (IX+d) or (NNNN) or similar, since it breaks the code structure. Other languages such as Intel have much nicer syntax for indirect modes.... Though IX/IY processing turned out to be surprisingly easy with this approach and were very code-efficient.
Errors are handled in an unstructured way due to unpredictable path to find an error, so the error handler restore the stack to the program-entry default, generates the message and drops straight back into the OS. Later I'll prove switch options to just keep searching the code for more errors instead of bombing out.
Overall, it's a very simple assembler architecture... The interesting thing is that a I'm writing it, I'm finding a real need for some of it's capabilities - eg, Conditional Assembly, Macro's etc. Which my original cross-assembler doesn't support. It's strange how I never needed those functions before I wrote this. Anyway, at a high level it looks like this;
I thought I'd share since it's an interesting way to build an assembler for z80 under CP/M... Though I'm going to need some comparable code samples and existing CP/M assemblers to test it for speed comparison purposes. If anyone has any ideas, I'd welcome them. I'm not really sure how fast an assembler should be. My hazy memory tells me they weren't that fast back in the 80s though.
David