Learning to dissasemble z80

kb2syd · May 3, 2024

I'm looking at some existing z80 code. Trying to take it apart and learn more about what it is doing. How do you figure out the starting offset? This is a file that has been analyzed by other but I'm trying to learn.

cjs · May 3, 2024

Generally you have to figure it out from context. If it's a CP/M .COM file, it almost invariably starts at $100. If it's a ROM image, you look at the address decoding circuits that enable the ROM and that tells you the address at which the ROM is mapped.

If you have absolutely no information at all about the code, you need to look around for absolute addresses that are arguments of jp instructions, loads of table data in the code, and similar, and try different offsets until those start to make sense.

For this particular binary, there's a whole bunch of textual information in it:

Code:

00000230: 00 00 00 00 c3 7e 40 5b 5a 38 30 20 43 6f 6e 74  .....~@[Z80 Cont
00000240: 72 6f 6c 20 53 79 73 74 65 6d 20 20 20 56 65 72  rol System   Ver
00000250: 73 69 6f 6e 20 20 33 28 31 33 31 29 20 20 31 30  sion  3(131)  10
00000260: 2d 4d 61 72 2d 38 38 20 2d 20 2f 4d 5d 20 20 20  -Mar-88 - /M]   
00000270: 20 20 00 43 6f 70 79 72 69 67 68 74 20 31 39 38    .Copyright 198
00000280: 34 2c 38 35 2c 38 36 2c 38 37 20 54 61 6e 64 79  4,85,86,87 Tandy
00000290: 20 43 6f 72 70 6f 72 61 74 69 6f 6e 0a 41 6c 6c   Corporation.All

Starting with some web searches for more information on that and how it was loaded into a Tandy Z80 machine will probably get you the starting address.

lowen · May 3, 2024

Ah, z80ctl for the MUX boards....

So, my analysis would start at the boot.

Reconstructed Model II ROM source in several versions can be found at https://electrickery.nl/comp/trs80m2/trs80m2boot.zip which you probably already have.

Reading this source, for hard disk boot, 512-byte sectors are loaded from the disk starting at head 0 cylinder 0 sector 0 and going for at least 9 sectors, loaded under the ROM into RAM beginning at 0000H until a signature of

Code:

/* END BOOT */

is found.This binary does not contain that string, so it's not the actual boot track. If this is a hard disk boot, you'll need the dump the boot track up to the signature to see where the z80ctl is being loaded.

Floppy disk boot loads differently, beginning at address 0E00H, loading 128 bytes per sector (single-density), 26 sectors (full track), looking for a dual signature of

Code:

BOOT

at 1000H and

Code:

DIAG

at 1400H; the routine found at the end of the second signature is CALLed, and then a JP to the routine at the end of the first signature is made. Yeah, a bit convoluted.

The boot track isn't z80ctl; z80ctl is loaded by the boot.

Looking through some files that I have, I believe z80ctl will load to 4000H. So you would start the disassembly there. This being z80ctl, you're very right that analysis has been done elsewhere. But doing it yourself is always fun. This is a large program, and the assembly source is almost certainly in several source modules. Because of this, strings and such belonging to modules will be with those modules and thus scattered through the binary.

Good luck!

cjs · May 3, 2024

lowen said:
Because of this, strings and such belonging to modules will be with those modules and thus scattered through the binary.

Ah, right; I recently had a worst-case experience with this disassembling the Supersoft CPU Diagnostics, which extensively use a print routine that takes its argument in memory immediately after the call. That involved creating well over a hundred z80dasm block entries manually, and was a miserable experience.

This caused me to ask about for a tracing disassembler for Z80, which didn't produce much in the way of results, but did show up a project new to me, Reko, that I'll be trying out for my next Z80 disassembly project.

BTW, I find the best way to handle disassembly is to get through the "identify data areas" part as quickly as possible, perhaps adding a few annotations beyond that where convenient, and get straight on to assembling your own source code and just comparing the output with the binary. That, for example was what I did for the Supersoft diagnostics linked above. You can from the Git history that I had to update the block map and re-disassemble a few times after I'd moved to doing assembly of my own code, copying in to my code a bit of the newly disassembled code. That was a pain, but I think less painful than the alternate approach, which is to try to do as much as you can with annotation files.

You can see an example of the annotation file approach in my JR-200 BIOS disassembly. This uses f9dasm, which has a relatively good annotation format, but even there, it's a pain in the you-know-what to be editing that rather than source code, especially where formatting is concerned. z80 dasm has basically no annotations at all beyond marking data blocks (which still get spit out as one byte per line, ugh!) and adding symbols, so I just use a sed script to add comments, etc., where necessary; you can see an example of this in my PC-8031-2W disassembly (the annotation files are under the info/ directory).

Chuck(G) · May 3, 2024

If you're looking for a pro-level disassembler, you may want to consider IDA Pro. It's not cheap but is very good.

lowen · May 3, 2024

Chuck(G) said:
If you're looking for a pro-level disassembler, you may want to consider IDA Pro. It's not cheap but is very good.

IDA is an excellent disassembler.

The NSA's open source and cross-platform Ghidra reverse engineering tool supports the Z80, among many others, including 8080, 8085, 6502, and 68000. Of course x86 is supported.

cjs said:
Ah, right; I recently had a worst-case experience with this disassembling the Supersoft CPU Diagnostics, which extensively use a print routine that takes its argument in memory immediately after the call. That involved creating well over a hundred z80dasm block entries manually, and was a miserable experience.

This technique was fairly common back in the day.

I wrote my own disassembler back in high school to disassemble the TRS-80's TRSDOS; it did what I wanted it to do, but it wasn't a tracing disassembler by any stretch, and there were much better disassemblers. But I learned a lot writing it.

Chuck(G) · May 4, 2024

lowen said:
This technique was fairly common back in the day.

It was common even in the x80 days, particularly when ROM code was involved.
Another gotcha from the old days, is "plugging" code in RAM. The idea is this:
instead of this:

Code:

    push    psw
    lda     item
    mov     c,a
    pop     psw
...
item  db  ....

this:

Code:

    mvi     c,0
item equ $-1

Where speed and space is important, you does what ya hafta. But it is confusing to someone disassembling your code.

One of the reasons that I have a soft spot for Harvard-architecture MCUs.

Chuck(G) · May 4, 2024

One of the more popular disassemblers in the x80 CP/M days was Dazzlestar. Still have a copy somewhere.

Looks like someone has re-implemented it: https://github.com/durgadas311/dazzlestar Wonder who that could be?

cjs · May 4, 2024

Chuck(G) said:
Another gotcha from the old days, is "plugging" code in RAM.

I more commonly hear this called "self-modifying code." And for the simple things it's often used for, I've never found it to be any big deal at all; it's usually quite obvious where and why it's being used.

The two most common cases I can think of are for I/O to arbitrary ports on 8080/8085 and for a massive increase in speed and ease of use for certain sorts of copy routines on the 6502.

Bruce Tomlin · May 4, 2024

I don't think it's CP/M, since it has a weird header with legalese that says "Tandy" in it. It also has a 1988 date, a bit late even for Tandy to be doing things with CP/M. I'm expecting it to be in one of those multi-record TRSDOS formats, but it's not obviously in the Model I/III .CMD format. You can have the best disassembler in the world, but it won't help if your binary is in a weird format that isn't a straight dump. But it's obviously Z80 code, plenty of C3 xx xx in there. Hmm, interesting that one of the strings is "68k crashed", could this be from a Model 16 or 6000?

I'm going to use my own disassembler on it (see sig) once I figure out what the heck this even is. If it really is in a multi-record format, I should see the gaps as glitches and misalignments. Then I would have to make a stripped down binary from it.

cjs said:
Ah, right; I recently had a worst-case experience with this disassembling the Supersoft CPU Diagnostics, which extensively use a print routine that takes its argument in memory immediately after the call. That involved creating well over a hundred z80dasm block entries manually, and was a miserable experience.

That's why my disassembler is a manual tracing disassembler. I can just go search for "CALL Lxxxx" and manually touch up each of those. I'm making no attempt at a stuffy calls tree, I just keep track of what each byte of the source binary needs to be.

One of my dream disassembler features would be a list of "subroutine 1234 always has weird crap after it", but usually each one is so bespoke that all you can do is just have it stop disasembly. But RST instructions done that way usually have a fixed number of bytes after, and are quite common, so I actually have handling for those.

Bruce Tomlin · May 4, 2024

It turns out that it really was straight binary after offset 0x234 = 0x4000. Here's a rough (by my standards) disassembly. Pretty good for one hour. Now to go find breakfast.

lowen · May 4, 2024

Bruce Tomlin said:
.... But it's obviously Z80 code, plenty of C3 xx xx in there. Hmm, interesting that one of the strings is "68k crashed", could this be from a Model 16 or 6000?

I'm going to use my own disassembler on it (see sig) once I figure out what the heck this even is.

This binary is the z80ctl program for Tandy's Xenix System III for the Tandy 6000. This is the code that turns the Model II/12 Z80 computer into an I/O coprocessor for the 68000 CPU. This particular binary supports the so-called 'mux' boards, multiplier serial cards. There were two different multiport serial cards for the Model II series computers. One version of the z80ctl supports the three port cards, and the other version supports the four port. They cannot coexist due to not enough memory for both drivers in z89ctl.

Frank Durda IV was the primary programmer working on this code, even as late as 1993, although a new group of highly capable individuals collectively known as Tandy Emeritus is now maintaining this code. Frank's initials 'FDIV' are in this code in plaintext.

Bruce Tomlin said:
That's why my disassembler is a manual tracing disassembler. I can just go search for "CALL Lxxxx" and manually touch up each of those.

Interesting. Back in the day I used several disassemblers, most notably DIS'N'DATA, once I got a copy. My own disassembler was primitive but fast, and DIS'N'DATA was slower but more thorough.

I've not really done any heavy duty disassembly in a long time, but even with my sporadic usage I've found that no one disassembler does everything I want.

That's part of the fun.

But this particular binary is pretty complex and uses tamper detection, even, with checksums and even obfuscation in places. It also interfaces with the Xenix kernel drivers running on the 68000. It's a great binary to learn from!

But what we're missing, to get what format the binary is in, is the bootloader. This binary could be booted with floppy or hard disk, by the way. The bootloader would load this z80ctl binary in, as one option during boot (the typical second option was the diskutil program for preparing disks for usage). Any formatting of any disk had to be done by the Z80 since the 68000 didn't have direct access to hardware, and then there's the pesky memory size limitation. So you needed to boot diskutil.

cjs · May 4, 2024

Bruce Tomlin said:
One of my dream disassembler features would be a list of "subroutine 1234 always has weird crap after it", but usually each one is so bespoke that all you can do is just have it stop disasembly. But RST instructions done that way usually have a fixed number of bytes after, and are quite common, so I actually have handling for those.

I would go with extending that just slightly: allow marking any given RST or CALL to a particular address as, "data after RST/CALL, terminated with $nn." (Or fixed length of n.) That would cover pretty much everything I've seen.

Thanks for the reminder about your disassembler; I still need to have a look at it. I think the one thing that put me off it last time I had a look was that I can't just edit a file with Vim and re-run it; I need to use a whole different set of keystrokes, etc. And it wasn't clear to me if binfile.ctl is something that can be reasonably committed to a VCS and diffed.

Bruce Tomlin · May 4, 2024

cjs said:
That would cover pretty much everything I've seen.

You've not seen the more complicated stuff that I've seen, heh. But at the very least, "terminated with bit 7 set" is also rather common. The main problem is the UI necessary to specify all of those call addresses. It's easier for me to simply deal with them manually than to write code to do it. Everything is at the level of "what this byte does", not trying to make an enormous tree of all the calls.

And no, the .ctl file is very binary, in an IFF sort of layout. The core is two byte arrays that describe each byte of the binary. That would be a 30x mess in any text-based format. I decided that just having it survive adding new instruction sets would be plenty. This is meant to be used more like a hand tool vs a power tool, and that would be sort of like putting your hammers and saws into a CAD program.

It's the result of having myself use my previous disassembler (which strictly used a hand-entered text file) for over a decade. Trust me, it was quite user-hostile. But yes, you could have put its control files into version control.

cjs · May 4, 2024

Bruce Tomlin said:
You've not seen the more complicated stuff that I've seen, heh. But at the very least, "terminated with bit 7 set" is also rather common.

Ah, well that's easily enough to deal with: just add a mask for the termination byte as well.

Bruce Tomlin said:
The main problem is the UI necessary to specify all of those call addresses.

Seems pretty simple to me! :-)

Code:

data call-suffix $1234 term $80 mask $80 dbtype char

It's too bad about the binary format for the annotation data. I don't mind a visual interface to a disassembler, but my ideal one would let me update the annotation file with an editor and then detect the changed file, reload and redisplay, and regenerate the output file, so I could work in both the visual interface and in my editor simultaneously. (I consider a disassembler annotation to be essentially a program that converts a binary file into source, obviously in a domain-specific language.)

Bruce Tomlin · May 4, 2024

That's actually why I added comments, that way I can at least do some annotation before I have to let go and create proper .ASM source.

Chuck(G) · May 4, 2024

cjs said:
The two most common cases I can think of are for I/O to arbitrary ports on 8080/8085 and for a massive increase in speed and ease of use for certain sorts of copy routines on the 6502.

It was the only way to modify addresses on many first- and second-generation CPUs.

whartung · May 4, 2024

cjs said:
This caused me to ask about for a tracing disassembler for Z80,

Funny, I would think a tracing disassembler wouldn't have much luck with arguments after the call, since it would be the routine that, I guess, tweaks the PC on the stack for when it returns from the subroutine, not necessarily something the disassembler could see.

Maybe with a peephole checker it could look for that kind of behavior and special case it.

Chuck(G) · May 4, 2024

whartung said:
Funny, I would think a tracing disassembler wouldn't have much luck with arguments after the call,

Particularly in ROM code, it's very convenient, not just for message display, but tables can also follow the call (e.g. a computed GOTO). The advantage is that no registers are destroyed (any saving needed is left to the called routine. In 8080 code, the called routine can use the XTHL instruction to swap the HL pair with what's pointed to by SP, incrementing HL through the literal, then XTHL back and return. A similar scheme applies in x86 code. In the case of the 8080, it's also self-relocating, although the location of the servicing routine is not, obviously.
And if an RST/INT instruction is used instead of a call, the resulting code is very compact indeed.

An interesting twist on this is with the Harvard-architecture lower PIC MCUs. If one wants to implement a lookup table in flash, it can be interesting. The solution is to use a table of RETLW instructions, which return, specified by an operand, a value to the W register. To do a table lookup, you do a computed CALL to a table of RETLW instructions. (Instruction words on,say, the PIC 12 are 14 bits wide, but data/RAM is 8 bits wide).

Dwight Elvey · May 4, 2024

I've not done a Z80 but I usually just look for absolute jump instructions first. Most code rarely jumps out of its own memory. Look for the possible target locations. Look to see if the target location makes sense. In other words that it doesn't look like the middle of an instruction. Another is to look for call instructions. Look again at possible target locations. These will usually be preceded by a return or a jump instruction.
Most code will begin with xx00 or X000.
Just my thoughts.

VCF West	Aug 01 - 02 2025,	CHM, Mountain View, CA
VCF Midwest	Sep 13 - 14 2025,	Schaumburg, IL
VCF Montreal	Jan 24 - 25, 2026,	RMC Saint Jean, Montreal, Canada
VCF SoCal	Feb 14 - 15, 2026,	Hotel Fera, Orange CA
VCF Southwest	May 29 - 31, 2026,	Westin Dallas Fort Worth Airport
VCF Southeast	June, 2026	Atlanta, GA

Learning to dissasemble z80

Veteran Member

Attachments

Veteran Member

Veteran Member

Veteran Member

25k Member

Veteran Member

25k Member

25k Member

Veteran Member

Veteran Member

Veteran Member

Attachments

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

25k Member

Veteran Member

25k Member

Veteran Member