So when the first instruction is 31 xx xx, it's a what? (e.g. DRI ASM, GENHEX, etc.)--or 01 xx xx (DDT/ZSID). C3 is very common, particularly when initialization code is at the high addresses and will be overlaid (e.g. EBAS). A relative jump at the beginning is only useful when the jump distance is short.
A program can use the flags register (slightly different behavior between 8080 and Z80) to determine what the host CPU is, but few do. Many just use 8080 code to be safe.
01 xx xx = 'LXI B,xxxx' (or 'LD BC,xxxx' in Z80 syntax). On x86 it would be 'ADD ew,rw' and not be a valid starting instruction since the register contents are undefined.
If the first byte is 31h (or 2Ah), you look at the second byte to see if it is clearing a register using XOR/SUB:
11:000:000 (C0h) XOR AX,AX / SUB AL,AL
11:001:001 (C9h) XOR CX,CX / SUB CL,CL
11:010:010 (D2h) XOR DX,DX / SUB DL,DL
11:011:011 (DBh) XOR BX,BX / SUB BL,BL
11:100:100 (E4h) XOR SP,SP / SUB AH,AH
11:101:101 (EDh) XOR BP,BP / SUB CH,CH
11:110:110 (F6h) XOR SI,SI / SUB DH,DH
11:111:111 (FFh) XOR DI,DI / SUB BH,BH
If it's not one of these, it should clearly be 8080 code ('LXI SP,xxxx' or 'LHLD xxxx'). That's 8 possibilities out of 256 for a mis-identification.
Yes, it is not entirely perfect in theory, but good enough in practice, especially compared to OP's plan of interpreting all .COM files as 8080 code.
edit:
Since we're discussing "interesting" starting bytes of CP/M programs, SLR Systems' Z80ASM.COM starts with EB 18 5F (and a religious message). On x86, that will jump forward to another jump, which leads to printing a message saying "CP/M emulator required" and exiting. On the Z80 the first byte is a "garbage" instruction (EX DE,HL) followed by a short jump, to another regular jump, to the actual entry point. There is no consideration for what happens on an 8080 at all.
A more elegant way I've thought of doing CPU detection at the start of a .COM file would be 81 C3 xx xx.
x80: ADD C ; JMP xxxx
x86: ADD BX,xxxx
Last edited: