• Please review our updated Terms and Rules here

Assemblers, Origins, Offsets and Targets... Offset Assembly question.

cj7hawk

Veteran Member
Joined
Jan 25, 2022
Messages
1,997
Location
Perth, Western Australia.
So I'm looking to ask about consensus or even what people think should happen when assembling code for one location, and writing it to another place in memory.

I'm just updating my assembler, and presently I use the ORG to define where the PC should be, and OFFSET that can be added to PC which defines where in memory it gets written.

So far, so good, so if I did,

.ORG $0100
.OFFSET $0100
HERE:
JP HERE

I would see the following code generated for memory location $0200 as
0200 HERE:
0200 C30001 JP HERE
0203

OK, so far, so good I hope - You can see from the above, in Z80, that the code thinks it's at $0100 and it assembles at $0200

ie, the HEX for the output would be:
:03020000C3000135
And the value for HERE would be $0100 and the disassembly would show JMP $0100

However while that's easy to understand, I want to add a third directive, "TARGET" to calculate what the offset should be, without forcing the programmer to use a macro or calculation each time. This is because code that will be relocated doesn't occur at nice segments. It can start anywhere and needs to be relocated to a specific location. So using "TARGET" is easier than using something like "OFFSET $0200-PCORG ; Subtract the current PC from the desired location to assemble to ).

Here's the problem I'm looking for help with.
Do I change OFFSET or the PC (ORG) with the directive?

ORG (Origin) generally means where we're writing in memory, so one line of thinking says to achieve the above, I should do
.ORG $0200
.TARGET $0100

The benefit of this way, is that ORG maintains it's primary meaning, and the target for jumps, would be automatically offset so that code compiled at the currently selected memory location can be relocated to the target location later and will work correctly there.

However ORG is so strongly connected to assembly programming as where the Program Counter is looking to be, it might make more sense to do something like;
.ORG $0100
.TARGET $0200

Ie, Target is now where it should assemble... Org is where the PC is, and target now refers to the pre-relocated memory location - ie, where we want to actually assemble to, while ORG is where it will end up post-copy-and-relocation once it executes.

The internal variables are the same in both cases, and the .HEX file generated would be the same in both cases. IE,
:03020000C3000135

The question then is which way makes more sense to assembler programmers?

Way 1.
.TARGET means where the code will be *relocated to" and follows what the Program Counter would be expected to think.
.ORG means where the code will be initially assembled to, and likely where it will be loaded to from the disk.

or

Way 2.
.ORG means where the code will be *relocated to" and follows what the Program Counter would be expected to think.
.TARGET means where the code will be initially assembled to, and likely where it will be loaded to from the disk.

Thanks for that - looking for opinions, preferences, the way you would naturally think it should work, or how some other assembler handles the same situation.

All input appreciated.

David
 
That's what an "RORG" is for. Here's how I described it for my assembler:
RORG
Sets the relocated origin address of the following code. Code
in the object file still goes to the same addresses that follow
the previous ORG, but labels and branches are handled as though
the code were being assembled starting at the RORG address.

REND
Ends an RORG block. A label in front of REND receives the relocated
address + 1 of the last relocated byte in the RORG / REND block.

So the regular ORG address is where stuff goes in the object code, but RORG tells it to pretend that the current address for code generation is different.
 
That's what an "RORG" is for. Here's how I described it for my assembler:


So the regular ORG address is where stuff goes in the object code, but RORG tells it to pretend that the current address for code generation is different.

Thanks @Bruce Tomlin , as I don't generate Object code outside of the .HEX format, just binary, do I understand this correctly ?

.ORG is the MEMORY location of initial loading - eg, if I .ORG $0200, then the assembler will create bytes starting at $0200 no matter what address the instructions think they are writing to.

But if I .RORG $0100 subsequently, then I'm generating code to be relocated to $0100 later - hence I'm changing the assembler "PC" from what .ORG assumes to .RORG and setting the offset so the code is generated at the .ORG location ( plus any intermediate steps. ) - In this case, the offset would be $0100 also so that code would be generated at $0200

That brings up an obvious question though - What is done to discontinue the .RORG addressing mode in that context?

Previously, I'd just set OFFSET to zero, and it would continue generating contiguous code - very simple. But if I use .RORG, then the defacto assembled code location will move as it adds bytes to the relocated-ORG address.

What function did .RORG use to drop the offset back to zero when the relocatable section of code was complete? ie, What command turns it off?

eg. ( If I understand .RORG correctly )

.ORG $0200 ; Code will start at $200
.RORG $0100 ; Code will be relocated to $100 after execution, so use this for positional labels.
HERE:
JP HERE ; This should write3 bytes, so I need a command to set the .RORG to $0203 for the next byte written. Except I normally wouldn't know the address.
.?????? ; ( What command would look at the target address, and change the PC in the assembler to $0203 for the next byte?
NOP ; This byte should be written at $0203
.END

Should produce the following binary.

0200 HERE:
0200 C30001 JP HERE <--- Jump goes to 0100, not 0200.
0203 00 NOP

My thinking is that a system variable would be required to address the issue - eg, in my current assembler, it would look something like this;

.RORG ORGPC+PCOFF

Which would cancel out the offset, but requires a formula to be known... The main issue being the handling of the "PC" inside the assembler, and how it relates to .ORG commands.

Normally, I'd just cancel it out with something like

.OFFSET $0000 ; Cancel relocatable code offset

But it seems that with the .RORG syntax, it's a bit more complex - eg,

.ORG - Sets the PC, and offset to $0
.RORG - Sets the PC, and changes the offset to match what the previous .ORG would have set.

Thanks,
David
 
The CDC COMPASS assembler maintained three counters--the location counter, the origin counter and the position counter. See Section 2.3 (PDF page 8) in this old manual from 1969. The old mainframe assemblers had a lot of good ideas--because much, if not all, operating system software was written in assembly.
 
That brings up an obvious question though - What is done to discontinue the .RORG addressing mode in that context?
Click to expand my quote block... it says REND right there.

Anyhow, the real fun begins when you both RORG and switch CPU type because of code that you're downloading to a second CPU, and you don't need another source file to do that.
 
The same thing proved 'amusing' when testing my Cromemco S-100 system out with the Z80/68K dual processor...

That is what comments are for of course :)!

The system starts off with the Z80 running and the Z80 memory image contains bootable/executable code for both the Z80 and the 68K (including any relocating offset that may apply). It took a few attempts to get it right!

Dave
 
For the z80, two counters should be enough, but for simplicity and reliability, I use one counter and one offset, and achieve three
Click to expand my quote block... it says REND right there.

Anyhow, the real fun begins when you both RORG and switch CPU type because of code that you're downloading to a second CPU, and you don't need another source file to do that.

Yeah, I completely missed that the quote block wasn't expanded, and then after using .RORG, I settled on REND anyway when I had to think up a name, without even knowing I had missed it. I guess that's a pretty normal directive name to settle on once you have .RORG.

So I've updated my code last night and added .RORG and .REND - I left .TARGET and .OFFSET in there anyway, since they might find alternative uses to cover any RORG gaps. And of course, .ORG works the same.

It was helpful - :) even if I didn't read the entire answer, it set me on the same path of thinking.
 
The same thing proved 'amusing' when testing my Cromemco S-100 system out with the Z80/68K dual processor...

That is what comments are for of course :)!

The system starts off with the Z80 running and the Z80 memory image contains bootable/executable code for both the Z80 and the 68K (including any relocating offset that may apply). It took a few attempts to get it right!

Dave

So did you use two assemblers, or one assembler that understands both processors?
 
If you have a decent macro assembler, you don't need two assemblers.

I can create new instructions with different opcodes with my assembler also, but I'm not entirely sure that's practical and there's always the possibility of syntactical or instructional collisions in different architectures.

Though that's just generalising and I don't know 68K assembly, so I can't say for certain that it's not.

Table driven assemblers might be more practical, but I'm curious about the kinds of assembler that would be used for such a project.
 
I recall that the MDS ISIS-II variant of the Intel 8086 assembler had to read all of its opcdes/formats from a separate file. Lots and lots of OPDEF statements. Yes, it was slow, but it worked.
 
I just did the obvious thing and basically crammed all the assemblers into one executable. Each one does things however it needs to. The main assembler core just switches between whichever one is currently selected. There's only a few things that the average CPU-specific assembler needs to do: initialize before a new pass, do a line with an opcode, and sometimes special handling for an opcode with a label in front of it.
 
I created my assembler, MP-ASM, towards the end of the '80s, just for my Commodore 64, thus 6502. Then I got a 128 so I added the Z80. Two directives, .p6502 and .pZ80. tells the assembler which opcode set is valid and this enables me to use both 6502 and Z80 assembly inside one program. Later I added support for the newer variants of the 6502, for the 6800 and more, each using their own directive.
But lately I found my assembler behaving a bit sluggish on my older systems and the executable was becoming rather large. So about a week ago I split the program up in a main part and a unit for each CPU. Using the Pascal directive $DEFINE enables me to create an executive for only one (or more) CPUs, reducing the size and increasing the speed.
FYI: written in FreePascal, open source.
 
So thinking of this further, what are the reasons for putting two sets of different assembly source in the same file, rather than having it assembled in a second file and bring back the system variables ( labels ) to interconnect the two - eg, any calls or memory actions?
 
That is the way you would do it 'properly' by putting the code in separate files.

However, this doesn't necessarily alleviate the assembler from keeping track of the location counter it is using - as this is not necessarily where the code should be stored at or executed from.

However, the better solution is to complicate the assembler by having 'groups' of code, data, etc., and using a location counter for each group. The actual decision about how things are grouped together, and their location, can then be deferred to the linker (or locator).

This can be seen in Intel's ASM86 for example.

I write assembler software for a multiprocessor environment, so where the code for one group is physically located in one memory space is not necessarily where another processor sees the code, or executes it.

Dave
 
The m80 assembler had two pseudo opcodes to support relocation of code segments
.phase and .dephase. They allowed code to be generated for and located at different addresses.
 
So thinking of this further, what are the reasons for putting two sets of different assembly source in the same file, rather than having it assembled in a second file and bring back the system variables ( labels ) to interconnect the two - eg, any calls or memory actions?
Ever do an emulator?
Code:
    subttl    Emulated BIOS area
    page

;*    The following table contains a list of locations where the
;    interrupt number is to be "plugged".
;


PLUG_COUNT    equ    24        ; total slots to plug


    public    Plug_Table

Plug_Table    label    word
    dw    PLUG_COUNT dup (0)    ; gets filled in later

??plug_ctr    = 0            ; plug table location counter


;    The emulated BIOS area - in 8080 code.

    public    BIOS_Table

BIOS_Table    label    byte

bibase    label    byte        ; where we really are

;    register pair numbers

reg80_b        equ    00h
reg80_c        equ    08h
reg80_d        equ    10h
reg80_e        equ    18h
reg80_h        equ    20h
reg80_l        equ    28h
reg80_m        equ    30h
reg80_psw    equ    30h
reg80_a        equ    38h

;    General Macro to generate an emulator TRAP.

emt    macro    class,subclass    ;;    generate an emulator trap
    local    ??emt
    db    0edh,0edh
??emt    db    0
    db    subclass,class
??emtlc =    $
    org    Plug_Table+2*??plug_ctr
    dw    ??emt
    org    ??emtlc
??plug_ctr    =    ??plug_ctr+1
    if    ??plug_ctr ge PLUG_COUNT
    err    Exeeded plug table length
    endif
    endm

;    Generate a BIOS emt.

biocall macro    func
    emt    0,func
    db    0c9h        ;; an 8080 RET
    endm

jmp80    macro    where        ; generate a 8080 jump, with offsets
    db    0c3h
    dw    where-bibase + _bios
    endm

jc80    macro    where        ; jump on carry
    db    0dah
    dw    where-bibase + _bios
    endm

call80    macro    where        ; generate 8080 call, with offsets
    db    0cdh
    dw    where-bibase + _bios
    endm

lxi80    macro    reg,what
    db    01h+reg80_&reg
    dw    what
    endm

calln    macro    what,plug       ; call native mode emulation
    db    0edh,0edh
plug    db    what
    endm

mov80    macro    to,from        ; move reg to reg
    db    040h+reg80_&to+(reg80_&from shr 3)
    endm

ret80    macro            ; do an 8080 return
    db    0c9h
    endm

pop80    macro    reg        ; pop a register pair
    db    0c1h+reg80_&reg
    endm

push80    macro    reg        ; push a register pair
    db    0c5h+reg80_&reg
    endm

mvi80    macro    reg,what    ; move immediate to 8-bit register
    db    06h+reg80_&reg,what
    endm

dad80    macro    reg        ; double add to hl
    db    09h+reg80_&reg
    endm

shld80    macro    where        ; store (HL) direct
    db    022h
    dw    where-bibase + _bios
    endm

sta80    macro    where        ; store (A) direct
    db    032h
    dw    where-bibase + _bios
    endm

;    the bios jump table

biostab label    byte
    jmp80    ebios0        ; cold start
wbootx: jmp80    ebios1        ; warm start
    jmp80    ebios2        ; console status
...
 
So thinking of this further, what are the reasons for putting two sets of different assembly source in the same file, rather than
In case of the Commodore 128, even in CP/M mode, the machine sometimes has to switch to 6502 mode, IIRC for example reading from and writing to disks, and back again. Sorry, it has been over 30 years ago. But it was done using one program and thus the assembler had to be told in a way when to assemble instructions for what CPU.
 
I have a "game copier" thing (called Multi Game Hunter) that works with Sega Genesis and Super NES. (but unfortunately I don't have the SNES cartridge slot adapter for it) Its 64K of ROM is half Z-80 and half 65816, along with a lot of common data. It uses the Z-80 on Sega rather than the 68K, presumably so that it can load Sega Master System games.

It was actually my motivation to get 65816 support working in my assembler and disassembler. Assembly-language tools are more tedious to work on when the architecture has mode bit flags that can cause instruction lengths to be different depending on the runtime mode.
 
Back
Top