• Please review our updated Terms and Rules here

System call parameter passing conventions

johnx993

Experienced Member
Joined
Oct 26, 2008
Messages
246
Location
Texas (mostly)
In CP/M and MS-DOS, parameters are passed in various registers. I've been playing around with Altair DOS, and found it's done there by adding the parameters as data bytes after the system call. Presumably picked off by reference to the stack pointer. This seems like an elegant way to do it. Was there any rationale for switching to registers? Back when I was doing assembly programming, I'd always have to deal with saving registers before changing them up for the call, then getting my data back to resume the code flow. This method seems to make it more seamless for the programmer.
 
My only experience is with the 68000, but I would say that passing parameters in registers tends to be the fastest method. The parameters tend to anyway be at hand in some register at least by either of the caller or the called code, or in the best case both. And you anyway need to save anything in the registers when calling an API.

For processors with lots of registers a tricky question is what convention to use as for which registers a called function should preserve and not. As an example IIRC on the Amiga A0-A1 and D0-D1 are treated as scratch registers (and also parameters and D0 as the return code), while the other registers are preserved when calling OS APIs. Also A6 always has to point to the "library" structure which contains a jump table for the APIs in the library that is called, but may also contain data that the called library API uses and thus A6 has to have the correct content. (A7 is the user space stack pointer so it obviously mustn't be tampered with).

For processors with fewer registers like the 6502 I think there is no reason whatsoever to not pass parameters in registers as you anyway only have three registers and you can be sure that any API would need at least one and likely use all three to do anything meaningful, and thus there is almost zero cost at also using the registers to pass parameters.

Confession: I've never used the LINK/UNLINK things on the 68000, or whatever they are called...
 
The Altair DOS way of doing things is pretty much aligned with how the 'larger' machines and operating systems did things.

The parameters are not actually passed on the stack, but in a parameter block that is pointed to by a pointer following the system call. The return address (on the stack when the operating system entry point is entered) contains the address of the parameter block. This address can be updated for the subsequent subroutine return.

This means that the code is 'clean' (and can be set to execute and read only where the CPU provides for memory management) whereas the parameter block can be located in read/write data space. Read/write because the status byte (related to the function call execution status) is stored within the parameter block - so has to be writeable).

Clearly, microprocessor CPUs of the time did not possess memory management, but they came along later...


1749547252647.png

I suspect that passing parameters in registers probably makes the code shorter (especially important in a memory-limited micro machine) but a nightmare to port to another processor.

Dave
 
he parameters are not actually passed on the stack, but in a parameter block that is pointed to by a pointer following the system call. The return address (on the stack when the operating system entry point is entered) contains the address of the parameter block. This address can be updated for the subsequent subroutine return.



View attachment 1302371

I suspect that passing parameters in registers probably makes the code shorter (especially important in a memory-limited micro machine) but a nightmare to port to another processor.

Dave
Regarding the stack, that's exactly what I meant by 'referencing the stack pointer'.
As far as shorter... no so sure. I'd have to run some benchmarks. It would seem intuitively to be shorter, but then you have to add in saving your registers for your 'real work' in your code, setting up the registers for the call, and restoring them after. Either via the stack or in dedicated memory addresses. Either method would have to do this. But the register way seems to put that burden on the programmer making more work for him. The 'register method' sys call internals would be shorter - no need to move params into regs. But it adds size to the application programs, by making the programmer do it. Hopefully somewhere back in the mists of time some Comp Sci gurus worked out the pros/cons of that tradeoff. Probably not even considering ease of use (programming in this case)!
 
>>> Regarding the stack, that's exactly what I meant by 'referencing the stack pointer'.

OK.

>>> As far as shorter... no so sure.

It probably depends upon whether you are talking about 'professional code' or 'amateur code' - for want of a better descriptive term for each.

For example, a lot of 'games' or 'graphics' programs try to wring the maximum amount of speed out of the implementation.

In the case of the Cromemco Dazzler card (for example), the Cromemco graphics library is used, but the registers are not saved and reloaded, but the user programs are largely designed to keep (and manipulate) the registers that are used by the library. Saved and restored only where necessary.

Dave
 
Passing parameters in registers usually leads to smaller, faster code. (There can be exceptions, depending on what CPU it is.) I would argue it's a moot point though, because the mere overhead of calling a routine should not become a drag on performance if an API is being used properly. For instance, it isn't advised to animate an image on the screen by calling a SetPixel routine millions of times.

One potential downside of using registers is that a compiler will then need information on every API that is used, specifying which parameters go in which registers, whether this takes the form of function definitions in the program source, or one giant database that ships with the compiler. It's much easier to avoid this with something like the win32 convention, where calling any routine is only a question of pushing the correct number of things on the stack. (Of course there are compilers for win32 that include giant databases anyway, but they weren't written by me...)
 
In the case of the Cromemco Dazzler card (for example), the Cromemco graphics library is used, but the registers are not saved and reloaded, but the user programs are largely designed to keep (and manipulate) the registers that are used by the library. Saved and restored only where necessary.
Ohh nice!
I didn't even know there was a Cromemco graphics library.
All I do is run canned demos on mine.
Can you point me in the right direction?
 
I suspect that passing parameters in registers probably makes the code shorter (especially important in a memory-limited micro machine) but a nightmare to port to another processor.
Yes shorter, in general. The BIOS is smaller because it doesn't have to pull stuff off the stack, and the caller is often shorter because it doesn't have to put stuff back on the stack. It can also help with a series of calls:

Code:
            call kbscan        ; wait for keypress and return scan code in A
            call kb2ch        ; convert scan code in A to ASCII returned in A
            call prchar     ; echo the character entered
            cp   A,' '        ; did the user type a control char?
            jp   c,.ctrl    ;   yes: handle it differently
                            ;    no: do whatever

No, not at all a nightmare to port for many things. Somehow it turned out that a lot of the old microprocessors have an A register, so not only are the ports simple, but you can even use the exact same unit tests for all of them:

Code:
    m.call(m.symtab.qdigit, R(a='F'))    # parse digit
    assert R(a=0x0F, m.regs)

Though this breaks down if you want to get super-anal and flag errors or something like that, not because of the register names, but because e.g the 8080/6800 has a carry/borrow flag but the 6502 has a carry/not-borrow flag. And the tricks you use for dealing with that in common unit tests handle different register names for different platforms as well. (See src/generic/qdigit.py and the files that import it in 8bitdev.)

It probably depends upon whether you are talking about 'professional code' or 'amateur code' - for want of a better descriptive term for each.
I use "industrial code" for what I think you mean by the former: code that is robust and efficient, handling all the cases that are necessary to keep it from exploding, generally at the cost of clarity. I guess I would call code that makes for clear examples, while eliding all the extra work needed for industrial code, "naïve," both because it's naïve about what code like that really needs to do in the real world and because it's much easier for naïve users to read and understand the core functionality.

There's a third kind of code, "academic" code, which is the worst of them all because it's written by people who aren't even concerned about clarity but just, "can I convince a computer that my idea works, at least in an ideal world." Probably you won't even be able to get it to build unless you happen to be on the exact original system it was written on.
 
I have seen the equivalent of an OS put pixel call for a graphics program developed by a professional software programmer. Needless to say it ran like a brick and he didn't work for us for long!

If we are talking about a compiler, then things are even worse, because the compiler will probably call a library routine, pushing all of the parameters on the stack, and for the library routine to pull them all off and call the OS. The returned parameters would then be pushed onto the stack before returning. Some compilers pass the first 'few' parameters in registers.

Some ports are easy, others not. It all depends.

You need to search for graphz80.rel. If you look through our threads on recreating the Dazzler board, there should be a reference to the disks we used somewhere. I will hunt the links out later if you can't find them.

EDIT1: Documentation: http://www.s100computers.com/My System Pages/Dazzler II Board/Cromemco Dazzler Graphics Instruction Manual.pdf

EDIT2: You want disk 948 from the well-known Cromemco repository. If you don't know where that is, DM me for details... An even better source is post #247 here: https://forum.vcfed.org/index.php?threads/cromemco-dazzler-replica-project.77906/page-13. Enjoy!

Dave
 
Last edited:
The Altair DOS way of doing things is pretty much aligned with how the 'larger' machines and operating systems did things.

The parameters are not actually passed on the stack, but in a parameter block that is pointed to by a pointer following the system call. The return address (on the stack when the operating system entry point is entered) contains the address of the parameter block. This address can be updated for the subsequent subroutine return.

This means that the code is 'clean' (and can be set to execute and read only where the CPU provides for memory management) whereas the parameter block can be located in read/write data space. Read/write because the status byte (related to the function call execution status) is stored within the parameter block - so has to be writeable).
The code can be "clean" in either case in that the called API would use the callers stack.
One potential downside of using registers is that a compiler will then need information on every API that is used, specifying which parameters go in which registers, whether this takes the form of function definitions in the program source, or one giant database that ships with the compiler. It's much easier to avoid this with something like the win32 convention, where calling any routine is only a question of pushing the correct number of things on the stack. (Of course there are compilers for win32 that include giant databases anyway, but they weren't written by me...)
You anyway need include files, they can contain this information too. For example SAS/C on the Amiga uses #pragma directives for this. Also with any API that is intended for general usage I assume that the way to go would be to have one source for defining the API and then automatically generate the relevant files for each language (and vendor of language implementations, in case one C compiler can use the register based calling convention directly while another needs a .lib intermediate file to translate stack based calls to register based calls).

I would think that a major reason for 32-bit Windows to not use a register based API is that it was intended for multiple processors, and Microsoft might not had been that keen on creating a separate register based calling convention for each processor family (x86, MIPS, PowerPC, Alpha).

==============

Btw back in the days run time performance wasn't the only performance consideration. Compile time performance was also a thing. A great example is Turbo Pascal who had a really fast turnaround code-compile-test cycle, but also generated one of the slowest run time codes. (Source: vague memory from the not so well known Swedish magazine "Industriell datorteknik" (=Industrial computer technology, roughly translated)).
 
I would think that a major reason for 32-bit Windows to not use a register based API is that it was intended for multiple processors,
Well, and maybe also that they never expected to (and as far as I know, never have) run on an ISA that didn't have useful indirect access through the stack pointer. When you don't have ld a,[sp+4] and similar, it's significantly more expensive to access parameters on the stack.
 
Well, and maybe also that they never expected to (and as far as I know, never have) run on an ISA that didn't have useful indirect access through the stack pointer. When you don't have ld a,[sp+4] and similar, it's significantly more expensive to access parameters on the stack.
That probably depends on the architecture though. Sure, stack pointer relative addressing would be the fastest in cases like this (but would be hard to use from assembler, as the pointer obviously moves if you push/pull things to/from the stack). But on architectures where there are enough registers and transferring the stack pointer to another register is just an instruction, there might not had been much of a problem.

A thing that I think haven't been mentioned yet in this thread is a comparison on "performance cost" for various things. In general for OS APIs that anyway involves a context switch I don't think the calling convention has that much influence on overall execution speed. But for any API that doesn't do context switching (say a library that runs in the callers context, or for that sake an OS that doesn't have memory protection (like AmigaOS), the calling convention is kind of the only thing that affects API call performance. (Obviously other things like designing an API so it doesn't have to be called more often than necessary and whatnot would have an even larger impact on performance, but that seems out of scope for this thread).
 
Sure, stack pointer relative addressing would be the fastest in cases like this (but would be hard to use from assembler, as the pointer obviously moves if you push/pull things to/from the stack).
Right, but on software systems designed for this sort of C-like calling convention you'd not generally be pushing or popping during a single routine (or perhaps you'd be loading another register with your pointer into the stack frame).

But on architectures where there are enough registers and transferring the stack pointer to another register is just an instruction, there might not had been much of a problem.
Sure, but that effectively puts it into the "ISA suited for this" category. That would include ISAs such as PDP-11, 68k, etc. I'd imagined this thread was more about the category of "ISAs not so well suited for this," such as 8080 and friends, given the mention of CP/M at the start.
 
For the Z80 processor, I saw two main approaches: parameters stored in registers (e.g. CP/M BIOS), or parameters stored on the stack (e.g. HiTech C compiler)

Recently, I have also noticed a mixed approach: for example, for the case of a routine that uses a single parameter, load it into registers, if the routine uses more than one parameter, load only the first parameter into registers, and the others onto the stack (e.g. Cowgol language compiler).

Of course, it all depends also on who "unloads" the stack: the caller or the called routine?

When the system in question is, for example, a real-time or multitasking system, extensive use of registers to pass parameters is key to the solution.

However, to save/restore the context of a task, it is essential to also have a standard structure of register values and program counter, on the stack.

So, ultimately, it all depends on the nature of that 'system software'...

Ladislau
 
Back
Top