I've been working on a fun little research project over the last few weeks. It is a continuation of a project I last worked on about 5 and half years ago. That project is a set of subroutines that allow somewhat efficient use of the PDP-8 as a stack machine. Of course we all know that the PDP-8 does not have a stack (except for the 6120 CPU which has two). I decided to document my work and go over it all again. I have a google doc which I will post a link to here at some point. What I did today was revisit the idea of using one of the auto-index registers (10 through 17 octal) as the stack pointer. My conclusion 5 years ago was that I couldn't see any way to make it an advantage. Although the pop type operations are faster the push ops are slower. You can think of all stack operations as symmetrical in that for every call there is a return and for every push there is a pop. So it becomes easy to compare the execution times of the pairs of operations. I will show the call and return functions here for both with and without auto index register.
CALL=JMS I ZCALL
RET=JMP I ZRET
*20
SP, 7600 /INITIAL STACK POINTER
ZCALL, ICALL /POINTER TO THE CALL SUBROUTINE.
ZRET, IRET /POINTER TO THE RETURN CODE
*200 /CURSORY TEST
CLA
LP, CALL; INCAC /CALL THE SUBROUTINE TO INCREMENT THE AC
HLT
JMP LP
/KIND OF A LAME EXAMPLE IF I SAY SO MYSELF.
INCAC, IAC /INCREMENT THE AC
RET /AND RETURN TO CALLER
*2000 /PUT CODE SOMEWHERE.
ICALL, .-.
DCA SAVAC /SAVE THE AC
CLA CMA /SP <- SP-1
TAD SP
DCA SP
CLA IAC /AC <- 1
TAD ICALL /AC <- REAL RETURN ADDRESS
DCA I SP /PUT ON STACK
TAD I ICALL /GET CALLED ROUTINE ADDRESS
DCA ICALL /SAVE SO WE CAN JMP TO THE ACTUAL ADDR??
TAD SAVAC /RESTORE AC
JMP I ICALL /TRANSFER TO THE REAL SUBROUTINE
IRET, DCA SAVAC /SAVE THE AC
TAD I SP /GET THE RETURN ADDRESS
ISZ SP /SP <- SP+1
DCA ICALL /USE THE CALL ENTRY POINT AS A TEMP
TAD SAVAC /RESTORE THE AC
JMP I ICALL /TRANSFER CONTROL BACK TO THE CALLER
SAVAC, .-. /TEMP STORAGE FOR AC
$
Had to fix up the formatting by hand, hope I didn't mess anything up. The above code is the version of call/return which does not place the SP in an auto-index register. The Stack Pointer points at the top of the stack and grows downward.
CALL=JMS I ZCALL
RET=JMP I ZRET
*17
SP, 7600-2 /INITIAL STACK POINTER
ZCALL, ICALL /POINTER TO THE CALL SUBROUTINE.
ZRET, IRET /POINTER TO THE RETURN CODE
*200 /CURSORY TEST
CLA
LP, CALL; INCAC /CALL THE SUBROUTINE TO INCREMENT THE AC
HLT
JMP LP
INCAC, IAC /INCREMENT THE AC
RET /AND RETURN TO CALLER
*2000 /PUT CODE SOMEWHERE.
ICALL, .-. DCA SAVAC /SAVE THE AC
TAD SP /MAKE A COPY OF SP. IT ALREADY POINTS
DCA TSP /AT THE PUSH ADDRESS
CLA IAC /AC <- 1 (DOES NOT AFFECT THE LINK)
TAD ICALL /AC <- REAL RET ADDR
DCA I TSP /PUT ON STACK
CLA CMA /SP <- SP-1
TAD SP
DCA SP
TAD I ICALL /GET CALLED ROUTINE ADDRESS
DCA ICALL /SAVE CALL OR RET ADDR FOR IND JMP
TAD SAVAC /RESTORE AC
JMP I ICALL /GO TO CORRECT PLACE
RET, DCA SAVAC /SAVE THE AC
TAD I SP /GET THE RETURN ADDRESS
DCA ICALL /USE THE CALL ENTRY POINT AS A TEMP
TAD SAVAC /RESTORE THE AC
JMP I ICALL /TRANSFER CONTROL BACK TO THE CALLER
SAVAC, .-. /TEMP STORAGE FOR AC
TSP, .-. /COPY OF SP TO AVOID THE AUTO INDEX
$
The version that does not use the auto index register takes 25 cycles for the call and 15 cycles for the return for a total of 40 cycles. The version that uses the auto index register takes 29 cycles for the call and 13 cycles for the return for a total of 42 cycles. This is a 2 cycle improvement over my best from 5 years ago. And that 2 cycle difference is found in all the paired operations. Except it gets a little more complicated when I realized that there are now operations that do not need to be implemented by a JMS subroutine but can be placed inline as a single instruction. For example you can perform an AND against the top of the stack and get the pop for free by using AND I SP if the SP is in an auto index register. This takes only three cycles where the conventional routine looks like this:
STKAND, .-.
AND I SP /AC <- AC & TOS
ISZ SP /SP <- SP+1
JMP I STKAND /RETURN
This will be 9 cycles if called indirectly. POP and ADD are very similar being 6 cycles faster most of the time. I don't know if this is enough to make using the auto index register a plus but it makes the answer a lot less certain.
Still sun bleaching the VR210 case. The rate of change appears to have slowed but since I have to wait for the CRT glass to separate I can afford to keep putting it out.
Ok, There is clearly something I don't understand about this editor. The formatting looked fine until I posted it. Sorry it looks so broken
Doug