"Retro" Propeller computer project

commodorejohn · Jun 21, 2017

This might sorta fit better under "Vintage Computer Programming," but it concerns a machine that A. doesn't exist yet and B. isn't technically vintage, so I'll just throw it here instead.

Anyway, this was inspired by a comment Chuck made a while back about the lack of a good starter system for people to learn machine-language programming on - in that most 8-bit CPU architectures require too many steps to do anything complex, and most modern 32-bit microcontrollers require too much busywork and boilerplate just to get things up and running. I had made the offhand suggestion that someone should look at doing a sort of retro design in an FPGA or something based on a simple yet comfortable 16-bit architecture like the PDP-11.

Well, I don't have any knowledge of FPGA development, but what I do have is a Parallax Propeller chip sitting around not doing anything at the moment. So I've begun to brainstorm ideas for developing such a system to be emulated on the Propeller. Specifics are sketchy at the moment, but I've started off with trying to work out a nice, hobbyist-friendly CPU architecture that looks like it could potentially be emulated in a single Propeller cog (since I want the entire 32KB of hub RAM for the emulated machine's address space.) Ideally it should achieve something in the neighborhood of 150-300 KIPS - comparable to a 1MHz 6502, but capable of doing a good deal more per instruction. As suggested, it's heavily derived from the PDP-11, though it's not actually binary-compatible. Before I get too much into the grunt work, I thought it might be nice to get some comments on the proposed architecture and a few little details I'm still trying to make up my mind on.

- - -

So the CPU is a 16-bit design with eight main registers and eight addressing modes. Six of the registers are purely general-purpose, but R6 is the system stack pointer and R7 is the instruction pointer. This is all lifted directly from the PDP-11, as are the addressing modes (register, register indirect pre-decrement, register indirect post-increment, and register indirect indexed, optionally with an additional level of indirection.) The instruction set is very similar, although I've stripped out the more advanced functionality like memory mapping/protection and the like, as well as floating-point math (not gonna fit that in and still run in a single cog - although it might be possible to add a separate "FPU" in another cog.) I've also tweaked things a bit so that everything fits neatly into octal-based instruction formats (the PDP-11 was mostly this way, but broke its own rule with 8-bit branch offsets.) The instruction set, at present, is as follows:

Code:

Legend: m = addressing mode, d = destination register, s = source register, r = other register, o = branch offset
Opcodes are presented in octal.

[u]Two-address group B:[/u]

[i]Opcode	Mnem.	Description[/i]
17mdms	MOVB	Move byte source to destination
16mdms	CMPB	Compare byte source to destination (subtract, no result)
15mdms	SBCB	Subtract byte source from destination with borrow
14mdms	ADCB	Add byte source to destination with carry
13mdms	SUBB	Subtract byte source from destination
12mdms	ADDB	Add byte source to destination
11mdms	TSTB	Test byte source against destination (AND, no result)

[u]Branch group:[/u]

[i]Opcode	Mnem.	Description[/i]
107ooo	BCC	Branch if carry clear
106ooo	BCS	Branch if carry set
105ooo	BMI	Branch if negative
104ooo	BPL	Branch if positive
103ooo	BNE	Branch if not equal to zero
102ooo	BEQ	Branch if equal to zero
101ooo	BRA	Branch always
100roo	DBZ	Decrement Rr and branch until zero

[u]Two-address group A:[/u]

[i]Opcode	Mnem.	Description[/i]
07mdms	MOV	Move source to destination
06mdms	CMP	Compare source to destination (subtract, no result)
05mdms	SBC	Subtract source from destination with borrow
04mdms	ADC	Add source to destination with carry
03mdms	SUB	Subtract source from destination
02mdms	ADD	Add source to destination
01mdms	TST	Test source against destination (AND, no result)

[u]Address-and-register group:[/u]

[i]Opcode	Mnem.	Description[/i]
007rms	ASH	Arithmetic shift Rr by source
006rms	ROT	Rotate Rr by source
005rms	DIV	Divide Rr by source
004rms	MUL	Multiply Rr by source
003rms	XOR	XOR Rr with source
002rms	OR	OR Rr with source
001rms	AND	AND Rr with source
	
[u]One-address group:[/u]

[i]Opcode	Mnem.	Description[/i]
0007ms	DECB	Decrement byte in source
0006ms	INCB	Increment byte in source
0005ms	DEC	Decrement source
0004ms	INC	Increment source
0003ms	JSR	Jump to subroutine
0002ms	JMP	Jump to source
0001ms	NOT	Ones-complement source

[u]Register group:[/u]

[i]Opcode	Mnem.	Description[/i]
00007r	DUB	Duplicate bye in Rr to MSB
00006r	SXB	Sign-extend byte in Rr
00005r	RCR	Rotate Rr right with carry
00004r	RCL	Rotate Rr left with carry
00003r	LSR	Logical shift Rr right
00002r	LSL	Logical shift Rr left
00001r	ASR	Arithmetic shift Rr right

[u]Parameterless group:[/u]

[i]Opcode	Mnem.	Description[/i]
000007	SEC	Set carry flag
000006	CLC	Clear carry flag
000005	PLS	Pull status register from stack
000004	PHS	Push status register to stack
000003	RTI	Return from interrupt
000002	WAI	Wait for external interrupt
000001	RST	Reset system (CPU & peripherals)
000000	BRK	Break (software interrupt)

So I'm looking for feedback - how does this look as an instruction set? Anything obvious missing? Anything here seem unnecessary? And I'm specifically looking for input on a couple little implementation details:

Should byte moves to a register zero-extend the value, or leave the MSB of the destination untouched? What about byte arithmetic operations?
I want to make the order of operations in instruction decoding explicit, so that there's no ambiguity about, say, MOV R4, R4+ as there was across different PDP-11 models. I'm looking at a scenario where the source operand is decoded first, and both pre-decrement and post-increment write back to the register at decode time - so in the hypothetical most-complicated scenario of, say, a loop of ADDB R0+, R0+ the effect would be to add, say, address 0 to address 1, then address 2 to address 3, then address 4 to address 5, etc. Does this seem like an intuitive way to do things?
General policy is for word operations on odd addresses to silently AND out the low bit, dropping back to the next-lowest word-aligned address. This seems potentially unwise with the instruction pointer, though - should instruction fetches from odd addresses do a proper trap? What about the stack pointer?

Curious to get some feedback on this!

Chuck(G) · Jun 21, 2017

Well, you asked.

I don't know why 2-address (or even 1-address) machines are popular. I like three-address (like ARM, CDC 6000, PA-RISC, Cray, i860, etc.). So you have, for example, ADD reg1+reg2->reg3. Further, there's no need for condition codes in all of this; just branch on the contents of a register, (zero, nonzero, positive or negative); since you're using a 3-address architecture, there's no need for "compare" instructions. Your ISA should include provision for immediate operands, even if nothing other than "load immediate". Addressing modes are okay, but really you need little more than register-indirect. A "CALL" that deposits the current P-counter into a register before branching is nice, but you could also make the P-counter a predefined register, as well as making a selected register hardwired to zero. Be generous with the register file. Let the user determine how a stack is to be implemented.

If this looks like something designed by Seymour Cray, well, it should. He really was the genius of modern CPU ISAs.

After all, the Propeller was inspired by one of Seymour's designs.

KC9UDX · Jun 21, 2017

Silly semantics, and others will probably disagree. But as a beginner, I found the non-orthogonality of LOAD, STORE, TRANSFER more user-friendly than MOVE.

Chuck(G) · Jun 22, 2017

My big quibble with x86 and z80 assembly mnemonics, is that MOV or LD covers too much territory. Especially Z80 LD to memory, whattheheck--it was called "store" for decades before the z80. With x86 assembly, the mnemonics are non-isomorphic. It's not unusual to have DEBUG assemble the same syntax differently from, say, MASM--and short of explicitly coding the bytes, there's no way to tell which one you want.

commodorejohn · Jun 22, 2017

I asked for opinions, and opinions did I get! :lol:

While I think there's good arguments to be made for the merits of three-address machines, there's two primary reasons why I don't want to go that route here: simplicity, and instruction size. The Propeller only has 32KB of hub RAM (where the emulated machine will be running,) and three-address architectures really need a 32-bit instruction word, especially if you want a large register file. Additionally, the goal here isn't to have the most powerful/capable design that can be done on the Propeller, but rather a design that's modestly capable and doesn't impose many arbitrary limits, but is also simple enough that it would make for a nice learning environment for people who want to play around with learning to program in assembly language.

Chuck(G) · Jun 22, 2017

If it's an educational tool, why even use binary for your number system? Thousands learned their computer skills on 1401s and 1620s. As a matter of fact, many started out by coding machine language on those machines; no assembler needed.

If you want simplicity, consider a one-address machine. Use, say, the CDC 160A as a model. Simple, straightforward, with most instructions only 12 bits in length. Or the PDP8--or any of a host of other machines.

Or a no-register machine--after all, registers are architecturally nothing more than shortened memory addresses. There have been many registerless machines.

As far as 3-address machines having inefficient coding, most of that CDC 6000's instructions were 15 bits (6 opcode+9 register specifiers). Only immediate and jump instructions were 30 bits.

Or, if you want utter simplicity, consider a move machine.

Memory should not be a concern. Your Propeller will probably need a way to save work. An SD card might be the simplest, given that SPI would be perfectly adequate and can even be bit-banged. No reason that you can't execute programs out of that, rather than main memory.

But I despise the use of condition codes. I didn't like them on the any of the IBM systems and I still don't like them. You spend endless hours explaining which instructions affect which flags--for large architectures, it's a pain in the posterior when it comes to out-of-order scheduling, as each condition has to be treated as a result. And then you get "bugs" like the Z80 OTDR instruction that shouldn't affect the carry flag, but does.

A pet peeve is that we still don't teach vector architectures at an early time, yet it's clear that many future systems will need that facility. A student who learns APL at an early time is leaps and bounds ahead of those who do not. (And the MCM/70 implemented it on an 8008--your Propeller is a couple of orders faster with more memory).

commodorejohn · Jun 22, 2017

I think our differences of opinion on ISA design may be slightly irreconcilable :lol:

Chuck(G) · Jun 22, 2017

I speak only from experience, having seen the good and bad over nearly 50 years.

Charles Anthony · Jun 23, 2017

It seems to me that if you are going to implement an "almost PDP-11' that it would be worthwhile to actually implement one of the the simpler PDP-11 models. This buys you a lot of existing software -- assemblers, compilers, debuggers, OSes.

Even if you skipped of some of the hard bits needed for full support of the software, the ability to cross-compile or cross-assemble could be a plus.

-- Charles

tingo · Jun 25, 2017

I don't have any input on the ISA or the machine / CPU choices. But - have you looked at the existing emulator designs for the Propeller?
Like the DracBlade: http://www.smarthome.jigsy.com/propeller
This, and others, use external memory to overcome the HUB RAM limit of the Propeller. Sure, external memory brings other challenges.

commodorejohn · Jun 25, 2017

Charles Anthony said:
It seems to me that if you are going to implement an "almost PDP-11' that it would be worthwhile to actually implement one of the the simpler PDP-11 models. This buys you a lot of existing software -- assemblers, compilers, debuggers, OSes.

Yeah, there's a lot to be said for this perspective. My reasoning on not doing this comes down to a few things:

Subtle incompatibilities between PDP-11 models (like the tricky issues of when exactly register auto-increment/decrement happens) make it hard to pick one "right" model to emulate.
I like having the freedom to change things about the architecture that I don't care for or feel aren't necessary for the project (for example, changing the branch instructions to use 9-bit offsets means that every instruction can be readably expressed in octal, which I think would be useful for teaching people about the relationship between instructions in assembly language and the actual opcode values that are produced as a result.)
Having the liberty to change things at will also makes it easier to meet the goal of having the emulator run entirely in cog RAM.

tingo said:
This, and others, use external memory to overcome the HUB RAM limit of the Propeller. Sure, external memory brings other challenges.

There is that, but as you said, the other challenges kinda put me off the prospect. (In particular, I'm not clear on what kind of performance these projects manage out of external memory access.) But we'll see where development takes me.

"Retro" Propeller computer project

commodorejohn

Veteran Member

Chuck(G)

25k Member

KC9UDX

Space Commander

Chuck(G)

25k Member

commodorejohn

Veteran Member

Chuck(G)

25k Member

commodorejohn

Veteran Member

Chuck(G)

25k Member

Charles Anthony

New Member

tingo

Veteran Member

commodorejohn

Veteran Member

VCF West	Aug 01 - 02 2025,	CHM, Mountain View, CA
VCF Midwest	Sep 13 - 14 2025,	Schaumburg, IL
VCF Montreal	Jan 24 - 25, 2026,	RMC Saint Jean, Montreal, Canada
VCF SoCal	Feb 14 - 15, 2026,	Hotel Fera, Orange CA
VCF Southwest	May 29 - 31, 2026,	Westin Dallas Fort Worth Airport
VCF Southeast	June, 2026	Atlanta, GA