TMS-9900 question

commodorejohn · May 25, 2011

So I've got a couple of these CPUs that I've scavenged from broken TI-99/4As, and I've kind of been toying with the idea of building a little homebrew computer around them. My question is, is there any reason at all to use the CRU I/O bus? I've been looking at the manual for the CPU, and the CRU seems like the most needlessly complicated way to access peripherals that I've ever seen. Is there a compelling reason not to just ignore it and settle for memory-mapped I/O?

Chuck(G) · May 26, 2011

It depends. The CRU is sort of a cool bit-serial bus, but without the right peripheral support, is pretty useless.

You have to understand that the TMS9900 was derived from TI's 990 minicomputers. All but a couple of models of the 990 also used the TILine bus--a nice parallel bus similar to DEC's Unibus. However, the TMS9900 doesn't implement TILine.

So, yeah, go ahead and use memory-mapped I/O unless there's some compelling reason to do otherwise.

commodorejohn · May 26, 2011

That's what I figured - it looked like some kind of legacy holdover. Was it included on the TI-990 for compatibility with something older, or was it just a weird design decision to have two I/O buses, one of which was far more complicated?

Chuck(G) · May 26, 2011

commodorejohn said:
That's what I figured - it looked like some kind of legacy holdover. Was it included on the TI-990 for compatibility with something older, or was it just a weird design decision to have two I/O buses, one of which was far more complicated?

Ah, good thing you asked--I'd almost forgotten about it. Yes, the TI960 was an early 1970s mini that TI called its "bit-pusher" and played up the CRU as the perfect process-control interface. Note that the 960 instruction set was completely different from the 990. The 980 was the general-purpose mini marketed by TI before the 990 and did not have a CRU.

commodorejohn · May 26, 2011

Ah, interesting. Process control would explain a lot about the "large numbers of individually-addressable bits" design (though still...yikes.)

matthew180 · Jun 29, 2011

The CRU design is definitely convoluted, but there *are* some instructions dedicated to using it. But like Chuck(G) said, it really depends on your design.

To me it seems like the 9900 was the test-bed for a lot of ideas about CPU design, and unfortunately the designers decided to try them all out in a single CPU. The CRU I/O, no stack pointer, having the general purpose registers stored in external RAM (allegedly to speed up context switching), read-before-write on all memory operations, etc.

Also, the read-before-write makes memory mapped I/O more complicated.

I have thought a lot about making a home-brew around the 9900 too, since I really like the assembly language on the CPU, but they really messed up the hardware implementation. But, we can fix that now.

With and FPGA you can implement the 9900 and correct those little problems, plus run it faster (100MHz 9900 anyone?) I've actually started work on a 9900 core (seems no one has done that yet), so we'll see how it goes.

If you do make a system around the 9900, please make all the RAM 16-bit and zero wait-state. That is probably the single biggest kludge of the 99/4A.

Chuck(G) · Jun 29, 2011

"No stack pointer" is perhaps an oversimplification. The 990/9900's idea of a workspace pointer allows one to form a linked list of workspaces, creating a stack and is quite efficient in concept, but not so good in execution, as register references are also (slow) memory references.

When the 990 came out, there were plenty of systems without a hardware stack and an inspection of architectures in use at the time will give you an idea of the thinking.

Consider, for example, the IBM S/360. No stack and the subroutine linkage convention was very similar to that of the 990/9900. A routine is "called" via a branch and link (BAL), the routine stores (STM) the registers of the caller in the savearea of the caller, then loads (LM) the registers from the current savearea. The 990 skips the separate load and store and simply uses a pointer to a new workspace that's linked to the workspace of the calling routine. On the STAR, we had 256 64-bit registers and a SWAP instruction that loaded the registers from one location, while storing to a different location (very efficient on a core-based machine, not so much on semiconductor RAM).

You can see how the 9900 instruction set has evolved by inspecting the TI MSP430 microcontroller family. Gone is the workspace pointer and CRU, replaced by traditional stack architecture but the instruction set will otherwise look very familiar.

TI's got some very inexpensive dev kits for the MSP430 if you're curious about the microcontroller.

matthew180 · Jun 29, 2011

Yeah, I saw the MSP430 and I even have one of the devkits (unopened as of yet.) I chuckled when I saw the instruction set and how similar to the 9900 it is. Makes me wonder if the MSP430 designer worked on the 9900 series? I need to crack open my devkit; should be fun to mess with.

Chuck(G) · Jun 29, 2011

Old architectures never die; they just mutate.

The National PACE was a single-chip implementation of the IMP-16 multi-chip processor, which was modeled on the Data General NOVA architecture. The GI CP1600 was modeled on the PDP-11. Both made fatal errors, however. PACE required a bunch of fairly exotic support chips and a 4-phase clock and was very, very slow. The CP1600 was slow and squandered code space by using only 10 bits of a 16-bit instruction word--and it was again, slow. The Fairchild 9440 was a fairly complete implementation of a DG MicroNOVA and was also very slow.

PIC µCs still use a design based on a 1975 GI support chip for the CP1600, except for the PIC32 which uses a MIPS 4000 instruction set (not exactly a spring chicken, either).

...and of course, the x86 chips of today are descended from the lowly 8008 to the point where you can recognize features and instructions.

Old designs don't, it seems, die or even fade away...

commodorejohn · Jun 29, 2011

matthew180 said:
To me it seems like the 9900 was the test-bed for a lot of ideas about CPU design, and unfortunately the designers decided to try them all out in a single CPU. The CRU I/O, no stack pointer, having the general purpose registers stored in external RAM (allegedly to speed up context switching), read-before-write on all memory operations, etc.

Yeah, it's definitely an interesting design. The stackless approach isn't too bad - the link register idea is used in a number of different architectures, and while it's moderately annoying having to save it yourself when making subroutine calls, it does actually save you a stack push if a called subroutine isn't going to call anything else. (Though that's not quite such an advantage given that registers are in memory, but oh well.)

The registers-in-memory thing is kind of weird - definitely an interesting idea with some potentially useful applications (fast task-switching, for instance,) but then again, it kinda bottlenecks things...huh.

If you do make a system around the 9900, please make all the RAM 16-bit and zero wait-state. That is probably the single biggest kludge of the 99/4A.

Oh, definitely. Half the reason I'm considering this is I've got a couple of 32Kx8 FRAM chips I got as a sample that I figure I can put to use - all I've got to do is double them up, and presto! 16-bit nonvolatile main memory. (And way more than fast enough for zero-wait-state, to boot - it might even be possible to run two CPUs off of it, but let's see if I can even get one working first ;D)

matthew180 · Jun 29, 2011

Chuck(G): you have quite the knowledge base stored away there!

Speaking of DG, one of my favorite books recently was "The Soul of a New Machine" which chronicled the building of one of Data General's last 16-bit minis.

It's funny (or not so much) how much influence the "old" systems and designs have on modern systems. And everyone thinks we are *so* advanced from the good-old-days... ha.

I never programmed an 8008, but I did learn 8088/8086 assembly after the 9900, and you can definitely see the remnants from those processors in the modern x86 CPUs. I can't decide if that is a good thing or depressing?

commodorejohn: if you have not picked video subsystem yet, you may be interested in my little project, the F18A (http://codehackcreate.com/archives/30), which is a pin-compatible replacement for the 9918A VDP with VGA output. I'm almost done with board revisions (I hope) and it should be ready this summer (again, I hope).

commodorejohn · Jun 14, 2012

Hey, coming back to this thread, I've been trying to build a TMS-9900 emulator so's I can test some ideas for this project, and I've been trying to figure out the read-before-write behavior. I can understand why it's done on byte writes, but I can't figure out whether it does that on word writes (seems wasteful and pointless,) or only on byte writes. I've even seen someone state that it applies for word writes only on instructions that have a byte variant. I've seen claims made for each behavior online, and I can't find anything about it in the manual. Does anyone know which behavior is actually implemented in the CPU?

billdeg · Jun 14, 2012

Not sure if this will give clues for your project. I sold this off years ago after losing interest. A four board homebrew system based arouond the 9900

http://vintagecomputer.net/ti/TI-990-101/

Bill

geoffm3 · Jun 19, 2012

Wasn't the CRU used for the GROM support in the cartridge software?

marc.hull · Jun 21, 2012

commodorejohn said:
Hey, coming back to this thread, I've been trying to build a TMS-9900 emulator so's I can test some ideas for this project, and I've been trying to figure out the read-before-write behavior. I can understand why it's done on byte writes, but I can't figure out whether it does that on word writes (seems wasteful and pointless,) or only on byte writes. I've even seen someone state that it applies for word writes only on instructions that have a byte variant. I've seen claims made for each behavior online, and I can't find anything about it in the manual. Does anyone know which behavior is actually implemented in the CPU?

I am fairly certain that the read before write issue occurs on every write access regardless of whether or not it has a byte variant. It is fairly invisible as it is performed by the CPU and requires no other support.

The CRU is a pretty neat design. It more or less allows a programmer to easily control hardware without sacrificing memory space. It is more or less a bus under the normal bus. Probably a PITA to implement as you need support IC's (9901.)

Memory mapping is an easy to implement strategy on the 9900. If you keep your ports at word boundaries and access them with the even byte then you don't risk reading a port accidentally.

How far are you on your implementation ?

commodorejohn · Jun 21, 2012

Oh, it's wholly theoretical at this point; I haven't even desoldered the CPUs from the junk boards yet. The emulator is coming along well in terms of basic coding, but we'll see how long it takes to get into any kind of usable, non-buggy state

geoffm3 · Jun 21, 2012

commodorejohn said:
Oh, it's wholly theoretical at this point; I haven't even desoldered the CPUs from the junk boards yet. The emulator is coming along well in terms of basic coding, but we'll see how long it takes to get into any kind of usable, non-buggy state

Are you starting from scratch, or using bits and pieces from MESS or one of the TI emulators?

commodorejohn · Jun 21, 2012

Starting from scratch. The whole reason I'm doing it is because it seems like interfacing to any of the existing ones would be more trouble than it's worth. (The homebrew TI emulators less so than MESS, which is a nightmare of intertwined macros from countless different files, but they still all seem to be designed to fit their own particular project, and not for general-purpose usability.)

commodorejohn · Jun 21, 2012

Okay, I think I've figured it out; the TMS-9900 data manual has a listing of number of memory accesses along with the instruction times, and it's not too difficult to work out the answer from there. MOV and MOVB both have four memory accesses (instruction fetch, source fetch, destination fetch, destination write, I theorize,) while LI, which has no byte variant, has three (instruction fetch, immediate fetch, destination write.) (Other immediate instructions have four, but they need a source fetch.) Seems like most other instructions fit the pattern for the "read-before write only on instructions with a byte variant" theory, so I'm going to go out on a limb and say that's the correct one.

commodorejohn · Jun 25, 2012

Okay, next question: RESET has its own separate line on the chip with immediate response (i.e. doesn't finish out the current instruction,) but it's also interrupt #0. The data manual calls interrupt #1 "the highest external-priority interrupt" (they number everything backwards,) but the only criterion it gives for interrupt acknowledgement is that the interrupt code being signaled to the CPU is less than or equal to the value of the status-register interrupt mask. That is satisfied when the interrupt code is 0 no matter what the interrupt mask is, so I have to wonder if it's not possible to signal RESET via the normal interrupt lines. You'd lose the immediate acknowledgement, and in decrementing the interrupt-mask value it'd reset it to 15 (mask no interrupts,) but it doesn't seem to say it wouldn't work at any point...

TMS-9900 question

Veteran Member

25k Member

Veteran Member

25k Member

Veteran Member

Experienced Member

25k Member

Experienced Member

25k Member

Veteran Member

Experienced Member

Veteran Member

Technician

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member