• Please review our updated Terms and Rules here

Why did IBM create CGA - What user was there target?

Early PC is NUMA. In CPU to CGA interaction you have two execution units, two memory localities on a shared bus.
 
I am not an expert on UMA, but doesn't it require symmetrical access to memory resources without arbitrating in favor of one processor over another?
So shared memory is not, by default then, UMA?
By that definition, UMA is quite rare. Even with dual-ported RAM, if a device on one port tries to read a row that a device on the other port wants to write, one of them is going to be blocked for a short period of time. 6502 systems that use memory significantly faster than the CPU and alternate accesses between ϕ0 and ϕ1 would count as UMA, but what Eudimorphodon was talking about in his post above would not be "more UMA" (or UMA at all) because it still has the same issue as standard CGA: if both the graphics circuitry and the CPU try to access the same address at the same time, one will be slowed or fail to read. (The latter case is what produces "snow" on the display.)

I was using "UMA" in the sense that Eudimorphodon seems to have meant it: the CPU has direct access to the frame buffer RAM.

UMA = Unified Memory Architecture, also called "shared memory", where CPU and graphics frame buffer is in the same RAM chips. So CPU and Graphics Chip share the same memory for software, data and frame buffer and access alternately to it.
By this definition, CGA is UMA. You can put instructions in the CGA memory and run them directly from there.

NUMA = Non-Uniform Memory Access - it's not the opposite of UMA, but it is something completely different, means: Im a multiprocessor system each CPU has it's own local connected memory. But other CPUs on the system can access it through the CPU where this memory is connected.
I agree with this, but again, that makes CGA non-NUMA. The CPU does not have to go through the graphics processing hardware to access the frame buffer; it simply reads and writes it.

And a much earlier example (by more than two decades) of NUMA than the AMD Opteron is any system using the TMS9918 graphics chip. It had its own separate memory, and the CPU would have to access it by making a request to the chip to read a particular address and return the data, or write given data to a particular address.
 
The CPU does not have to go through the graphics processing hardware to access the frame buffer; it simply reads and writes it.

The NUMA does not specify the path constraint. NUMA is primarily associated with CPU socket<->memory locality, but in abstract sense it just means the CPU memory I/O has different timing throughout the address space.

(NUMA) A memory architecture, used in multiprocessors, where the access time depends on the memory location. A processor can access its own local memory faster than non-local memory (memory which is local to another processor or shared between processors).

I think of this in very practical terms, using succint definition - if you make a resource intensive multithreaded software and expect it will just scale with multisocket upgrade, you will experience the same issue as if you were to just use unused VRAM as extra RAM.
 
Yup the bus was dubbed ISA once PCs became prevalent and masses of clones out there, hence "industry standard".
Perhaps the first 'cloners' didn't want to use IBM's terminology - PC BUS, AT BUS, for potential legal issues?
Calling it ISA was a reaction to IBM's introduction of the Microchannel bus in the PS/2 line. Big Blue was trying to reset the definition of PC hardware to be IBM-centric, and the rest of the industry basically went "LOL, no". They declared the existing XT/AT bus to be "Industry Standard Architecture" and introduced EISA as a 32-bit extension to it.

 
If IBM had actually gone through with this plan to use a more "UMA" design for video memory we might have actually ended up with a 5150 that was more like the PCjr or Tandy 1000. There definitely would have been pros and cons to such an approach, but if IBM had done it correctly (by that I mean like how Tandy did with the 1000, where they strategically used 16 bit access for video with storage latches to cut down on the contention compared to the PCjr) then it might have been a bit more palatable to allow for 32K of video memory in 1981, since it could have done double-duty as system memory in lower spec configurations. But...



The Plantronics Colorplus is the best known example of a "Super-CGA" card for 16 color graphics that predates the PCjr and Tandy 1000, but it might not be the first. Note that it's not *compatible* with Tandy or PCjr, it implements the color planes differently so the memory layout doesn't match up.

Here's the thing, though: if you look at the Plantronics adapter you'll see it occupies *two* full length boards. CGA is actually a *pretty darn complicated* standard; if you look at the details it basically has three separate pixel generation chains and the PCjr and the Tandy 1000 both rely on some custom ICs to get the parts count down. Custom ICs were more mainstream in 1984 than they were in 1980, and while you could probably make an argument that if IBM had really wanted to they could have comissioned some custom glue to make CGA fancier while still fitting it into the target price they were after it would have front-loaded a lot more investment in what was kind of a skunkwork's project.... and that probably would have torpedoed the whole thing.

Judged by the standards of its time CGA was plenty good for what it cost, and its flexibility and low cost to implement (by the mid-80s they'd compressed it down to blob chips that basically just needed to be slapped on a card with a couple RAM chips) kept it in the game right up until the end of the 80's. IBM deserves at least a silver medal for it. They might have gotten gold if instead of those gross pre-set palettes they had for the 320x200 mode they'd just slapped an 74LS670 addressible register chip on it and let you pick which four

Understand that when the pc5150 was being designed DRAM was quite poopy to put it mildly and despite Ram still being quite mediocre in 84 things were exponentially better from a speed and consistency standpoint at that point. They cost of the additional video circuitry on the main board was also simplified and commodified by that point.

AKA UMA even in 81 would have driven a larger motherboard size and would have impacted speed, performance, complexity and machine capabilities and could have potentially increased system cost , oddly.

Having the CGA card separate with what is in effect its own separate video ram/bus was the right decision for 1980 and opened up the architecture to new video standards (PCJr had a fixed spot for video memory in an awkward place, the openness was the 5150’s saving grace to being a platform and family of tech rather than a bespoke footnote)
 
CGA card has extra memory for frame buffer and does not use mainboard memory, this is no UMA. NUMA is about multiprocessor systems where CPUs have local memory and all CPU can access all local memories of all CPUs, but slower as it's own local memory, and "remote local memory" is slower as local memory. NUMA aware systems need to know this because if one processor would run program code or data stored in the local memory of another CPU it would run slower as necessary. This is not anything about graphics card. It does not make sense for me to explain you UMA and NUMA again, all has been said by me to this topic and some of you don't understand.
 
  • Like
Reactions: cjs
My conjecture was that the design document that suggested having no memory on the main board meant that something similar to the System/23 would have been attempted. The System/23 had no memory on its main board but only on expansion cards. The video system was on the main board and shared the expansion card memory with the CPU. The hypothetical personal IBM PC would have had to give the unused portion of CGA memory over to the CPU because there wouldn't be any other RAM.

Getting it to work would have been complicated. Not worth the slight savings especially since extra RAM would have needed to be purchased to make the PC more useful than the cheaper competition. I doubt there was a marketer out there in 1981 that could move a $1,000 desktop computer with the programmable memory of the T/S-1000.
 
Understand that when the pc5150 was being designed DRAM was quite poopy to put it mildly and despite Ram still being quite mediocre in 84 things were exponentially better from a speed and consistency standpoint at that point. They cost of the additional video circuitry on the main board was also simplified and commodified by that point.

AKA UMA even in 81 would have driven a larger motherboard size and would have impacted speed, performance, complexity and machine capabilities and could have potentially increased system cost , oddly.
This is nothing to do with the year these were created. Consider that, four years earlier in 1977 (when DRAM was even more "poopy" the Apple II had what was unquestionably UMA access to the part of main memory that was shared with the video subsystem, which was also on the main board.

The real issue here was simply with the differences in the 6800 and 6502 memory access timings from the 8080-style systems.

Having the CGA card separate with what is in effect its own separate video ram/bus....
CGA did not have its own video RAM/bus; that's exactly why you get snow on the display when the CPU is accessing the RAM.
 
I was using "UMA" in the sense that Eudimorphodon seems to have meant it: the CPU has direct access to the frame buffer RAM.

That is... an interesting definition of "UMA" that nobody else uses.

UMA = Unified Memory Architecture, also called "shared memory", where CPU and graphics frame buffer is in the same RAM chips. So CPU and Graphics Chip share the same memory for software, data and frame buffer and access alternately to it. Means, if CPU is accessing it has to wait until the video chip has read it's part.

Yes, this, To repeat, AGAIN: UMA in this context is "Unified Memory Architecture", which was a term that appears to have been invented by, or at least latched onto by SGI, to describe the architecture of their O2 Workstation, and quickly latched onto by AMD and Intel to describe computers that have integrated video chipsets. (And, almost forgot, it's also *widely* used when discussing the architecture of the ARM architecture Macintoshes. )This is a term that's been pretty well known since the late 1990's (it's been used to describe Intel chipsets since the i810), but it's clear the concept it describes, IE, a system where under "normal circumstances" you have a common pool of memory shared between CPU code execution and video refresh, goes back to, yes, machines like the Apple II. Unfortunately this acronym shares "UMA" with Uniform Memory Access, which describes multiprocessor systems that all have equal priority shared access to a common pool of memory (verses "NUMA", where the concepts of "Local" and "Non-Local" memory apply), and it's clear that this clash is causing some brainrot problems here. For the rest of this discussion I am referring *solely* to "Unified Memory Architecture" as it applies to graphics.

Anyway. I mean, sure, if you really want to pick a fight about it you could argue that "UMA" as described in the SGI literature describes more than just framebuffers, it also refers to things like texture memory... but this is functionally an evolution of the tricks that many old 8-bit computers did with walking around the base address of the video buffer to do things like hardware scrolling and double-buffering. So as far as I'm concerned "UMA" applies as well to an Amiga or Atari 800 as it does to an M2 Macintosh.

but what Eudimorphodon was talking about in his post above would not be "more UMA" (or UMA at all) because it still has the same issue as standard CGA: if both the graphics circuitry and the CPU try to access the same address at the same time, one will be slowed or fail to read. (The latter case is what produces "snow" on the display.)

I mentioned this in the same post, but here's a point about CGA: It doesn't snow in either the 40 column text mode or the graphics modes. From the tech manual for CGA:

Screenshot 2025-04-03 at 11.39.43 AM.png

If you look at the technical details you'll find that the CGA card implements the IOREADY line correctly to insert enough just wait states such that *almost* all the time the CPU can read/write the RAM without serious performance hits *and* without causing screen disturbances in every mode other than the 80 column text mode. This mode requires twice the bandwidth on the video refresh side as all other supported modes and if priority were given to video instead of CPU you'd basically have to hold the CPU in wait through the entirety of the active horizontal line period, so instead of fixing the IOREADY mechanism to do that in that specific mode IBM implemented a polling method that allows manually checking to see if you're in the active area if you're really dying to avoid snow.

Anyway, regarding this:

The reasons for not using the video card RAM as sole system RAM are pretty obvious if you start to work out the design: you a kilobyte of memory at $000-$3FF for the interrupt vector table, and working out how to share that with the frame buffer and the program, while it could be done, would be rather a pain.

I kind of hope whoever posted that "Prototype PC" design document sees this thread and re-links it, because apparently I completely fail at using the site's search engine. Because I don't have it in front of me I cannot comment on whether the document actually addresses that, or if I'm misremembering the "0k scenario" entirely, but I do recall that the version of the PC described in said article had a lot of differences from what actually shipped; among other things the "proto-CGA" card described might not not of had the 80 column mode yet. It's possible that idea of running from CGA memory was chucked in there by someone who didn't know about the interrupt vector limitation, or it's also possible they might have considered a scenario where the CGA memory could have been addressed from 0K.

Back to this nitpicking about "UMA": The definition I used in that post was a machine that "normally" uses the same physical memory bank(s) for both video refresh and executing CPU code. Applying to any machine that has a memory-mapped framebuffer is stretching it to absurdity, because outside of some weird special cases (let's say some really old machines like the original TRS-80 that leave out a bit), sure, you can run code from video memory if you *really* want to. Machines like the Apple II, Commodore VIC-20 and 64 (almost any home computer using the 6502, actually), the Sinclair ZX Spectrum, etc, completely fit this definition, as do 16 bit computers like the Commodore Amiga and the 128k IBM PCjr and Tandy 1000. If this is what a computer is doing then I don't think it matters if the memory that's being shared is physically on the same PCB as the CPU, that's just a construction detail, if said memory is the "main" memory. This hypothetical prehistory PC prototype just running from a proto-CGA card would thus qualify as UMA, at least until you add dedicated CPU RAM to it. At which case it... stops being UMA? Maybe we can have a whole separate argument about what happens when you add more memory to a UMA machine and said memory is NOT accessible to the video hardware.

Most of the machines described above can *only* use memory directly connected to their video hardware for video refresh; IE, if you sit down and pick through how they're put together the video hardware is essentially acting as the DRAM controller and providing refresh to the attached memory; the video hardware is *not* a self-contained DMA busmaster sharing a "neutral" pool of memory with the CPU(*). In these machines if you slap a memory card on the bus it's only there for the CPU, and depending on the architecture of the machine it may or may not be faster than the shared memory; Amiga people *specifically* make a distinction between "fast ram" and "chip ram", for instance.

Here's an edge case: going back to the S100 era video systems that you could call UMA that *were* implemented as neutral busmasters using RAM not on the video card. The classic example is in fact the Cromemco Dazzler; it worked as a DMA master to periodically grab a line's worth of data from whatever memory card you pointed it at in your system for the video refresh, and as a result it could do things like double-buffering by just poking a new base memory into its address register. Here's the thing, though: it also demonstrates why this wasn't a very good solution. The Dazzler would only work with SRAM cards because it had no patience for getting interrupted by anything else, and even then it still relied on having a line cache for storing a blob of pixels to keep from starving out the CPU, which is a strategy that only works for a low resolution/text system. (There was a fairly common Intel CRTC, the 8275m that also relied on doing DMA to external memory just like the Dazzler, using a built-in 80 character memory cache. That works great if you only need to fill that cache every 8 scalines or whatever, not so well if you're *constantly* having to let the DMA controller own the shared memory bank.) Because of the scale of the contention issues you run into with this I personally consider "UMA" only really applying to machines where the shared memory in question is controlled/refreshed specifically with the timing requirements of the video circuitry in mind.


Yeah, again, I mostly agree it was a dumb/unworkable idea with CGA only having 16K on it, but I do think it's *interesting* to think about an alternative universe where they decided the solution was to slap another 16K on the card. 32K minus 2K for the 40 column text mode would have almost been competitive at the time, as dumb as that is, and having that extra RAM on there *could* have enabled PCjr-level color in an expanded machine. But yeah, didn't happen.
 
Last edited:
CGA did not have its own video RAM/bus; that's exactly why you get snow on the display when the CPU is accessing the RAM.

Again, see the CGA technical manual. IBM is a little optimistic when they describe it as "dual ported", it's not really, but they did have a pretty fancy system of latches and timing that could strategically knock out a few wait states to sync it up with the CPU in an almost hitless and snowless manner in every mode but 80 column text. And, again, I hope I can find that document again, because I'm around 70% sure the proto-CGA described in it didn't *have* the 80 column mode.
 
Hate to quote myself, but:

32K minus 2K for the 40 column text mode would have almost been competitive at the time, as dumb as that is,

Food for thought here: three years later IBM actually thought they could sell a cassette version of the PCjr, and said version with 64K also didn't support the 80 column text mode.

The preliminary sketches for what became the IBM PC are at https://www.os2museum.com/wp/the-ibm-pc-41-years-ago/ The PDF is linked through the image of lined paper.

THANK YOU! I thought I might have also seen it at the os2museum web site but I failed when selecting the right sweet nothings to whisper to Google. And yes, there it is on page 2 of said PDF, "NO RAM!!!" is in caps on the Planar configuration list and the "Color TV adapter" specs... aren't even 40x25, they're 40x16, with Apple II 280x192 graphics. (I guess they were thinking of using a 12 line character cell like a TRS-80?)...

Also of note that the document's description of MDA specifically describes supporting "TV Freq" monitors with a "smaller box", presumably 8x8 characters. So yeah, proto-CGA was 40 columns only, in the original vision if you wanted 80 column text you were supposed to have bought the MDA card to do it on your mono (presumably composite) monitor.
 
Hate to quote myself, but:



Food for thought here: three years later IBM actually thought they could sell a cassette version of the PCjr, and said version with 64K also didn't support the 80 column text mode.



THANK YOU! I thought I might have also seen it at the os2museum web site but I failed when selecting the right sweet nothings to whisper to Google. And yes, there it is on page 2 of said PDF, "NO RAM!!!" is in caps on the Planar configuration list and the "Color TV adapter" specs... aren't even 40x25, they're 40x16, with Apple II 280x192 graphics. (I guess they were thinking of using a 12 line character cell like a TRS-80?)...

Also of note that the document's description of MDA specifically describes supporting "TV Freq" monitors with a "smaller box", presumably 8x8 characters. So yeah, proto-CGA was 40 columns only, in the original vision if you wanted 80 column text you were supposed to have bought the MDA card to do it on your mono (presumably composite) monitor.
Not long after that document IBM made it a design requirement to have acceptable 80 column text unlike the Apple 2 as the video output quality was considered as being too inferior to wear an IBM label

This is nothing to do with the year these were created. Consider that, four years earlier in 1977 (when DRAM was even more "poopy" the Apple II had what was

CGA did not have its own video RAM/bus; that's exactly why you get snow on the display when the CPU is accessing the RAM.

It certainly had a ram frame soldered right on, and didn’t require the pc memory controller in any way.

During writes you could get an access collision, but the pc certainly isn’t required to be continuously interacting with the video card as it will hum along displaying whatever is in the buffer on its own.

The video cards behavior doesn’t affect the pc or its system ram in any way being lopped at the end of the dos memory area being a latched port with direct address in the memory range.
 
Last edited:
Yeah there seems to be some confusion going on about the terminology.

Especially because NUMA is too new to be compared to IBM PC.

CGA isn't an execution context per se, there aren't caches involved. DRAM access might be granted to either CGA or CPU - an execution core always prioritizes itself. But even in the case of fixed CPU priority, a CPU would always have faster access to system RAM. That's why if you want to associate IBM PC arch to NUMA or UMA, it is a NUMA.

Key difference is in programming POV - a modern NUMA aware software keeps track of the locality of memory and doesn't use non-local memory for high throughput cases. An old DOS software that steals CGA VRAM to itself should be also aware of the same kind of a penalty.
 
Again, see the CGA technical manual. IBM is a little optimistic when they describe it as "dual ported", it's not really, but they did have a pretty fancy system of latches and timing that could strategically knock out a few wait states to sync it up with the CPU in an almost hitless and snowless manner in every mode but 80 column text.
I have seen that manual, particularly the schematic on page 28 where it's very clear that this isn't dual-ported RAM, but just regular old RAM with some '374s in front of the ISA bus and the 6845 to allow switching access between the two.

And yes, the timing was fancy, but of course it had to be, because the CPU didn't have as simple timing as a 6502. But the Apple II still did the same thing, with easier timing (just ϕ0/ϕ1).

But I guess I see what you're getting at here: though to the CPU the video RAM is just RAM (albeit sometimes slightly slower than other RAM), that RAM must be on the video card because it needs to be behind those tri-state buffers.

The definition I used in that post was a machine that "normally" uses the same physical memory bank(s) for both video refresh and executing CPU code.

Well, the Apple II did or didn't, depending on what you're doing. Lots of people had a Language card, and when they were using that they were running code from a physically separate memory bank to which the video system didn't have access. (And heck, the ROM probably even qualifies as that.) This is why I think the "physically separate" thing is an incoherent definition; it's a distinction that doesn't help at all with understanding the taxonomy of video systems, but instead gets in the way.

Maybe we can have a whole separate argument about what happens when you add more memory to a UMA machine and said memory is NOT accessible to the video hardware.
Exactly.
 
Well, the Apple II did or didn't, depending on what you're doing. Lots of people had a Language card, and when they were using that they were running code from a physically separate memory bank to which the video system didn't have access. (And heck, the ROM probably even qualifies as that.) This is why I think the "physically separate" thing is an incoherent definition; it's a distinction that doesn't help at all with understanding the taxonomy of video systems, but instead gets in the way.

Okay, but honestly that mostly sounds like a problem for you, not for most of the rest of the world. Obviously there are degrees of compliance with any arbitrary definition you care to cook up, but nitpicking like this isn't exactly useful to anyone's understanding of the world either. I would venture a very confident guess that all of those systems which are defined by their manufacturers as using "UMA" have sections of memory like ROM chips or whatever that can't directly be mapped into framebuffer/texture space, I guess you'd better get on the phone and tell them that unless everything is completely, utterly, 100% uniform then they need to come up with some other acronym that makes you happy.

I mean, sure, if you want to play this game of arguing what happens when you add dedicated-to-the-CPU memory to a "UMA" machine I will tongue-in-cheek make an arbitrary case that even though they're nearly identical machines in most respects adding memory to a Tandy 1000 turns it into a "Non-UMA" computer while a PCjr is stuck forever with its "UMA" curse? Why? When you add memory to a Tandy 1000 the new memory goes at the *bottom* of the memory map and the chunk that's shared with video floats upward. This has the effect of moving all the vital low-memory globals and tables that reside in the bottom pages of memory in an 8088 machine into "dedicated" memory; the part that's shared with the CPU is now just tacked onto the end of the DOS "TPA" and therefore it's *very possible* to run programs that, other than for when they write to the CGA access window shadowed at B8000, never touch "video memory". The PCjr, on the other hand, puts expansion memory *after* the built-in 128K, which means the system globals are forever stuck in the "video memory" and the fact that it still has to assign video buffers out of that low memory space creates a bunch of headaches that have to be hacked around and have performance implications...

But no, I'm not going to pretend this distinction is anything but mostly-arbitrary. That said, I will note in your example of the Apple II that whether you have a Language Card or not the zero page, the stack, and most other vitally important bits of the system are still in the RAM that's shared with and refreshed by the video system. The Apple II's utterly baroque memory map is entirely dictated by how it shares memory and, on top of that, how it uses the absolute minimum of supporting hardware and counters to do it.

Whatever. If you don't like the term "UMA" don't use it.
 
Last edited:
I would venture a very confident guess that all of those systems which are defined by their manufacturers as using "UMA" have sections of memory like ROM chips or whatever that can't directly be mapped into framebuffer/texture space...
Apple never called their system "UMA," and the Apple II had large sections of on-board RAM (most of it, in fact) that could not be mapped into video RAM. In fact, of the three physical banks of memory on the motherboard, only two could be used for video RAM, and only parts of those. So are you saying that the Apple II is not "UMA"? What would a UMA microcomputer from the late '70s or early '80s be?

So this is where I'm finding no point to these definitions of "UMA" that look at physical memory banks, rather than how the video system and the CPU interact. You can gain some clear understanding by distinguishing between, e.g., a TMS9918 system and a CGA system, but given that neither is "UMA," what use is the that term?
 
Last edited:
I have seen that manual, particularly the schematic on page 28 where it's very clear that this isn't dual-ported RAM, but just regular old RAM with some '374s in front of the ISA bus and the 6845 to allow switching access between the two.
Dual ported video memory as we now know it was being invented in 1980. Two ported RAM was available earlier. Cromemco offered a card with 48K of two ported RAM for use with the SDI video system for $995 in 1983. Applying the typical 4 fold increase for something being delivered in 1980 would mean that the 16K of two ported RAM would cost about $1,200. That is a lot just to stop snow in 80 column text mode.
 
So this is where I'm finding no point to these definitions of "UMA"

Then don't use it. I'm sorry there are things on the Internet you don't agree with.

Apple never called their system "UMA,"

Do you mean the ones they sell today that everyone else calls "UMA"? Oh, gee, you're right, on their tech spec page they leave out the "A".

Screenshot 2025-04-03 at 3.04.06 PM.png
And yes, the term wasn't around in 1977, it's a retronym, and an imprecise one at that. The fact that it bothers you so much is fascinating.
 
Do you mean the ones they sell today that everyone else calls "UMA"?
No, I meant the Apple II, of course. Remind me again, what era of technology do we discuss in this forum, and particularly what era are we discussing in this thread?
 
Back
Top