I'm a little surprised that nobody built one. Third-party PCGs were quite popular in Japan.
HAL Laboratory started out by building and releasing
a PCG for the PET 2001, and went on to release
seven more for various systems.
Worth remembering is that this was the only way to freely display any shape on a PET or other text-mode only computers. It would had made more sense to have this as an add-on for the MDA card than the CGA card.
Yeah, an SRAM character generator would have been a big improvement (it could have been just 1K for the upper character set with "standard" characters in a low 1K ROM, but I don't know if that would have saved any money). Complicated dual-ported access wouldn't really be necessary - just require apps to only change the SRAM when the display is disabled, and updates to the SRAM do not need to be fast. The ROM BIOS could have had multiple compressed characters sets in it and a shared decompression routine (it is interesting that contemporary ROMs back then rarely compressed anything, probably reflecting the premium on RAM at the time making it not worth it).
A way to update the content of a character generator RAM could had been to just write contents to the regular screen ram, and have a mode where the character generator would be written rather than read, kind of sort of.
==========================
Some general thoughts:
We have to remember that at the time the microcomputer companies did more or less just wing it, and some things became successes, and others became failures. In-house IBM knew that compatibility was key to selling computers, with their 360 mainframe line where all models could run software written to run on the lowest spec models, and other software would run on all but the lowest spec ones.
All other companies that lacked this experience seemed to just wing it. Commodore and Apple had successes with their PET and Apple II, and both tried an incompatible and in theory better follow-up that flopped, the CBM-II/B series and the Apple III. Later on many companies learned the hard way that creating a microcomputer not compatible with others really required the computer to have killer features to take any market shares. I.E. once the PC was well established as a business computer, and a few of the successful home computer models were available in enough quantity, there were almost no market for anything else unless it was superior to what already existed (i.e. say the Amiga with superior features or the Atari ST that beat the price for more or less all similar spec systems at the time, and so on).
IBM just realized that they wanted a monochrome and a color option.
A thing that this thread seems to miss is that some early MDA cards is said to actually support color text, i.e. all four attribute bits are routed all the way to the DE9 d-sub connector. (Don't quote me on this being early cards, just that a few cards does this). This makes me think that IBM probably initially had some thoughts of making a graphics card and a text only card, and have both be usable with either monochrome or color displays, but then reality hit them and they really wanted a better than 15kHz monitor, and it was likely hard to buy a monitor chassis (or have an OEM manufacture monitors according to your spec) for color with anything else than 15kHz TV frequencies. Technically I think it wouldn't had been that hard for a qualified TV/monitor manufacturer to create a color monitor that would run at something else than 15kHz, but it would likely had taken some time and whatnot and wouldn't had fitted with IBM's intentions to use off-the-shelf parts.
There are so many "what if"'s here. Like for example, since MDA has a full 8-bit attribute byte, what if the signal to the monitor had been analogue instead of two-bit digital? That way we could had had 8 or 16 grey scale levels rather than 4.
========================
Re memory and whatnot: If IBM had opted for the 8086 rather than the 8088, the PC might had had a 16-bit data bus, and that would had made a 32k CGA card more reasonable, or for that sake UMA graphics on the main board.
A problem with UMA is how it interacts with any faster versions of a computer. The AT would had either required non-UMA graphics hardware, or the bus clock would had had to been a multiple of the PC/XT bus clock, more or less.
Also re the UMA v.s. NUMA discussion: To make things more complicated, there were computers where the main / default memory were UMA, but add-on memory were separate. In particular on a VIC 20 the internal RAM is UMA and sits on one side of bus buffers (shared with the video chip and the character ROM), while expansion RAM sits on the other side (shared with the CPU, I/O chips and ROM).
========================
Speaking about "what if"'s, and also the comment that some Intel support chips were less than stellar (the dram controller v.s. 8085 v.s. 64kbit DRAM chips not fulfilling the timing margins and so on), I would say that a major mistake with the PC was to use the Intel style active high with pull-up interrupt lines. IBM should had for example added an inverter to the IRQ lines on the ISA bus, removing all problems with conflicting interrupts and whatnot. The 6800/6502 systems uses active low interrupts and you just connect all interrupt sources to the same line (and poll each chip to determine what cased an interrupt). This could had been a thing on the PC too. Btw since IDE/ATA is basically a stripped down ISA bus, it's worth mentioning that Commodore solved the interrupt polartity issue for the IDE/ATA interface on the Amiga 600, 1200 and 4000 by simply having a pull-down resistor on the interrupt line. I.E. if you didn't have a hard disk connected, it wouldn't try to hog the interrupt line. (Can't remember the details but IIRC that interrupt line is shared with one of the CIA I/O chips).
I think that in comparison the issues with the CGA card seems miniscule as compared to the interrupt polaritly blunder.