Memory addressing/architecture and system responsiveness

Uniballer · Aug 27, 2014

In the 1980s, I recall shipping a bunch of systems based on PDP-11/23's and /53's, the occasional 11/44 or /73. A Q-bus -11 with 512KB and an ST-506 interfaced disk running RSX11M-PLUS was pretty responsive. You could easily put 1/2 dozen or more people on it and none of them would complain that it was too slow, because loading a new program (task image) selected from a menu happened pretty fast. Small VAXen or UNIX boxes needed a lot more hardware capability to get anywhere close to that level of responsiveness. How come?

One thing that the RSX11 family did (and RT-11, too), is that task images, both in memory and as files, are generally contiguous (shared libraries, shared regions, and supervisor mode libraries didn't change the overall picture much). So when you are ready to load a task into memory you first allocate a contiguous block of RAM, then you find the file on disk and tell the disk driver to DMA it right into place, hook up the TCB and task header, and go. No scatter/gather I/O. No faulting the image into memory. So the process of starting a new task is way faster than on VMS or UNIX or Windows. I don't know enough about how TOPS-10 did things to compare that system, but that seemed pretty responsive, too. Of course, there were costs to doing this, too (remember the SHUFFLER that ran when RSX11 couldn't find enough contiguous RAM: it moved task images and so on around trying to make big enough holes to fit in more tasks. And you had to swap complete tasks out rather than just memory pages that weren't getting used).

I was thinking about how responsive a general-purpose system you could build based on current PC hardware and this kind of memory/filesystem architecture. Certainly, you could build a file system with contiguous files, but it would need the kind of maintenance that ODS-1 filesystems needed (frequent defragmentation, etc.). But would you really want to implement a memory management scheme that managed contiguous blocks of RAM, then updated page tables to make the thing go? With 64K or so task images this is not so bad, but the system I am on right now says my firefox browser process has 290MB resident, out of a total image of 733MB. You could easily build a system that just grabbed pages from a free list as needed, and did not allow page faulting to occur (i.e. all pages are resident), but this seems to me that it would lose the benefit of fast image loading because you need to load all those scattered pages before a task can be started. I guess if the allocated pages were big enough this wouldn't hurt too much (64K?), but wasted RAM is still an issue (isn't it)?

Certainly, embedded systems with tiny to medium filesystems can be quite responsive (everything important is resident in memory), but can you scale up the kind of historical O/S memory/filesystem architecture I am talking about to general purpose computing with the kind of programs we now have? Or has that ship sailed forever? Has the cost of hardware dropped to the point that it just doesn't matter much anymore? What do you think?

Chuck(G) · Aug 27, 2014

Real memory (not virtual) persisted for quite a long time (right into the 80s) even on large machines. Program implementation has to be performed to work with it, however. So you leave memory management of a program to itself and overlay or segment code, use memory-resident tables for large data. This can be very efficient, but a bother to code. On the other hand, if you leave paging to the operating system to manage, you can end up with inefficient code. but easily implemented. Consider, for example DOS/360--the supervisor can run in as little as 8KB, but there are hundreds of transients. This does have the advantage that a relatively small addressing space is needed.

There are disk systems that never need automatic defragmentation. These either rely on pre-declaring the estimated size of a file (with perhaps a limited option for extending it and also returning excess space. If files reach their size limit, you allocate a new one and copy data over, then delete the old one. This works well when data is mostly static and its maximum size is known.

Or, if you're on a multi-programmed system, never de-fragmenting--just satisfy the closest requests for disk I/O first and let the individual programs fight it out.

I've worked with both these types of systems. I don't know if they'd work well with modern bloated code, however.

JDallas · Aug 27, 2014

Bloat-ware is certainly a problem in such a system, but it might be worth consideration for a system on a eZ80 with 16MB in address space with memory banking. Considering that an 8MB Nor-Flash rom only costs about $1.20 in small volume, you could easily bank it ready-to-go image code, to run directly in support of a user-page with a wait state or two at 50Mhz, or simply copy it quickly into a static RAM chip (@$4.50 per 512KB) and run it at full speed.

I personally think that an eZ80 design is better with all silicon memory and perhaps a Sata drive on a shared server link... but I'd expect that demand would be high for a SATA drive, so thinking about a new way to manage that data wisely could be useful.

It would be nice to design a new operating system for a more medium application than to be forced to use overloaded PCs running meltdown core processors for simple tasks. Something closer to the CP/M origins and less from the Windows end of the spectrum.

Uniballer · Aug 27, 2014

Chuck(G) said:
Real memory (not virtual) persisted for quite a long time (right into the 80s) even on large machines. Program implementation has to be performed to work with it, however. So you leave memory management of a program to itself and overlay or segment code, use memory-resident tables for large data. This can be very efficient, but a bother to code. On the other hand, if you leave paging to the operating system to manage, you can end up with inefficient code. but easily implemented. Consider, for example DOS/360--the supervisor can run in as little as 8KB, but there are hundreds of transients. This does have the advantage that a relatively small addressing space is needed.

Yes. I recall having to deal with overlays and other small address space hassles as the worst part of programming the PDP-11. But even on the PDP-11 under RSX user-mode code only dealt with virtual (i.e. translated) addresses; but only 64K of them + I/D space. And everything was resident, so there was no demand paging. Of course you could remap addresses in certain ways (e.g. managing "large" data in memory by remapping as needed), send part of your address space by reference to another task, etc.

There are disk systems that never need automatic defragmentation. These either rely on pre-declaring the estimated size of a file (with perhaps a limited option for extending it and also returning excess space. If files reach their size limit, you allocate a new one and copy data over, then delete the old one. This works well when data is mostly static and its maximum size is known.

Or, if you're on a multi-programmed system, never de-fragmenting--just satisfy the closest requests for disk I/O first and let the individual programs fight it out.

I've worked with both these types of systems. I don't know if they'd work well with modern bloated code, however.

Right. The Berkeley Fast File System made a huge difference as compared to earlier UNIX filesystems by placing file fragments on the disk using algorithms that take disk geometry into account. Many other file systems took ideas from that work. My system here can read and write large files at over 100MB per second and defragmentation is really not needed until the disk gets pretty close to full. And the best part is that the user mode programs don't have to know or care how that filesystem is implemented. But UNIX does all of it's filesystem I/O from buffers allocated for that purpose (not DMA directly to/from user process space); that is simpler to manage but is more wasteful of both memory and compute cycles moving stuff back and forth.

Chuck(G) · Aug 27, 2014

t depends to a very large degree on the application.

If you're running the typical business applications (AP, AR, GL, Inventory and Payroll), you can estimate pretty accurately what the storage requirements will be and pre-allocate appropriately. No need for elaborate dynamic allocation. Similarly, if you're writing in COBOL, program sections can be treated by the compiler and run-time as segmented code.

Sometimes you just have to do things a certain way. Consider the CDC 6000 series architecture; one of the earlier Cray-designed supercomputers. Most of the operating system resides in peripheral processors (4K of 12-bit words) that have access to main memory, but which run at one-tenth the speed of the CPU. Memory is monolithic, with each user given a relocation address added to all memory addresses and a field length that limits addresses on the high end. Multiprogramming is performed by "rolling" whole user areas in and out of disk storage--and the user area must be locked down during I/O (for PP access). The only real CPU part of the operating system consists of a storage move routine to squeeze out gaps in main memory allocation.

You'd think that running real-time interactive code on a setup like this would be terrible--but it's the OS behind the PLATO project.

Similarly, consider the CDC STAR--a massive (for the time) supercomputer. Virtual memory, paged in either 512 or 64K 64-bit words. Disk allocation is contiguous, with up to 4 extensions allowed. Up to 512KW total memory. Most of the instruction set is vector (SIMD) memory-to-memory operations with 48 bit virtual addresses.

For the typical application workload, this worked pretty well. The main applications were simulation-type with large arrays that would run for hours or days (you need this kind of thing when designing nukes)--and most importantly, usually a single job at a time. The biggest downside was that scalar operations were performed as vectors of length 1--and startup overhead for an instruction was a killer. That however, was a problem with the hardware architecture, not the operating system.

Again, it all depends on the application.

Uniballer · Aug 28, 2014

JDallas said:
Bloat-ware is certainly a problem in such a system, but it might be worth consideration for a system on a eZ80 with 16MB in address space with memory banking.

The eZ80 sounds pretty interesting for an 8-bit micro. The large addressing capability would have benefits for a lot of embedded applications. But without memory management of any type, and no protected mode capabilities it is hard for me to see it as a candidate for general purpose (multi-user/multi-programming) computing.

It would be nice to design a new operating system for a more medium application than to be forced to use overloaded PCs running meltdown core processors for simple tasks. Something closer to the CP/M origins and less from the Windows end of the spectrum.

To me, CP/M has always looked like a derivative of RT-11, OS/8 or maybe TOPS-10, but I know what you mean.

JDallas · Aug 28, 2014

My Crazy eZ80 Project

My Crazy eZ80 Project

Uniballer said:
The eZ80 sounds pretty interesting for an 8-bit micro. The large addressing capability would have benefits for a lot of embedded applications. But without memory management of any type, and no protected mode capabilities it is hard for me to see it as a candidate for general purpose (multi-user/multi-programming) computing.

I'd doing the schematic and printed circuit board layout for a eZ80 design prototype for an industrial application, but it could also be used in a more amusing way to run many user-pages of 64KByte in a time-slice multi-user environment controlled by the kernel. I'll probably have the ability to run all my vintage computers software on it, to a point.

I have simple hardware solutions to trap a user in a 64KByte page entering ADL mode (16MByte access) and then either addressing memory or I/O outside his page. The hardware detection interrupts the eZ80 to invoke the kernel to stop the user's process. ADL mode can be allowed within a contiguous block of 1 to 16 64KByte pages and assure no access outside that assignment.

There is still one hole I've not plugged.

With this project, you can probably see why your topic caught my attention.

Chuck(G) · Aug 28, 2014

The eZ80 sounds pretty interesting for an 8-bit micro. The large addressing capability would have benefits for a lot of embedded applications. But without memory management of any type, and no protected mode capabilities it is hard for me to see it as a candidate for general purpose (multi-user/multi-programming) computing.

Well, yes and no. Remember that you're using an 8-bit MCU here. It makes directly-compiled-to-machine-code for real-world applications less of an advantage than you'd think. Consider, for example, a business or scientific application. You'll need floating point, probably string manipulation, etc. Well, if you code that into machine code, it looks like a bunch of "set up operands", "call the right routine" and "store the result". That sort of thing makes a p-code implementation very attractive--the compiled result is smaller and runs nearly as fast as a native code application. Note that an awful lot of CP/M applications were coded in CBASIC, which is one such a p-code type of compiler.

If you take that approach, then you can implement any sort of virtual memory scheme that you'd like. Back in the 70s, I had to code an 8-bit BASIC to run on an 8085 and that could accommodate up to 5 users (on a 3.5 MHz 8085, yet). The 8085 is not an architecture that allows for movable code, so a p-code type BASIC was almost mandatory--and worked very well. At an informal test by customers at one of the computer shows, it actually beat BillG's compiled BASIC. And it could fetch code and operands from disk or page-mapped memory as it ran. Eventually, it was ported to Xenix. The last installation that I was aware of using it just moved to more modern applications only about 5 years ago.

Memory addressing/architecture and system responsiveness

Uniballer

Experienced Member

Chuck(G)

25k Member

JDallas

Experienced Member

Uniballer

Experienced Member

Chuck(G)

25k Member

Uniballer

Experienced Member

JDallas

Experienced Member

Chuck(G)

25k Member