• Please review our updated Terms and Rules here

Dual Pentium III

For years, I ran a SuperMicro P6 dual slot I P3 board with SL4BS CPUs. Ran DOS, 98, 2K and XP and Debian Woody on it. Storage was a 250GB SATA SSD with a PATA-SATA adapter. Still have the board; the lack of SSE2 sort of limits your options.
That one is a true classic.
 
Multi-core doesn't mean all of the cores are on the same die.

Sure, I'm aware of that. And as you note, in some instances even when they *are* on the same die it's still topologically the same as if it were two separate packages linked only by the FSB. So... sure, I guess I'm guilty of oversimplifying, but the point I was responding to was a claim that two separate Pentium IIIs is somehow "better" than a "dual-core" CPU because they're "separate". In the very worst case they're exactly the same, and a modern multi-core system with more tightly integrated caches is likely to be better.

You won't see it in benchmarks and it doesn't make sense on paper, but there are performance advantages you'll see in day-to-day usage. If you don't believe me, build one yourself sometime and give it a whirl.

Ever hear of something called the "placebo effect"? If you don't see it in benchmarks it's not real, sorry. And there's no need, I have had plenty of dual socket systems, going back to an ABIT BP6 dual Celeron, circa 1999, iterating through "industrial strength" Serverworks-based Pentium III and P4 Xeons, and in fact right now I have both an old Mac Pro tower and a gnarly 2018-ish vintage Lenovo dual-socket Xeon with a gawdawful total number of cores within swift-kicking range right now.

They might be innately somehow "cooler" than a mere single socket system, but if that single socket has as many or more cores in it it's probably objectively better. Is the argument here that the extra sockets are effectively "speed holes"?

giphy-downsized.gif


The performance-gain comes from being able to dedicate 1 CPU to the operating system and 1 CPU to the program being run.

No. Look up how Windows NT works. Windows multiprocessing is *entirely* about the kernel scheduler picking CPUs to run threads on, and threading has been part of Win32 since the beginning. (With the exception of the "Win32s" subset that gave Windows 3.11 limited 32-bit powers.) Even plain old Windows 95 encourages programmers to use threads to spin off tasks that are supposed to run concurrently instead of trying to implement said concurrency themselves cooperatively, and thus a well-written Windows 95 program can, at least in theory, gain something from running on NT. The asterisk here is that depending on Windows to "auto-magically" handle scheduling your threads wasn't necessarily the most performant or predictable way of doing it, which is a good reason why games in particular might avoid it and just depend on a monolithic CPU-hogging process.

"Photoshop" was cited earlier as an early example of an "multiprocessing aware" program, but I think when people chuck this out there they might be getting confused about the differences between the Macintosh and Windows versions. Back in the mid-1990's Apple sold a few multi-CPU Power Macs well before OS X came out, and the pre-OS X consumer Macintosh OS wasn't even really a (preemptive) multitasking operating system, let alone a multiprocessor-aware one. To to utilize the additional CPUs they came up with a crude hack that allowed the secondary CPU(s) to essentially be used vaguely similarly to how you might use a GPU today, IE, you could write your program to spin off specific tasks that can use that extra CPU as an accelerator. You can "properly" say that for instances like this you have one cpu "dedicated" to the program that's being run and the other "dedicated" the operating system, but even that isn't really true, because in the Mac case there still needs to be a lot of the "MP aware" program running in the "main" OS space. Windows never worked this way.

Adobe *was* one of the first companies to specifically tout Windows NT's multiprocessing support, but their first 32 bit version of Photoshop (which ran on both NT and 95) "just" took advantage to the Threading support Microsoft added to the API and therefore left it up to the OS to schedule its subtasks like a modern OS would. In other words, it was written the same way you'd write it today; thread it up and leave it to the OS to decide where the threads go. The only thing that's going to be different when running this program on a multi-CPU machine instead of a single CPU is the program might opt to spin up more threads in parallel to accomplish a given task if it thinks there are more CPUs, but ultimately it's up to the OS if it decides to drop them all on the same CPU because it has other things viing for its attention.

There certainly might be some discussions to be had about the merits of "single fast CPU vs. two slower ones", and there's also a whole rabbit hole you can fall down when discussing the evolution of multiprocessing support in various operating systems (in terms of interrupt handling, fine-grained locking, etc), but, no, Windows NT-based multiprocessing was never as simple as "two CPUs, your 'main' program gets one and the OS gets the other!".

Edit: And per @GiGaBiTe's complaining about NUMA, that's a valid example of both why two sockets might be *worse* than one, and why the evolution of operating systems *does* matter. In the late 2000-aughts my work workstation was "upgraded" from a Mac Pro, in which all the cores are symmetric, to a dual-socket AMD system that had twice as many (supposedly faster) cores, but, yes, NUMA support kind of sucked at the time and that machine would randomly bog down compared to the older Mac. Shortly before I ditched that machine for the next one iterative Linux upgrades had significantly improved it. I am far less familiar with the evolution of the NT kernel when it comes to multiprocessing support, but I do remember, for instance, Microsoft crowing about there being significant improvements in the process scheduler with NT 7.0...
 
Last edited:
Windows has morphed in the current multi core world to sling processes to the next free core so even when your machine is pretty much idle more cores are doing something all the time.

The way CPU's work today not all cores run at the same speed, and it is kind of random on the die which ones are better than the rest. You also have the issue with Intel using real cores and mini ones (P and E cores) so MS's scheduler needs to know which thread needs more horsepower than the others.

Correct me if I am wrong but in the multi-CPU Athlon days, I think each CPU got its own RAM (so that if you only had 1 CPU installed in a dual CPU motherboard only half the RAM sockets were usable because the RAM controller was in the CPU). Makes me wonder if MS knew which bank of RAM went to which CPU.

The only reason you would need a 600W PS is because anything made since the P4 era has all its power on the 12VDC rail and only a little on the 5VDC rail that older systems still need. If you look at powerhouse supplies in the P4 era most of their power is on the 5VDC and 3.3VDC side since video cards back then were not power hungry.
 
It will be fun, mostly just waiting on some small cables to finish out the build. One of the CPU fans was bad, it was a plastic clip jobby so I have two new ones with metal clips. Still have the option on this one to add an IDE HD, I have a stack of them laying around. I also have a PCI to SATA Card that could be used when if I can find and dig it out and a pile of SATA 7200 RPM Drives as well. And even if XP doesn't run all that well, I already have a CoreDuo 3.1ghz System I am running it on so no worry for me there.

The ideal is to have 3 very nice retro systems, real hardware with very good era video and as good a sound card as I can reasonably obtained. Completed Drive Images backup safely for restoral whenever needed.

The 486 System
I already built a nice gateway that is going to be hell to sell, but I do have a classic full tower for the one that will be the keeper when the Motherboard of choice finally arrives this summer. I actually have 3 cool 486 MBs to choose from for that build and all the forementioned sound cards,

  • 486 MB TBD, Likely the486 ISA VLB PCI VIP Motherboard ASUS PVI-486AP4 Chipset
  • 4ISA, 1VLB, 3 ISA SLOTS
  • 486 DX4/100 CPU - this may have to be reduced for the Ultra GUS Card - TBD
  • Full Retro but New in Box Full Tower Case
  • New PSU TBD - likely a 300W Seasonic I have laying around
  • Gravis Ultrasound Clone Sound CardXGS 16 DreamBlaster Addon
  • Adlib Gold Sound Card Stereo Surround Addon Module
  • PCMIDI 8Bit MPU 401 with Roland MT 32 McCake Emulator
  • 10BT Ethernet - ehhhh, Maybe Dont see myself using this one on the network.
  • Gotek Drive
  • 52X CD ROM
  • DOS 6.22, WFW 3.11, Win 95, OS2 WARP 3, maybe 4.52 if I get it working on the new machine.

In this situation the two gateway DX2 66 and extra motherboards get sold. May keep one stored as a backup option.

The Pentium III
I could keep this one, however I do have another PIII board coming as well with AGP Slot. We'll have to see which I like best. The one I like lease gets sold. I have a couple of Dell Optiplex I am almost done refurbing and upgrading that will go up for long sale. They will help fund some of this as I got them cheap, cleaned them up and added some nice upgrades.

The Core Duo - completed minus getting all the OS's running multiboot.
I do have a system built already - I may change MB's someday, maybe not.
  • Gigabyte G31 motherboard with Intel E8500 Core 2 Duo 3.1GHz CPU
  • 2GB Hynix DDR2 800 RAM
  • EVGA Geforce 8800GTX 768MB Video Card
  • Samsung EVO 840 SSD's removeable front hot swap bay
  • Lexar 256GB SSD also in front hot swap bay.
  • Sound Blaster X-fi Sound Card
  • DVD +/- RW drive 52X
  • Thermaltake 500W PSU (new)
  • Copper Heat Pipe CPU fan (new)
  • Fractal Mid Tower ATX Case (new)
  • DOS 6.22, XP PRO, VISTA PRO, OS2 ARCA, Maybe Win 7
Beyond that, I wish I had kept my I7 system II recently gave away, it would have been fun for W7,W8.1 & W10.. maybe I will eventually build a killer quad CPU system some day.

E
 
"Photoshop" was cited earlier as an early example of an "multiprocessing aware" program, but I think when people chuck this out there they might be getting confused about the differences between the Macintosh and Windows versions. Back in the mid-1990's Apple sold a few multi-CPU Power Macs well before OS X came out, and the pre-OS X consumer Macintosh OS wasn't even really a (preemptive) multitasking operating system, let alone a multiprocessor-aware one. To to utilize the additional CPUs they came up with a crude hack that allowed the secondary CPU(s) to essentially be used vaguely similarly to how you might use a GPU today, IE, you could write your program to spin off specific tasks that can use that extra CPU as an accelerator. You can "properly" say that for instances like this you have one cpu "dedicated" to the program that's being run and the other "dedicated" the operating system, but even that isn't really true, because in the Mac case there still needs to be a lot of the "MP aware" program running in the "main" OS space. Windows never worked this way.

Adobe *was* one of the first companies to specifically tout Windows NT's multiprocessing support, but their first 32 bit version of Photoshop (which ran on both NT and 95) "just" took advantage to the Threading support Microsoft added to the API and therefore left it up to the OS to schedule its subtasks like a modern OS would. In other words, it was written the same way you'd write it today; thread it up and leave it to the OS to decide where the threads go. The only thing that's going to be different when running this program on a multi-CPU machine instead of a single CPU is the program might opt to spin up more threads in parallel to accomplish a given task if it thinks there are more CPUs, but ultimately it's up to the OS if it decides to drop them all on the same CPU because it has other things viing for its attention.

I do not recall which Dual CPU system I had at the time, but the speed increase for Adobe was real and very evident when working on a Single verses double CPU system. I was huge into photography back then and my camera had huge 25MB files, it was the reason I started building Workstation Level systems verses single CPU systems. It really helped chug through all those Fuji S3 photos I was processing back at that time.

E
 
If you don't see it in benchmarks it's not real, sorry.

Bechmarks are what's not real. Seriously. Do you not remember the whole SCSI thing with NT vs. 2000 where throughput always scored artificially high, to the point where a ton of people refused to make the switch-over because they thought something was "wrong" with 2k? It eventually turned out the benchmarks were the problem. There are few things that have less in common with real-world performance than the benchmark.

but if that single socket has as many or more cores in it it's probably objectively better.
Part of what makes this difficult to compare is the relative clock speeds. Even a single-core 3ghz P4 will run circles around a 1.4ghz dual PIII because it has so many more cycles. A core2duo will run circles around it for the same reason - faster architecture, more advanced technology.
 
Benchmarks attempt to measure performance, however, as I said earlier, performance benchmark lust can rot peoples minds into buying more hardware than they need. Also, they really cant account for the numerous real world scenarios. People use systems in so many different ways that benchmarks cant account for them. Then there was some software that was written to exploit systems hardware to its max where other similar software made the worse of it. Benchmarks are sort of directional, not necessarily factual for everyone's needs. My humble two cents on the topic.

E
 
Benchmarks that show good performance gains for the software a given user plans on running are an indicator that certain hardware should be tried. Any company that cheats on a benchmark should be excluded from potential purchases.
 
Bechmarks are what's not real. Seriously. Do you not remember the whole SCSI thing with NT vs. 2000 where throughput always scored artificially high, to the point where a ton of people refused to make the switch-over because they thought something was "wrong" with 2k? It eventually turned out the benchmarks were the problem. There are few things that have less in common with real-world performance than the benchmark.

Uh, huh. Sure.

Seriously, though, I'm trying to understand what you're even trying to claim here. If you were claiming that a dual socket PIII system "feels smoother" or something than a single-CPU system of roughly the same aggregate horsepower there might be some room in there to grant that there might be some subtle difference in system responsivity or something that you could only experience if you had the two side by side. But that's not what you're claiming; you said this:

The advantage of a dual CPU system for gaming is 1 CPU can run the operating system, freeing the full might of the other CPU up for running the games. In some ways this actually works better than a dual-core system.

IE, you're trying to say that a dual-socket Pentium III system is different from and better than a "dual core" system. I ask again... HOW?

Here's a cruddy diagram of a couple Pentium III's in an SMP configuration:

image003.jpg
IE, you have your two CPUs sitting there with their CPU cores, which in this case have a 400Mhz private bus to their LII caches (increase this to 533mhz for Pentium III's with 133mhz FSBs) tied together in parallel on a frontside bus that has a maximum data transfer rate of ~1Gb/sec and, very importantly, only one CPU can use at a time. IE, every time a thread running on one of those CPUs requires access to something in RAM that's not already in its cache it has to lock up that frontside bus getting it; a thread running on the other CPU has to wait until it's done. Even worse, let's say these two CPUs are separately working on tasks that might share memory, and one of them writes a change to that potentially shared memory. A protocol called MESI has to be constantly working away in the background snooping for these transactions and ordering cache flush/refills, and all of these have to happen over that shared bus.

In short, despite being "two CPUs" your dual Pentium IIIs are very much joined at the hip and have to behave as one unit. Almost as if they were, in fact, a "dual core CPU". If you dig into the Intel Multiprocessor Specification and its accompaning APIC architecture the entire design philosophy is making the rest of the system fairly agnostic when it comes to how many CPUs are sitting there sharing the frontside bus. Up to the limits of the old Intel MP specification it didn't matter if you had 1 CPU or 8, the rest of the system was all the same.

Intel's first "dual core" CPU, the "Pentium D", was literally just this (except with Pentium 4s) in one socket. Diagrams of it look like this:

fig01.jpg

If we ignore the fact that the Pentium 4 frontside bus was like 8 times faster than the Pentium IIIs this exactly the same setup, and it should behave exactly like a pair of Pentium IIIs. IE, all the cores in this system have to use MESI to keep an eye on the state of each other's caches and if something gets invalidated it has to be flushed all the way to main memory through that same shared frontside bus and pulled back in at the 800mhz rate the bus supports vs. the 3ghz-or-so you get between the cores and their private caches.

Let's look at Intel's first *real* multi-core chip, "Yonah", IE, the first Core Duo and the closest thing to a Pentium III:

yonahid.jpg

With this Intel kept essentially the same architecture of the processor cores, but the L2 cache on the die is shared between the two cores. This comes with a *tiny* bit of overhead on the bus segment between the cores and the cache, but in return it means that cache updates can be handled a lot more efficiently; if Core A accesses a location of memory and gets it cached and Core B subsequently uses it the cache logic steers it directly to the already cached copy. And likewise, if one of the cores updates a chunk of memory that the other CPU might be interested in it can be "lazier" about having to grab the slower FSB and push it out to main memory because the only "dirty" path that needs immediate attention is the more local L1 connection.(*)

(* in case you're wondering, Intel's related multi-socket-capable Xeon CPUs related to the Core Duos continued to run MESI on the frontside bus to ensure coherency across sockets. So they essentially still act like a multi-socket Pentium III system, IE, a bunch of cores that to the rest of the computer basically look like one CPU. The modern i-Core Xeons are NUMA and use a different crossbar system.)

In short, like I said, worst case a "dual core" CPU behaves exactly like a dual socket Pentium III, but most of the dual core CPUs actually in circulation are better because they have less bus overhead and faster interconnects. What is your basis for claiming a dual Pentium III "works better than dual-core" system? They are the same thing.
 
Last edited:
Any company that cheats on a benchmark should be excluded from potential purchases.
Such as Saxpy? (whose apparent raison d'etre was to corner the market on the saxpy portion of linpack).

"There are lies, damned lies and benchmarks..."

I was in that (supercomputer) racket for some time. "Maybe we can make a tweak to the (Fortran) compiler that will be absolutely useless to everyone, but give us an edge on this benchmark" kind of stuff.
 
Last edited:
Benchmarks are useful but people get into wars over minor differences we wouldn't even notice in real world usage.
 
Benchmarks attempt to measure performance, however, as I said earlier, performance benchmark lust can rot peoples minds into buying more hardware than they need. Also, they really cant account for the numerous real world scenarios.
Benchmarks are useful but people get into wars over minor differences we wouldn't even notice in real world usage.
This has been my point all along. Benchamrks give you pretty numbers but not a lot else. You'll have some fun with that Dual PIII.
 
This has been my point all along. Benchamrks give you pretty numbers but not a lot else. You'll have some fun with that Dual PIII.

A very specific claim was made, all I'd like to know is the basis for it. Again, I'm serious, read up on how the Intel MP standard used by Intel for multi-socket systems from the 486 through the P6 era works and you'll see that these machines *are* "multi-core" systems that from an electrical/bus standpoint masquerade as a single CPU. Pentium Pro family CPUs (like the PIII) use a bus called AGTL(+) and in a dual-CPU system +90% of the signal lines going to the two sockets are literally a daisy chain. (Intel recommended putting the system chipset in the middle instead of one end because of how the internal termination on the CPU cores was set up, but make no mistake, those two CPUs are wired up such that they're taking turns pretending to be the only CPU on the bus.)

I mean, I'm not saying this to say there's no point in playing with a multi-socket Pentium III system, if this vintage of thing counts as "retro fun" for the OP then by all means have fun with it. But this idea that there's some kind of fundamental difference between a dual-socket PIII and a single socket multi-core Intel CPU like a Core Duo is bonkers and just not true.
 
I always wanted a 6-way Pentium Pro but you need the complete setup and some kind of use for it. Dual PPro was good enough same with dual P3 and P2 slot 1 systems.
 
Back
Top