XP will fully support those PIIIs 100%. I used a pair of 1.4ghz Tuatins for years on XP.
You want AT LEAST 600W for a dual PIII.
If you stay within software of the era, yes the PIII will be fine. However, if you want to fully patch the system to SP3 and run more modern software, you're going to run into headaches. Late XP era was when software started requiring SSE2, which the PIII doesn't have.
Also, you in no way need a 600 W power supply for a dual PIII system. You will however need a PSU that has a strong +5v rail. If an older 400-450W unit could be sourced, that'd be the best option.
Just to be clear, the two CPUs in a dual Pentium III system are not the same as having two separate computers; they are sitting on the same frontside bus in parallel and an arbitration system makes them take turns accessing every resource outside their onboard caches. They are not somehow “more independent” than a dual-core cpu, and in fact they’re at a significant disadvantage here because a multi-core system that has the cores and cache all sharing the same die can handle cache coherence issues much faster internally than over the bus.
Multi-core doesn't mean all of the cores are on the same die. The Pentium D was two Pentium 4 dies on the same package, and was basically the same as a dual socket system. Both cores had to go over the FSB to talk to each other. The Core 2 Duo moved the cores to a monolithic die, but still had a FSB. The Core 2 Quads had two dies of two cores each on a shared FSB. While both the C2D and C2Q had some cache coherency (the separate dies didn't), the cores still had to arbitrate over the FSB for resource access. It wasn't until Nehalem did Intel do away with the FSB, something AMD did years earlier with the Athlon 64.
(I think multicore AMD “Hammer” based systems were the first “consumer-grade” x86 hardware with a NUMA memory architecture?)
The AMD Opteron in 2003 was the first AMD CPU that was NUMA capable. There really was no other way to implement a multi-socket motherboard otherwise, since the memory controller was moved into the CPU itself.
NUMA was pretty terrible up until the Windows 7 era, because nothing supported it properly, not even Linux. NUMA support wasn't added until the 2.5 kernel version around 2005, and it wasn't really made usable until 3.8 around 2013. I've had a few NUMA servers and performance issues were always a headache. Sometimes the task scheduler forgets what node a process is running on, and will split a multi-thread process across two nodes and fall flat on its face. It's a similar but worse problem than the "faildozer" where two integer units share one FPU and resource contention causes terrible performance.
Another major headache is the PCIe bus. Since PCIe lanes come directly off the CPU, server motherboards can have quite literally any mapping of PCIe lanes. You can have one CPU connected to all of the PCIe slots, the slots split between each CPU, or slots on one CPU and other slots coming off the PCH. This can create some pretty severe performance penalties, because if say you have a video card on CPU0 and an application on CPU1 that wants to use said video card, you have to go to the local CPU, which then goes over the NUMA bus to CPU0 and then out to the PCIe slot. Since the NUMA bus is often A LOT slower than a PCIe slot, it cripples performance, and causes traffic for the rest of the system to back up.
That's one thing that I miss about FSB systems, they at least guaranteed equal access to all hardware from the CPU, even though it could get quite congested.