• Please review our updated Terms and Rules here

Performance Comparison (Qbus Systems)

Steve Toner

Experienced Member
Joined
Apr 6, 2020
Messages
378
Location
California Central Coast
The question of how an 11/53 compares with an 11/73 came up on another thread, so I decided to run a simple compute-bound program (FORTRAN IV for anyone not familiar with the language used) on a variety of machines:
Code:
C     COMPUTE PI USING NILIKANTHA SERIES
      DOUBLE PRECISION NUMER, DENOM, PI, OLDPI

      TYPE 5
    5 FORMAT(' PLEASE WAIT WHILE I COMPUTE THE VALUE OF PI...')

      PI    = 3D0
      NUMER = 4D0
      DENOM = 2D0

   10 OLDPI = PI
      PI = PI + NUMER/(DENOM*(DENOM+1)*(DENOM+2))
      IF (PI .EQ. OLDPI) GOTO 20
      NUMER = -NUMER
      DENOM = DENOM + 2D0
      GOTO 10

   20 TYPE 25,DENOM/2D0
   25 FORMAT(' AFTER ',F8.0,' ITERATIONS, I HAVE DETERMINED')
      TYPE 30,PI
   30 FORMAT(' PI IS APPROXIMATELY ',F17.15)
      END

This is obviously floating point heavy, so maybe not a good measure of general system performance, but it's what I had available. I know @Hunta has posted some comparisons of RT-11 generation times, which also provide a useful data point.

I ran this program on six different configurations of KDF and KDJ CPUs:
  1. M8186 w/KEF11 floating point chip, National NS23C memory (11/23)
  2. M8189 w/M8188 floating point processor, M8067 memory (11/23+)
  3. M7554 - 15MHz, no cache, no FPJ11, on-board RAM (11/53)
  4. M8192 - 15MHz w/no FPJ11, M7551 memory (11/73)
  5. M8192 - 15MHz w/FPJ11 floating point chip, M7551 memory (11/73 FP)
  6. M8190 - 15MHz w/no FPJ11, Clearpoint PMI memory (11/73+)
Results from slowest to fastest:

11/238 min 27 sec
11/23+4 min 20 sec
11/732 min 44 sec
11/532 min 43 sec
11/73+2 min 20 sec
11/73 FP1 min 56 sec

These results show that the test is not just a test of floating point performance, as the PMI memory of the 11/73+ does provide about a 15% improvement over the plain vanilla 11/73 (although the 11/73+ does also have dual-tag cache vs. the 11/73's single-tag - not sure what effect that has on these results).

Note that the 11/53, despite its lack of cache, provides basically the same performance as the basic 11/73. I'm going to speculate here that this is a result of faster access to the on-board RAM on the 11/53 vs. over the Qbus on the 11/73, but I'm certainly open to other interpretations...
 
...and to demonstrate that RAM speed does indeed make a difference in this test, I replaced the M8067 boards in the 11/23+ with an M7551. The frainresearch RAM Modules page(*) reports M8067 access time as 260 ns and M7551 access time as 358 ns, so we'd expect it to be slower with the M7551. And it is. Add to the above table:

11/23+ w/M75514 min 29 sec

(*) I'm too lazy to look up the numbers in the actual DEC documentation, so just have to trust this source...
 
Microcode J-11:
Code:
34567.89022+32109.754321 ->      97 825 op/sec
34567.89022*32109.754321 ->      29 653 op/sec
34567.89022/32109.754321 ->      30 044 op/sec
FPA
Code:
34567.89022+32109.754321 ->     395 440 op/sec
34567.89022*32109.754321 ->     395 452 op/sec
34567.89022/32109.754321 ->     322 778 op/sec
 
Very interesting. I asked about this, and was wondering if the lack of cache made the 11/53 more like a Pro/380. Granted the 380 runs at 10mhz instead of 15 but it looks like local memory can keep up with traditional cache plus Q Bus memory.

Pity, it would mean that a 20mhz clocked Pro 380 with local memory boards would have probably been as fast as an 11/93. Ah well, there is no way with cooling to run a DCJ11 at 20mhz?

I can also see why they abandoned PMI: Why bother when you can just stuff the memory on the motherboard and run every cycle at full speed.

Quick clarity point: Was test #6 done with a FPJ11 or without? And was test 5 done with PMI memory or without?
 
Test 5 uses M7551, which is not PMI. In any case, the M8192 does not support PMI memory.
Test 6, as stated, is without FPJ11. I have not tried transferring the chip (I only have one) over to the M8190 board to see how well it will perform in that system...
 
It doesn't support PMI memory, but it works. Just like the LSI-11/2, KDF11-A, or KDF11-B... :)
Well yes, you can use PMI memory as Qbus memory (in a Q/CD backplane), but I don't think that was the question.
I wouldn't try an LSI-11/2 in a 22-bit backplane with any 22-bit cards (which would certainly include any PMI memory board), as it uses those high 4 bit address extension lines for other things:

Qbus.jpg

(for anyone not familiar, the KD11-HA is the 11/02 processor)
 
Interesting. One of the last PDPs I ordered and configured for a customer, I had 2 options: 11/73 with a smaller disk (probably an RD53) and a little more memory or an 11/53 with a newer disk (RD54 probably) and less memory than the 11/73. I remember weighing the two and selected the 11/53 with the higher-performing (at least, that's what our internal specs listed) disk. Wonder if I made the right choice.
 
Hi Steve - Very interesting information - so a couple of things...
In "real world" usage, the 11/53 was generally slower than the 11/73 due to the lack of cache, even when only using the onboard 0.5 or 1.5MB RAM - depending on the version - and it got slower still if extra Q-Bus Ram was added.. But of course - different workloads highlight different areas of system performance, but I am surprised that the 11/53 and 11/73 come out at about the same speed..

If you have an executable (.SAV) file for RT-11 - I can run it on my 11/83 with FPU and PMI - or other things for comparison (I can also try on another 11/53) - I say the executable to avoid compiler differences and so on for a fair comparison...

I do think some standalone benchmarks like this for different aspects of system performance would be useful and interesting, e.g. fixed and floating point performance, as well as disk throughput etc.

If it can't time itself, you could put it in a batch file like:

tim 0:00
test.sav
tim

to reset the clock and let the system do the timing for you, of course

Robin
 
In "real world" usage, the 11/53 was generally slower than the 11/73 due to the lack of cache, even when only using the onboard 0.5 or 1.5MB RAM - depending on the version - and it got slower still if extra Q-Bus Ram was added..
Going off on this tangent: Did any OS have some support for using the slower ram for less often accessed things, and the faster ram for the more often accessed things?
 
On disks, yes, in RAM—I don't recall anything. But back in the distant 1980s, a rumor circulated in a small Soviet team that the French had developed an operating system that handled RAM using the write-once-read-many principle. I'm not sure if it was just a rumor or if such an operating system actually existed, but I've encountered some pretty exotic operating systems, so the scenario from the question above could have existed as well...
 
Going off on this tangent: Did any OS have some support for using the slower ram for less often accessed things, and the faster ram for the more often accessed things?
Sure. In RSX11 you can set up a memory partition and load your stuff into that. Putting the OS, cache, and whatnot in the "fast" memory areas is optimal, leave GEN for the slug users.
 
If you have an executable (.SAV) file for RT-11 - I can run it on my 11/83 with FPU and PMI - or other things for comparison (I can also try on another 11/53) - I say the executable to avoid compiler differences and so on for a fair comparison...
Yes, I'm running RT-11 (V5.07) in all cases. I pulled the files off one of the machines with kermit & attached a zip file for anyone who wants to try it.

I already had a batch file to execute it, though I don't set the time to 0:00 at the start, so it does require you to subtract time values as supplied.

I'll be curious to see if there is any measurable difference between 50Hz and 60Hz LTC - it is interrupting to keep track of the time, and that may also be enough to impact the cache efficiency, which might help explain the 11/73 vs. 11/53 results.

For anyone needing instructions:
  1. Copy the files to the DK: drive on your RT-11 machine. Make sure to SET FILE TYPE BINARY (for kermit, or equivalent for whatever you're using).
  2. Enter @PI at the system console to run the batch file
  3. Subtract start time from end time
 

Attachments

Something I said in my last post made me wonder if system overhead is having any effect on the results, so I tried a couple more experiments:

  • I've been running XM on every machine but the 11/53, which is running ZM. I tried running XM on the 11/53 and there was no change in the execution time.
  • I tried disabling the LTC and timing the process by hand. No measurable difference on either the 11/53 or the 11/73.
 
So - Some results from my PDP-11/83, with 1 x 1MB MSV11-J and 1 x 2MB MSV11-J, FPU etc...

(Time is reset before test runs in each case)

Under TSX there are some other (idle) tasks running and the OS overhead etc.

Regards, Robin


Under RT-11SB:

.SET TT QUIET
PLEASE WAIT WHILE I COMPUTE THE VALUE OF PI...
AFTER 262144. ITERATIONS, I HAVE DETERMINED
PI IS APPROXIMATELY 3.141592653589796
STOP --
00:01:16


Under TSX:

.@pi
.SET TT QUIET
PLEASE WAIT WHILE I COMPUTE THE VALUE OF PI...
AFTER 262144. ITERATIONS, I HAVE DETERMINED
PI IS APPROXIMATELY 3.141592653589796
STOP --
00:01:32
1763215960949.png
 
Was going to try this on my POS 3.2 Pro/380 but I can't find my Pro/Fortran disks. Is there a copy online somewhere (I also have Pro/Basic and Pro/Cobol but they are packed away and on my 2.0 disk)
 
Two more data points:

I got curious about the performance of the Clearpoint PMI memory vs. the DEC M8637, so swapped an M8637 into the 11/73+ system.
Runtime: 2 min 19 sec

And I pulled the Floating Point chip off the M8192-YC and plugged it into the M8190 (running with the Clearpoint memory). That makes this system almost an 11/83 - only the processor speed is different (15.2 MHz vs 18 Mhz):
Runtime: 1 min 30 sec (which matches exactly the clock speed difference between this machine and Radix's 11/83 :) )
 
Interesting. I'd expect the CP memory to be faster as it's basically a 64 bit memory board that fetches into its' own cache then double bucks into the CPU over 4 cycles. IE Cycle 1 is at memory speed (the fetch) then 2,3,4 are at 0 wait state speeds. The DEC memory does cycle 1 at memory speed, then cycle 2 at 0 wait state.

I have noticed boots are faster with the CP memory than the DEC memories.
 
Back
Top