• Please review our updated Terms and Rules here

PDP-9 at the RICM

After lots of studying of the Maintenance Manual and schematics we learned that the CPU pauses when executing IOT instructions and lets the I/O controller handle the I/O functions. When the IOT instruction is complete, the I/O controller tells the CPU that it can resume executing instructions.

In our case when running at full speed, and after running for a few minutes, the IOF instruction would not turn interrupts off. We could see three IOT timing pulses for the ION instruction, then two of the three pulses for the IOF instruction, and then three pulses for the IORS instruction. There is a Gray Code counter in the I/O controller that counts the IOP states and after IOP 4 generates a IO RESTART signal to restart the CPU.

In the 'scope image below you can see the IO 0(1) flip-flop part of the Gray Code counter in the purple trace looks OK, but the IO 1(1) flip-flop has a weak transistor. The weak transistor looked OK when measuring the diode-drops and Beta with a DVM. In our Tektronix transistor curve tracer the weak transistor barely turned on. We replaced both 2N3639 transistors in that half of the S203 flipchip and now the system runs OK.
DS1Z_QuickPrint6.png

The Gray Code counter signals look much better with the new transistors installed.

DS1Z_QuickPrint8.png
 
Hey Mike:

Were both transistors weak or was it just the one that had no gain?

Does the PDP-9 have the margin test setup like the Straight 8? This seems like a perfect time to run diags at margins to find weak transistors.

The last time I ran margin tests on my Straight 8 was 2008. And I found nothing with them at the time. It was actually a little disappointing to find nothing. The maint manual seems to think that the margin tests should be run monthly. Our DEC guy would only show up when called as he had to travel 400 miles. And I don't remember him messing with the margin tests.
 
Doug,

I need to investigate the broken transistors a little further. One of the transistors on the S202 appeared to behave normally, but the beta was lower than what you see with a modern transistor. The other misbehaving transistor had no beta on the transistor curve tracer. That transistor had normal diode drops of about 1.0V B->E and B->C, and infinite in the opposite direction. I am not sure what is broken inside of the transistor, and it needs further investigation.

Yes, the PDP-9 has margin power supplies for the +10V and the -15V. There are switches in the fan trays that disconnect a section of the chassis from the normal power supply and connect it to the margin supplies. If I remember correctly one margin test will find leaky transistors, and the other test will find low beta transistors.

I am really afraid that running margin tests will find hundreds of transistors that should be replaced.
 
I remember looking at the characteristics of one of the original transistors. I don't remember which one exactly but it was on the R211 board which is a bit of the AC. When new the Beta was about 25 and the one I was having problems with was down to about 15. This was sometime around 1985.

I know what you mean about being afraid of finding hundreds of transistors. But if you run it and find nothing like I have every time it is a little depressing. I've wanted it to find something and never have.

For those that don't know what we are talking about, there is a variac that generates a voltage that is fed into diode bridge and a big cap. There is an analog meter across the output voltage. It only has enough power to drive two rows of cards at a time. This was probably so you could margin test the double wide cards. There is a switch on each row of cards that will disconnect the regular supply and use the marginal supply for that row. One switch for the -15 and another for the +10. You need to make certain the voltage is below certain limits before you switch it in. The machine should be halted when you flip the switch. You run diagnostics while changing the voltage. If I remember correctly the -15 can be tested as high as -18 and as low as -12 and the +10 has a more limited range.

The idea was to find failing components before they failed.
 
The Maintenance Manual suggests periodically running MAINDEC diagnostics and recording the margin voltage levels that cause problems. If the problematic voltage level gets lower over time, then components are aging and will fail.

Increasing the +10V will cause low gain(Beta) transistors to fail, and decreasing the +10V will cause high leakage transistors to fail. Lowering the +10V has the same operational characteristics as high-temperature and can be used to check for thermal runaway. The manual is vague about what the -15V margin is for, and just says that it reduces the output signal levels from the modules.

The system schematics have a diagram that shows what modules are powered from what margin switches.

These images show the front and back of the PDP-9 Margin Voltage Control.
RICM_PDP-9_Margin-Control-Front.jpg
RICM_PDP-9_Margin-Control-Back.jpg

This image shows the switches on the fan trays that enable Margin Checking.
RICM_PDP-9_Margin-Control-Switches.jpg
 
We were able to run ADSS V5A on the PDP-9 today, and run the "HELLO WORLD" program written in Fortran IV.

I guess that we can declare it the only running PDP-9, at least until Mattis or Anders gets theirs working.
 
Now that would be cool. How many flipchips does it take to add the EAE? Can't be too hard to write a dectape driver, can it? This must be an amazing machine to see. I have never seen one. And not likely to ever see one. I applaud you getting it running.
 
Argh. It ran fine early this afternoon and after an hour would not boot ADSS. It passes MAINDEC 9A-D01A INSTRUCTION TEST PART 1, so most of the processor is functional. It halted at E1135 in MAINDEC-9A-D02A Instruction Test Part 2. This part loads 777777 from memory into the AC, deposits zeros into memory at 17777, compliments the AC, and skips over the HALT instruction if the AC = 0. The memory location contained 773777 instead of 777777 so it failed. Now I need to determine if the paper tape loader microcode failed or if there is a problem with memory. I will fix this on Saturday if we don't have too many visitors.
 
We reloaded MAINDEC-9A-D02A Instruction Test Part 2. Where the constants for the program start in core, 006307, the first 7 constants are missing, and the rest are at 7 locations lower in core than they belong. This really confuses the program.

We tried MAINDEC-9A-D0DB JMP SELF TEST, and it works just fine. This tests the CLK and the PIE, so lots of the system are working OK.
 
We ran MAINDEC-9A-D0FA JMS-Y Interrupt Test, MAINDEC-9A-D0BA ISZ Test, MAINDEC-9A-D0CA Memory Address Test, and MAINDEC-9A-D1AA PDP-9 Basic Memory Checkerboard Test. The Checkerboard test consistently reported a bit-7 error at address 17515. We disabled bit-7 error checking, ran the test for about 30 minutes to make sure everything else was OK, and enabled bit-7 error checking. The Checkerboard ran fine and reported no errors. I am sure that this memory error will be back.

We tried to boot ADSS from DECtape and consistently got a timing error on the TC02 DECtape controller. We tried several DECtapes and always got the same error. We didn't try another TU55 DECtape drive, so we will do that next time. If we still get a timing error, we will start debugging the TC02 DECtape controller.
 
A different TU55 DECtape drive didn't make a difference, so the timing error problem is in the TC02 DECtape controller.

We ran the MAINDEC-9A-D3BB TC02 Basic Exerciser, and were surprised to see that the Move Scope Loop would move the DECtape end to end, so it could read the Timing and Mark tracks. Any other test would result in a Timing Error. Then the TC02 diag would not run, so something else was broken.

MAINDEC 9A-D01A INSTRUCTION TEST PART 1 ran OK. MAINDEC-9A-D02A Instruction Test Part 2 didn't load correctly. While we were examining memory and comparing it to the listing we found that Address Register bit-13 was stuck on. We swapped the B213 FlipChip in slot C30 for the one in slot C34 to see if the stuck bit moved from bit-13 to bit-15. That actually fixed the stuck bit problem. Maybe dirty fingers on the FlipChip? I bet that this problem will be back. The diag ran fine after reloading it from paper tape.

MAINDEC-9A-D1AA PDP-9 Basic Memory Checkerboard Test ran OK, so we tried to boot the ADSS monitor. That failed with a Timing Error in the DECtape controller, so we are back to debugging the TC02.
 
Well, the next debugging session didn't go so well. There is an instability in the processor when it is running the TC02 Basic Exerciser that makes it stick in a loop instead of halting when it detects an error. In the middle of single-stepping through the TC02 Basic Exerciser to find the problem the CONTINUE switch stopped working. More debugging on Saturday.
 
The TC02 manual says that there are three possible sources of the timing (TIM) error:
  1. A DataBreak request is not serviced within 66 us (+/-30%)
  2. The DECtape Flag (DTF) was not cleared by the processor before the TC02 tried to set it again
  3. A Read or Write Data function was entered while the DECtape was positioned in the Data portion of the tape block
We checked the behavior of the DF flip-flop that requests a data-break. It is only on for 2-6 us, so the data-break request is getting processed quickly. It looks like we are back to investigating cause #2. The DTF light is on when the TIM error light goes on. Normally the processor executes a DTXA instruction to turn off the DTF flip-flop. I looked through the the code in the D3BB Maindec, and found a DTXA to make sure that the GO bit is on, but nothing to turn off the DTF. I guess that means that the DTF is unexpected, and some hardware is broken. We will look at that Saturday.
 
After several more hours of debugging the TC02, especially the W104 module that handles the data-break interface, we determined that the problem isn't in the TC02. The DCH RQ from the TC02 goes active when a block number us found, then the DCH GRANT signal from the processor goes active, SELECT(76) goes active to indicate that the TC02 has been selected and it can put the Word Count address on the I/O Bus, and then it hangs like that. The processor stays running the MainDEC code, but the data-break never finishes. Since the DCH GRANT signal from the processor stays active the DTEN A and DTEN B signals stay active, and the ROTATE DTB 12-17/RWB signal finally goes active and turns on the TIM error flip-flop. The TIM error stops everything.

If we halt the processor, turn on SING STEP, and try to single step using the CONTINUE switch, nothing happens. We suspect that the processor is in a weird state because the data-break never finished.
 
After several more hours of debugging the TC02, especially the W104 module that handles the data-break interface, we determined that the problem isn't in the TC02. The DCH RQ from the TC02 goes active when a block number us found, then the DCH GRANT signal from the processor goes active, SELECT(76) goes active to indicate that the TC02 has been selected and it can put the Word Count address on the I/O Bus, and then it hangs like that. The processor stays running the MainDEC code, but the data-break never finishes. Since the DCH GRANT signal from the processor stays active the DTEN A and DTEN B signals stay active, and the ROTATE DTB 12-17/RWB signal finally goes active and turns on the TIM error flip-flop. The TIM error stops everything.

If we halt the processor, turn on SING STEP, and try to single step using the CONTINUE switch, nothing happens. We suspect that the processor is in a weird state because the data-break never finished.

Which components fails usually? How do they fail?

I'm trying to understand what a tester must be able to find.
 
Which components fails usually? How do they fail?
I have only seen failed transistors and diodes. Well, sometimes bad solder joints that make connections open.

The most common failure modes for diodes and transistors are shorts and opens. The FlipChip just stops working as expected. A tester that had pulldown resistors on the input signals and pulled the inputs to ground should be enough. To sense the output signals the negative logic to TTL converter would need to have thresholds similar to what is defined for DEC's negative logic.

Transistors can have high leakage or low gain so the signals don't make it to ground or to -3V. Testing for gain and leakage would be difficult, and might not be worth the effort. I think that the work that you started would be enough to get a prototype working and would be very helpful.
 
Which components fails usually? How do they fail?
Agree with above. For diodes one of the big failure modes I have seen is leaky where with my DVM forward voltage drop is lower than normal, < 0.5V and reverse voltage drop doesn't show as over range. When the diode is the input to the logic it loads down the entire net and is a pain to find. The proper -3V low will like -1V due to the leakage. Checking input current would be a good test to find this fault.
 
Back
Top