SIMH/os8diskserver/PDP-8

m_thompson · Mar 31, 2016

Mike_Z said:
Mike, sorry to hear that during your spruce up our machine failed. I bet something with the front panel either broke or didn't get re connected. Probably a user error, that's never happened to me. Mike

The -15V supply is only -4.22. -15V biases the switches on the front panel, so the switches should not work. I don't know why the LA is working. Time to take that beast of a power supply apart again.

daver2 · Mar 31, 2016

Mike,

It is either a CPU instruction, Extended arithmetic instruction or memory that is causing the problem... You just need to identify which!

The 'correct' sequence for the MAINDEC diagnostics is something like:

CPU #1
CPU #2
Remaining CPU tests (e.g. random AND etc.).
Memory management logic.
Memory
Extended instruction #1
Extended instruction #2.

I think you can run Extended instruction #1 at any time (i.e. after the CPU tests) but Extended instruction #2 must be run after you have tested the memory.

Within each MAINDEC document it identifies the tests that should have been run prior to the test that it documents.

If all the MAINDECs pass - and you still have FP problems - this becomes more interesting!

Gut feeling would be an extended instruction.

Dave

Mike_Z · Mar 31, 2016

Dave, I have started.
DOAB CPU#1 PASS
DOBB CPU#2 PASS
DOCC Adder test PASS --- Did this, this morning. It ran through twice and reported no errors

DHKHA Extended memory & checkerboard PASS
DHMCA Memory extension PASS

You mentioned 'Memory Management Logic' was this covered in either DHKMA or DHMCA?

Hope to run a few more CPU tests, maybe

DODB Random AND
DOEB Random ISZ
DOFC Random DCA

If these run quickly maybe I'll get to more, depends on what else comes up today. Thanks for the help. Hopefully I can get this machine in shape. Mike

daver2 · Mar 31, 2016

Memory Management Logic = DHMCA.

Sorry, I was running off to a meeting at work so didn't have time to lookup the test identifier.

If the memory management logic and memory tests run OK - I would be inclined to run the two EAE tests (Extended Instruction tests) and see what happens... I would like to bet that these instructions are used heavily within the floating point library code for speed purposes.

Start one running - nice day - model 'T' drive time!

Dave

Mike_Z · Mar 31, 2016

I continued on with the extra CPU tests, but for some reason, again, I had to power down and up to get BINLDR to work correctly. I loaded the RIM loader, then the BINLDR, which both appeared to loaded correctly. I have a couple of addresses that I look at prior to running them. Then I tried to load D0DB. The program seemed to run, but I never received any "A's" on the TTY. Not sure whether or not it was running. So I tried the same with D0EB and had the same behavior. I then checked the code that was loaded with the listing and it was not correct. Powered the machine down and up. Loaded the programs and everything worked as advertised.

D0DB runs and about every 2-3 seconds the TTY will print an 'A'. Ran for an hour.
D0EB runs and about every 5-6 seconds the TTY will print an 'T'. Ran for an hour.
D0FC runs and I did both parts. It is difficult to tell whether or not this program is doing anything. There were no error print outs in either case.

Tomorrow more attempts of test will be made. Mike

daver2 · Mar 31, 2016

D0DB (AND) and D0EB (TAD) seem to be doing what I would have expected.

D0FC on the other hands isn't. It should be printing FC out on a successful completion.

Are you setting SWR to 0000 before running the program? This should be the 'typical' setting.

Don't do a PART 2 test - it takes "many days" (according to the documentation)! Is this your problem? Try PART 1 only (SWR = 0000) and see what happens.

D0FC should print out an FC at about 3 times that of D0DB (6-9 seconds at a guess) for an SWR setting of 0000.

Dave

Mike_Z · Mar 31, 2016

Yes I had SR=0000 before running the test. I'll reload it and try again. Mike

AK6DN · Apr 1, 2016

I don't recall (or can find in back posts) whether your machine has the two board EAE option (M8340,M8341). If so, it could uniquely affect the fortran program output without affecting the normal operation of OS8, as OS8 does not use any EAE instructions, but the fortran libraries will/can, if the option is detected.

I have had more problems keeping the EAE functions running correctly in my PDP-8m(s) than any of the other boards (including core memory).

I would recommend you run the EAE diagnostics as well, assuming you have those modules: http://ak6dn.dyndns.org/PDP-8/MAINDEC/KE8-E_extended_arithmetic_element/
They are D0LB and D0MB (D0LA is an earlier revision of the first).

Don

m_thompson · Apr 1, 2016

There are many different revisions of the CPU and EAE boards, and many ECOs to the boards. If you don't have the right combination of board revisions you can get unpredictable behavior. Run the EAE diags to make sure that it is working correctly.

Mike_Z · Apr 1, 2016

This morning I loaded D0FC again and then checked the code against the listing. D0FC failed every time I tried it yesterday. I found that the last 5 addresses in the program had incorrect data in those spots. I checked my downloaded file and the values in the file are correct, but for some reason my machine either loaded the wrong data or just spotted prematurely. This is what I found

Code:

ADDR  Should Be     What My Machine had
7610   5202            7777
7611   5217            5217

7617   6042            3042
7620   6001            4143
7621   5437            1335

I hand keyed in the correct values and the program ran correctly. I would see 'FC' on the TTY every 7-8 seconds. Ran the program for an hour with no errors.

I think I'll finish all the extra CPU tests, then move onto the EAE tests. I have the M837 - KM8E memory extension control board installed in my machine. I have a spare card and it doesn't seem to make a difference. Mike

Mike_Z · Apr 1, 2016

The D0GC and D0HC programs loaded and worked fine. D0GC would beep every 4-5 seconds and D0HC would display HC every 9-10 seconds. Then I could not load BINLDR again. I had to power down and up, before BINLDR would be deposited into memory correctly. This time I only had the machine off for 5 minutes. I have reduced the off time each time this has happened. Next time I'm just going to turn it off and on right away, to see if that will correct the problem. I figure if that does work, then there is some thing set that is not being reset and not a heat problem. Anyway we will see.

After re starting I got D0IC to load, but had the following errors

Code:

ADDR Should have been Was in my machine
0200 5601             5441
0201 0600             5556

0600 7200             5663

The programs beeps every 8-9 seconds.

D0JC loaded with one data error, but ran fine. It would display 'JC' every 10 seconds or so. Tomorrow I'll try the EAE programs.

Maybe I need an I/O test program also? Mike

daver2 · Apr 1, 2016

It's getting late in the UK - I'll try and digest this tomorrow.

Just as a matter of interest - can you explain the process you are using to load the diagnostic and execute it.

If the BINLDR halts; AC should be 0000 if no checksum error occurred. None zero if a checksum error occurred.

So, after the BINLDR halts - a non-zero value in AC indicates an error and you need to reload the program. Don't try and execute it - it won't work.

Depending on the binary loader you are using - there may be a way to stop the loader from executing the loaded program by setting SR<0> to a 1 before/during the load. You can then check the program that was loaded before it is executed (and ends up corrupting stuff before you can look at it).

When you say you couldn't load the BINLOADER again - do you mean that RIM is corrupt and won't load the BINLOADER or the CPU won't load the BINLOADER even though the RIM code looks OK in memory when you examine it?

Dave

AK6DN · Apr 2, 2016

daver2 said:
It's getting late in the UK - I'll try and digest this tomorrow.

Just as a matter of interest - can you explain the process you are using to load the diagnostic and execute it.

If the BINLDR halts; AC should be 0000 if no checksum error occurred. None zero if a checksum error occurred.

So, after the BINLDR halts - a non-zero value in AC indicates an error and you need to reload the program. Don't try and execute it - it won't work.

Depending on the binary loader you are using - there may be a way to stop the loader from executing the loaded program by setting SR<0> to a 1 before/during the load. You can then check the program that was loaded before it is executed (and ends up corrupting stuff before you can look at it).

When you say you couldn't load the BINLOADER again - do you mean that RIM is corrupt and won't load the BINLOADER or the CPU won't load the BINLOADER even though the RIM code looks OK in memory when you examine it?

Dave

A bit of a mea culpa here. I may have added some confusion with my posting of my customized/adapted low speed binary loader (BINLDR.*) on my MAINDEC page: http://ak6dn.dyndns.org/PDP-8/MAINDEC/

After looking at the discrepancies noted in post #130 upon loading the D0FC diagnostic (ie, some locations in the 76xx region were different from expected) I looked at both my BINLDR listing, the DEC standard lo/hi speed bin loader listing, and the diagnostic listing for D0FC. Turns out DEC needed some extra space in the D0FC diagnostic, and used a few words in the low 76xx region of memory. These words were unused in the 'DEC standard binary' loader, but due to some code shuffling on my part were used in my custom low speed loader BINLDR as posted on my website. The noted issue of memory content differences in post #130 is thus completely explained. No other diagnostics seem to be affected, as they all only use locations below 7600(8).

So, I did go ahead and fix my custom low speed binary loader to not use these locations (ie, 7600-7611,7617-7625) and only use locations 7612-7616 and 7626-7752 as are used by the DEC standard binary loader. I deleted references to BINLDR.* on my page, and now call the updated version DNNBIN.* to note that it is customized.

I also placed the standard DEC RIM and DEC lo/hi speed BIN loaders on that page as DECRIM.* and DECBIN.*, for use or reference.

A further note: my custom low speed BIN loader DNNBIN is customized in several ways:

it supports the lo speed reader (ie, console) interface only
if you load it with the standard RIM loader it will auto run
after you load a BIN, and if the computed checksum is OK (matches) and SR<0>=0, it will autostart the loaded program at location 200(8) (ie, the contents of location 7740(8)).
if SR<0>=1 it will go back to the start of the BIN loader waiting for another block of code to be loaded. If there is a checksum error it will HALT independent of SR<0> setting.

Don

daver2 · Apr 2, 2016

Problem solved then.

I thought I hadn't seen a BINLDR that had the SW<0> trick. Now I know why...

So at least it isn't a problem with what Mike's doing or his machine - so that should allow him to run the diagnostics without corruption from now on.

See what happens then.

Thanks for a new take on the BINLDR though!

Dave

Mike_Z · Apr 2, 2016

This morning I downloaded both the DECBIN and DNNBIN. I tried

D0GC, D0HC, D0IB, D0JC

And they all worked. The DECBIN halted at the finish of loading and DNNBIN would start the loaded program. I did have some trouble with D0IB, I think it over wrote some addresses in the bin and rim loaders. I tried it twice and had the same result, could have been my error, but....

So, I'm moving on to D0LB. I loaded D0LB, set the start address to 0200 and set SR=5000. The program ran and almost immediately HALTED at 5510. The program description says that a halt at 5010 is

SWAB failed to set mode B or DPSZ failed.

I looked up SWAB (I have very little information on these) and it says 'Switch from Mode A to Mode B'. I don't know what Mode A or B is.

DPSZ is Double Precision Skip if zero. This is obviously something that pertains to Floating Point Math.

I had the impression that this program would print out the error, but nothing was printed. I tried a few different SR settings, but the result was always the same.

So, what's next? I don't know what to do about the SWAB. Maybe I can think of something to directly test DPSZ. Ideas? Thanks Mike

daver2 · Apr 2, 2016

Better news (although you may not think so at the moment).

If you look at the code for D0LB (in the PDF manual) you will see the failing bit of code at address 5001 onwards.

The CAM instruction (7621) should clear AC and MQ to zero.

The DPSZ checks that both AC and MQ are both zero and skips the next instruction if they are.

There are two modes of operation of the EAE - mode 'A" and mode 'B'. Mode 'A' (the default on power up) is a form of compatibility mode with the PDP-8/I where the extended arithmetic instructions were performed by software. Mode 'B' is where the extended arithmetic instructions are performed in hardware.

SWAB (instruction 7447) switches from mode 'A' to 'B'.

SWBA (instruction 7431) switches from mode 'B' to 'A'.

The 'mode' register is a single J/K flip-flop within the EAE hardware.

Armed with this knowledge, I would load D0LB. Set PC=5001 and single step through the instructions and see what doesn't work.

You might find this pocket reference card useful - http://homepage.cs.uiowa.edu/~jones/pdp8/refcard/74.html.

After the SWAB at 5005, the CAM at 5006 should set the AC and MQ registers to zero. If not (as seen from the front panel) there's your problem.

If the AC and MQ registers are both zero after the CAM, the DPSZ at 5007 should skip the halt instruction at 5010. I assume this is not the case and you are crashing into the HALT at 5010.

If you didn't want to single step it - I suppose you could wait for it to HALT and tell us what the values are in the AC and MQ registers.

Some errors produce HALTS and no messages - other errors produce messages. I assume the difference is whether a fault is classed as 'fatal' or not (i.e. you can continue after the fault). Also, too many error messages just makes the diagnostic too big!

Dave

EDIT: Just had a thought. You have got the little PCB's fitted linking the EAE cards together and the cards to the major registers haven't you?

Mike_Z · Apr 2, 2016

Dave I'll give it a try. I have also found some EAE documentation DEC-8E-HR2B-D KM8, which seems to be for the M837. I want to read this and see if it will aide my understanding of what is going on. There is also a document for the M8340/41, are these boards similar to the M837? Obviously they must be a later version. Would there be any benefit to reading that also, or would it confuse the issue? Thanks for the help. Mike

Mike_Z · Apr 2, 2016

I think that I have just had an enlightening. I think that the problem that I'm having is I do not have a M8340/41 board in my machine. I have been confusing M837 which is extended memory and time sharing for the M8340/41 which is EAE. Geeezzz. So I suppose that which out the M8340/41, Fortran will not function properly? Am I correct? Mike

daver2 · Apr 2, 2016

In the words of Mythbuster Jamie Hyneman - "There's yer problem"...

Yes. Without the M8340/41 EAE boards you are stuffed with FORTRAN (unless you use the variant that doesn't expect the EAE hardware of course).

Dave

Mike_Z · Apr 2, 2016

Rats! I can only claim ignorance, but I'm not as ignorant as I was. Mike

SIMH/os8diskserver/PDP-8

Veteran Member

10k Member

Veteran Member

10k Member

Veteran Member

10k Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

10k Member

Veteran Member

10k Member

Veteran Member

10k Member

Veteran Member

Veteran Member

10k Member

Veteran Member