• Please review our updated Terms and Rules here

help on troubleshooting multi board multi processor system

Klyball

Experienced Member
Joined
Feb 15, 2014
Messages
162
Location
Surrey, BC
I am finally getting around to tackle my grid server unit, and could use some advice on trouble shooting one of the boards.

its a 3 board system ,a diagnostic board with a 8031 micro controller , a file server board with 80186 , and a com server board with a 80186
as far as i can tell there are no schematics or manuals for this beast.
The first problem i need to tackle is getting the com server board to be recognized by the diagnostic board. When the system is powered on the diagnostic board scans the buss for installed boards
and reports them. the diagnostic board and file sever board are recognized but the com server is not.
from visual inspection the com and file boards have the same basic interface circuits so that is good.the com board seems to start and run but i assume when the diagnostic board is requesting data from it the whole board locks up then goes back to running what ever its running.

My question is how would you go about tackling this ,with no schematic or information.

here is a screen of the boot up the com board is in slot 2

the errors are from no drives attached, that's a whole different issue for down the road

Code:
SYSTEM RESET
0D ERR 18 00
0D TEST DONE
1F TEST DONE
2 SLOT EMPTY
3 SLOT EMPTY
4 SLOT EMPTY
5 SLOT EMPTY
6 SLOT EMPTY
7 SLOT EMPTY
*?

Boot          - Boot system from system disk
Boot x        - Boot system from device 'x'
Clear         - Clear display
Configuration - Display card slot configuration
Connect x     - Connect to processor in slot X
Log           - List display log on screen
Exec          - Connect to file processor
Test          - Run high-level diagnostics
<break>       - Return to diagnostic processor prompt
*CONFIG
Slot    Card Type       PROM Version    RAM Size
                                   (in 128kB blocks)

 0      diag processor       3.0           1
 1      file processor       3.0           2
 2      Empty
 3      Empty
 4      Empty
 5      Empty
 6      Empty
 7      Empty

*CONNECT 1
Error: 1F 452 BOOTw
Error: 1F 107 BOOT0
Error: 1F 107 BOOT1
Error: 1F 107 BOOT2
Error: 1F 107 BOOT3
Error: 1F 107 BOOT4
Error: 1F 107 BOOT5
Error: 1F 107 BOOT6
Error: 1F 107 BOOT7
Error: 1F 452 BOOTc
Error: 1F 452 BOOTf
 
I suggest that you post some pictures and hope that somebody recognizes this stuff. Any chance you have a standard interconnecting bus?
 
When it comes to GRiD stuff you're in the right place, literally. There's a guy down in Seattle who is pretty experienced in GRiD hardware and I've worked with the later machines myself.
Personally though I've heard rumors about the GRiD server and seen it mentioned in documentation but never seen it before outside of a low res product photo, so I couldn't even begin to help you unless I dropped by and began poking at the thing.

That being said, pictures would be nice to see as nobody here has really seen one.
 
Oh man, indeed it's way more custom than I expected.
well I see no signs of a fuse or anything that might kill the board's power if something shorted out and presumably you've checked that at least logic voltage is present. Have you tried reseating the EPROMs? Also being near the coast it wouldn't hurt to verify the landings on the 186 are not oxidized. I've had issues on 286 machines from down at the port where the machines were absolute crap for reliability until the CPU was pulled and the socket and chip cleaned with an eraser and contact cleaner.
 
As far as i can tell , the 3 board are sbc's that run independently but also communicate with each other, the diag board seems to be fully functioning other then the error 18 , and im gonna guess that its means lost config due to the dead battery, but thats just a guess. the file board seems to be fine also it reports to the diag board and attemps to boot from hd or floppy . the comm board seems to be running its rom code but is not communicating with the diag board. ive dumped all the roms switched the proccesors and also the memory controller chip(i think that what they are). i am going guess the problem is when the diag board address the comm board it fails or when the comm board tries to communicate back it fails, so i need to figure out how to test for this.
 
What do the boards use for the bus interface? Could there be a bad buffer somewhere that data is getting lost at? Like you said you've basically confirmed the boards work but they aren't talking which does help narrow it down a little.
 
it looks like 74ls573 on the data lines, 74ls645 on the address lines controlled by a m8289 bus arbiter and a m8288 bus controller with a lot of logic to run the 8289 and 8288, buy probing a few data lines and address lines on bootup there is time period were the address and data lines go dead on the comm board side of the buss . the file board has the same buss configuration and does not have this dead period.
it looks like data only flows outward.
 
i think i might be getting closer to the problem, after probing around i am pretty sure all the buffers and transceivers are working as should be, i checked a lot of logic in and around the area of interest .i have narrowed it down to the m8288 bus controller that controls the 74ls645's on the bus. the enable line is stuck high on the 645 that goes through an 7404 to the 8288 which is stuck low. now it seems all the other signals going to and from the 8288 have activity, is it the right activity, that is hard to say at this time. Can one output go bad? i hope so. I have a 8288 coming so i will replace and hope that solves it. if not it will be a signal going to it. Just for fun i put my logic pulser on the enable line of the 645's and it was able to detect the board randomly . see below, so that leads me to believe the board is otherwise functioning.

the errors are from having the boards in the wrong slots , it want them in specific order but i have them backwards for testing purposes


Code:
SYSTEM RESET
?D ERR 18 00
0C ERR 50 00
1C ERR 50 00
2F ERR 50 00
3D ERR 20 00
3D TEST DONE
4 SLOT EMPTY
5 SLOT EMPTY
6 SLOT EMPTY
7 SLOT EMPTY
*config
Slot    Card Type       PROM Version    RAM Size
                                   (in 128kB blocks)

 0      comm processor       3.0           1
 1      comm processor       3.0           1
 2      file processor       3.0           2
 3      diag processor       3.0           1
 4      Empty
 5      Empty
 6      Empty
 7      Empty
 
it looks like the 8288 is most likely ok as there is some activity on the line i thought was stuck, any suggestions on how one might tackle something like this?
 
Back
Top