I'm not sure how long is too long to revive an old thread. But here goes anyway. Please let me know if its better to start a new thread.
I have had a few questions PM'd to be about this system and I was eventually able to get the thing working. The short answer is that I worked out the functionality of the RAM card and still ended up mass desoldering chips using a paint stripper gun and socketing them. It's now working. I plan to document the board in some detail in the future. It will be along the lines of the Ardent Tools of Capitalism pages for the PS/2's.
@yuhong - My question "Does anyone know which chip may be located at 100000h" was a bit sloppy.
The long (rambling) version:
The board responds in 32-bit chunks, or 64-bits (32-bit chunk followed by another, it's interleaved). So given an input address with a RAS+CAS format, IIRC first the board responds with all the bits from the chips on the right giving 32-bits, and 60ns later does the same with all the chips on the left.
However before probing LITERALLY EVERY SOLDER POINT
I didn't know how the physical RAM layout mapped to the logical one. It seemed reasonable to say that all the chips on one side were accessed first, followed by the other side, but I didn't know for sure. It could have been some other configuration.
The board is effectively addressed in 64-bit chunks. change one bit in the address sent with the RAS or CAS, and the board responds with a new 64-bits sent in 2 32-bit groups.
the "Memory write/read failure at 100002, read FF30 expecting FF10." gives indication that the BIOS message is numbered in bytes. 100000h is 1024KB. So 100000h refers to the first byte, 100001h second, 100002h third and 100003h fourth. The error "read FF30 expecting FF10" along with "100002" gives indication that the bios is counting in words. So 100000h refers to the first word 100002h the second. That is, it's never going to respond with the message "Memory write/read failure at 10000
1..." It's only ever going to show an even number.
Confusion #1: numbering in bytes, counting in words. To an ASM hacker, this is entirely logical when you consider the AL AH AX registers. To a simpleton trying to replace a RAM chip, it's confusing.
So from this we can determine that at address 100000h there are only 8 chips. (lets forget the interleaving for the moment)
There is still a problem of which 8 chips? and of those 8 which is bad?
Code:
FF30 is: 11111111 00110000, and
FF10 is: 11111111 00010000, xor'd giving
00000000 00100000
and this all starts at 100002h. So looking at it from a 32-bit perspective, at 100000h:
Code:
00000000 00000000 00000000 0010000
Confusion #2: numbering in bytes, counting in words, with a dword interface. To an ASM hacker, this is entirely logical when you consider the AL AH AX EAX registers. To someone realizing they're out of their depth, trying to replace a RAM chip, it's confusing.
Confusion #3: By the way it's interleaved! <facepalm>
So now we have:
Code:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00100000 or perhaps:
00000000 00000000 00000000 00100000 00000000 00000000 00000000 00000000
On top of this I didn't know which bit was the LSB, so you could inverse the two previous patterns. And on top of that there are also parity chips. So if we try mapping this to the card. Lets assume all the parity chips are on the bottom, and the LSB is the top left we can mark an x on the bad chips matching each of these patterns. Problem is maybe the LSB is the the bottom right, or maybe it starts at column 4 at the top (or bottom), maybe it counts from column 4 at the top to column 1, or perhaps the parity bits are at the top transposing my x marks the spot pattern one chip up. Essentially what I came up with was this:
Code:
P? = Parity? x = Likely spot ? = possibility
______________________________________________________________________________________________________ ___
| _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ ||
||_P?__| |_P?__| |_P?__| |_P?__| |_____| |_____| |_____| |_____| |_P?__| |_P?__| |_P?__| |_P?__| ||
| _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ ||
||_____| |_____| |_____| |_____| |_____| |_____| |_____| |_____| |_____| |_____| |_____| |_____| ||
| _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ ||
||__x__| |__?__| |__?__| |__x__| |_____| |_____| |_____| |_____| |__x__| |__?__| |__?__| |__x__| ||
| _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ ||
||__x__| |__?__| |__?__| |__x__| |_____| |_____| |_____| |_____| |__x__| |__?__| |__?__| |__x__| ||
| _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ ||
||_____| |_____| |_____| |_____| |_____| |_____| |_____| |_____| |_____| |_____| |_____| |_____| ||
| _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ ||
||__x__| |__?__| |__?__| |__x__| |_____| |_____| |_____| |_____| |__x__| |__?__| |__?__| |__x__| ||
| _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ ||
||__x__| |__?__| |__?__| |__x__| |_____| |_____| |_____| |_____| |__x__| |__?__| |__?__| |__X__| ||
| _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ ||
||_____| |_____| |_____| |_____| |_____| |_____| |_____| |_____| |_____| |_____| |_____| |_____| ||
| _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ ||
||_P?__| |_P?__| |_P?__| |_P?__| |_____| |_____| |_____| |_P?__| |_P?__| |_P?__| |_P?__| ||
|______________________________________________________________________________________________________||
|_____________________________||___________________________| |
IIRC they're 14 pin dips so, that's at the very least 16 X's 14x16=224 pins to desolder. Say I find the right chip half way thru, that's still over 100 pins to desolder. There's no way I can possibly remove 100 pins without destroying a solder pad leaving the board in a state that just has more variables leading to non functionality. But the problem could be with a parity chip? Or perhaps the chips physical layout don't map consecutively to the bit pattern (the ? marks). So even if I do all this without screwing up once, it still may not work. On top of all this I'm not 100% sure my reasoning is sound, it *seems* rational. But there is always the possibility I'm a crazy-person with the delusion that I *seem* rational. I mean I have just but a paragraph ago created an ASCII-art diagram of a memory card.
So I decided to workout the entire schematic of the memory card:
And then worked out the logic of the card. I've still got all the notes, but it is far from an easy to understand description of the card.
This was an interesting exercise, but not terribly fruitful. I discovered The parity chips are at the top. Also the chips on the right are read as the first 32-bit word followed by the left chips. On the right set of chips, the one at the bottom left, has its data pin connected to the first pin (component side) of the memory connector. The chip above this is the next pin, and so on. So either this is the LSB or the MSB. On the left side the arrangement is inverted. So the LSB/MSB is the bottom right.
So then I tried to use the pinout of the 386 to track back to the memory card to workout what was the LSB and the MSB. I got lost at some tri-state chips, and the process started to become very complicated.
I then looked at alternative ways of desoldering chips, and I ended up using a paint stripper gun. There is another RAM card in existence, so worst case I would order another. In hindsight it would have been much quicker to start with this, but I've at least gained a better understanding of this system. My first plan was to remove a column of RAM, socket them, then test. I was running out of DIP sockets, so next I removed some rows of chips, socketed them, and bingo, now I had an error in a different place. Some switching around and I found the faulty chip.
Anyway it all seems to be quite happy now. Only thing now is some of the chips in the dip sockets still have some solder on them, so some work loose occasionally.