ajgriff
Experienced Member
Have you turned it off and on again yet ?
I'd be surprised if he's resisted the temptation to change something and to be fair I think I would have cracked by now. Chianti and pasta time perhaps?
Alan
Have you turned it off and on again yet ?
I agree. Our Op is a trooper, but is getting a little frazzled with the many problems in his PET.I'd be surprised if he's resisted the temptation to change something and to be fair I think I would have cracked by now. Chianti and pasta time perhaps?
Alan
Ha !This is not an IT “no help desk” Gary !
Dave
please i need an investigator!
I would like to understand how it is possible that this works now ... before it often crashed, even after a few minutes sometimes ...
Thanks Dave! Great explanation!The first thing to say is that it isn’t a “hard” fault, as you now have a “working” system. That is both good and bad. Good in the sense that there is no IC that is permanently faulty (e.g. a logic gate that is bad). Bad in the sense that the fault is intermittent and random.
There are a whole load of problems that it could have been. A bad socket connection being one such problem. This could also be caused by a faulty solder joint, a solder splash (or some other conductive contaminant) or flux residue. Temperature sensitivity being another. Faulty bulk decoupling capacitors or increased power supply noise.
Then there are internal effects within the ICs themselves. Some parts were manufactured with a bonding agent (bonding the silicon die to the carrier) that leaked chemicals into the IC that damaged the silicon or aluminium connections over time. Other IC packages leaked, letting in contaminants that started to damage the internals of the IC.
All of the above didn’t just ‘kill’ the IC, but made it unreliable before it died.
Logic gates have two binary states - 0 and 1. These are defined to be voltage levels below a certain threshold (for a logic 0) and above a threshold (for a logic 1). The presence of contaminants changes these thresholds so that what is supposed to be a logic 0 is actually seen as a logic 1 (or vice versa). This introduces a ‘glitch’ into the processing and can cause the CPU to crash. However, it also means that the CPU can carry on with incorrect data also (and misbehave) before something major causes it to crash totally.
If you end up with one (or more) logic gates like this, the effect can be either improved (or worsened) by environmental factors (such as temperature, electrical noise etc.).
Another interesting thing I have to contend with at work is “tin whiskers” that grow and can short out IC pins and PCB connections. We also get “zinc whiskers” that make a forest of zinc trees on zinc-plated metal and cause untold havoc in power supplies etc. if they become dislodged...
All of these effects are well-documented with a bit of googling. Tin whiskers on the NASA website, internal IC failures on the JEDEC website etc.
The industry I work in, we collect all of the failed components from repairs. If we see an upturn in a failure rate, we will get some of the components forensically analysed to see what is going on.
The pain of my life at the moment are firmware failures (either through EPROM bit-rot or fuse regrowth in PALs, fuse programable ROMs and other devices).
As to what your specific problem was/is we will never know without the “smoking gun”. It may have been a bit of solder that had become lodged somewhere and we have now dislodged it. It may have been a bad socket connection that has resolved itself by wiping the contacts by removing and inserting the EPROM.
Who knows?!
It may be a ‘logic threshold’ problem (marginal IC) that is going to come back sooner or later. Who knows?
This is the issue with vintage computers, they require constant maintenance (like a vintage car does).
We just have to hope it was a bit of debris that has now gone; but only time will tell I a, afraid...
I am watching the Formula 1 race this afternoon. Mercedes have little hope of winning, so it is either going to be Ferrari - or the Flying Dutchman!
Dave
But why machine code now???That’s TIM - the machine code monitor.