Cbm 2001 Pet strange boot

ajgriff · May 21, 2022

Gary C said:
Have you turned it off and on again yet ?

I'd be surprised if he's resisted the temptation to change something and to be fair I think I would have cracked by now. Chianti and pasta time perhaps?

Alan

dave_m · May 21, 2022

ajgriff said:
I'd be surprised if he's resisted the temptation to change something and to be fair I think I would have cracked by now. Chianti and pasta time perhaps?

Alan

I agree. Our Op is a trooper, but is getting a little frazzled with the many problems in his PET.

But he is listening to daver2 very well now.

Desperado · May 21, 2022

I haven't changed anything yet...i learned to follow your suggests

At the moment pettester seems running without freeze... tomorrow i'll try to insert original roms, PIA and VIA and i'll cross my fingers....also because tomorrow my favorited soccer team AC Milan can win Italian League

Good night guys!

Gary C · May 22, 2022

daver2 said:
This is not an IT “no help desk” Gary !

Dave

Ha !

I was thinking the freezing must have made a good connection somewhere so it needs a few cycles to shake it loose,

What about resoldering the connections in that area that was sprayed

Desperado · May 22, 2022

Hello, i inserted all the roms, Pia and Via and I turned on ... at the moment it seems that everything is working ... I leave it on for a while and let's see how it behaves!
Could it be that the heat has somehow "fixed" bad sockets?

ajgriff · May 22, 2022

Sockets, dry solder joint, short from debris, cracked track etc? Who knows? Also do we don't know what's fixed it up to this point. Unfortunately the problem may yet return. Good that all seems to be well for now at least. Well done!

Alan

Desperado · May 22, 2022

please i need an investigator!
I would like to understand how it is possible that this works now ... before it often crashed, even after a few minutes sometimes ...

ajgriff · May 22, 2022

Further investigation is unlikely to reveal the cause now as there are just too many possibilies. Anyway you are the only valid investigator because you have the PET infront of you, anyone else would just be guessing from afar. Relax and enjoy the football. Hope your team wins.

Alan

Desperado · May 22, 2022

Thanks so much at all for the moment but maybe
i will still need your help for dual floppy 3040 ... i don't know yet if it works

daver2 · May 22, 2022

The first thing to say is that it isn’t a “hard” fault, as you now have a “working” system. That is both good and bad. Good in the sense that there is no IC that is permanently faulty (e.g. a logic gate that is bad). Bad in the sense that the fault is intermittent and random.

There are a whole load of problems that it could have been. A bad socket connection being one such problem. This could also be caused by a faulty solder joint, a solder splash (or some other conductive contaminant) or flux residue. Temperature sensitivity being another. Faulty bulk decoupling capacitors or increased power supply noise.

Then there are internal effects within the ICs themselves. Some parts were manufactured with a bonding agent (bonding the silicon die to the carrier) that leaked chemicals into the IC that damaged the silicon or aluminium connections over time. Other IC packages leaked, letting in contaminants that started to damage the internals of the IC.

All of the above didn’t just ‘kill’ the IC, but made it unreliable before it died.

Logic gates have two binary states - 0 and 1. These are defined to be voltage levels below a certain threshold (for a logic 0) and above a threshold (for a logic 1). The presence of contaminants changes these thresholds so that what is supposed to be a logic 0 is actually seen as a logic 1 (or vice versa). This introduces a ‘glitch’ into the processing and can cause the CPU to crash. However, it also means that the CPU can carry on with incorrect data also (and misbehave) before something major causes it to crash totally.

If you end up with one (or more) logic gates like this, the effect can be either improved (or worsened) by environmental factors (such as temperature, electrical noise etc.).

Another interesting thing I have to contend with at work is “tin whiskers” that grow and can short out IC pins and PCB connections. We also get “zinc whiskers” that make a forest of zinc trees on zinc-plated metal and cause untold havoc in power supplies etc. if they become dislodged...

All of these effects are well-documented with a bit of googling. Tin whiskers on the NASA website, internal IC failures on the JEDEC website etc.

The industry I work in, we collect all of the failed components from repairs. If we see an upturn in a failure rate, we will get some of the components forensically analysed to see what is going on.

The pain of my life at the moment are firmware failures (either through EPROM bit-rot or fuse regrowth in PALs, fuse programable ROMs and other devices).

As to what your specific problem was/is we will never know without the “smoking gun”. It may have been a bit of solder that had become lodged somewhere and we have now dislodged it. It may have been a bad socket connection that has resolved itself by wiping the contacts by removing and inserting the EPROM.

Who knows?!

It may be a ‘logic threshold’ problem (marginal IC) that is going to come back sooner or later. Who knows?

This is the issue with vintage computers, they require constant maintenance (like a vintage car does).

We just have to hope it was a bit of debris that has now gone; but only time will tell I a, afraid...

I am watching the Formula 1 race this afternoon. Mercedes have little hope of winning, so it is either going to be Ferrari - or the Flying Dutchman!

Dave

Hugo Holden · May 22, 2022

Dave_C78 said:
please i need an investigator!
I would like to understand how it is possible that this works now ... before it often crashed, even after a few minutes sometimes ...

If there is a bad connection anywhere, either inside an IC, in its internal circuitry or in the PCB wiring, IC socket connections, soldered connections, crimped connections, a single pulse or signal can just disappear, causing a failure. If it happens some time after turn on, likely it is a thermal effect where expansion of the materials causes a connection to break.

It is not unheard of that IC's can fail in this manner.

Also in a ROM, a memory cell can be borderline or intermittent. The byte it contains could sometimes change. I wouldn't be trusting your TMS2532A's that have had the wrong program voltage applied to them, it is possible that they could be the source of the trouble, even though you have used them before and not had any trouble, there is always the first time. But, again there is no proof of this....yet, but I think this could possibly be the cause.

The only way to get to the bottom of an infrequent intermittent fault, that is inside a component, is that after you have replaced the suspect component or IC, is to trial the computer for weeks & months. If the fault appears cured, put the suspect component back in, trial it again for weeks & months. If you have found the defective IC or component, the fault should only occur while that IC is in place and never at any time when it is not present. And repeat the trial again and again with and without the part present to confirm the findings are consistent.

It is fair enough to say that intermittent faults are the most difficult of all, especially when they are not there long enough to track them down. But, in my experience, if the fault is still there, it will appear again, sooner or later, when you least expect it. So be ready for it, with the scope at hand.

A while back I had an intermittent fault in a SOL-20 computer where randomly the keyboard just stopped working. Sometimes it didn't do it for weeks. I had the scope ready though, and one day when it stopped I found that an output from one of the counter IC's in the keyboard circuit had just disappeared, one of the 4 bit counter IC's in the key-scan circuit was intermittent and faulty. Sometimes these things can take some weeks to find and it is very very frustrating when the fault goes away for long periods. Because if the fault is not active, it is impossible to find it, unless something (such as heat, cooling or vibration) can be made to precipitate it. In many cases the fault can be independent of these factors.

Then, sometimes, a fault can appear due to a "borderline condition". Meaning parts of the circuitry is just on the verge of operating correctly. In digital circuitry, this sort of thing is not all that common as it is in analog circuitry. But one example of it on a computer pcb could be the crystal master oscillator. Where the crystal loses activity and the oscillator can stop. I once had a Tek scope with this problem, where the master oscillator stopped at times on the logic board due to this problem.

This is why Daver2 wanted you to check the clock when the fault appeared.

(Daver2 also just mentioned Tin whiskers, which is more of a problem in modern gear with Lead (Pb) free solder and inside vintage transistors. I have had things fail in my workshop due to these, one Tek 2465B scope and one modern computer VDU and a large number of Germanium transistors made in the 1960's era. Probably not a Tin whisker fault in the PET , but it just goes to shown all the things that can go wrong in electronic circuits).

Desperado · May 22, 2022

daver2 said:
The first thing to say is that it isn’t a “hard” fault, as you now have a “working” system. That is both good and bad. Good in the sense that there is no IC that is permanently faulty (e.g. a logic gate that is bad). Bad in the sense that the fault is intermittent and random.

There are a whole load of problems that it could have been. A bad socket connection being one such problem. This could also be caused by a faulty solder joint, a solder splash (or some other conductive contaminant) or flux residue. Temperature sensitivity being another. Faulty bulk decoupling capacitors or increased power supply noise.

Then there are internal effects within the ICs themselves. Some parts were manufactured with a bonding agent (bonding the silicon die to the carrier) that leaked chemicals into the IC that damaged the silicon or aluminium connections over time. Other IC packages leaked, letting in contaminants that started to damage the internals of the IC.

All of the above didn’t just ‘kill’ the IC, but made it unreliable before it died.

Logic gates have two binary states - 0 and 1. These are defined to be voltage levels below a certain threshold (for a logic 0) and above a threshold (for a logic 1). The presence of contaminants changes these thresholds so that what is supposed to be a logic 0 is actually seen as a logic 1 (or vice versa). This introduces a ‘glitch’ into the processing and can cause the CPU to crash. However, it also means that the CPU can carry on with incorrect data also (and misbehave) before something major causes it to crash totally.

If you end up with one (or more) logic gates like this, the effect can be either improved (or worsened) by environmental factors (such as temperature, electrical noise etc.).

Another interesting thing I have to contend with at work is “tin whiskers” that grow and can short out IC pins and PCB connections. We also get “zinc whiskers” that make a forest of zinc trees on zinc-plated metal and cause untold havoc in power supplies etc. if they become dislodged...

All of these effects are well-documented with a bit of googling. Tin whiskers on the NASA website, internal IC failures on the JEDEC website etc.

The industry I work in, we collect all of the failed components from repairs. If we see an upturn in a failure rate, we will get some of the components forensically analysed to see what is going on.

The pain of my life at the moment are firmware failures (either through EPROM bit-rot or fuse regrowth in PALs, fuse programable ROMs and other devices).

As to what your specific problem was/is we will never know without the “smoking gun”. It may have been a bit of solder that had become lodged somewhere and we have now dislodged it. It may have been a bad socket connection that has resolved itself by wiping the contacts by removing and inserting the EPROM.

Who knows?!

It may be a ‘logic threshold’ problem (marginal IC) that is going to come back sooner or later. Who knows?

This is the issue with vintage computers, they require constant maintenance (like a vintage car does).

We just have to hope it was a bit of debris that has now gone; but only time will tell I a, afraid...

I am watching the Formula 1 race this afternoon. Mercedes have little hope of winning, so it is either going to be Ferrari - or the Flying Dutchman!

Dave

Thanks Dave! Great explanation!
Maybe the old white socket they lost the connection and with the heat they settled down ....

let's see what happens these days, now I have to move on to the restoration of the metal case!
Ferrari <3

Desperado · May 22, 2022

I AM DESPERATE!
I turned on the Pet now and i have this screen...
the computer was cold because it had been idle for at least an hour ..

daver2 · May 22, 2022

That’s TIM - the machine code monitor.

Your ‘friend’ has come back again!

I suggest you watch your football team win, and I’ll watch the F1, and we will reconvene at some point...

Dave

Desperado · May 22, 2022

daver2 said:
That’s TIM - the machine code monitor.

But why machine code now???

Please what can i do to solve this misteryous fault???

daver2 · May 22, 2022

Think about what you have just done with the configuration...

You have taken the PETTESTER ROM out and replaced it with the full complement of Commodore PET ROMs.

TIM is only present within the Commodore PET ROMs.

When the CPU executes a BRK (break) instruction ($00), it fetches the address of the “break handler” from the vector area of the $Fxxx Kernal ROM. In the case of the Commodore PET ROMs, this invokes TIM.

In the case if the PETTESTER ROM, it is never intended to execute a BRK instruction (unlike the Commodore PET ROMs where you can deliberately enter TIM from BASIC and debug at the machine code level). In this case, with the PETTESTER ROM, either the machine will crash, or it could do random things.

Think back to how the PETTESTER failed...

Changing the configuration = changing the fault behaviour!

Now, why the CPU was executing a BRK instruction when it shouldn’t is another matter - and we need to understand this to identify your fault.

In a commercial organisation, this PET mainboard would have been consigned to the bin a while ago.

I also don’t feel as though you have the test equipment to fund this sort of fault.

I was a bit lucky with my Cromemco Z2 in this respect. I had an intermittently faulty CPU card, memory card AND backplane sockets. Fortunately, I have a stack of CPU and memory cards. So, with a bit of swapping and documentation, and 100 or so crashes later I have a good CPU, memory and set of backplane sockets that work. I can now concentrate on finding the faults on the other cards.

You don’t have that luxury with a single PET mainboard.

Let me think what you may be able to do with the test equipment you have.

Dave

Desperado · May 22, 2022

meanwhile I changed the old white socket of Ud9 and cleaned with alcohol before soldering the new one. Now I'm waiting to see if it crashes again in order to then test the cpu pins 37 and 39

Desperado · May 22, 2022

Freeze! pin 7, 37 and 39 low!

Desperado · May 22, 2022

now it is already the second time that after two minutes it freezes and a character appears under the blinking cursor ...

daver2 · May 22, 2022

So we now know it wasn't the CPU socket...

Hang on, so pins 37 and 39 are LOW?!

This implies the clock circuitry has stopped...

This points to G5. But let me think about that as the 1MHz clock (CLK1) is used elsewhere.

CLK1 is used by the CPU (sheet 1), the DRAM (sheet 5) and the video circuitry (sheet 7). However, I think the video circuitry may still function with the signal being faulty.

Check for activity on G5 pins 14, 11, 3 and 7.

Dave

Cbm 2001 Pet strange boot

Experienced Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Attachments

Experienced Member

Veteran Member

Experienced Member

Veteran Member

10k Member

Veteran Member

Veteran Member

Veteran Member

Attachments

10k Member

Veteran Member

10k Member

Veteran Member

Veteran Member

Veteran Member

Attachments

10k Member