• Please review our updated Terms and Rules here

What's wrong with my timer chaining code? (3Com PCI packet driver mystery)

I just took a look at the Agilent 16700 series logic analyzer software CD, and fortunately it does contain the Inverse Assembler software for the E2457A Pentium probe. It looks like it also contains the firmware for an E3491A Pentium probe, which can connect to the JTAG interface of the processor through the E2457A. I'm not sure what help the E3491A might be in this scenario, if any.

I wonder what disabling the Pentium L1 cache might do to your repro scenario. Without getting this all set up and working, I'm don't know if bus traces might be easier to understand if the L1 cache is disabled.
 
I'm sorry that SoftICE ended up being a bust. I even looked up if SoftICE changed the environment in a significant way and the only thing that came up was that it runs the system in V86 mode. Why that would mask the error condition is quite perplexing- it shouldn't be affecting interrupt stacks or anything like that. If you're looking for additional test machines, I could probably put one or two together.
 
gslick - I'm pretty sure it's not timing dependent. I can reproduce it on a P133 and on a Celeron 1100.


resman - No problem at all on SoftICE ... it still looks useful and I had to drag myself to do the installation and go through the basics, so as a learning exercise it was definitely worth it.

I added code to my interrupt handler to detect it being re-entered; if it detected that it would bump a second counter and exit immediately. And then magically nothing crashed, so I was sure that my code had been re-entered on the timing interrupt when it should not have been. But I got suspicious because I could never get the second counter to increment, so it wasn't actually being re-entered; it was just working. Then I removed SoftICE and poof - instant crashing again.

V86 mode might have changed instruction timings. I could also still be chasing an uninitialized storage problem in the packet driver.


My next plan of attack is probably go back to static code analysis using a disassembler.
 
Which 3Com PCI adapter are you using for your testing?

I have a spare 3C905C-TXM EtherLink10/100 PCI in front of me right now that I could use for testing. I might have some other flavors of 3Com PCI adapters around here too.
 
So to sum it up:

* jumping to the original int 08h handler (which would be the default BIOS one, if your code was loaded before the 3Com driver) after sending a packet will crash
* but it works when calling the old handler first
* in V86 mode (at least under SoftICE, maybe not with EMM386?), both versions of the code work

I'm just guessing here, but the cause could be that interrupts are being enabled at a point where they shouldn't be? What is the exact code for <switch to private stack> / <restore original stack>?
 
I experimented with a few different trials of disabling or re-enabling interrupts; it would be worth doing that again in a more controlled manner and taking notes.

Here is the failing code with some annotations

Code:
timerInt:
    ; If the count down timer is active decrement it
    cmp         word ptr cs:L$42,0x0000
    je          L$158
    dec         word ptr cs:L$42
L$158:
    ; If no ARP response is pending skip to the end where we jump to the next handler
    cmp         byte ptr cs:L$39,0x00
    je          L$159
    ; Stack switch code
    push        ax
    mov         word ptr cs:L$44,ss
    mov         word ptr cs:L$43,sp
    mov         ax,cs
    mov         ss,ax
    mov         sp,offset L$45
    ; Save a few registers on the new stack
    push        bx
    push        cx
    push        dx
    push        bp
    push        si
    push        di
    push        ds
    push        es
    ; Re-establish our data segment so we can point at our outgoing packet.
    ; Then do a software INT to the packet driver to send it.
    mov         ax,cs
    mov         ds,ax
    mov         ah,0x04
    mov         cx,0x003c
    lea         si,L$24
    call        doPacketInterrupt
    mov         byte ptr L$39,0x00
    add         word ptr L$22,0x0001
    adc         word ptr L$23,0x0000
    ; Restore registers to previous state.
    pop         es
    pop         ds
    pop         di
    pop         si
    pop         bp
    pop         dx
    pop         cx
    pop         bx
    ; Restore previous stack
    mov         ss,word ptr cs:L$44
    mov         sp,word ptr cs:L$43
    pop         ax
L$159:
    ; Chain to the next hander
    jmp         dword ptr cs:L$40

doPacketInterrupt:
    ; Save bx, then use it to compute where to jump to invoke the correct
    ; software interrupt.  (INT 0x60 through 0x6F are supported)
    push        bx
    xor         bx,bx
    mov         bl,byte ptr L$8
    sub         bl,0x60
    shl         bl,0x01
    shl         bl,0x01
    add         bx,offset L$160
    jmp         bx
L$160:
    pop         bx
    int         0x60
    ret
    pop         bx
    int         0x61
    ret
    ; Repeated code continues to int 0x6f

The order of execution is:
  • Timer INT (IRQ8) fires.
  • The 3Com packet driver is first in the chain.
  • The 3Com packet driver pushes flags (PUSHF) and then does a far call to the next handler (my code). Interrupts are still disabled at this point. I don't touch them.
  • If there is no ARP packet to be sent my code might decrement my countdown timer, but it will skip ahead to the end where it does a far jmp to the next handler.
  • The next handler (DOS or the BIOS if STACKS=0,0 is set in CONFIG.SYS) handles the timer tick and executes INT 1C so that anything that hooked the timer tick gets to run.
  • As things unwind they do IRET instructions, which restore CS:IP and the saved flags on the stack.
If there is a pending ARP response then things get slightly more complicated:
  • I switch stacks. AX gets pushed onto the existing stack, and then I manipulate the SS and SP registers. A move to the SS register suppresses interrupts through the next instruction (where SP gets set) so that I don't have to disable interrupts myself. And interrupts are still disabled here anyway. My stack for this is 128 bytes in size.
  • I save the registers I might touch. (I could try saving every possible register just in case the packet driver is messing one up; I'll add that to the list.)
  • The packet is pre-formed so I just need to set a few registers and do INT 60h.
  • I update a 32 bit counter. (16 bit code, so an add and an add with carry.)
  • I restore the registers I saved, and then switch back to the original stack
  • Then I chain to the next interrupt handler via the far jump.
That works on every other packet driver I've tried.

After writing this out the only thing I think I want to test again is saving and restoring more registers right around the INT that sends the packet. If their packet driver is messing with a register and not saving and restoring it that would leave it corrupted in my code, and possibly cause the crashes. But you can see I've saved the common registers there so there are not too many more registers they could have trashed for me.

And for reference, here is the working varation:

Code:
timerInt:
    pushf
    call        dword ptr cs:L$40
    cmp         word ptr cs:L$42,0x0000
    je          L$158
    dec         word ptr cs:L$42
L$158:
    cmp         byte ptr cs:L$39,0x00
    je          L$159
    push        ax
    mov         word ptr cs:L$44,ss
    mov         word ptr cs:L$43,sp
    mov         ax,cs
    mov         ss,ax
    mov         sp,offset L$45
    push        bx
    push        cx
    push        dx
    push        bp
    push        si
    push        di
    push        ds
    push        es
    mov         ax,cs
    mov         ds,ax
    mov         ah,0x04
    mov         cx,0x003c
    lea         si,L$24
    call        doPacketInterrupt
    mov         byte ptr L$39,0x00
    add         word ptr L$22,0x0001
    adc         word ptr L$23,0x0000
    pop         es
    pop         ds
    pop         di
    pop         si
    pop         bp
    pop         dx
    pop         cx
    pop         bx
    mov         ss,word ptr cs:L$44
    mov         sp,word ptr cs:L$43
    pop         ax
L$159:
    iret

That's basically the same code, but just restructured to push flags and do a far call of the next handler first before trying to send a packet. It also uses IRET at the end where the non-working code would let the next handler do the IRET.

Edit: I tried saving and restoring the flags around the call to send the packet but it didn't make a difference, nor should it .. the INT and IRET from the packet drive would do that anyway.
 
Last edited:
Just a stupid idea... can you try saving the registers in some static memory block outside the stack?
Can you try staying on the same stack instead of switching?
Also, can you figure out if any other interrupts fire during the timer handling?
 
I'm not really concerned about the stack switching code, except that it is not safe if it is re-entered. I put in some code to detect re-entry and I couldn't get it to fire. Based on that, it really doesn't matter where I save registers to - the stack or a private data area are equivalent, as long as there is not re-entry.

The timer interrupt should be the highest priority hardware interrupt and other interrupts should not be honored unless other code does something stupid and enables interrupts. The 3Com code is not, at least not before it calls my code. The BIOS is enabling interrupts again before calling INT 1C, but after that is done it will return and the interrupts should be disabled again. My code would have come and gone long before then, and the 3Com packet driver code hasn't even really had a chance to get started; it will also have interrupts disabled when it does run. (I think ... being able to prove it would be nice.)

Unless somebody can point out a tragic error, I'm going with they are using uninitialized memory or trashing a register and that my code change to chain first has side-stepped the issue. That's the most plausible explanation at this point.
 
The timer interrupt should be the highest priority hardware interrupt and other interrupts should not be honored unless other code does something stupid and enables interrupts. The 3Com code is not, at least not before it calls my code. The BIOS is enabling interrupts again before calling INT 1C, but after that is done it will return and the interrupts should be disabled again. My code would have come and gone long before then, and the 3Com packet driver code hasn't even really had a chance to get started; it will also have interrupts disabled when it does run. (I think ... being able to prove it would be nice.)

It's now seeming more unlikely that something like this is the case, but one thing I noticed is that the packet driver interface, like many others, returns an error status in the carry flag. One common trick that interrupt handlers use is to return using RETF 2 instead of IRET, so that they don't have to poke the carry bit into the saved flags on the stack. Only problem with this is that of course it doesn't restore any other flags including the interrupt enable one, so it could be that interrupts will be left enabled after INT 60h.

When restoring the registers and jumping to the old timer handler, your code assumes interrupts are still disabled. If the problem is really something that simple, then an explicit CLI might be all that's needed to fix it.
 
Last edited:
In this particular case I was able to confirm that interrupts are disabled when my code gets control - on entry from the timer interrupt the 3Com packet driver just pushes flags and does the far jump to my code. The number of instructions was minimal and there was no opportunity to enable interrupts.

I've also experimented with enabling and disabling interrupts explicitly. Honestly, that tends to cause more harm than it is worth. (The PCjr BIOS keyboard handler is an excellent example; even with interrupts disabled you can get an NMI interrupt to service the keyboard, and a BIOS bug re-enables other interrupts while that is happening.)

(Ranting)

An interrupt handler that actually gets invoked by an interrupt and doesn't do an IRET to return is just kind of broken ... the carry flag and interrupts enabled/flags are just two flags that need to be restored. The direction flag is an important one too. A pox on anybody who takes an interrupt and doesn't restore the state back to what it was when they interrupted.

Software interrupts are a little different because they are more of an explicit call, not an interrupt. But preserving registers and flags is part of the job, which is why I was so diligent in doing it and leaving a generous amount of stack space.
 
Mike, I'd like to take a look at the 3Com driver. Can you point me to a download, please?
 
Mike, which version of the 3C90XPD.COM driver are you using?

Someone on this thread below back on 2023-02-04 reported that the 3C90XPD.COM driver version from the 3Com Etherdisk CD 5.4 caused mTCP applications to hard hang the system, while the driver from the older version 5.1 CD worked fine.

https://www.vogons.org/viewtopic.php?t=92487

I started to set up a Pentium MMX 200 based setup, running DOS at least for the time being. As for networking I have 3Com Etherlink XL PCI, model 3C900-COMBO. I have used MTPC as the IP stack for my earlier builds, so wanted to use the same this time. To start with, I downloaded 3Com Etherdisk CD 5.4 from Vogons library:

http://vogonsdrivers.com/getfile.php?fileid=9 … &menustate=34,0

The 3C90XPD.COM packet driver from that package loaded fine, but when trying to run any of the MTPC applications, the system just froze. Recovery possible only via hard reset.

After some research, I was able to find an older version of EtherCD from here:

https://archive.org/download/ethercd-5-1

I mounted the BIN/CUE image, extracted 3C90XPD.COM from there, and that's that: with the older version, MTCP now works perfectly. I'm attaching the extracted packer driver .com and the related config utility extracted from the CD in this message, in case someone stumbles upon the same issue.

I'd be happy to upload also the whole cue/bin image to the Vogons library, but not sure what's the process, so if anyone can help with that, it would be great.

==========

Yeah, the packet driver from the 5.4 CD seems to be buggy. Besides causing random freezes, it also defaults to 10 Mbit/s speed even if the card is capable of more.

Like you, I tried a few older versions of the packet driver, including the one from the 5.1 CD and they all work much better. If you want to try some of those, @Grzyb has kindly posted versions 2.0c and 4.0d in this thread.

I am still curious to try looking into what exactly is happening with the driver using a logic analyzer and a Pentium processor probe to capture an instruction execution trace. I'm waiting on the arrival of some 296-pin SPGA sockets before I can try to get this set up. I'll need to stack a couple sockets between the processor probe and the motherboard CPU socket for physical clearance (and also to physically protect the pins on the bottom of the processor probe).
 
Glen - I'm using 5.2.6 and I'll post the exact binary today. (I've been meaning to do a page for common packet drivers with annotations and notes, so this is an excuse to get it done. Krille was also asking about the binary.)

I'm familiar with the Vogons threads, but keep in mind those threads predate my specific interrupt handling problem with NetDrive. 5.2.6 seems to work correctly with the original mTCP programs, except for being finicky about DHCP right after loading the packet driver. It only crashes when used with NetDrive. The reports from Vogons did not give me warm fuzzies.

When the time comes let me know and I'll give you everything you need to play with this particular problem. It will probably be useful for the code to do a dummy OUT to a port to trigger the logic analyzer; that will be easy to add.

Side note: the technical specs for the 3C905 variants are online. Too bad they didn't post source code for their drivers.


-Mike
 
I loaded the driver in IDA Pro and I was not impressed with what I saw. The driver is huge, with a massive jump table, inefficiently coded and bloated by lots of code sequences that appear to be CPU delays. I also found lots of signed jcc:s which in my experience is a tell-tale sign of poor quality code.

Things like this doesn't exactly instill confidence either;

Code:
seg000:6C60 loc_16C60:                              ; CODE XREF: seg000:6D3Dj
seg000:6C60                 inc     cs:word_1444D
seg000:6C65                 mov     ds, cs:SavedStackSegment2		; Load DS with the segment address of a saved stack and
seg000:6C6A                 lds     si, cs:dword_10BB3				; then immediately overwrite it by loading a far pointer.
seg000:6C6F                 mov     word ptr es:[di+12h], ds
seg000:6C73                 mov     es:[di+10h], si
seg000:6C77                 mov     word ptr cs:dword_10BB3+2, es
seg000:6C7C                 mov     word ptr cs:dword_10BB3, di

It also has a huge table of 56 counters for all kinds of events, with a 14 bytes long label for each counter. With the 2 bytes for the actual counter that's 896 bytes
used for statistics, which I suppose is good for debugging but not much else. Add to that the code to update the counters and you quickly realize that they didn't care much about memory usage at 3Com. How much memory does it actually use when loaded?

BTW, does it print anything on screen when loading? I'm asking because it seems the code just prints CRLF instead of the actual strings. I'm probably missing something.

Anyway, I'm not wasting more time on this driver, neither should anyone else in my opinion. I'm sorry I couldn't be of any help Mike!
 
a point to note, I've _always_ had an issue with mtcp and 3c90xpd when the card is configured to full duplex. in half duplex, never had an issue... might be something to try.
 
It outputs the standard messages when loading - checking to see the connection type, and then the resources (I/O addrs, interrupts) that it thinks it found the card at.

The jump table that I saw was fairly large (20+ functions) because it complies with a later version of the packet driver specification that has a whole bunch more functions. One of the newer functions allows the packet driver to do a callback to user code after the packet driver is completely done with the receive flow, including the user part of that. (It allows you to send a response to an incoming packet right after the packet is fully processed, but most packet drivers allow you to do that right from the original receive function, not yet another new function.)

The code you show above is sad, but what scares me more is the comment (yours?) about a saved stack pointer. Did you find evidence of them manipulating the SS and SP registers anywhere? Were you able to see if they have just one stack or did they use multiple stacks?


-Mike
 
I see now that it calls a procedure to print CRLF and that then falls through to another procedure that prints the actual string. So yeah, I had missed something.

I saw the jump table with 29 packet driver functions but immediately after that there's a massive jump table of 2048 entries using up 4096 bytes of RAM.

I searched for accesses to SS:SP and found this (this is after renaming but you get the idea);
Code:
seg000:69E0                 mov     cs:SavedStackSegment1, ss
seg000:69E5                 mov     cs:SavedStackPointer1, sp
seg000:69EA                 mov     ss, cs:SavedStackSegment2
seg000:69EF                 mov     sp, offset SavedStackPointer2
so I renamed the memory locations so they would be easier to see and search for. Note, SavedStackPointer2 is actually not saved, it's just a fixed offset. (I should probably give it a different name). So yes, there is code to switch stacks.

Searching for "SavedStack" finds the following;
Code:
seg000:0A36 SavedStackSegment2 dw 0                 ; DATA XREF: seg000:5B39r seg000:5B67r ... 
seg000:0A38 SavedStackSegment1 dw 0                 ; DATA XREF: seg000:69C7r seg000:69E0w ... 
seg000:0A3A SavedStackPointer1 dw 0                 ; DATA XREF: seg000:69CCr seg000:69E5w ... 
seg000:0B06 SavedStackPointer2 dw 0                 ; DATA XREF: seg000:5B5Br                   
seg000:5B39                 mov     es, SavedStackSegment2                                       
seg000:5B5B                 test    SavedStackPointer2, 4                                        
seg000:5B67                 mov     es, cs:SavedStackSegment2                                    
seg000:5B74                 mov     ds, cs:SavedStackSegment2                                    
seg000:5B8D                 mov     ds, cs:SavedStackSegment2                                    
seg000:5C57                 mov     es, SavedStackSegment2                                       
seg000:5E35                 mov     es, cs:SavedStackSegment2                                    
seg000:5F39                 test    SavedStackPointer2, 80h                                      
seg000:5FCF                 mov     es, SavedStackSegment2                                       
seg000:6069                 mov     ds, cs:SavedStackSegment2                                    
seg000:6770                 test    SavedStackPointer2, 80h                                      
seg000:69C7                 mov     ss, cs:SavedStackSegment1                                    
seg000:69CC                 mov     sp, cs:SavedStackPointer1                                    
seg000:69E0                 mov     cs:SavedStackSegment1, ss                                    
seg000:69E5                 mov     cs:SavedStackPointer1, sp                                    
seg000:69EA                 mov     ss, cs:SavedStackSegment2                                    
seg000:69EF                 mov     sp, offset SavedStackPointer2                                
seg000:6A01                 mov     ds, cs:SavedStackSegment2                                    
seg000:6A89                 mov     ss, cs:SavedStackSegment1                                    
seg000:6A8E                 mov     sp, cs:SavedStackPointer1                                    
seg000:6BD7                 mov     ds, cs:SavedStackSegment2                                    
seg000:6C56                 mov     ds, cs:SavedStackSegment2                                    
seg000:6C65                 mov     ds, cs:SavedStackSegment2                                    
seg000:6CDB                 test    SavedStackPointer2, 2                                        
seg000:6EE9                 mov     ds, cs:SavedStackSegment2                                    
seg000:7324                 cmp     bx, cs:SavedStackSegment2                                    
seg000:734A                 cmp     ax, cs:SavedStackSegment2                                    
seg000:74F1                 test    SavedStackPointer2, 2                                        
seg000:79A1                 mov     ds, SavedStackSegment2                                       
seg000:79D8                 test    SavedStackPointer2, 10h                                      
seg000:7A18                 test    SavedStackPointer2, 20h                                      
seg000:8742                 test    SavedStackPointer2, 80h                                      
seg000:88A6                 mov     SavedStackSegment2, ax                                       
seg000:8AED                 test    SavedStackPointer2, 1                                        
seg000:8B88                 test    SavedStackPointer2, 8                                        
seg000:8BB2                 test    SavedStackPointer2, 8                                        
seg000:8BE4                 mov     ds, cs:SavedStackSegment2                                    
seg000:8BF7                 test    SavedStackPointer2, 8                                        
seg000:8C1A                 or      SavedStackPointer2, 1                                        
seg000:8C20                 or      SavedStackPointer2, 2                                        
seg000:8C26                 or      SavedStackPointer2, 4                                        
seg000:8C2C                 or      SavedStackPointer2, 8                                        
seg000:8C32                 or      SavedStackPointer2, 10h                                      
seg000:8C3E                 or      SavedStackPointer2, 20h                                      
seg000:8C4A                 or      SavedStackPointer2, 40h                                      
seg000:8C50                 or      SavedStackPointer2, 80h                                      
seg000:8C69                 mov     ds, cs:SavedStackSegment2                                    
seg000:90B1                 test    SavedStackPointer2, 8                                        
seg000:90C1                 test    SavedStackPointer2, 10h                                      
seg000:90D1                 test    SavedStackPointer2, 20h                                      
seg000:959E                 test    SavedStackPointer2, 40h                                      
seg000:95C4                 mov     ax, SavedStackSegment2
 
Hi, I found this forum searching "3Com 3C905C-TX-M mass hardware failure".

I'm reporting what happened to me because it has similarities with the topic, in the hope could be useful.

I've a Linux computer that acts as a little local server/NAT/gateway, the 3Com card is the second/added card in the system (the other one is an Intel m/b integrated ethernet interface).
For years and until a few months ago everything was working flawlessly. The computer serving the private network for NAT connections, dnsmasq and DHCP services. But suddenly some problems arose: some download operations from Internet started to slow down markedly (even 50 or 100 times than they should have) and the DHCP addresses leasing to private network clients became slightly instable. What is really strange is that the slowdown in downloads depended on which was the remote server. For example the speedtest-cli never shown slowdowns. Things worsened until DHCP service become unusable.
It was difficult to isolate the problem: I verified that it was not a problem of the service provider (other computers connected directly to the router showed non problems). The o.s. version was quite old (OpenSUSE 13.2, kernel 3.16.7 - also the hardware CPU Intel Core Duo Quad and m/b DG965OT - in that sense the system is "vintage") but an updated version had the same problem. No evidence from log files.
At the end, excluding this network card from operations solved the problem. But so far I haven't found an explanation of what could have happened.
Even worse: I have a number of network cards of the same identical model, so I tried some of them on another system and... the problem has arisen again with each of them!
I still don't know what to think: this hardware is very old, could it be a single type of aged component that causes this behavior? Or a sort of "Y24" bug in the firmware or the driver (which probably hasn't changed much in a long time)?...

The similarities I see are that DHCP operation involves ARP packets and the locks up in a system like DOS could correspond to the slowdown of network streams in Linux.

Gianluca
 
Back
Top