• Please review our updated Terms and Rules here

mTCP NetDrive: Faster code available for testing

mbbrutman

Associate Cat Herder
Staff member
Joined
May 2, 2003
Messages
6,687
About a year ago I started experimenting with how to make NetDrive faster. I have the next revision of that experiment available for testing, and I am excited by the results. The new code was 1.4 to 2.7x faster when talking to a server on the same network, and 12x to 21x faster when talking to a remote machine across the internet.

I'd like to get some help testing it though. It is rock solid, and I'm not concerned about data corruption. I'd just like to hear what kind of improvements other people see on their machines. The new code requires both the DOS and server side be updated. The downloads are at:

Some quick notes:
  • You must update to the new server to test this code. (Your DOS machines using older code can use the new servers; if they can't that is a bug.)
  • Slower machines do not see as much of a benefit, as they can barely keep up packet processing. But even an XT connecting to a remote server saw more than 2x improvement when reading and more than a 3x improvement when writing.
  • I need your feedback before I go further and make this widely available.
Thanks,
Mike
 
This is the result of my testing against your server. The OS is MS-DOS 6.22 running under QEMU on my netbook (HP Mini 5101) over wifi. I've included the batch file used for testing. Let me know if you want me to do more testing or test something specific.
 

Attachments

Thanks for looking - you are the first person to report back.

It looks like you got a 5x speedup (29.8KB/sec vs. 5.7KB/sec) when connecting to a remote server, which in your case was very remote - Europe going to the central US. But the card did not handle more than 7 extra read-ahead packets per read request, so your machine wasn't keeping up with the incoming packets and the buffer space on the card was limited. (And realizing this is all emulated, who knows what is going on ...)

Also, I need to remove the IOTEST program because heavy network traffic can cause the timer tick to be missed, distorting the clock on the computer and ruining the speed computation. I wouldn't believe those numbers at all.

Very good on checking verify on vs. verify off ... there is a story or two about the verify changes. (I'm using CRC16 instead of sending all of the data back for a byte-by-byte comparison.)


Mike
 
Last edited:
Thanks for looking - you are the first person to report back.
I hope I won't be the only one. I actually don't understand why people aren't more interested in NetDrive. It's just about the best thing since sliced bread, at least when it comes to vintage PC:s. And I'm not saying that to butter you up, it is the reason why I've spent a huge amount of time working on it myself.

Anyway...
It looks like you got a 5x speedup (29.8KB/sec vs. 5.7KB/sec) when connecting to a remote server, which in your case was very remote - Europe going to the central US. But the card did not handle more than 7 extra read-ahead packets per read request, so your machine wasn't keeping up with the incoming packets and the buffer space on the card was limited. (And realizing this is all emulated, who knows what is going on ...)
Those results were actually lower than they are in general but not by much. This test is actually a more representative example;
nd_cache_test_d_30.png

Also, I need to remove the IOTEST program because heavy network traffic can cause the timer tick to be missed, distorting the clock on the computer and ruining the speed computation. I wouldn't believe those numbers at all.
For some reason those very inflated numbers only happens when I do the testing using the batch file. When running them manually writes (with verify on or off) are in the 42-44 kilobyte/s range and reads are slightly slower at just over 38 kilobytes/s. I guess the clock is still lagging because of missed timer ticks.

The fact that heavy network traffic causes the clock to lag is a problem. I've noticed that you keep interrupts disabled a lot and I hope that it's not necessary to avoid data corruption? I would rather have slower transfer speeds than a lagging clock.

Very good on checking verify on vs. verify off ... there is a story or two about the verify changes. (I'm using CRC16 instead of sending all of the data back for a byte-by-byte comparison.)
You know, I almost had a hissy fit when I first saw that routine because I thought it was part of some kind of anti-tampering code in an attempt to prevent disassembly/reverse engineering. ;) But then I discovered that it's only called from the Write with Verify routine and I've come to see the beauty of it. The client can send the data to the server and then do the CRC16 calculation while waiting for the server to 1) receive the data and store it 2) re-read the data and calculate the CRC16 value (presumably using the same seemingly random data as the client?) and 3) send the resulting 16-bit value back to the client for verification. The result packet being a lot smaller means that some of the processing that used to be spent on handling the "returning data" packet can now be spent on doing the CRC16 calculation instead. It's quite brilliant actually. (OK, now I *am* buttering you up. LOL)
 
I hadn't used netdrive yet and wanted to test it out over SLIP. Not sure how useful the the testing feedback would be since the 38400 serial port connection is likely to be the biggest bottleneck.

However, I'm having a problem on connect.. when testing with your hosted drive I get:

Code:
Resolving brutman.com, press [ESC] to abort.
Server ip address is: 35.206.78.10
Next hop address: FF:FF:FF:FF:FF:FF
Error: Timeout waiting for Connect message response.

The timeout message comes after just like 2 or 3 seconds.

I get the exact same then when attempting to connect to a local server as well. This is using my WiRSa wifi modem with my own SLIP implementation, so I'm not sure if I'm missing something or what's going on? Mind you, all other mTCP software (ping, telnet & ftp) are working fine.
 
I actually don't disable interrupts often, except on the write-verify path.
  • Interrupts are disabled for a few cycles at the beginning of a read while I clear out the read-ahead cache bitmap.
  • Interrupts are disabled for a few cycles on each block being read to check the cache.
  • Interrupts are disabled while computing the CRC for write verify blocks - the time depends on the size of the write.
Ignoring the write-verify path, the problem is probably in the packet driver. Not all packet drivers are the same, but the ones I'm using look like they are calling my receiver routine as fast as they can when they have packets, and they are not giving other interrupts time to process. They really should give the receiver one packet, then mark the interrupt as done - that would allow for the timer interrupt.

I actually tested this by re-enabling interrupts in the receiver, which fixed the time problem but is unsafe to do because a packet driver may also hook the timer interrupt, and if they didn't put the reentrancy checks in their code that's going to crash the machine. It's not something I can do much about.

The CRC calculation after the send was supposed to save some time, and I'm sure it does. But on large writes it is possible (and even likely!) that the server will return a response *before* the CRC has been calculated. That causes an immediate mis-compare. The only way to avoid this is to calculate the CRC before sending the last packet of the write, or to hold off interrupts until the CRC is calculated. I've chosen the second option for now while I investigate lighter weight checksum/CRC checks.

In practice this is only an issue on slower machines doing large writes. File copies might cause some timer ticks to be lost, as they use larger buffers. Limiting the size of the write to 8 or 16KB instead of 32KB would help, at the expense of network speed.
 
@nullvalue

Did you run through all of the instructions for setting up SLIP with mTCP? In particular, the address assignments are static and have to be chosen as per the instructions, and you need to set an environment variable to tell mTCP that it is on a SLIP connection. And of course, you need to set up forwarding and proxy ARP on the machine in the middle.

I use a patched version of EtherSlip that you can find at https://www.brutman.com/Device_Drivers/packet_drivers/EtherSlip_v11.7_patched.zip. The standard version of EtherSlip has a bug that prevents NetDrive from working. (EtherSlip also has another serious bug in it that may crash the machine or corrupt data depending on how things land in memory. I need to get that patched as posted as well.)

mTCP and NetDrive definitely work, but start with ping first. Also check to make sure you are connecting to the correct port at brutman.com - it is port 2002.


Mike
 
Oh geez! Yes my mTCP implementation is fully working on this machine -- But I was entering the port as 2020 instead of 2002. Ok I at least got a bit further this time, it starts the session.. and I see the connection at the server side but then I get a "Read fault error reading drive F". I'm going to try your patched Etherslip version and see if that helps.
 
Ok the patched version of Etherslip worked! I was able to connect to your drive, list contents... loaded 3DSTARS and even played Castle Adventure! 😁 (That first DIR takes quite a while to come back, which is probably expected)

running nd_cache test now..
 
You definitely need the patched EtherSlip version ...

The standard one allows for the connection to be made, but then fails to work when the device driver tries to send data. That's because the device driver is checking the MAC address in the packets, and the EtherSlip driver is putting in a different fake MAC address in the packets than what it told mTCP the MAC address was at startup.

If you are using that new test code I wouldn't bother trying it with the read-ahead turned on. For a SLIP connection you are better off with no-readahead at all, as you can't buffer packets. I'll change the code to detect that and disable read-ahead on SLIP connections. For now just disable read-ahead in the test code, or use the existing published code included in mTCP.
 
I think this just confirms what you were saying but figured I'd paste the results anyways:

Code:
mTCP nd_cache by M Brutman (mbbrutman@gmail.com) (C)opyright 2025-2026
Version: Apr  9 2026

NetDrive device opened, current read-ahead cache size is 64 KB

Running trials for 30 seconds, use Ctrl-Break to stop early.
A ** after a measurement indicates a new best speed.

Cache:  0 KB,     88 KB read in 30 seconds,     2.9 KB/sec  **
Cache:  1 KB,     96 KB read in 30 seconds,     3.2 KB/sec  **
Cache:  2 KB,     42 KB read in 31 seconds,     1.3 KB/sec
Cache:  3 KB,     84 KB read in 30 seconds,     2.8 KB/sec
Cache:  4 KB,     55 KB read in 30 seconds,     1.8 KB/sec
Cache:  5 KB,     48 KB read in 33 seconds,     1.4 KB/sec
Cache:  6 KB,     42 KB read in 35 seconds,     1.2 KB/sec
Cache:  7 KB,     40 KB read in 33 seconds,     1.2 KB/sec
Cache:  8 KB,     45 KB read in 33 seconds,     1.3 KB/sec
Cache: 10 KB,     44 KB read in 33 seconds,     1.3 KB/sec
Cache: 12 KB,     39 KB read in 32 seconds,     1.2 KB/sec
Cache: 14 KB,     45 KB read in 41 seconds,     1.0 KB/sec
Cache: 16 KB,     34 KB read in 30 seconds,     1.1 KB/sec
Cache: 18 KB,     38 KB read in 34 seconds,     1.1 KB/sec
Cache: 20 KB,     42 KB read in 32 seconds,     1.3 KB/sec
Cache: 22 KB,     46 KB read in 35 seconds,     1.3 KB/sec
Cache: 24 KB,     50 KB read in 41 seconds,     1.2 KB/sec
Cache: 26 KB,     54 KB read in 45 seconds,     1.2 KB/sec
Cache: 28 KB,     58 KB read in 43 seconds,     1.3 KB/sec
Cache: 30 KB,     62 KB read in 43 seconds,     1.4 KB/sec
Cache: 32 KB,     66 KB read in 51 seconds,     1.2 KB/sec 

Read-ahead cache restored to 64 KB, use the set_max_ra command
to set it to a new value.
 
About a year ago I started experimenting with how to make NetDrive faster. I have the next revision of that experiment available for testing, and I am excited by the results. The new code was 1.4 to 2.7x faster when talking to a server on the same network, and 12x to 21x faster when talking to a remote machine across the internet.

I'd like to get some help testing it though. It is rock solid, and I'm not concerned about data corruption. I'd just like to hear what kind of improvements other people see on their machines. The new code requires both the DOS and server side be updated. The downloads are at:

Some quick notes:
  • You must update to the new server to test this code. (Your DOS machines using older code can use the new servers; if they can't that is a bug.)
  • Slower machines do not see as much of a benefit, as they can barely keep up packet processing. But even an XT connecting to a remote server saw more than 2x improvement when reading and more than a 3x improvement when writing.
  • I need your feedback before I go further and make this widely available.
Thanks,
Mike
I love Netdrive, But currently in a business trip, will do some testing as soon as I get back. Great work BTW!
 
Interrupts are disabled for a few cycles at the beginning of a read while I clear out the read-ahead cache bitmap.
I don't see why that is necessary. The server(s) won't be sending anything before getting a request from the client. And DOS is not a multitasking OS so there will never be more than one request at a time. But even if the server(s) (or someone else, maliciously*) would send a NetDrive packet to the client, there are extensive checks in the receiver that will cause the packet to be ignored since it has not been requested by the client. Other packets can also arrive such as broadcasts (ARP Requests) or packets destined for the foreground application. None of these packets will affect the read-ahead cache bitmap.

* In theory, I guess someone could spam the client with spoofing packets that could eventually match a valid packet all the way down to the last used Unit Sequence but that seems unlikely, to say the least. It's also a possibility that someone along the path between the server and client could, quite easily, spoof a valid packet but if that happens you have bigger problems anyway.
Interrupts are disabled for a few cycles on each block being read to check the cache.
This makes sense as this is all in the outer read loop where packets can arrive from the server anytime.
Interrupts are disabled while computing the CRC for write verify blocks - the time depends on the size of the write.
I guess this is to avoid the next arriving packet to overwrite the packet currently being processed.

Ignoring the write-verify path, the problem is probably in the packet driver. Not all packet drivers are the same, but the ones I'm using look like they are calling my receiver routine as fast as they can when they have packets, and they are not giving other interrupts time to process. They really should give the receiver one packet, then mark the interrupt as done - that would allow for the timer interrupt.

I actually tested this by re-enabling interrupts in the receiver, which fixed the time problem but is unsafe to do because a packet driver may also hook the timer interrupt, and if they didn't put the reentrancy checks in their code that's going to crash the machine. It's not something I can do much about.
Do you know if any packet driver like this actually exists? I'm asking because the packet driver specification says that interrupts are allowed to be enabled while in the receiver so if any packet driver crashes because of this then that's on the driver developer. Packet driver specification v1.11 even mentions that older versions of PC/TCP enables interrupts in the receiver so it sounds like it has always been this way. In other words, I would expect most, if not all, packet drivers to be able to handle this.

The CRC calculation after the send was supposed to save some time, and I'm sure it does. But on large writes it is possible (and even likely!) that the server will return a response *before* the CRC has been calculated. That causes an immediate mis-compare. The only way to avoid this is to calculate the CRC before sending the last packet of the write, or to hold off interrupts until the CRC is calculated. I've chosen the second option for now while I investigate lighter weight checksum/CRC checks.

In practice this is only an issue on slower machines doing large writes. File copies might cause some timer ticks to be lost, as they use larger buffers. Limiting the size of the write to 8 or 16KB instead of 32KB would help, at the expense of network speed.
For me personally, clock lag is not acceptable. Speedy writes is not going to be a requirement or even high on the wish list for most people, I think. Limiting write sizes seems like the right choice if that's what it takes.

Regarding lighter weight checksum/CRC checks; Since the routine is only called from one place, I would inline it to avoid the call/ret overhead. Also, the routine itself is 4 bytes larger than it has to be in total, and 5 bytes larger than it has to be in the actual checksumming loop. I'm sure that adds upp when running on the PCjr. ;)
 
I don't see why that is necessary. The server(s) won't be sending anything before getting a request from the client. And DOS is not a multitasking OS so there will never be more than one request at a time. But even if the server(s) (or someone else, maliciously*) would send a NetDrive packet to the client, there are extensive checks in the receiver that will cause the packet to be ignored since it has not been requested by the client. Other packets can also arrive such as broadcasts (ARP Requests) or packets destined for the foreground application. None of these packets will affect the read-ahead cache bitmap.

You have to think beyond the simple "happy" path. This is UDP. Packets can arrive out of order, duplicated, or late. You can't risk having the device driver being interrupted while setting up a read - that could lead to data corruption.

Do you know if any packet driver like this actually exists? I'm asking because the packet driver specification says that interrupts are allowed to be enabled while in the receiver so if any packet driver crashes because of this then that's on the driver developer. Packet driver specification v1.11 even mentions that older versions of PC/TCP enables interrupts in the receiver so it sounds like it has always been this way. In other words, I would expect most, if not all, packet drivers to be able to handle this.

I don't know, but I also don't want to find out. My responsibility as a device driver writer is to defend against data corruption and crashes. We've all seen how bad the 3Com 3C905 packet driver is. I assume there are other bad ones out there, not necessarily due to malice. Guaranteeing that every packet driver is re-entrant safe is not on my todo list.

For me personally, clock lag is not acceptable. Speedy writes is not going to be a requirement or even high on the wish list for most people, I think. Limiting write sizes seems like the right choice if that's what it takes.

That's why I included an option to limit the write size. I'll figure out what the default should be another time; for now I need to ensure it is functional.
 
You definitely need the patched EtherSlip version ...

The standard one allows for the connection to be made, but then fails to work when the device driver tries to send data. That's because the device driver is checking the MAC address in the packets, and the EtherSlip driver is putting in a different fake MAC address in the packets than what it told mTCP the MAC address was at startup.

If you are using that new test code I wouldn't bother trying it with the read-ahead turned on. For a SLIP connection you are better off with no-readahead at all, as you can't buffer packets. I'll change the code to detect that and disable read-ahead on SLIP connections. For now just disable read-ahead in the test code, or use the existing published code included in mTCP.

@nullvalue - I've posted a fixed EtherSlip at https://www.brutman.com/Device_Drivers/packet_drivers/EtherSlip_v11.8.zip . It has two bug fixes, including a data corruption bug. (See the readme file in the ZIP for details.)
 
You have to think beyond the simple "happy" path. This is UDP. Packets can arrive out of order, duplicated, or late. You can't risk having the device driver being interrupted while setting up a read - that could lead to data corruption.
Ok, I can see how a late packet could cause problems. How are duplicated and out-of-order packets handled by the driver?

I don't know, but I also don't want to find out. My responsibility as a device driver writer is to defend against data corruption and crashes. We've all seen how bad the 3Com 3C905 packet driver is. I assume there are other bad ones out there, not necessarily due to malice. Guaranteeing that every packet driver is re-entrant safe is not on my todo list.
No one is asking you to guarantee that. You're not responsible for other people's bugs. Trying to defend against such a scenario by introducing a bug in your own code doesn't make sense. (I consider a lagging clock to be a bug.) Data corruption is very bad, yes. But the machine hanging because of a problematic packet driver is just a nuisance. And that's assuming this problem even exists to begin with.

That's why I included an option to limit the write size. I'll figure out what the default should be another time; for now I need to ensure it is functional.
Oh right. I forgot about that option. BTW, would it make sense to have a speed test for writes just like for reads?
 
No one is asking you to guarantee that. You're not responsible for other people's bugs. Trying to defend against such a scenario by introducing a bug in your own code doesn't make sense. (I consider a lagging clock to be a bug.) Data corruption is very bad, yes. But the machine hanging because of a problematic packet driver is just a nuisance. And that's assuming this problem even exists to begin with.

The machine won't hang; I don't know where you are getting that from. We are talking about losing some timer ticks which affects the DOS time of day tracking, not a frozen machine.

The trade-off is between possibly losing some timer ticks, or re-enabling interrupts in a place where it may not be safe. Even if the packet driver is broken, there are many where source code is not available and nobody is maintaining them. I need to keep my code safe and work around other people's existing bugs.

Oh right. I forgot about that option. BTW, would it make sense to have a speed test for writes just like for reads?

I have that code already, but it needs to be updated to use a different time source just like the read performance testing code does.

Unlike reads, writes will always benefit from larger write sizes. The only considering is how fast is the machine and how much time slippage does it cause due to missed timer ticks. With reads there is a different problem; packet flooding causes lost packets on the DOS machine. That's generally not possible when doing writes because the current servers are so fast.
 
The machine won't hang; I don't know where you are getting that from. We are talking about losing some timer ticks which affects the DOS time of day tracking, not a frozen machine.
You misunderstood me. With a packet driver that hooks the timer interrupt and that does not check for re-entrancy, the machine will hang.

The trade-off is between possibly losing some timer ticks, or re-enabling interrupts in a place where it may not be safe. Even if the packet driver is broken, there are many where source code is not available and nobody is maintaining them. I need to keep my code safe and work around other people's existing bugs.
Having interrupts disabled for too long doesn't just affect the timer. It very well might break other things. Though, the way I see it, it's bad enough if only the timer lags. If this problem actually exists (and that's a big if), there are many options for the presumably very few users affected by it; change the driver to another version, change the NIC to something else with a less broken packet driver or fix the driver (with or without source). I would be willing to help with the latter. The other option is that *everyone* suffers from a lagging clock, or worse.

I have that code already, but it needs to be updated to use a different time source just like the read performance testing code does.

Unlike reads, writes will always benefit from larger write sizes. The only considering is how fast is the machine and how much time slippage does it cause due to missed timer ticks. With reads there is a different problem; packet flooding causes lost packets on the DOS machine. That's generally not possible when doing writes because the current servers are so fast.
Regarding packet flooding on reads; would it be possible to implement some kind of flow control on the server side?
 
Flow control ... yes, but only up to a certain point. That's why I have the read test. (And the read test is actually more sophisticated than what I released; I simplified it a little bit to keep it manageable.)

Super-tight flow control fails on noisy or bursty networks, especially with network cards that don't buffer a lot of packets.
 
Having interrupts disabled for too long doesn't just affect the timer. It very well might break other things.
You sound like our management, we have concerns (no matter whether it matters in reality).

Losing a few ticks here and there generally does not break things. There are lots of badly written applications blocking interrupts unnecessarily (games in particular) and nobody seems to notice. DOS will also synchronize with the RTC after a reboot, so there is no permanent damage, either. Exceptions may apply, but nothing prevents you from running NTPDATE every hour or so.

I highly prefer a few lost timer ticks - with a configurable trade-off - to some users having to watch their systems crash and burn, and being unable to avoid it. Defensive coding is a good thing. Dealing with old hardware doesn't have the luxury of a modern Linux environment, where you just file a Github issue, fix the bug and have everyone update. Doesn't work for decades-old binary-only drivers.
 
Back
Top