• Please review our updated Terms and Rules here

mTCP NetDrive - An alternative version!

Krille

Veteran Member
Joined
Aug 14, 2010
Messages
1,268
Location
Sweden
Background
When I discovered that Mike Brutman had released his first version of NetDrive I became curious to see what the code looked like, how it worked etc, so I loaded the device driver (NETDRIVE.SYS) in IDA Pro to take a quick peek. As I was reading the code I saw plenty of room for improvement in the form of optimizations so I figured it would be a fun project to properly reverse engineer it just to see how much I could reduce the size of the driver.

As time went by, Mike would release updates and I would update my version of the driver to keep it up to date with his changes while continuing work on the optimizations. Eventually, the project became more of a fork than just optimizations because I started adding functionality before Mike. (I wanted the ability to use other packet driver applications while having drives connected with NetDrive.) This is why the packet driver emulation is so different between the two drivers and there are pros and cons to each (more on that below).

Well over a year later, this is the result.

Differences
There are many differences between my driver and Mike's but I'll just mention the major ones;

* My driver can be customized with up to eleven different defines when building it. The overall objective is to reduce memory usage so most of the defines will cut some functionality - for example, the packet driver emulation, the read-only RAMDrive, code for collecting statistics etc.

Here's a table for easy comparison:
Code:
                My driver    (smallest)        (largest)         Mike's driver
   File size:                4170 bytes        5263 bytes        6768 bytes
   Memory usage (1 drive):   4336 bytes        5440 bytes        6144 bytes
* If you press Control while my driver is loading you will get 1) additional info about build flags (defines) used when the driver was built (useful for determining the exact version for debugging purposes or to recreate the driver at a later time) and 2) the option of not loading the driver (useful if the driver causes a hang at boot under older DOS versions where stepping through CONFIG.SYS is not possible).

* My driver has HLT instructions in the polling loops where the driver is waiting for a response from the server. The idea is to reduce power usage and heat emission. Note that this can in rare cases cause problems (hangs) on some buggy CPU:s such as early revisions of the 486DX4-100 CPU:s. If so, adding NOHALT to the end of the command line in CONFIG.SYS will disable this functionality. (Please let me know if this is required for any of your machines.)

* The packet driver emulation in Mike's driver can only accept one protocol registration at a time. This means that it will only work with applications that uses just one protocol, or, with applications that registers themselves as receivers of all packet types - the mTCP applications all belong to this group. Any application that uses more than one protocol and does not register itself as a receiver of all packet types will not work with Mike's driver.

My driver will accept one registration each for IP and ARP, or one registration for all packet types. Any registration attempts for other protocols will be passed on to the real packet driver (where they will probably fail since NETDRIVE.SYS is already registered for all packet types).

In practice there's very little difference between the drivers in this regard. The mTCP applications seems to work just fine with my driver but I've not been able to get ARACHNE or LINKS to work. Please let me know if you find a packet driver application that works with my driver but not Mike's (or vice versa).

* My driver's packet driver emulation feature hooks the same interrupt as the actual packet driver. In other words, it won't use the packet driver shim interrupt in your mTCP configuration file. Though it should work just fine with that enabled so no configuration changes are required.

* Due to the way my driver is initialized, you can not unload the packet driver once you've connected a NetDrive, even if you disconnect all drives. If you need to be able to unload the packet driver without a reboot then you must use Mike's driver.

Disclaimer
This code has not been extensively tested. I will not accept responsibility for any data corruption or data loss. Make backups before using this driver.

Download
I do plan to release this as open source but for now it's just binaries. The source still needs a lot of work - clean ups, adding more comments and such to make it easier to read and maintain. Meanwhile, if you're really curious, just do what I did - reverse engineer it! :)

The download contains two versions of the device driver and you can probably guess the differences just by looking at the file name extensions;
NETDRIVE.086 - For use with 8088 and 8086 CPU:s. You can use this version with any CPU, but it requires slightly more memory.
NETDRIVE.186 - For use with everything else, including the NEC V20/V30 CPU:s.

These drivers are the full-featured versions - no defines have been used to reduce memory usage which means they are functionally equivalent to Mike's latest released version (250110).

Feedback
I would love to get all kinds of feedback - bug reports, benchmarks, feature requests and so on.

Thanks!
 

Attachments

Hi,

Not to be too nit-picky here, but my code is GPL 3.0 and if you started with my code, then your code is also GPL 3.0. Which means you are bound by the GPL 3.0 license terms too:
  1. You have to ship the source code, not just the binaries.
  2. You need to include the GPL 3.0 license file in whatever you distribute.
  3. You need to attribute the original author and their work in whatever you ship.
Also, I would encourage you to do more testing. I do extensive testing on a variety of machines and virtual machines to ensure it works and doesn't corrupt data. I really don't want alternative versions of NetDrive doing something bad, and then I have to deal with the fallout or my project's name gets tarnished.
 
OK, so it's been a couple of weeks and I've been working hard on getting my source code presentable to the world. So here it is, as promised. Be warned though, there's so much conditional assembly that trying to read it might cause your head to explode. :)

@mbbrutman; Mike, I think this should be in alignment with the license. Please let me know if I've missed something.
 

Attachments

OK, so I'm officially an idiot. The two versions I've released so far both have serious bugs in them.

The first didn't support multiple drives due to the -D:xx parameter not being recognized because of a very simple last minute change that I didn't actually test (because it was, after all, a *very* simple change).

The second release was worse - I reduced the size of the OutgoingPacket buffer too much, forgetting to account for the space required by the NetDrive protocol header, resulting in the very first two-sector write overwriting the StrategyRoutine as well as the start of the InterruptRoutine. On the next call to the driver it would crash causing the machine to hang. There was also another bug where I had removed the filler needed to make ARP response packets the minimum size for Ethernet (60 bytes) which in itself would have been fine if it weren't for the fact that NETDRIVE.EXE writes to that area overwriting some variables in the driver.

So apologies to everyone for wasting your time. Not that there's been any complaints. In fact, there's been no feedback at all, which makes me wonder if all the downloads are made by bots? :D

Anyway, third time's the charm?

This release supports the new read-ahead cache functionality so it is equivalent to Mike's latest release (250428).
 

Attachments

FYI, that zip file with the read-ahead code isn't what I'd call a release. It's test code that I wanted people to try out to see if it was interesting to them or if they could see any speed improvements. As you noted elsewhere there has been no feedback, so I'm going to assume the feature is not interesting to people.

Going forward I would advise you to not bother reverse engineering or re-implementing changes like that because they may not stick. If I release source code and do a formal release, then you know it's going to stick around.

(That particular code was tricky, even with the comments and source code. So unless you copied everything I did including the order of some of the instruction sequences, you might have more unpleasant bugs. The tricky parts were ensuring that late UDP packets did not introduce stale data into the read-ahead cache and keeping the read-ahead streams of multiple different drive letters from interfering with each other.)
 
FYI, that zip file with the read-ahead code isn't what I'd call a release. It's test code that I wanted people to try out to see if it was interesting to them or if they could see any speed improvements. As you noted elsewhere there has been no feedback, so I'm going to assume the feature is not interesting to people.

The downside of the read-ahead cache, at least in its current form, is that it requires a lot of memory. Presumably very few people will connect to a server outside of their own home network so the performance is going to be acceptable as-is without read-ahead caching. For most people the speed increase is just not going to be worth the additional memory usage.

Yes, I had realized that this is test code and that you have plans for future changes. I don't know exactly what you had in mind but I would like to suggest the following idea I had (and I'm guessing your plans involve something similar); Add another parameter to the CONFIG.SYS command line to enable the read-ahead cache and configure its buffer size. So, for example; -B:x where x is a digit in the range from 0 to 4 or perhaps 1 to 4. Then the cache buffer size and the size of the array of StartingSectorNumbers can be set during initialization. Doing this would allow people to adjust the memory requirement of the driver to their own liking.

Going forward I would advise you to not bother reverse engineering or re-implementing changes like that because they may not stick. If I release source code and do a formal release, then you know it's going to stick around.

No worries, Mike. I do this for fun and for the challenge of making the driver as small as possible. If I'm having fun it's not a waste of time, is it? Besides, I've made the driver so modular that it's actually absurd. I can now make the driver with the NOREADAHEAD define and it's like it never existed. Since it's all in blocks of conditional assembly it's very easy to rip out if needed.

(That particular code was tricky, even with the comments and source code. So unless you copied everything I did including the order of some of the instruction sequences, you might have more unpleasant bugs. The tricky parts were ensuring that late UDP packets did not introduce stale data into the read-ahead cache and keeping the read-ahead streams of multiple different drive letters from interfering with each other.)
I did start to wonder why you copied DriveData.wUnitSequence (as I call it) to a separate variable and almost undid that to optimize away that variable but I realized that there's probably a good reason for it. I later found out that all the read-ahead sectors are transferred under the same sequence number as the original request so I eventually figured it out.

I did however optimize away the two word variables after the array of StartingSectorNumbers - they are now kept entirely in registers. I also removed the shift-by-CL in the read loop since it's very slow and it was linear anyway (the index is simply incremented so no point in shifting all the way from bit 0).

However, despite (or perhaps because of) all my optimizations, my driver is consistently slower than yours for all read-ahead sizes larger than 0. In fact, the larger the read-ahead size is, the slower my driver is, relatively speaking. My driver is slightly faster when the read-ahead is set to 0 though. My flabbers are thoroughly gasted. :)

I don't know if it's because I'm doing all my testing in QEMU (and QEMU for some weird reason would be emulating longer and slower instructions faster than they are on real hardware) or if it's just that the loop churns through the read-ahead sectors faster than they can arrive, resulting in the driver sending a request for more to the server. BTW, how does the server handle the read-ahead sector streaming? Does it just send everything at once as fast as possible or does it insert a delay between packets? I'm guessing the latter.

Anyway, here's some feedback from me at least. As I said, the testing is all done under QEMU. The results from the nd_cache utility are actually quite misleading due to the way it does rounding so I've also used IOTEST for testing transfer speeds.
 

Attachments

That's funny because I wrote ND_CACHE to get around limitations I found in IOTEST.

ND_CACHE should be pretty accurate, except it is constrained by the 55ms granularity of the BIOS timer tick.
  • It waits for the BIOS timer tick counter to be updated before starting a test, eliminating on average 27.5 ms of rounding error.
  • It uses DOS absolute sector reads to avoid the overhead of any FAT lookups needed during the test.
  • It rotates through sectors on the disk to eliminate the possibility of help from DOS BUFFERS or old read-ahead data.
  • It waits for the last issued read to complete, and then computes the rate based on the total elapsed time, not just cutting off exactly at 5 seconds or whatever the test length was.
  • Even though it's using integer math it's scaling the results so that you can get millisecond accuracy. (It should be off by no more than 55ms, and the average error is 27.5ms which over a 5 or 10 second test is minimal.)
IOTEST is reasonable code but it has some limitations .. like not bypassing DOS reads. That makes it susceptible to DOS overhead for FAT lookups, which is the only safe way to test write speed but that limits its utility as a device level benchmark.
 
Since I mentioned rounding it should've been obvious that I was talking about the presentation of the results (in KB/sec), not the actual measurements. I'm sure it's a simple mistake so there's no need to get defensive about it.
 
The presentation of the results is in 10ths of a KB/sec ... I'm failing to see where the mistake is, or how much precision you think should be there.

VirtualBox_DOS_31_05_2025_19_34_53.png
 
I expect it to round to the closest tenth of a kilobyte since that's how it's presented. Let me show you an example from one of the text files I posted above;
Code:
Cache size: 0 KB,     64 KB read in   9020 ms,     7.0 KB/sec
Cache size: 1 KB,     96 KB read in   6930 ms,    13.0 KB/sec
Cache size: 2 KB,     96 KB read in   5225 ms,    18.0 KB/sec
Cache size: 3 KB,    128 KB read in   6435 ms,    19.0 KB/sec
Cache size: 4 KB,    128 KB read in   5115 ms,    25.0 KB/sec
That's from KRILLE.TXT. Here's what I would expect to see;
Code:
Cache size: 0 KB,     64 KB read in   9020 ms,     7.1 KB/sec
Cache size: 1 KB,     96 KB read in   6930 ms,    13.9 KB/sec
Cache size: 2 KB,     96 KB read in   5225 ms,    18.4 KB/sec
Cache size: 3 KB,    128 KB read in   6435 ms,    19.9 KB/sec
Cache size: 4 KB,    128 KB read in   5115 ms,    25.0 KB/sec
Actually, let's also look at another example, MIKE.TXT;
Code:
Cache size: 0 KB,     64 KB read in   9295 ms,     6.0 KB/sec
Cache size: 1 KB,     96 KB read in   6930 ms,    13.0 KB/sec
Cache size: 2 KB,    128 KB read in   6490 ms,    19.0 KB/sec
Cache size: 3 KB,    128 KB read in   5060 ms,    25.0 KB/sec
Cache size: 4 KB,    160 KB read in   5830 ms,    27.0 KB/sec
And what I would expect to see;
Code:
Cache size: 0 KB,     64 KB read in   9295 ms,     6.9 KB/sec
Cache size: 1 KB,     96 KB read in   6930 ms,    13.9 KB/sec
Cache size: 2 KB,    128 KB read in   6490 ms,    19.7 KB/sec
Cache size: 3 KB,    128 KB read in   5060 ms,    25.3 KB/sec
Cache size: 4 KB,    160 KB read in   5830 ms,    27.4 KB/sec
As you can see, the presented results can sometimes deviate as much as 13% from the actual transfer speed which is why I think it's misleading.
 
Ok, I see the 6.0 vs. 6.9 KB/sec discrepancy ... that's a bug I'll look into.

That's a much better way of describing a problem than just a off-hand comment saying that something is "quite misleading." There is no intention to mislead.
 
Back
Top