mTCP Backup Program/Script Generator (v 0.01)

cr1901 · Apr 15, 2014

Like the title says, call this version 0.01. This little program I have attached will create an mTCP FTP script that will backup your FAT16 file system tree to your private or perhaps public FTP server. It will also generate the corresponding restore script to be executed in the directory specified as the local root (meaning on a newly formatted disk, you need to manually create directories down to the root of the backup tree). You need to set the following mTCP variables in your MTCPCFG file:

NWBACKUP_SERVER- Ip address/name of server. Not used, but must be set.
NWBACKUP_USERNAME- Username to the server that you wish to send backups.
NWBACKUP_PASSWORD- Password to said server.
NWBACKUP_DIR- Directory on the server to hold your backup directories.
NWBACKUP_FORMAT- Set to "msdos3.3". Not used, but must be set.
NWTEMP- Not used, and need not be set. Will hold the directory to store temp control files eventually.

Quite honestly, this is really a hack more than anything. These scripts are prone to error on the server side since the scripts assume that each FTP command executes successfully, and there is limited features/functionality (directories are hardcoded into the scripts). A better, more elegant solution would be one that I didn't code in less than 24 hours XD, and one that sends FTP commands directly to the server rather than piggybacking off of Mike's work. BUT... since his FTP client has a batch mode, I might as well put it to use. It could also potentially be put as a command into the FTP program itself. Additionally, and more importantly, I would've needed to implement nearly all the functionality I have already (directory traversal and name conversion) had I decided to just create my own client from scratch (which I have done and discarded no less than 3 times already), so this is certainly a start to work up to that.

This will probably go through a rewrite; traversing the directory tree and creating the file system data structure was HELL, especially converting path names. I'm not proud of the function that I've created that does it (there HAS to be an easier, more elegant way than my solution, which is attached), the code is a bit difficult to read and littered with commented debug statements- as well as requiring this mess of arguments:

Code:

int8_t allocDirs(const char * name, const uint16_t indexMaxSize, \
	const uint16_t dirNameMaxSize, char *** dirStr, uint16_t * numDirs, \
	uint16_t * dirStrSize, int level)

I had to pass a number of variables by address to be modified :/. Of course, the code is also most likely a bit inefficient; there are 11 different error codes and checks for directory traversal alone. In light of Heartbleed, I've been programming more defensively. This uses OpenWatcom's POSIX implementation functions to traverse the directory tree. POSIX opendir() will fail on hardware (not DOSBOX) if the initial opendir() does not have a trailing backslash, for some reason- at least I THINK it does... it acts so weird :/!

I have verified this script (backup) works on my PC AT- it takes 2 to 5 seconds for a 20MB/40ms seek drive with a dozen or so directories to generate a script. MPUTs with a large number of files- 500+- will fail. I haven't decided how to handle this yet, or if there is a way to query how long the FTP STOR strings the server must support at a minimum.

I'm not ready to create a git repo just yet, but I will post the source and the exe (put it into it's own MTCP APP directory in the MTCP tree). However, I hope some people are willing to test on their FAT16 drives if they have Network-Attached Storage (NAS) with FTP.

Enjoy! :D

mbbrutman · Apr 15, 2014

I have not taken a very deep look at this but I have some initial thoughts:

The scripting support in the mTCP FTP client is minimal. It is not even really scripting support - "scripting" implies that it is programmable. What I have there now is basically just the ability to read commands from a file instead of the keyboard. It is good for repetitive tasks, but it is not robust.

One serious drawback to this approach is that in case of an error, it will just keep plowing ahead. It can not detect an error at all. That's not great for a backup program.

I think that a better way to do this is to use the FTP client as a starting point for a new program. You can strip the user interface portion and replace it with new code that calls the FTP internal functions directly. The new code would implement the recursive directory walk, starting at the root and transferring each file. The DOS findfirst and findnext calls let you walk directories easily enough. Each time you encounter a subdirectory you can push your current directory on the stack and enter the new directory. I think even a machine with a modest amount of RAM can handle anything that the DOS filesystem can throw at it - you can never really get more than 30 or so directories deep because of the limit on the length of DOS paths.

With this design you get rid of the need to generate a script. You can also detect and react to errors. And you will not run into any arbitrary limitations with MPUT. MPUT has to hold the list of files in memory; and there is a fixed amount of RAM set aside for doing that. If I had some time I would change what I did with MPUT so that it would not scan the directory first; it would just use findfirst and findnext as it needed to. (A similar fix can not be made for MGET ... that is a different problem.)

cr1901 · Apr 15, 2014

I highly appreciate your quick response. It might be best for your health NOT to look at the source.

mbbrutman said:
The scripting support in the mTCP FTP client is minimal. It is not even really scripting support - "scripting" implies that it is programmable. What I have there now is basically just the ability to read commands from a file instead of the keyboard. It is good for repetitive tasks, but it is not robust.

I'd like to add that that particular feature is extremely valuable to me- it sure as hell beats writing an expect script. I suppose this is more a proof-of-concept than to be seriously used (although it does work, yes, it cannot be relied on).

mbbrutman said:
One serious drawback to this approach is that in case of an error, it will just keep plowing ahead. It can not detect an error at all. That's not great for a backup program.

Indeed, hence this is why I labeled this version 0.01, and that it should only be used in a reliable environment. Another reason for releasing this in the first place is to show that I have the basis coded for such a program, although it will most likely be completely rewritten.

mbbrutman said:
I think that a better way to do this is to use the FTP client as a starting point for a new program. You can strip the user interface portion and replace it with new code that calls the FTP internal functions directly. The new code would implement the recursive directory walk, starting at the root and transferring each file.

(If you're short on time, these two are the important paragraphs) In my initial implementation of just the FTP portion, I created a state machine for the backup program based on your sample code, and used FTP primitives (USER, PASS, APPE, etc) directly rather than your FTP functions for "minimal overhead".

At least in your Netcat and FTP program, at the beginning of EVERY loop iteration you send and receive new packets. One issue that I haven't been able to satisfactorily solve up to this point is how to handle that in some states in my backup program, it's not appropriate to send and receive new packets at the beginning of the loop (i.e. when changing directories, or opening a data socket in response to a 100-series code). My state machine is a mess because state transitions occur in response to the control code sent by the server for completion of the state in the previous loop iteration.

My state machine would be much more natural is packet sending and receiving was delegated to each case in the FSM switch statement, instead of having a "global" send and receive at the beginning of the main loop. However, I take it there is a good reason that you design your programs to retrieve packets every loop iteration regardless of whether the next state (such as FTP local mkdir) needs them. What reason would that be? Should I just keep a "global" send and receive packets at the beginning of my main program loop.

mbbrutman said:
The DOS findfirst and findnext calls let you walk directories easily enough. Each time you encounter a subdirectory you can push your current directory on the stack and enter the new directory. I think even a machine with a modest amount of RAM can handle anything that the DOS filesystem can throw at it - you can never really get more than 30 or so directories deep because of the limit on the length of DOS paths.

As far as I can tell, this is the canonically correct way to do directory traversal... and will be something I implement for v0.02 (I used a recursive routine which created all directories at once and stored them in a nice big malloc()'ed block). To keep it an iterative algorithm however, I'd have to maintain my own stack, but I can decide whether to make it iterative or recursive later. I used opendir(), because I wasn't sure if findfirst was recursive (call it to traverse a subdirectory before the parent directory was completely traversed).

mbbrutman said:
With this design you get rid of the need to generate a script. You can also detect and react to errors. And you will not run into any arbitrary limitations with MPUT. MPUT has to hold the list of files in memory; and there is a fixed amount of RAM set aside for doing that. If I had some time I would change what I did with MPUT so that it would not scan the directory first; it would just use findfirst and findnext as it needed to. (A similar fix can not be made for MGET ... that is a different problem.)

Now that I think of it, I am not sure whether the MPUT failure was a server or client response, but your explanation makes far more sense. I'll take a look when I have some time.

I apologize in advance if looking at this code made your brain melt. It doesn't follow your preferred formatting style either

(I'll use Astyle to fix that).

mbbrutman · Apr 16, 2014

I might not be understanding you question but it is always safe to make the calls to process incoming packets, drive outgoing ARP requests, and drive outgoing TCP/IP packets.

Packets arrive on the wire to the Ethernet device and generally cause a hardware interrupt. The hardware interrupt is to tell the packet driver to scoop the packet out of the device buffers and get it copied into memory. mTCP and other programs that use packet drivers provide a pool of buffers for the packet driver to use. This part of the code is entirely interrupt driven and you never have to worry about it unless the rest of your code is not processing the packets fast enough.

The three magic lines that you see sprinkled in mTCP event loops are:

PACKET_PROCESS_SINGLE;
Arp::driveArp( );
Tcp::drivePackets( );

The first line is a macro that looks for new packets in a ring buffer. When the packet driver gets a packet, it calls mTCP twice. The first time is to get a buffer to use and the second time is to give the buffer back to mTCP with the new packet data in it. If new buffers are found in that ring buffer they are processed. Processing involves looking at the packet type and calling through the different layers of the TCP/IP library until the packet processing is completed. This usually ends with new data being put in a socket receive buffer and the original buffer being returned. When your TCP/IP code does a recv call on a socket, it gets that data.

The second line is to send (or resend) any ARP packets that need to be sent out. ARP is responsible for handling its own timeouts so if you sent an ARP request you need to keep monitoring to see if you got a response or if you timed out. If you get a response it is handled by PACKET_PROCESS_SINGLE and all of that processing. But if you time out you need to send another ARP request.

Tcp::drivePackets works pretty much the same way - it is the timeout detector and re-transmit logic for TCP packets.

You can safely do this processing at any time with respect to your code flow/state machine. If any data comes in it will be sitting in a receive buffer some where, which you won't look at until you do a recv call when you are in the right state. You want to do this processing fairly often because if you don't service the packets and return them, you will run out of space for the packet driver to use and it will have to start dropping newer packets.

As for directory processing ...

It can be done with the posix looking routines, but I'm sure that findfirst/findnext can be used too. I can pretty much guarantee that the posix routines are just wrappers for the findfirst/findnext routines to make them re-entrant. Looking at the Watcom source code would tell you how they did it.

Mike

cr1901 · Apr 17, 2014

Now that I actually made a serious attempt to code the logic for such a program:
https://github.com/cr1901/nwbackup

Right now, only the directory traversal code is done. But... it DOES work, and it is an iterative solution using a stack of directories instead of recursion. The logic which processes the loop in a real backup application would go approximately at the top of the loop for normal files (printf("Current Dir+file: %s\\%s", path, currFile.name); in TEST\DIRTEST.C), and there is logic for handling directories if necessary as well (including chdir, which is not included in TEST\DIRTEST.C).

This may have been the most difficult part... but I'm not sure as of now. We will see.

MikeS · Apr 17, 2014

I haven't really looked at any of this so maybe it's a dumb question, but couldn't you just use DOS for traversing the directories, using FOR and a PD SWEEP-like utility? I do that daily with various procedures and programs, including backups.

I take it that once it's connected mTCP takes its parameters from the keyboard or a file but not the command line?

I.e. you can't just say SWEEP FOR %A in (*.*) do FTP PUT %A ?

Ole Juul · Apr 17, 2014

I'm not completely understanding this either. Presumably the idea is to make something small and fast that works automatically. I use a batch file which is under 400 bytes including music and comments. I type "CDXBAK driveletter" and the specified drive is zipped and mFTPd to the appropriate directory on a machine called CDX. To get it all back I type "CDXGET". I'm not exactly sure how that could be made quicker, smaller, easier, or better in any way, but the idea of making it a no-brainer so a person doesn't have to be familiar with batch files would probably be much welcomed.

cr1901 · Apr 17, 2014

Ole Juul said:
I'm not exactly sure how that could be made quicker, smaller, easier, or better in any way, but the idea of making it a no-brainer so a person doesn't have to be familiar with batch files would probably be much welcomed.

Well, part of it was I'm too damn lazy to generate my own directory listings by hand (I have too many machines)

, and I wasn't aware that batch was powerful enough to iterate over directories. Also, for at least early DOSes (2.x), not all batch commands are recognized. DOS 2.x doesn't even recognize @echo off!

In line with my coding style (portability over speed), the code itself is open for me to port it to other embedded targets.

cr1901 · Apr 17, 2014

Ole Juul said:
I'm not exactly sure how that could be made quicker, smaller, easier, or better in any way

Error checking, auto-reconnect/repeat failed "transaction"? Possibly in the future differential backups? I mean, sure, in practice, a batch file to a home server isn't going to fail (as my generator program didn't fail), but... it feels like there could be a more elegant solution in this.

mbbrutman · Apr 18, 2014

Ole,

Your method assumes that you have enough space to create a zip file for whatever it is you are backing up and it also requires you to do the file compression on the machine. That is great for network bandwidth, but if the machine is slow it makes more sense to just send the files uncompressed and let the faster/better machine you are backing up to do the compression.

That solution works for you. But a more generally applicable solution does not require you to have that much extra space laying around.

Another way to do this is to not do file level backups, but to do drive or partition level backups instead. I recently hacked up the netcat program to read raw sectors from the drive using BIOS and send the data to a Linux machine. That allowed me to do a full image backup of the drive. On the target machine I "imported" the raw image into Virtual Box and was able to mount the result in Virtual Box and see the data.

Ole Juul · Apr 18, 2014

mbbrutman said:
That solution works for you. But a more generally applicable solution does not require you to have that much extra space laying around.

You're right. It does require a lot of space and wouldn't be practical on a 32Mb drive. I'm doing it on an 8Gb drive with 10 partitions so there's a luxury there.

I guess the reason I didn't see this backup program as a solution right away is that I personally keep all my vintage programs and utilities on non-vintage media in multiple copies. All I could likely lose is a setup. I do occasionally back up my whole daily user DOS machine, but it is only the text files such as my writing, bat directory, and phone book that really matter a lot. My real vintage machines just need floppy copy.

cr1901 said:
Error checking, auto-reconnect/repeat failed "transaction"? Possibly in the future differential backups?

Error checking sounds very good. Differential backups would be a serious plus. I never did get rsync working in DOS.

Edit: Changed 32 Gb to 32 Mb. Oops!

cr1901 · Apr 18, 2014

Ole Juul said:
I never did get rsync working in DOS.

Neither did Mike... I seem to recall him saying that the person who wrote the DOS port used too many Turbo C extensions that it wasn't worth porting.

mbbrutman · Apr 18, 2014

cr1901 said:
Neither did Mike... I seem to recall him saying that the person who wrote the DOS port used too many Turbo C extensions that it wasn't worth porting.

Which Mike? Mike C? I don't remember trying to port rsync.

Also, I thought there was a port of rsync already using the WATTCP library.

MikeS · Apr 18, 2014

Speaking of Mikes, looks like I'm being rudely ignored again ;-) guess I'll have to install mTCP one day and find out for myself why a batch file wouldn't work (probably should have done that in the first place). FWIW FIND should let you do differential/incremental backups.

cr1901 · Apr 18, 2014

MikeS said:
Speaking of Mikes, looks like I'm being rudely ignored again ;-) guess I'll have to install mTCP one day and find out for myself why a batch file wouldn't work (probably should have done that in the first place). FWIW FIND should let you do differential/incremental backups.

Yea, I somehow forgot to quote you for the second part of my initial response. I wouldn't know how to chain commands together to create a backup system only in batch. The only pipe that I'm really aware of that consistently works is "dir | more"*. The other issue is- I can't imagine that having DOS call the FTP program each file is efficient.

barythrin · Apr 18, 2014

I'm sorta guessing it doesn't work (don't have a vintage system or early DOS version near me) but newer versions (perhaps just NT based) batch has some logic for things like:

for /F %%a in ('dir /b *.txt') do something with %%a

That's how I've written some newer batch scripts for either running a group of patches in a directory, a simple ping each IP in a range scanner, etc. Or you could echo it to a file that perhaps the ftp program reads for the mget/mput.

But again I'm thinking that's in newer dos not original batch file language but could be wrong or it could have some logic still.

cr1901 · Apr 18, 2014

I guess now would be a time to put a disclaimer... I'm not particularly fond of either batch or shell scripting (the Linux analogue or batch). I'd rather write solutions in assembly or (preferably ANSI C), mainly because it's actually easier for me

. In fact the only scripting language I really like is Python... and if a Python 2.7 (or subset) port to DOS were feasible, I'd do it.

MikeS · Apr 18, 2014

cr1901 said:
Yea, I somehow forgot to quote you for the second part of my initial response. I wouldn't know how to chain commands together to create a backup system only in batch. The only pipe that I'm really aware of that consistently works is "dir | more"*. The other issue is- I can't imagine that having DOS call the FTP program each file is efficient.

It doesn't necessarily have to call FTP every time, it could maybe create more or less the same script file that you're building.

What does it actually look like?

And I'm not suggesting that there's anything wrong with your way, or that DOS would be more efficient; I'm just curious how yours works and why DOS can't do the same.

ISTR that there was indeed a Python for DOS, but that was a long time ago...

cr1901 · Apr 18, 2014

Right now, there is no effective difference between a batch script and my way... the original program I posted was a very inelegant way to accomplish the directory traversal using C functions. The program proper should be able to handle restarting transfers and handle error conditions. Differential backups and restoring individual files should be a bonus. A "better" program will require the directory traversal algorithm anyway.

Your method probably could create effectively the same script- I just don't know batch well enough to accomplish it (nor do I have the patience to debug such a script).

Ole Juul · Apr 19, 2014

cr1901 said:
I guess now would be a time to put a disclaimer... I'm not particularly fond of either batch or shell scripting (the Linux analogue or batch). I'd rather write solutions in assembly or (preferably ANSI C), mainly because it's actually easier for me . In fact the only scripting language I really like is Python... and if a Python 2.7 (or subset) port to DOS were feasible, I'd do it.

Good "disclaimer".

As an amateur (and a rather inept one, at that) I am doomed to work with shell scripting, and indeed consider it the "Dos (or UNIX) way". That said, I totally agree that assembly or ANSI C is certainly superior and generally preferable - especially in the hands of someone, like yourself, who has that skill. My batch script, as Mike B. points out, is not suitable for many situations. It works for me because it's easy to do and the machine on which I use it is more than usually capable for a DOS box, but it's really a bit of a hack. However, as MikeS points out, it would be interesting if a more sophisticated batch script could do it in a more universal manner. If you do get your program up to snuff, I think it would be a good thing though.

BTW: The rsync for DOS which I have tried is version 2.2.5 and is a port by Chris Simmonds dated February 2003. From the README:

Code:

This is a port of the standard rsync program
(www.rsync.org) to the DOS operating system. It runs 
entirely in real mode, making it suitable for embedded 
and hand-held systems with only an 8086 processor and 640 
KB RAM.

At 286Kb it is rather large though. Has anybody else gotten this to work?

mTCP Backup Program/Script Generator (v 0.01)

Veteran Member

Attachments

Associate Cat Herder

Veteran Member

Associate Cat Herder

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Associate Cat Herder

Veteran Member

Veteran Member

Associate Cat Herder

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member