• Please review our updated Terms and Rules here

FTP filenames and spaces

mbbrutman

Associate Cat Herder
Staff member
Joined
May 3, 2003
Messages
6,408
One of the limitations of my FTP code is that it considers the space character to be a delimiter when parsing command line input. This is certainly true for DOS filenames, but a space might be a legal character on another OS like Unix.

I want to add the ability to handle filenames with spaces in them. There are two primary ways to do this:

  • Use quotes around strings with embedded spaces that are to be treated as a single parameter: This is natural, but the quote character is a legal DOS filename character. So a leading quote might not be a special character, but an actual part of a filename.
  • Use an 'escape' character like the backslash to escape special characters: I would only need two escape sequences to do this, a '\\' to mean a single slash and a '\ ' to mean a space. But typing filenames with spaces this way is awkward. And the backslash is the DOS path delimiter so there might be cases where a \\ is required where none is required now.

Which method do you prefer? Have any other ideas?

I'm also looking to support filenames with characters that are not in the standard ASCII 32 to 126 range to make my European friends happier. I think I can do that easily by just being less restrictive while I accept keyboard input. (DOS should be able to handle filenames with the first character being 0xE5 safely. I'll only have a problem with 0xE5 if running on a DOS version earlier than 3.0.)
 
The single quote ( ' ) is a valid file name character, but not a double quote ( " ).

Windows 95 and newer uses double quotes to handle file names with spaces in them at the DOS prompt. For example:

COPY "C:\directory1\file name with spaces.txt" "D:\directory name with spaces\copy of file.txt"

You now also put double quotes around file name placeholders when you expect to encounter file names with spaces in them:

FOR %A IN (*.WAV) DO NORMALIZE.EXE "%A"
 
You are correct. I just dug back into my DOS 5 manual to double check, and I found the rules there. Valid characters are A-Z, 0-9, _ ^ $ ~ ! # % & ~ { } ( ) @ ' `

Unfortunately, quotes are still valid on other filesystems, so the problem still exists. It looks like Windows can use quotes to get around this problem, but Windows would still have same problem when dealing with a Unix system.

Using the 'escape char' method seems to be safe.
 
Over the Web, space characters are represented by the hex code 20, which shows up as %20 in the URL. Many web browsers automatically replace any space in a URL with "%20". However, I don't know how UNIX systems would deal with this syntax.
 
I downloaded a file via ftp today--I had to use double quotes around the name because of embedded spaces. It worked just fine--so, at least there's precedent.
 
I downloaded a file via ftp today--I had to use double quotes around the name because of embedded spaces. It worked just fine--so, at least there's precedent.

Yes, and I seem to recall some situations in which two double-quotes in a row ( "" ) are used to represent a double-quote within a double-quoted file name or string, so that may be the more intuitive and elegant solution, instead of using \ to represent a space and then \\ to represent a backslash.
 
Good luck with your implementation! As an example of a site which uses spaces, a fellow but unnamed forum member has a semi-private FTP server that is open on request. His server is filled with vintage PC software, both public domain and abandonware. Since a number of the software would apply to a computer using Mike Brutman's TCP/IP routines and applications, it was a bit problematic for me to fetch some files from the server. Then again one could argue why vintage software needs to have fully qualified file names with spaces and stuff, but perhaps it makes it easier for you to know what is what. Obviously a generated 00INDEX would have worked on that particular server, but we can't rename files on foreign FTP servers to fit our clients' needs.
 
Essentially you only need two things to specify any string: a delimiter and a way of escaping that delimiter to represent a literal occurrence. Sure, full-on nested quoting with proper escaping (like you get in Unix shells) is great, but it's deceptively tricky to program correctly and takes a lot more code.

The double quote is probably the most obvious choice for the delimiter seeing as the single quote is reserved in DOS. It's also what most people are used to using. I'd imagine that alone would cover the vast majority of cases. I'd sway towards the double-double quote to represent the literal as vwestlife suggests. Not only does it spare you the clash with the DOS directory separator - it also means you only have one escape sequence to worry about.
 
I think you're already aware of this but due to other standards in file naming conventions via ftp you often end up with folks making a directory out of invalid characters for that OS (well, ok Windows is the example) which you can't delete because it's not a valid filename. Quite a PITA but I think that suggests that there may not be a 100% compliant method out there? (if I remember correctly examples would be creating a folder named con or com1).
 
Sorry guys, 8.3 filenames are what I confign myself too! Everybody needs a little CP/M in their current lives - so these spaces just aren't on, nor quotations! :p :shock:

Embedded spaces are perfectly legit in CP/M filenames, as are control characters and lowercase. One of the favorite ways to hide a file back in the old days was to create a name with trailing backspaces. You can't create them if you go through the CCP interface, but that doesn't mean that they aren't supported.
 
I think you're already aware of this but due to other standards in file naming conventions via ftp you often end up with folks making a directory out of invalid characters for that OS (well, ok Windows is the example) which you can't delete because it's not a valid filename. Quite a PITA but I think that suggests that there may not be a 100% compliant method out there? (if I remember correctly examples would be creating a folder named con or com1).
I remember coming across unprotected FTP servers where the "warez"-hackers took over and created multiple layers of directories named with extended-ASCII characters that were very difficult to figure out and type in (you couldn't just cut-and-paste from the directory listing to the command line). In one case they actually crashed the FTP server by filling it up with movie files until the hard drive ran out of space. Finally that led the server admin to realize his mistake and make the server limited-access instead of wide-open!
 
I had already implemented the backslash mechanism because I was familiar and comfortable with it as a C programmer. But yes, it is messy because the backslash character is a path delimiter for DOS.

It seems that everybody likes the method using quotes instead, so I just implemented that. It was a little bit more complicated than I thought it would be, so I scratched what I had and started again using a simple state machine implementation. The state machine is a wonderful too - it forces you to think about all of the inputs in each state, and how they should be handled.

This code will be in the next version of FTP that I release. It will fix that little problem with the member who has an FTP site that has spaces embedded in the filenames. We know who he is. :) High bit character support will also be in there for my friends who can think bigger than standard 7 bit ASCII.
 
Would it be too brash to put in another request at this point? The support for an external script file would _really_ be handy for more serious use. It is a lot of work to type the user, pass, cd to dir, commands over and over for the same site, not to mention puts and gets when you get there. :) It is not so important for casual use or browsing, but for every day work I think the functionality of the program would be raised by an order of magnitude if this feature was implemented. I think you will actually find that regular DOS users run most of their tasks from scripts.

Again, sorry if this is a bad time - but I had to lobby. :)
 
I'm always taking requests, but I think this one is handled already!

FTP has always supported reading it's command line input from a redirected file. If you do the same thing every time, just put all of the input that you would normally type into a file and redirect stdin from that file like this:

ftp 192.168.2.10 < script.txt​

Don't forget that your script has to have your userid and password at the front, as these are the first two things you would normally type in.
 
I remember coming across unprotected FTP servers where the "warez"-hackers took over and created multiple layers of directories named with extended-ASCII characters that were very difficult to figure out and type in (you couldn't just cut-and-paste from the directory listing to the command line). In one case they actually crashed the FTP server by filling it up with movie files until the hard drive ran out of space. Finally that led the server admin to realize his mistake and make the server limited-access instead of wide-open!

It doesn't take a warez hacker. I download .RAR files with cyrillic characters very often. It's a riot (not!) to find how many programs refuse to understand them. Mike, I hope your ftp program handles character values from 128-255.
 
FTP has always supported reading it's command line input from a redirected file.

Now I remember. Sorry, I forgot what the real problem was with that, and so didn't make myself clear. :) It dumps you back out so it doesn't work for automatic login which is what I think is really needed most of the time. Some sites have hard to remember logins and/or passwords. Others only let you log into the root directory and then you have to move from there.

eg. to get to ftp.eunet.bg/pub/simtelnet/msdos/commprog/ you have to log into ftp/eunet/bg
 
Embedded spaces are perfectly legit in CP/M filenames, as are control characters and lowercase. One of the favorite ways to hide a file back in the old days was to create a name with trailing backspaces. You can't create them if you go through the CCP interface, but that doesn't mean that they aren't supported.
It is also possible to put real spaces into DOS 8.3 file names, if you either hack the FAT or install OS/2. OS/2 had a wonderful habit of creating a hidden file named "EA DATA. SF" on any FAT drive -- including those two "hard" spaces, not Windows 95 long-filename pseudo-spaces.
 
Now I remember. Sorry, I forgot what the real problem was with that, and so didn't make myself clear. :) It dumps you back out so it doesn't work for automatic login which is what I think is really needed most of the time. Some sites have hard to remember logins and/or passwords. Others only let you log into the root directory and then you have to move from there.

eg. to get to ftp.eunet.bg/pub/simtelnet/msdos/commprog/ you have to log into ftp/eunet/bg

Ah, that is STDIN ending, which means EOF.

I think that what you would like is the existing capability except instead of ending when STDIN ends, it is open ended and continues as an interactive session after stdin is exhausted.
 
It doesn't take a warez hacker. I download .RAR files with cyrillic characters very often. It's a riot (not!) to find how many programs refuse to understand them. Mike, I hope your ftp program handles character values from 128-255.

I originally rejected ASCII 128 to 255 as valid characters. That is the other thing I am fixing - see a few posts up where I talk about allowing high bit ASCII in the filenames.
 
Back
Top