Handshaking is there for a reason. It works, and you should utilize it. If "everyone" is basically blocking on I/O, then the handshaking will let you read/write/stream "as fast as practical". As long as your UART buffer can keep up (i.e. with a very high baud rate), you should be fine.
Are you writing this in assembly? Are you doing that on purpose? You could probably knock this out in Turbo Pascal or any of the C implementations quite quickly.
Again, with handshaking, "efficiency" is less of an issue. While TP or C will have their core loops slower than assembly, this is going to be dominated by the disk and serial I/O, so any inefficiency introduced by using a higher level language will be mostly moot. At a minimum, it's an easy way to prototype at a high level before you fall into the bowels of assembly.
But, please, get hardware handshaking to work -- it will make your life much easier.