• Please review our updated Terms and Rules here

Read files byte by byte on old Hard disks

Mills32

Experienced Member
Joined
Sep 25, 2018
Messages
149
Location
Spain
I want to improve a VGM player I made for PC/XT.
VGM files (most of the time bigger than 64KB) must be processed in order to be even faster to play. To do that, I need to read the files byte by byte using fread(), then process every byte, store in ram, and read the next byte. If I use a buffer in RAM to read 64KB chunks, the code becomes a bit complex, because I have to check if the processed data reached a 64KB boundary and...

Imagine many VGM's will be around 128KB, so it will take a few seconds for the 8088 to read that, byte by byte.
If you use emulators, fpgas or sd card adapters I guess there's no problem. But what about real old HDs? Could this damage them?.
 
Last edited:
You're not reading "byte by byte" anyways, even if you only ask for a single byte using fread (). The system will read a complete sector (or even more) off the hard disk (typically around 512 bytes), store it in a buffer and hand you over single bytes from there until the buffer is exhausted, then read the next sector.
 
Last edited:
No matter what your code says, the operating system is still reading 1 or more disk sectors at a time, buffering them, and delivering them to your program as you specify.
 
Thanks!, so I have another buffer to free ram in MS-DOS, I didn't know that.
 
If you're using stdio.h, there (depending on implementation) is almost always an API to control buffering:
setvbuf, setbuffer,... Implementing your own buffer on top of this is likely to slow things down. If you want to do your own buffering, you need to drill down another level.
 
To read byte by byte, you can just use `fgetc`. Internally, the system will buffer and hand off 1 char (byte) at a time.

Doing it yourself is not difficult.

Code:
#define BUFSIZ 1024
char buf[BUFSIZ];
FILE *infile = fopen("file.dat", "rb");
size_t bytes_read = fread(buf, BUFSIZ, 1, infile);
while(bytes_read > 0) {
    int p;
    for(p = 0; p < bytes_read; p++) {
        process_byte(buf[p]);
    }
    bytes_read = fread(buf, BUFSIZ, 1, infile);
}
fclose(infile);

Caveat, I have not written C in, like, 20 years. So, there may be an entire salt mine lurking in this. But that's the gist of it.
 
I didn't want to really read byte by byte, I assumed that was slow and maybe dangerous to the hard drive. So I'm happy if the system has a buffer, (even if it is small in 8088 hard disks).
 
There simply is no technical way, not even a low-level one for the CPU to read a single byte from the hard disk, (That's why it's called a "block device") - Whatever you do, even whatever language you use, you can only read or write larger units.
 
If I use a buffer in RAM to read 64KB chunks, the code becomes a bit complex, because I have to check if the processed data reached a 64KB boundary and...

Never looked into the VGM format, but my instinct would be to check if I could unroll the 'processing' loop and make it handle 16(*n) bytes per iteration. Then increment your segment register by 1(*n), reset the offset to 0, and repeat. No boundary checking needed.
 
64K is a tough buffer size because 16-bit registers hold a value from 0-65535. If you ask malloc in C for 64K, it would probably see it as 0 and not allocate anything.

What is the different between just working with 32K? If you use fread, it is going to pull the next 32K from the file and then you can process it in chunks.

You can always use an array of pointers and allocate 32K to each one, then use /32768 to decide which pointer to use and %32768 to decide the offset into that pointer.

I posted some code here somewhere that allows using EMS/XMS/conventional memory as a giant blob.
 
Last edited:
64K is a tough buffer size because 16-bit registers hold a value from 0-65535. If you ask malloc in C for 64K, it would probably see it as 0 and not allocate anything.

What is the different between just working with 32K? If you use fread, it is going to pull the next 32K from the file and then you can process it in chunks.

You can always use an array of pointers and allocate 32K to each one, then use /32768 to decide which pointer to use and %32768 to decide the offset into that pointer.

I posted some code here somewhere that allows using EMS/XMS/conventional memory as a giant blob.
The problem is, when processing VGM files you might read a 0x61 value at the end of first chunk, (that means, a 16 bit delay is at the start of the next chunk), or even worst, the last byte of the chunk is one of the two bytes that define the 16 bit delay... So the code was getting very complex.
 
Back
Top