• Please review our updated Terms and Rules here

How many bytes in an Intel Hex file is "optimised" for CP/M?

cj7hawk

Veteran Member
Joined
Jan 25, 2022
Messages
1,121
Location
Perth, Western Australia.
I don't mean based on the screen length, or anything like that - I mean what did most other software default to for these file? 8 data bytes? 16? 32? What was "normal" for most assemblers and the likes of the time that put out HEX format files?
 
Why would it have anything to do with CP/M? I prefer my .hex files to have 16-byte records because then the addresses line up, and it's easier to see where you are in the file. And I actually set up a utility to do this at work in the past week or so. (not something that I used more than once, but still)

16-byte records also fit nicely in the "standard" 80-column screen line.
 
On some of the software tools I used to use the 'record length' was variable...

Out of the 'fixed record' tools I have seen both 8 and 16 bytes - mainly 16 though.

Dave
 
I do have variable length output records, and I was originally thinking to set it to 32 to get as much as possible on the 80 column screen...

And then I figured I had no idea what was common, so easy enough to ask - :) And applies and reasons were all very appreciated...

I found an OS bug tonight while writing my Hex Output routines... Because I finally had enough data, I was writing extents and suddenly started getting weird errors... And tracked it down to my BDOS writing the extent back into the buffer, but fixed at $0080 - Which makes me wonder when a filename is matched and it writes it back to the DMA - Does CP/M default to the DMA at $0080 or whatever DMA is currently set?

I'm not sure about that one... But on the positive side, I've added HEX output to my assembler... Which I realize now is kind of important as when writing COM directly, or BIN directly, more than 1 org statement can really mess things up.

Anyway, I have to research what CPM does while matching files again... I'm not sure whether it's an OS bug or a Application bug I need to fix.
 
I did a sample of several non-CP/M (non-x86) architectures whose assemblers generated .hex files. All seem to default to a maximum record size of 16 bytes. (e.g. pic, avr, ISIS...)

So, perhaps not an official standard, but a convention.

As far as ORG statements and CP/M goes, I've got a few sample files that use several ORGs, don't see a problem there.
If you need a short sample of code that works and the hex file generated, just ask.
 
Last edited:
That does bring up the question - if the org starts on a non-boundary, eg, F7F4, Does it run a short record then go back to regular boundaries or does just just continue on from there every 16 bytes?
 
It depends who programmed it I am afraid...

The HEX file was not really meant to be looked at by a human. It was designed for a computer to process and, therefore, layout was not (perhaps) considered a high-priority requirement.

The only thing would be to limit the line length so that any HEX reader could work by allocating a sufficiently large buffer to hold one line. Even then, there way ways around this limitation also.

Dave
 
I do have variable length output records, and I was originally thinking to set it to 32 to get as much as possible on the 80 column screen...

And then I figured I had no idea what was common, so easy enough to ask - :) And applies and reasons were all very appreciated...

I found an OS bug tonight while writing my Hex Output routines... Because I finally had enough data, I was writing extents and suddenly started getting weird errors... And tracked it down to my BDOS writing the extent back into the buffer, but fixed at $0080 - Which makes me wonder when a filename is matched and it writes it back to the DMA - Does CP/M default to the DMA at $0080 or whatever DMA is currently set?

I'm not sure about that one... But on the positive side, I've added HEX output to my assembler... Which I realize now is kind of important as when writing COM directly, or BIN directly, more than 1 org statement can really mess things up.

Anyway, I have to research what CPM does while matching files again... I'm not sure whether it's an OS bug or a Application bug I need to fix.
CP/M (the BDOS) never assumes the user DMA buffer has a valid (cached) directory entry. That would be a serious mistake. When doing functions 17/18, the BDOS copies the directory entries from internal disk buffers to the user DMA buffer immediately before returning to the user. If the BDOS can ensure that the (internal) cached directory entry is still valid, it *might* use it to write back. But, also note that the user's FCB is NOT the directory entry either, and so almost-always the BDOS must read a fresh copy of the directory sector in-question, and splice the updated FCB values back into it. Any "BDOS" that does not take these precautions is risking corruption.
 
That does bring up the question - if the org starts on a non-boundary, eg, F7F4, Does it run a short record then go back to regular boundaries or does just just continue on from there every 16 bytes?
The HEX file is simply a series of records, where each record (line) declares a starting address and number of bytes. The fact that adjacent records are sequential and contiguous (and/or the same number of bytes) is purely coincidental. Many assemblers will create gaps in the HEX output for "DS" data, not to mention any new "ORG" statements, and thus most loaders must be capable of piecing-together the image in whatever form it is given. That's generally not a problem, unless the loader is making some special assumptions about the contents of the HEX file. Note that the "base address" of the HEX file need not be the address where data is being loaded - many loaders have the concept of a bias, either implicit or explicit, that is applied to the record addresses when loading. Consider LOAD.COM where the new .COM image is being assembled someplace other than 0100H - since that's where LOAD.COM is running.
 
It's also worth noting that I've never seen anything that states categorically that all records in a HEX file are in ascending order (never jump backwards or overlap). I don't know what loaders might make that assumption, and it is generally the case that the records are in ascending order, but I'm not sure one can make the assumption for all HEX files out there. One possible problem for this would be if the loader were trying to zero the gaps (e.g. DS or ORG jumps), and if it were not "smart" enough about that.
 
I don't see you mention it here, but it is worth talking about the "terminating record" of HEX files. This record is generally identified by having zero bytes, although the rest of the details are not very consistent. M80 seems to use a "record type" of 01 while DRI tools use the record type 00. The address field is defined to be the "entry address" of the program, typically the address/symbol specified on the END statement - but note that loading into a .COM file cannot use any entry address other than 0100H. An "END" statement with no entry address typically results in address 0000 in the terminal HEX record. Some loaders may stop reading after seeing the terminal record, others may ignore it and keep going. You'd probably need to experiment with PIP's handling of HEX files to see what it expects. In general, the record needs to exist and be at the end of the file, but other details are less strict/obvious.
 
I can categorically state that HEX file addresses do not necessarily always increase as you proceed through the file.

In the compilers we use at work, you can define a block of data as 'empty' - but with address references (labels) within it - and then go back to it multiple times within the code (to various labels) and define the contents.

The HEX file then sets the address for the record 'backwards' and fills in the data and then sets the address for the next record 'forwards' to where it left off and carries on with the code generation.

Dave
 
That would be a pretty dumb hex loader, but then I've written that kind of dumb loader before myself -

The generator in the assembler is a bit smarter. It has a small buffer where it assembles things, and while it assumes a 16 byte payload, if the address changes unexpectedly it rewrites the size field and flushes the short buffer to the disk and starts a new line.

And I use the 01 terminator... Mainly because that's what I found on wikipedia.

COM files don't like things jumping around when written directly, but the hex files don't care, so that's why I liked the idea of supporting Hex. Otherwise I was going to have to think about how to write files if I had a few bytes in lower memory and a few in upper memory, and god only knows how I'd then handle a few in the middle with a binary.
 
CP/M (the BDOS) never assumes the user DMA buffer has a valid (cached) directory entry. That would be a serious mistake. When doing functions 17/18, the BDOS copies the directory entries from internal disk buffers to the user DMA buffer immediately before returning to the user. If the BDOS can ensure that the (internal) cached directory entry is still valid, it *might* use it to write back. But, also note that the user's FCB is NOT the directory entry either, and so almost-always the BDOS must read a fresh copy of the directory sector in-question, and splice the updated FCB values back into it. Any "BDOS" that does not take these precautions is risking corruption.
Is it only function 17/18? I was going to ask you about that when I got a break - Thanks for pre-empting it - :)

And yeah, I used the same code in everything. I even use close-file to open files. So I had to think carefully about how to code around the problem once I worked out where the corruption was coming from. And Risking corruption is an understatement. It *was* corrupting, each time I wrote into a new extent... Whether it impacted or not was more to chance, but the corruption was without question... Had me scratching my head for a bit.
 
FWIW, the AVR assembler produces a final record of :00000001FF
I can check other non-x80 assemblers as well, if it matters.

That's the same end-record I use... It would be interesting to know, but it seems a fairly safe choice either way.

I didn't even know Gary Kildall was Intel's consultant on the Intel Hex format until I started looking it up.

I wonder how much history from this era will be completely lost in another 50 years...
 
Intel .hex format matters mostly to Intel-type processors and copied widely because it's simple to implement.
But there's also the S-record format for Motorola processors: https://en.wikipedia.org/wiki/SREC_(file_format), which, I would argue, is more formal and flexible.

History is never lost--it's just forgotten. Some work I do involves recovering data from corporate archives. I often wonder if anyone will ever read it. The same sentiment must be present among librarians--lots of books but lots of books unread.
 
Back
Top