Allocation Vector Table.

cj7hawk · Dec 15, 2022

Hi All,

Once again I must demonstrate my ignorance of the CP/M world and throw myself on the mercy of the expert's court and ask for assistance around function 27.

With respect to the Allocation Vectors table - I was wondering about this as I can't find a data stucture of this table anywhere, and while it's typically small ( around 32 bytes ) it seems as though it has a maximal size of 8Kb, all of which needs to sit in addressable space, once for each drive.

I was wondering of the important of this, as it seems there are alternative ways to locate free allocations and generate directory information without services outside of the BDOS needing to see this, or to have it available, so I was curious as to it's important to other applications (eg stat) and also how the CP/M system handles larger drives as there doesn't seem to be a returned value with the calls to BDOS service that specifies the length of the table, just a vector to the table start - also it seems like the table can work to either 1 bit or two bits per block ( what do the other three states represent? ) and are there any data examples of the allocation vectors online? Most of the records I can find talk about them, but don't explicitly lay down information such as whether the increase in block number corresponds to an increase in bit number ( ie, Bit 0 of first byte = block 0? )
Also, how important is this function for CP/M programs in general?

Thanks for anything or any pointers anyone can add -

Regards
David

durgadas311 · Dec 15, 2022

This all depends on whether you're trying to replicate the software/program API. Programs may query the BDOS for the ALV for a given driver, and also the DPB, from which they can compute available space. The most common user of this is STAT.COM (or the CP/M 3 equivalents) that show space available. There have been some applications that do this, to determine whether they have enough space for whatever they are doing - but I don't think there were many. The data can be faked, too. In my CP/NET servers they just return a vector with all bits 0 and a DPB that "works" sufficiently to do the computations. Since these "drives" are host computer (Linux, windows, etc) filesystems with virtually infinite space (by CP/M standards), it seems fine to always return DPB and ALV that always say there is space available (mine always return 2M free by default). Now, if you need to provide an actual CP/M filesystem, you may have to be more precise.

Chuck(G) · Dec 15, 2022

The allocation vector table is pointed to by the DPH, but the location of the pointer differs by CP/M version. In CP/M 2.x, the pointer is located at offset 000Eh in the DPH; in CP/M 3, it's at offset 0010h; in CP/M-86 I think it's at offset 000Ch.
A bunch of bytes, with each bit, starting with the high-order bit corresponding to an allocation unit (block).
So, for example, with an allocation block size of 2048 bytes, the allocation vector starts off with 80h, then 40h, then 20h... for allocation blocks 0, 1 and 2 respectively. At least the first bit is always set; this corresponds to the block containing the start of the disk directory.
As regards importance, well, sort of. It doesn't really come into play until you start writing to a disk. Then, this is how the BDOS keeps track of what blocks have been used and which are available for allocation.

durgadas311 · Dec 15, 2022

A little more detail on how the CP/M BDOS handles the ALV. In CP/M 2.2 (and earlier), each bit in the ALV represents a filesystem block (as defined by the DPB). In CP/M 3, that has the potential to change based on various factors. See the CP/M 3 System Guide section on the BIOS functionality, specifically the ALV field of the DPH.

Also note that, unlike MS-DOS, the ALV is never stored on disk, it exists in memory only and is built from the directory when a drive is initially accessed. It is a serious, even fatal, mistake for anything outside the BDOS to alter the ALV (i.e. it is strictly R/O to all programs).

And if you are going to "fake" the ALV for programs to use, you need to remember that the ALV and associated DPB need to be in-sync. In the case where you are faking the ALVs for all drives, and the ALV is always full of "0", it may be possible to use the same ALV for all drives (it simply needs to be as large as the largest drive's DPB dictates). Basically, the total number of blocks on the drive (DPB.DSM+1) dictates the size of the ALV, depending on whether you are using 1-bit or 2-bit entries.

cj7hawk · Dec 15, 2022

OK, this helps. It sounds like I can treat this as a string of 0's for most applications, as long as I treat the directory allocations as defacto file tables before finding free allocations to write to, if I'm using an alternative file technology, and could probably even do this with FDD as long as the entire directory structure exists within the same track without too much loss of speed, although it would involve seeking that track prior to opening a new allocation for sequential writes, while maintaining CP/M general compatability, though stat isn't going to work correctly. That makes sense. I assume I can do much the same for the check vector in the DPH then also? ( This seems already inferred for fixed disks ).

Thanks
David

durgadas311 · Dec 15, 2022

The check vector is another matter, although that is not accessible from programs (via the BDOS). If a program is accessing the BIOS, that's another issue. I can't think of a reason a program *needs* to access the check vector. If you're going to try and emulate the BIOS API, you've got more work.

It may be reasonable to view what you're doing similarly to what CP/NET does - it provides a file API (well, BDOS) but not a BIOS (networked drives might not resemble a collection of tracks and sectors at all). Of course, the character I/O API of the BIOS is more widely used by programs. It really depends on the scope of programs you want to run. Do you have a list of target applications?

cj7hawk · Dec 16, 2022

It only needs to work with applications that would typically run under CP/M, not with system utilities or the likes (eg, the Wordstars, etc ) . The project is to build a computer that was planned in the 80s but never got built, and while I was originally intending to make the hardware first ( I'm more of a Hardware developer ) I ended up with the opportunity to work on the OS first, so began with writing an emulator and an assembler, then started trying to work out what CP/M compatability would have looked like.

If I achieve reasonable BDOS compatability, that will be OK. But I still want it to be possible for it to run native CP/M as an option. Except that my Emulator is buggy, my Assembler is buggy and my CP/M rewrite is buggy, and while small test programs work just fine when called up the the CCP Command line, bigger ones don't yet, so I am pretty sure I still have some major bugs hidden in my assembler, or they need more functional BDOS calls, or more likely, my emulator still needs a lot of work... From time to time I test my code on an old Amstrad PCW, and Doug's emulator has been a lifesaver in terms of arbiting the faults I find.

Still I get the small wins - I mentioned that i only just got it loading up .COMs from the CLI I think last week and CP/M apps that load from the disk seem to be able to use Serial Read OK - Well, the original disk I set up was 32K, which turned out a bit small, so I bumped it to 64K, then 240K, and now it's somewhere around 600K - I got lucky that my code for the directory worked well and coped with the 16 bit block addressing requirements and managed to get the code that figured out the allocations to records, and was amazed when I made the switch to the double-allocations in the extent and the code kept up with the constant disk changes without me needing to rewrite the code a single time! I only changed the DPB.

I know some assembly programmers can do that nearly all the time and a perfectly written BDOS call on the first write isn't that high a bar to set by older standards, but I am not one of the programmers who can do that, so I was happy about it. Now I'm at the point of figuring out how to make it write sectors and that means crossing some bridges like the Allocation Vectors, which is currently represented by a big blank block of bytes in my bios.

I was going to give a cool and technical explanation as to *why* I needed to do that, but on examining my reasoning, I think I mostly just want to know what I can leave out at this stage of development without destroying future CP/M compatability so that I can focus on getting the rest of the problems out of my code. Also, I do need to go with RS232 network drives ( Yes, that would have genuinely been one of the elements of the computer they never built - it was on all their other models, even in 1982, though was proprietary in nature. By 1985, they realized they needed to follow other standards, but never finished the intended machine, which would have been pretty amazing had they built it ) - And figured to cut down TCP/IP and run SLIP as the native L2 for comms out of the BIOS / BDOS with extensions, so I'm diverting from pure CP/M whichever way I go since I'll end up writing BIOS extensions anyway.

And case insensitivity is already a requirement, which I've crossed ( My CCP/BDOS already allow mixed case filenames, and will recognize Alpha and ALPHA and alpha as the same )

But as far as I can, I still want to run any native CP/M software without modification. So if it was written to run on CP/M, it should run on the OS I'm building.

Sorry for the long post- I don't even have a potato to offer, but the help and advice is really really appreciated, and I guess if you have an idea what I'm working on, it may help angle the advice better

The machine, BTW, was the Loki. They literally intended to steal all the best ideas going around in 1985 and build a CP/M compatible machine that had them all... Including hardware accelerated graphics and a z280 processor. From what I can tell, it would have looked more like a PC than a MAC, and would have sold for just a few hundred dollars at a time when home computers with any kind of power still cost many thousands. It was always believed "impossible" to build at the time, but in my research, I came across a few tricks that the original designers would have been aware of that greatly simplified the hardware, and so I think it would have been quite achievable. And to that extent, it would have been a CP/M based games machine/home PC.

Either way, I'll share my code as I get it working, whether it's good or bad, since it will be a genuine CP/M version written expressly for z80 ( because the first models almost certainly would have come out with the z80H rather than a z280 and it was a modular concept. ) and I'm gaining a lot of satisfaction slowly writing an OS for the first time, mostly on my train ride into work each day, and it's slowly helping me unstick my rusty programming skills.

David

cj7hawk · Dec 16, 2022

After working through the algorythms I thought would replace this function, I realized I can't get around it with the disks that do work normally, so the only thing I can do is to limit their size to keep the allocation table size down, and minimize it's use and maybe cache it outside of the visible memory pages to use less memory.

Every time I thought I had a better algorythm it turned out to be worse when I looked closely at it, it wasn't any better. Seems it's pretty difficult to improve on CPM. Thanks again for the help - I have a better idea how it has to go together now I have to write it after all. At least for some of my disks.

durgadas311 · Dec 16, 2022

One difference between CP/M 2.2 and 3 w.r.t the ALV is that for v3 (banked systems at least) the ALV is not typically accessible by programs. There is a v3 "Get Disk Free Space" BDOS function which largely (entirely?) eliminates the need for programs reading the ALV. Unfortunately, 2.2 still uses the ALV to compute free space. But, if you're not interested in things like STAT, then perhaps you won't need it. Also, the ALV is generally only accessed by programs immediately after the call - i.e. I don't think it is expected to be able to use that same pointer hours later. For CP/NET, each "get ALV" call on a networked drive brings a new ALV into the same message buffer - so for a program to be compatible it will need to consume the ALV immediately and forget the address. This may allow you to keep the ALVs in another bank and only copy it to a common buffer if/when a program asks for it. You still need space for the largest ALV, but only one.

You might even choose to not support that function.

cj7hawk · Dec 16, 2022

I was just planning on having a small defacto ALV, and leaving it blank, as all I really need is a routine to get the next free block, which I find by scanning the directory.

But every algorythm I examined was a bit inefficient and unweildly. I could go like DOS with a linked list, but then I lose CP/M compatability. It's a mess, and it looks like Gary Kildall really did come up with the best idea based on the disks as they were at the time - I did everything I could to avoid it, but in the end, it's simple and efficient.

So since I'm creating a vector table by default, I might as well use it as it was intended. Especially now I have a better idea of how it works.

Now the challenge is writing efficient code to populate it. I figure I only need to populate it when I need some free blocks, so I'll cut corners by only keeping the logged drive in memory, and having a way to force the act when I go looking for a free allocation.

It annoys me, but I can't think of any other easy or simple way to do it.

So I made an arbitrary decision to limit disks that need an ALV to 512 Blocks max, which practically tops out at 1.44Mb size with 4K allocations, which is workable.

Though in a way, it's nice to learn something that gives me an insight into what CP/M had to deal with in it's time -

Regards
David.

Svenska · Dec 16, 2022

My approach was an ALV with 50% set and 50% unset. Makes the STAT output look more reasonable, but has otherwise no consequences.

cj7hawk · Dec 17, 2022

Thanks Svenska - In the end I ended up having to do the same task as the original OS, so I just took that vector and created a reusable table for all drives, and will need to limit the vector to a single drive at a time, with no more than 512 allocations, and it will need to rebuild the table each time the drive is changed or another operation occurs. I'll probably need to put in some constraints later to prevent it from getting out of sync, but for the moment, it's working and is generated when the BDOS call for the vector is requested, or a write needs to occur. I was hoping to finish the write sequential call today, but I still need to figure out how to make it work propertly, and want to reuse a lot of the read-sequential code, then merge most of them together. As an aside, I have the routine also return the next free block in DE, so I can reuse it to pick up the next block to drop into an allocation when writing, so I'm a little closer to finishing out those routines.

That brings me to some questions about write behaviour... But I'll open a new thread for that -

Thanks again -
David

cj7hawk · Dec 19, 2022

Out of note, for those who come to read this thread later, I failed. In the end, I couldn't find a good solution to keeping all of the Allocation Vector tables in memory, which means pre-provisioning a table that is large is a problematic idea at best.

An 8Mb disk can be fitted into 64bytes with 16K allocation but the extents are all bigger than 16K once an allocation size of 2048 bytes is achieved, so it's problematic to store a lot of disks in memory, without breaking with the CP/M format. I think CP/M 3 can get around this problem with an upper counter, and random disk IO, but I wanted to begin with CP/M 2.2.

So in the end, I decided to go with 2 x small drives ( ROMDISK and permanent RAMDISK ) and either 1 larger drive ( up to 2Mb) or 2 smaller drives ( Up to 1Mb ) and give away 152 bytes of memory for AV space and just leave it all in the BIOS space, rather than paging in and out as required.

This, in the end, seems the only valid way to address the issue of how big the AV space is, at least while disk routines might write-address two or more drives simultaneously. By reducing the disk size in total, or increasing the AV space. And the remainder - and any fixed disks - I'll have to make a non-CP/M standard format.

Thanks again for all the input from the forum -

David

ASittler · Oct 9, 2023

I suppose this is a dead thread (2023.10.08) but I can say a few words about allocation vectors in CP/M 2.2 which may likely apply to earlier versions of CP/M as well.

CP/M handles disk data in "logical sectors" which are 128 bits, or 1/8 K. The logical sector number is stored in a 16-bit number, so CP/M can only address 8M bytes total per drive. It is possible to have multiple logical drives of up to 8MB each on one physical disk by using the starting track offset value at the start of the additional logical drives mapped to one physical hard drive. CP/M usually handles physical sectors larger than 128 bytes, typically 256 bytes for double density sectors, but often larger, up to 1K sectors on floppy drives and possibly larger for hard drives. Larger physical sectors can put more data on a track of a floppy disk because there is a gap between sectors and a smaller gap between the sector address and the sector data and the address block and data block each have their own 16-bit CRC error check word appended at the end. Less sectors means less gap space per track.
One tradeoff with larger physical sectors is that the physical sector must be stored in memory so that CP/M can read and write logical sectors from the physical sectors. If I remember correctly, there is at least one deblocking buffer for each disk drive. Larger sectors would consume more space in high memory and make the Transient Program Area, TPA, a little bit smaller. This is not too big a deal because CP/M programs were usually written to run in some smaller TPA, typically assuming available RAM may be as small as 48K.
Each directory entry is 32 bytes long with 16 bytes for disk allocation blocks for that directory entry. Those block numbers can either represent 16 8-bit numbers or 8 16-bit numbers. So the structure of the directory entry would enable the number of total allocation blocks on a disk to be up to 64K blocks. CP/M never actually used that many blocks per disk. I thought that a logical extent in a file was that part of the file that could be reached from each directory entry. A file could use more than one directory entry with the same file name and extension in each directory entry.
CP/M allocates disk space in blocks, not logical sectors. The smallest allocation blocks are 1K blocks, but blocks of 2K or 4K may also be used for larger disks. The tradeoff between larger or smaller block sizes is that smaller block sizes use disk space more efficiently but a smaller number of larger blocks takes less processing for allocating and freeing disk blocks. With 8M disk and 1K blocks CP/M would have to handle 8192 blocks. With one bit in the allocation vector per block, the allocation vector would need 1KB per disk. Using 4K bytes per block reduces the number of blocks to 2K. This reduces the size of the allocation vector to a much more manageable 256 bytes which can be handled as a one-byte address offset into the allocation vector.
Multiple allocation blocks are used for the directory. Each directory entry uses 32 bytes, so each KB of directory can hold 32 directory entries. It is a reasonable guess that the average file size is probably around 4 allocation blocks. With 2K blocks on the disk we would need 1K directory entries. 1K directory entries would need 32KB. This would be 8 blocks at 4K per block. Smaller disks would use smaller allocation blocks and would likely average more allocated blocks per file, so 8 blocks would likely be a good guess. If you want to be pretty sure you would not run out of directory space before you run out of disk space you could allocate more blocks to the directory but allocating enough directory to guarantee full disk use without running out of directory entries could make the system pretty slow. The system has to sift through the whole directory to look for file names. CP/M did not have sub directories. It had user numbers, but that was a little bit of a hassle if you want to access data in one user number from a program in a different user number. User numbers could filter out directory entries that were not relevant quickly without checking for a file name match.

ajacocks · Oct 9, 2023

This is really good information for someone starting to build on CP/M 2.2, @ASittler .

- Alex

cj7hawk · Oct 10, 2023

Hi @ASittler ; thanks for the input. FWIW, I did complete my version of CP/M and made it 2.2 compatible. In time, I'll add things like directories without breaking 2.2 compatibility - though it is only recognized by the OS and not the applications, which won't have any awareness of directory structures unless they are written to take advantage of the new OS hooks...But that's another story.

There are ways to break the model under which disks are represented to CP/M through the BIOS, which lends itself to connecting in other file systems, but anything that is expecting the Allocation Tables in the Extents will read it incorrectly. Fortunately, this only creates issues with system applications mostly, and again, anything that uses the BDOS calls should work correctly.

I ended up with a very closely aligned CP/M 2.2 compatible OS ( which I recently made public, along with tools and an emulator space that runs under Windows, and to an untested extent may run under Linux ), but as I expand it, I'll add other support for directories, which should still function under DR's version of CP/M as I have been extending on CP/M through applications and hardware architecture.

At some point I'll start writing up some modern graphical documentation to make it all easier to understand for anyone hitting it for the first time. Also, as a nice aside, I found that the BDOS routines make a really nice way to manage large memory spaces up to 256Mb, especially if something like preemptive multitasking is added later. It has the cool side effect that I can navigate to M: drive and if I do a DIR I can see what processes I have running and things like STAT will show me which routines are taking up memory space in the main memory.

Welcome to the forum

David.

Allocation Vector Table.

cj7hawk

Veteran Member

durgadas311

Veteran Member

Chuck(G)

25k Member

durgadas311

Veteran Member

cj7hawk

Veteran Member

durgadas311

Veteran Member

cj7hawk

Veteran Member

cj7hawk

Veteran Member

durgadas311

Veteran Member

cj7hawk

Veteran Member

Svenska

Veteran Member

cj7hawk

Veteran Member

cj7hawk

Veteran Member

ASittler

New Member

ajacocks

VCForum Administrator

cj7hawk

Veteran Member