Yeah, most of what I've been working on lately is self-indulgent cleanup and making the boot sequence look more official. There's not much exciting there in terms of compatibility and performance. The absolute latest change moves a few lines around so that hopefully if someone puts in the wrong port numbers, it will be shown BEFORE the "can't find disc" message, so a screenshot will be more useful to debug.
I'm sort of wanting to keep the ROM size below 4095 bytes (4k - one byte checksum), allowing to fit it, a full-bells-and-whistles XT-IDE BIOS (10-12k), HD floppy BIOS (8k) and a regular XT BIOS (8k) in a 32k ROM. We're at like 4007 bytes now, and some of the code-golfing options really come at the cost of legibility and/or performance (adding extra CALL/RET or JMP to combine multiple copies of similar code, or removing the shift-optimizations in CHS-LBA calculation). Expect future versions to get more verbose in their messages, as that's a cheap place to buy another 10 bytes at a time.
A fair bit of the size comes from repeating several functions in 8088 and V20 flavours-- if it were actually built as two seperate ROMs there would be significantly more headroom, but this would be sort of inconvenient for my use case, where I swap between the processors fairly regularly.
Benchmark-wise, this thing does seem to love low wait states. With the V40 CPU, you can configure it on the fly by pushing a byte to port 0xFFF5, and switching from "maximum wait states" (3 on memory, 3 on I/O) to zero-wait seems to potentially double benchmarks. Of course, not everyone can exploit that.