It's because of the block-mode transfers - the microdrive in questions supports 16-sectors per IRQ, meaning DMA controller needs to be programmed only once to do 8k.
To add, the XTIDE Universal BIOS is quite heavy for an 8088 (compared to, for example, the IBM/Xebec source) - because of the generalisations necessary, there's a lot of jumps involved in just checking the IDE status register within the read or write transfer loop. So block mode helps with DMA so dramatically because it reduces not only the DMA programming but also the iterations through the IDE wait function. For example, for memory-mapped mode the CF card transfer rate can be improved (on a 4.7MHz XT) by about 20% by hard-coding the control register ports and ready checks in the ide_transfer module.
Part of my plan for this card is to have an optional TSR to handle Int13h read and write functions in a highly specific (and hence fast) way. This is really specific to compact flash media, since even with a 16-sector block mode, the DMA and polling overhead is almost insignificant with the microdrive (~2%).