Unless you're talking about adding fast I/O to an existing slow Z80 system (i.e. 4MHz) then I think you should be considering the overall architecture, what devices you're trying to connect to and your fundamental objectives.
I've gone in circles about this in my head for some time.
First assumptions: I am not a hardware guy. I'm a software guy. I struggle to get a header placed in things like KiCad.
Now I'll go through my base thought processes.
Where this reared its head first was when I discovered the CH376 chip, which is a nice little chip that will let you talk to DOS file systems on USB mass storage.
It is imperfect, but this chip "solves" a bunch of issues for a simple CPU in todays world. Simple mass storage, "Free" file system, "standard" file system, etc. It's also a USB host to which you can connect, well, "everything". But, it's SPI.
The idea of bit banging SPI on a 4MHz Z80 or a 2MHz 6502 for I/O is just, well, painful. It's effective, it'll work, and having used things like Ataris with the 19.2K serial busses in the past, it's "usable". It's simply not desirable.
So, now if you want to get some more performance, you need a glue chip to let the SPI run "very fast" in order to make data available to the CPU in 8-Bit chunks and closer to bus speed. Naively, such a chip gives you "8x" performance for the I/O. That's a nice benefit.
But, now, we have this extra chip (or logic). I'm sure there are folks here or around that can scribe something like this on a napkin. I can't. I can't even spell CPLD. (Software guy).
What this told me was if I needed an "I/O" chip to actually access the REAL "I/O", then, why not think a little bit bigger.
Next up, there's the "cost" situation. I'm not scraping pennies, but I'm as cost sensitive as the next guy. But I'm also "value oriented".
I remember seeing someone talking about hex digit display driver chips, something like that. And the chips for that cost, like, $7 or $10 or somesuch thing. To render a hex digit in LEDs (and it didn't even include the LED!).
Meanwhile, in the other hand, you can get a Pi Zero of $5-10. Soooo....for $10 I can get a hex digit driver chip, or I can get a USB/Serial/Video/Wifi/Keyboard driver "chip". THAT, my friend, is "value".
That's more value than the CH376.
So, the next trick is how to drive the Pi.
I can't mate it to the bus (somebody may be able to, I can't). I don't need DMA, the Pi doesn't have the pins for it anyway. What I need is "8 Bit SPI" essentially. FT1248 (by the FTDI folks) is essentially this, I haven't looked at it in enough detail.
But what I also want is I want the Pi to be able to interrupt the host when pushing data back. I don't want it to be a pure slave that's polled. The host will "drive" the conversation, i.e. the host will drive the master SPI clock, but the Pi should be able to let the host know that data is ready to be loaded.
The beauty of something SPIish is that it's robust in timing because of the handshaking. As I understand it, the Pi is not a particularly good citizen for "real time". It likes to go off on pauses and such on a whim, so I'm told, so having a handshake driven protocol lets the Pi head out to lunch and not break anything at the wrong time.
I feel that when the host sets up two "async" channels, one reading a file, the other downloading the home page from Yahoo, and "simultaneously" having 2 blocks of data delivered, one from each, then the project is successful.
So thus my curiosity in 8 Bit busses and protocols. Pretty sure this problem has been thought about before by folks smarter than me.
SCSI is one of those solutions. EPP, Enhanced Parallel Port (IEEE-1248), is another. IEEE-488 GPIB is another. FT1248 (what a coincedental number...) is obviously a modern and up to date protocol. I don't need several devices, don't need to daisy chain them, I just need the one, so that simplifies things. I'm not going to have 12 foot cables, so I don't have any real electrical considerations to worry about.
The IEEE-1248.4 is a wire protocol with channels and such to handle multiple data streams over the connection.
The dark side, of course, is that the Pi is a behemoth (this is good and bad). Out of the box have to boot up a linux kernel, etc. etc. just to bring the device on line. But short term, bang for the buck, it can work out. I figure the Pi side can be written, naively in, well, anything. C, Python, whatever. The Pi has the performance to be "faster than the host" even with sloppy code, which is great for prototyping. Long term someone could just make it bare metal, skipping Linux entirely.
I have not looked at other controllers (like the Arduino you mentioned). The Pi is as much exemplar than necessarily normative.
But in the end, you get a CPU (Z80, 6502, 65816, etc.) that connects to an I/O processor, with, ideally, little more than 2 8 bit ports (and perhaps some level shifters) which opens up the world of modern peripherals: USB storage, networking, printers, etc. Which, arguably, is what the SCSI chip did back in the day. SCSI disk drive, SCSI scanners, SCSI plotters, SCSI network interfaces.
So, that's where my head is at. And why I'm curious how other people address it.