You have to deal with the fact that reading vs. writing the SRAM involves an entirely different set of data and address signals, along with the fact that you likely can't map the whole SRAM into I/O address space.
Yeah, it's definitely a hassle that you have to multiplex both the address and the data lines of the mapping RAM depending on if it's doing its normal role of address translation vs. being written. (Or read; the latter could technically be optional, but being able to read its state could save you from having to maintain a potentially very large state table in the driver. Maybe a clever fix there would be to actually have the driver use a page or two of the EMS RAM to mirror the state table?) My sketch of the design basically solves the I/O space problem with latches, IE, instead of trying to directly match 64 page locations covering all of the 1MB conventional memory state you'd write the address of the page you want to update to a latch in one operation and write the page value you want to deposit in it in a second operation.
If we imagine a maxxed-to-the-EMS-4.0 limits version of this, IE, being able to map a 16K page to *anywhere* in the bottom 1M address space, having 64 alternate register sets, and access to 32MB of expanded RAM this is about what I picture about the minimum of moving parts; this comes to seven or 8 discrete packages assuming standard TTL packages for the buffer/latches and a GAL for the control logic. I would *guess* you could fit it in a mid-size CPLD?:
This is a 20 minute diagram I just slapped up because I'm not near my kicad model. To clarify a bit, the "page address latch" is used to hold the address for the 64 possible "page frame" slots in the conventional RAM 1MB space that you might need to update, and its output is multiplexed with the CPU's A14-A19 address lines by switching which output is enabled on their shared connection to the SRAM when an IN or OUT is performed at the
separate update register port address. On the other side of the chip you have the SRAM's data lines normally driving the top 11 address lines of a total 25 bit addressable memory space in the EMS RAM block; during an update for a register both the SRAM's data lines and the EMS RAM's address lines will be driven by the CPU data write buffer... which should be fine because, well, the CPU is in the middle of a port write operation, it shouldn't matter if the EMS address lines are in flux during the operation. I think reads should be fine too. This schematic *is* assuming we can do 16 bit-at-a-time data transfer, if we did 8 bits at a time things get a little more complicated, but not too much.
Note that the reason I showed *two* register set latches is because there might be a need to modify registers in a different set than the one you're actually running out of? I haven't dug into the full details of the EMS 4.0 standard to know if that would be an absolute necessity, it might be something you could paper over in the driver, but it seems reasonable.
We're only actually using 4K's worth of 16 bit words with this thing, which might seem like a little bit of a waste when the RAM chips I mentioned could hold 64Kx16 but, well, they're cheap. We're also only using 11 bits on the output side; I don't know if there would be any useful tricks you could do with the other five bits, like implement, I dunno, read/write protection flags or something? But the capability to play with such ideas is there.