The search space can be reduced considerably by using bit patterns commonly used for memory testing: 55h, CCh, 00h, 01h, FEh, both by themselves as well as rotated 0-15 times as appropriate (ie. there's no point in rotating 00/FF, or CC more than 3 times, etc.).
The existing 8080/Z80 exercisers use a "test vector" (instruction bytes, register values including flags, a memory operand) and two additional vectors: One which is combinatorically explored, and one where every bit is flipped once. So a trade-off between coverage and execution time is possible. On the other hand, I think that covering the complete (relevant) search space is necessary to some extent: The 8080 exerciser was still able to uncover bugs even after all contemporary test programs were happy (the aux carry flag is hard to get right).
If you did this, you could test every opcode, not just the ALU.
I do not aim for complete opcode coverage. Some instructions must work for the exerciser to successfully work in the first place (e.g. JMP, Jxx), others depend on the envonment (e.g. I/O, INT/IRET) or behaviour beyond the application (e.g. TRAP flag). Also, the existing exerciser structure does not work well for most addressing modes.
Practically, you would either use a statistical testing method or break the problem down into smaller 'blocks' assuming that the blocks didn't interact too much (if at all).
I have looked at a few emulator designs, but am by no means smart. My approach is strongly table-driven, trying to unify as much of the instruction behaviour. Others approaches treat each opcode separately, implementing similar behaviour in many places (either manually or through macros). Then, there are JIT approaches, which generate code at runtime, possibly fusing instructon groups. All of these cases will result in different degrees of interaction between blocks.
It would be good for someone to come up with an x86 validation test I must admit...
I don't think that is feasible. The x86 space is far more varied than the 8080/Z80 space is, and behaviour varies a lot between vendors and implementations (e.g. CPUID or RDTSC). Specifically, I only care about a subset of 16-bit real mode - and trying to ignore implementation differences.
Is the goal of the 8086 exerciser to find bugs in emulation?
Partly. The main goal is aiding the implementation of an emulator. I wouldn't be surprised to uncover bugs in existing emulators, either. My main project requires an x86-compatible CPU core to run a few DOS applications in a restricted environment. I do not plan on being PC-compatible (not more than necessary) or to even support evil software tricks - there are other projects better suited (PCem comes to mind). Basically, some form of DOSBox tailored to my specific needs.
If so, then you'd need more than an ALU opcode tester. For example:
REP with segment overrides doesn't resume properly after an interrupt
POP CS is possible on 8086
Interrupt behavior after MOV SS vs. MOV any_segment_register
Behavior of 8D C2 ("LEA AX, DX") (See Raúl Gutiérrez Sanz's comments here:
http://www.os2museum.com/wp/undocumented-8086-opcodes/comment-page-1/#comments )
...etc.
I agree, but none of these behaviours are easily testable within the existing exerciser framework. Also, they test for very specific behaviour of the 8086, which is only useful if you want to accurately emulate that very specific processor including all of its quirks. Software written to run on x86 (rather than 8086 or "IBM 5150") should not rely on these behaviours at all. Also, I will exclude any attempt at testing timing behaviours or limitations with self-modifying code (outside of the exerciser itself, that is).
I suspect that you're not interested in a stress test, as you're working with an emulator.
No, although I will need to be able to run the exerciser on a real machine in order to get "known good" results. I won't be able to test on a true 8086 either - only a single 80186 and 80486SX each (possibly an 80486SL if I get the machine to work).
How about
Sandsifter for exposing non-documented features? Run it on a "real" 8086, then run it on your emulator. Compare results.
The approach taken by Sandsifter relies on invalid instructions and page faults, neither of which exist on 8086/80186. Again, this is interesting and useful work if one wants to recreate a "true" CPU, but this is not my goal.
The more interesting question is "can one emulate an 8086 not only with respect to instructions, but also with respect to real-world timing?"
I don't see any reason why this shouldn't be possible. Technically, it should be feasible to simulate the whole PC platform as a whole, starting from the 14.3 MHz main crystal by now.
In other words, can your emulated 8086 run at precisely the same speed as an 8 MHz 8086?
Nope, not even trying. The emulator is far from done (hence this side-project), but I already now know that the Trap Flag does not behave correctly. Also, any undefined or FPU instruction (including POP CS) will instantly kill the core and additional inaccuracies will definitely appear as well.
My 8080 core was originally written in AVR assembly. That version fits into 3 KB of flash memory and should run about as fast as a 2 MHz 8080 (which I cannot verify). If I can squeeze an 8086/80186 core into the same size region, it could be used to run an original VGA BIOS...