Trixter
Veteran Member
I’ve got an interesting design problem, and was hoping those familiar with x86 assembler and PC architecture could help me work out a few things; I apologize for the lengthy background, but it’s necessary to understand the problem I’m wrestling with:
Background: One problem that has cropped up recently in this hobby is a means to properly identify and benchmark systems, not only “real” (ie. true IBM PC/XT, AT, etc.) but moreso unmarked clones. There is also a need for something like this for emulator writers, so that they can attempt to get their code cycle-exact, and also just for regular people who, for example, just want to play games at the right speed in DOSBOX.
Everyone is familiar with the old Norton SI and Landmark CPU Speed benchmarks, but they are horribly misleading and generally incomplete test suites. Other benchmarks, such as C&T MIPS.COM, are much better, but they aren’t realtime (test takes 30 seconds) and only offer three machine classes to compare to. So, I have volunteered to write a benchmark that would meet the above needs. The goals would be relatively simple:
Now the problems:
I’m having trouble coming up with a decent metric and/or way of profiling a machine that not only works on ANY PC (ie. even PC/XT where there is no RTC or RDTSC available, only the 8253) but also working as high up as, say, a Pentium @ 166MHz (but not much higher, as there is no target audience for this benchmark above that platform).
The basic idea I had was to run through every single 808x-compatible instruction (except POP CS which would hang 286 and later, and aad/aam with a custom divisor because those hang NEC V20/V30) and time it, then perform some memory moves/fills in system RAM, then the same to video adapter RAM, and then print out the closest matches in the database for all three measurements. Optionally, also output some sort of combined score (like a “fingerprint” for the machine) so that one generic clone can be compared to other generic clones and/or known machine performance profiles.
I was planning on using the 8253 at full resolution to perform the timing, using Abrash’s Zen timer code which I am very familiar with. The problem with this method, as far as I can tell, is that once I hit 486 and later, L1 caching becomes a problem – not because caching is an “unfair” speed boost (if anything, I definitely WANT caching to affect speed as a true test of how fast a system is), but rather because of how small the test suite is -- it would fit entirely in cache and, coupled with pipelining on Pentium and later, would execute faster than the 1.19Mhz 8253 would be able to detect! Ie. the entire test suite could execute in a single tick of the 8253.
Questions:
Thanks for reading this far Any and all thoughts regarding this are appreciated.
Background: One problem that has cropped up recently in this hobby is a means to properly identify and benchmark systems, not only “real” (ie. true IBM PC/XT, AT, etc.) but moreso unmarked clones. There is also a need for something like this for emulator writers, so that they can attempt to get their code cycle-exact, and also just for regular people who, for example, just want to play games at the right speed in DOSBOX.
Everyone is familiar with the old Norton SI and Landmark CPU Speed benchmarks, but they are horribly misleading and generally incomplete test suites. Other benchmarks, such as C&T MIPS.COM, are much better, but they aren’t realtime (test takes 30 seconds) and only offer three machine classes to compare to. So, I have volunteered to write a benchmark that would meet the above needs. The goals would be relatively simple:
- Take a performance measurement of a machine and store it locally in a tiny database that accompanies the program
- Allow comparison of the current machine’s metric to the database, and bring up close matches for comparison
- Perform the measurement/comparison continuously, so that running it inside an emulator would allow you to immediately see the results of tuning the emulator speed. (For example, this would allow people to “dial” the speed of the emulator to match a target machine.)
Now the problems:
I’m having trouble coming up with a decent metric and/or way of profiling a machine that not only works on ANY PC (ie. even PC/XT where there is no RTC or RDTSC available, only the 8253) but also working as high up as, say, a Pentium @ 166MHz (but not much higher, as there is no target audience for this benchmark above that platform).
The basic idea I had was to run through every single 808x-compatible instruction (except POP CS which would hang 286 and later, and aad/aam with a custom divisor because those hang NEC V20/V30) and time it, then perform some memory moves/fills in system RAM, then the same to video adapter RAM, and then print out the closest matches in the database for all three measurements. Optionally, also output some sort of combined score (like a “fingerprint” for the machine) so that one generic clone can be compared to other generic clones and/or known machine performance profiles.
I was planning on using the 8253 at full resolution to perform the timing, using Abrash’s Zen timer code which I am very familiar with. The problem with this method, as far as I can tell, is that once I hit 486 and later, L1 caching becomes a problem – not because caching is an “unfair” speed boost (if anything, I definitely WANT caching to affect speed as a true test of how fast a system is), but rather because of how small the test suite is -- it would fit entirely in cache and, coupled with pipelining on Pentium and later, would execute faster than the 1.19Mhz 8253 would be able to detect! Ie. the entire test suite could execute in a single tick of the 8253.
Questions:
- Is this a reasonable fear, or am I overestimating how much pipelining and cache will speed things up? (Remember, there is no target audience for this benchmark beyond a Pentium)
- Should I look into some sort of alternate timing method, such as running the test suite multiple times in a certain time period? If so, what would a reasonable time boundary be? (no more than a full second, I hope... Remember, one of the primary goals of the benchmark is to run “realtime” such that adjusting an emulator, or popping a “turbo” button on/off, would be noticeable; I'm also worried about having interrupts turned of for a long period of time)
- Has this problem already been solved by a benchmark utility I am not yet aware of?
- Is cutting the benchmark off at Pentium reasonable, or will people be strangely compelled to benchmark their quad-core xeon against an IBM PC/XT? (ie. should I even worry about machines above Pentium?)
Thanks for reading this far Any and all thoughts regarding this are appreciated.