• Please review our updated Terms and Rules here

Looking for volunteers to help test a new benchmark

Full tool alpha is ready for testing!

Full tool alpha is ready for testing!

Hey, I actually got close to finishing the full tool! Everything works (with one exception, see below) and you can pick it up and play with it here: ftp://ftp.oldskool.org/pub/TOPBENCH/topb_v03.zip
The full tool (requires 384K RAM) has the following benefits over the stub:

  • Menu-driven interface
  • Can view, add, edit, and delete systems
  • Can export the database to a .CSV file for use with spreadsheets
  • Compare any two systems to see how they differ in terms of speed
  • Continuously benchmark your machine or emulator (makes it easy to dial a specific speed in DOSBOX by hitting CTRL-F11/F12 while realtime benchmarking is running)

The "exception" is that there is no mean averaging during the realtime benchmarking part, so you might find the following behavior:
  • You add your system to the database, but the realtime benchmark part matches some other system that is very close to yours
  • During the realtime benchmarking part, the matched system jumps between two systems every refresh

The above problems are due to interrupt jitter, which is both intentional and anticipated -- I just need to bone up on statistics so I can perform some smoothing. (Anyone know a good online Statistics 101 course?)
 
Continuously benchmark your machine or emulator (makes it easy to dial a specific speed in DOSBOX by hitting CTRL-F11/F12 while realtime benchmarking is running)

here are some results after trying to use this in DOSBox. Doubt they would be very relevant in any way but I had to try. :)

in DOSBox, I typically set the cycles to 278 when I want the speed of a PC/XT @4.77MHz. I don't remember where I got that number from - probably someone else's approximation, but it "feels" right for the majority of games, maybe even a tad on the slow side.
in TOPBENCH this setting gets a slightly higher score than most basic PC/XT systems: it falls somewhere between the IBM PC Convertible and the IBM PCjr with a NEC V20. Funnily enough, changing the machine type from svga_s3 to CGA actually yields faster results, enough to change the score from 5 to 6.

Gg7r3.png
 
in DOSBox, I typically set the cycles to 278 when I want the speed of a PC/XT @4.77MHz. I don't remember where I got that number from - probably someone else's approximation, but it "feels" right for the majority of games, maybe even a tad on the slow side.
in TOPBENCH this setting gets a slightly higher score than most basic PC/XT systems: it falls somewhere between the IBM PC Convertible and the IBM PCjr with a NEC V20.

DOSBox is definitely a "DOS gaming machine" emulator rather than a true x86 emu, because in DOSBox, every instruction runs internally in 1 cycle. If you set cycles=fixed 244 and then run MIPS.COM you can see that, while the average spread is 1.0x the speed of a PC, those numbers are way fast in some areas and slower in others. I was very cognizant of this when designing the metrics and overall design of TOPBENCH.

So, what does your experience tell you? If you believe my metrics and research, it should tell you that 278 is slightly faster than a real PC :) and that 200 is a more realistic value. Most people who run DOSBox run it too "hot" and isn't representative of how the real machines react. Does that mean you should run it at a slower speed just because it is more accurate? No, you should run it at the speed where the game you're trying to play is the most fun. TOPBENCH just helps you choose that spot, if you know what the game's intended requirements were.

I know the realtime benchmarking is jittery sometimes -- I'm still working on that and should have an update soon.

Funnily enough, changing the machine type from svga_s3 to CGA actually yields faster results, enough to change the score from 5 to 6.

That is both interesting and quite unexpected! I should look at the dosbox source sometime to see if I can figure out why.
 
Admittedly, 278 is only what seems to feel right for me in most games. But given DOSBox's idiosyncracies, isn't it possible that typical game behavior (whatever that might be) mostly targets those areas in which DOSBox's numbers are lower? For example, at 200 cycles "MemTest" and "MemEA" are very close to typical IBM PC 5150 results, but "Opcodes" and "3DGames" are a good bit slower; ~278 cycles gives a better match for the latter two.

I have to stress the "most" in "most games" - I've encountered a few odd ducks that need a value of 100 or even less, or they're way too fast to be playable. Though it seems like DOSBox's timer emulation is getting a complete rewrite some time soon, so any weird stuff like that can be conveniently ignored for now...

That is both interesting and quite unexpected! I should look at the dosbox source sometime to see if I can figure out why.

Video is the only part of DOSBox I've ever really messed with internally (e.g. that composite CGA patch over at vogons a few months ago), but evidently not enough as I have no idea why that happens... interestingly, whatever the cause may be, during gameplay I don't really feel much of a speed difference between cga and svga_s3, if at all.
 
Admittedly, 278 is only what seems to feel right for me in most games. But given DOSBox's idiosyncracies, isn't it possible that typical game behavior (whatever that might be) mostly targets those areas in which DOSBox's numbers are lower? For example, at 200 cycles "MemTest" and "MemEA" are very close to typical IBM PC 5150 results, but "Opcodes" and "3DGames" are a good bit slower; ~278 cycles gives a better match for the latter two.

But at 278, other areas are ~1.5x faster. So that's why you shouldn't really look at the individual metric timings and should instead concentrate on the Score which takes everything (CPU instruction mixes, video memory write speed, memory read/write speed) into account. Any machine or emulator, no matter how wonky or lopsided, should "feel" more or less like a 5150 as long as the Score is 4. At 278 cycles, the Score is 6 which puts it at the 4.77Mhz NEC V20 or 7.16MHz 8088 range. (Note that this is probably fine for 95% of the games out there, although I can think of a few where this would be too fast to be playable, like Pinball Construction Set.)

I have to stress the "most" in "most games" - I've encountered a few odd ducks that need a value of 100 or even less, or they're way too fast to be playable. Though it seems like DOSBox's timer emulation is getting a complete rewrite some time soon, so any weird stuff like that can be conveniently ignored for now...

Yeah, I noticed that at low cycle rates (ie. under 500) the timer emulation has quite a bit of jitter. Granted, I'm accessing the 8253 at a low level and reprogramming it ~20 times a second, but it's still odd that it "wobbles" like that (and more odd that it stabilizes somewhat if you hold down a key like CTRL). The 0.74 release has a bug that actually locks up in some conditions if HLT is encountered, which I use, so I had to get a fixed version from SVN. So it's good that they'll be revisiting that at some point.

If you can remember any of the games that run too fast and need cycles=100, let me know, I'd like to try them on the real hardware to see what's going on.
 
Last edited:
That reminds me of the old fashion practice of code loops instead of calling the system timer.
If i place that instructions there I gain another whooping 3% performance because i don't need to access the slow system timer. ;)

Call me terrible but i am a cycle counter myself. ;)
 
Last edited:
At 278 cycles, the Score is 6 which puts it at the 4.77Mhz NEC V20 or 7.16MHz 8088 range. (Note that this is probably fine for 95% of the games out there, although I can think of a few where this would be too fast to be playable, like Pinball Construction Set.)

My old 8088 machine actually was one of them "turbo" clones... so 7.16MHz might just explain why this speed feels right for me in DOSBox, though I could swear I had to play most games with turbo off on that computer. :)

Yeah, I noticed that at low cycle rates (ie. under 500) the timer emulation has quite a bit of jitter. Granted, I'm accessing the 8253 at a low level and reprogramming it ~20 times a second, but it's still odd that it "wobbles" like that (and more odd that it stabilizes somewhat if you hold down a key like CTRL). The 0.74 release has a bug that actually locks up in some conditions if HLT is encountered, which I use, so I had to get a fixed version from SVN. So it's good that they'll be revisiting that at some point.

If you can remember any of the games that run too fast and need cycles=100, let me know, I'd like to try them on the real hardware to see what's going on.

Willy the Worm is one such game. The speed isn't even constant (it increases with the number of objects on screen, oddly enough), so there's probably another quirk at work here, but setting cycles to something around 80-100 makes it at least consistently playable.
Ancient DOS Games did an episode on Impact! aka Blockbuster, which mentions that cycles=100 are needed for the EGA version (even less for CGA). Though it also mentions that the game refuses to run at all on DOSBox > 0.72, and this one wasn't fixed in SVN so far, so there ya go.

I remember the vogons thread about the guilty HLT (I post there as VileRancour btw, which I now realize isn't immediately apparent.... you're at least consistent with avatars on both sites). :)
 
Last edited:
Willy the Worm is one such game. The speed isn't even constant (it increases with the number of objects on screen, oddly enough), so there's probably another quirk at work here, but setting cycles to something around 80-100 makes it at least consistently playable.
Ancient DOS Games did an episode on Impact! aka Blockbuster, which mentions that cycles=100 are needed for the EGA version (even less for CGA). Though it also mentions that the game refuses to run at all on DOSBox > 0.72, and this one wasn't fixed in SVN so far, so there ya go.

I just ran both games in DOSBox 0.74 SVN at cycles=245, and also on my 5160. They played perfectly fine in both cases. I tried to start Impact! at cycles=100 and it locks up. I love what Kris is doing for the community, but I question his methods sometimes.

BTW, I fixed all of TOPBENCH's issues (it has a real distance function now) so if it was jittery or otherwise odd, it should be fine now. New version (as well as source for the curious) is at http://dosbenchmark.wordpress.com/downloads/

This version is a feature freeze. If I can't break it in the next week, I'll release it upon the VOGONs and see what they think.
 
Odd; I just tried a clean SVN build of DOSBox dated August 12, and I get the same issues with both games as in plain 0.74 (crazy speed-ups in Willy, Impact freezes on startup regardless of cycles).
This is going a bit off topic, though... DOSBox quirks aside, I believe your benchmark is a far better judge than my subjective take on speed. :) Tested the latest TOPBENCH update now, and indeed there's little to no "jitter" in the realtime portion.
 
The Realtime Benchmark is curious, because changes to your system's speed are relative to the systems in the database. There should be an easy option to show how your scores change as your speed changes. Additionally there should be a quick system benchmark for your own system which can display on the DOS screen without the delays of identification and floppy rundown.

Still looking for Tandy benchmarks. Especially Tandy 1000 RL @ 4.77MHz, Tandy SL benchmarks (gave invalid results when previously posted, and any Tandy 1000 TX/TL, fast and slow, CPU upgrades, built-in video or EGA/VGA.
 
Last edited:
The Realtime Benchmark is curious, because changes to your system's speed are relative to the systems in the database. There should be an easy option to show how your scores change as your speed changes. Additionally there should be a quick system benchmark for your own system which can display on the DOS screen without the delays of identification and floppy rundown.

The realtime screen can definitely be altered to show what the current system looks like, so I'll work on that tonight. I'm not sure what you mean about the quick system benchmark -- you mean something that just spits out quick numbers? The value-add for the tool is the database, so I'm not sure what just spitting out the numbers would do without recording them somewhere, but I can add it if that's what you meant.

Still looking for Tandy benchmarks. Especially Tandy 1000 RL @ 4.77MHz, Tandy SL benchmarks (gave invalid results when previously posted, and any Tandy 1000 TX/TL, fast and slow, CPU upgrades, built-in video or EGA/VGA.

I will do my TX and TL/2 as soon as I can make room to drag them out; I did my EX last night.

As a warning, I have some rules about adding systems to the "official" database:

- Only real systems (no emulator numbers)
- Only systems at their intended speed (meaning, if the EX can boot at 4.77MHz or 7.16MHz, and the 7.16MHz mode is default on bootup, only the 7.16Mhz score will be included)
- If a system was sped up via hardware (NEC V20/V30, inboard 386, etc.) then the addition must be in the system name. For example, the database has AT&T PC 6300 and AT&T PC 6300 (NEC V30). The CPU and speed are also listed in other fields, but having it be part of the system name is helpful to distinguish different iterations of seemingly the same machine.

The database is meant for realistic or historically-relevant scenarios, hence the above rules. I certainly encourage people to run TOPBENCH in any situation they want to; I just won't include impractical results in subsequent database releases.
 
Essentially the idea is to have the ability to perform a quick metric based on the changes made to the system for the user's personal benefit. Thus if I were to disable the cache or turn off the Turbo of my 486, I could see the effects in realtime or very quickly.

I appreciate having some selectivity in the database, but if we include methods like processor upgrades to speed up the system, why not make a small exception for official methods to slow down the system as well. Slowdowns are very important to users, so if the system has an official method for slowing down the clock speed, that measurement should be equally valid.
 
Essentially the idea is to have the ability to perform a quick metric based on the changes made to the system for the user's personal benefit. Thus if I were to disable the cache or turn off the Turbo of my 486, I could see the effects in realtime or very quickly.

I added "-i" to quickly spit out the score, then exit. If you need to run programs or reboot to change those things, this should be what you were asking for. I also added more details on the realtime benchmark screen; hopefully that was what you wanted. (That screen is getting cluttered, so use -c if you have a vga or ega system to increase the # of lines.)

I appreciate having some selectivity in the database, but if we include methods like processor upgrades to speed up the system, why not make a small exception for official methods to slow down the system as well. Slowdowns are very important to users, so if the system has an official method for slowing down the clock speed, that measurement should be equally valid.

System slowdowns are really only useful for the person who owns the system. If I included every single "non-turbo" version of an XT clone system that has been submitted, there would be 25+ entries all for essentially the same class of machine (4.77MHz 8088 ). So I'm afraid that will stand for now. I certainly won't stop you from gathering and publishing your own results -- I just won't include "slowdown" numbers in the official release.
 
Last edited:
I'm a little curious, why does mine and brutmans BIOS signature on our model 25's appear different? They are the same BIOS.

Because I cleaned one of them up and left the other alone :) It's just a copyright string and date info, and my routine isn't perfect, so it will often bring in ASCII 32-127 that isn't really part of the copyright string. Whenever I see "dirty" strings in the database, I clean them up in the editor, but I must have missed that one. It will be fine in the next release.

If you look at the BIOSCRC for both systems in the .ini file, they're identical. Also, the UID characters 4-7 are also the BIOS CRC, so you can see the UID is very similar for both entries as well.
 
Still looking for Tandy benchmarks. Especially Tandy 1000 RL @ 4.77MHz, Tandy SL benchmarks (gave invalid results when previously posted, and any Tandy 1000 TX/TL, fast and slow, CPU upgrades, built-in video or EGA/VGA.

So something appears odd with some Tandy machines and how I use Abrash's Zen Timer to time metric results -- I just now tested on a Tandy TL (which I only have access to for a few more hours, unfortunately) and all but one of the metric segments returned a time of "65535"!! So something is odd with some Tandys and their handling of 8253 port reads. I'll see if there is anything obvious in the code to fix. Until it's fixed, I can't really record those into the database, so I'll do my best to figure out what is going on.

For the curious, the Score for the Tandy 1000 TL was 14.
 
Figured it out -- the Tandy implementation of the 8253 (an 8254, although it shouldn't matter) doesn't like it if an interrupt occurs inbetween port reads. I modified the code so that interrupts are disabled when reading the 8253, et voila, we now have proper support of the Tandy TL (and probably other wonky machines). New executable and database on the downloads tab at dosbenchmark.wordpress.com.

I was worried that fixing this would screw up timings significantly and invalidate the entire database. It does change the numbers, but only by 32 cycles (about 24usec) and that kind of variance is acceptable to me since the actual metrics themselves run with interrupts deliberately enabled, and there is sometimes that variance just running each metric itself. With slow machines taking 3000usec to run some sections, 24 extra doesn't bother me.
 
Last edited:
Figured it out -- the Tandy implementation of the 8253 (an 8254, although it shouldn't matter) doesn't like it if an interrupt occurs inbetween port reads. I modified the code so that interrupts are disabled when reading the 8253, et voila, we now have proper support of the Tandy TL (and probably other wonky machines). New executable and database on the downloads tab at dosbenchmark.wordpress.com.

I was worried that fixing this would screw up timings significantly and invalidate the entire database. It does change the numbers, but only by 32 cycles (about 24usec) and that kind of variance is acceptable to me since the actual metrics themselves run with interrupts deliberately enabled, and there is sometimes that variance just running each metric itself. With slow machines taking 3000usec to run some sections, 24 extra doesn't bother me.

I hope this only occurs on the TL and SL computers, as it obviously does not on the SX, EX, HX, RL or RLX.

I cannot stress enough that these tests show increasingly the value of faster video. The TL is 10-15% slower in every benchmark compared to the IBM PC AT @ 8MHz, but the video benchmark shows that it is 30% faster than IBM's 8-bit EGA card, which supports roughly equivalent features.

Another telling comparison is between the 8-bit VGA of the IBM PS/2 Model 80 and the 16-bit VGA of the IBM PS/2 Model 35SX. Slower processor and ISA bus in the 35SX, yet its video absolutely smokes the Microchannel machine. Compare also the PS/2 Model 55SX and the PS/1 Model 2121.
 
A few comments on what is being tested: I designed the tests to be vidadapter-agnostic and processor-agnostic, which is why the video RAM write speed test doesn't require VGA unchained mode, or why the CPU tests don't use 386+ instructions or intentionally arranged to be pipeline-friendly. It had to work on everything, so that when you compare one system to another, the playing field is completely level.

There are disadvantages to this, and knowing them may interest you and illustrate what TOPBENCH is good for and is not good for:

- Because no 386+ instructions are used, some processors are "penalized" more than they should be. For example, there isn't very much difference between a 386sx-16 and a 286-16, when in reality if you had a function to perform and could do it using 32-bit registers, it could greatly outperform the 286. (Amiga MOD player realtime mixing comes to mind.)

- Because no EGA/VGA special handling is done, those adapters are "penalized" more than they should be. Yes, very early adapters were very slow, but if you use specialized modes provided by those adapters, they can paint pixels sometimes 4x (VGA) or 8x (EGA) as fast as just moving memory to the card. One example is polyfilling, which in 256-color mode is 1 write = 1 pixel (chained) or 1 write = 4 pixels (unchained). TOPBENCH takes none of this into account because CGA, MDA, and MCGA adapters can't do that.

- Starting with the Pentium Pro, 16-bit operations were devalued by Intel and execute worse than their 32-bit equivalents, so later processors can actually start to report lower scores (there is a 1998-era K6 in the database that "outperforms" my Core i7). This is why TOPBENCH is not practical to run on anything higher than, say, a Pentium. (I designed it to run on any machine that should come out in the next 20 years, but future machines are certainly not the design targets.)

I hope this only occurs on the TL and SL computers

It happened on a few other non-Tandy machines, so I'm glad I figured it out.
 
Using the latest version, the CPU detection still hangs against an IBM 486DLC-based accelerator board being used in a Tandy 1000 TL/2. Not knowing whether the issue is with the CPU itself, or some aspect of the system, I'd be curious to know if anyone has a similar CPU to test against. I noticed a few Cyrix variants in the database, but it doesn't look like anything has been gathered for systems using IBM's chip.
 
Back
Top