• Please review our updated Terms and Rules here

Looking for volunteers to help test a new benchmark

Re: the MediaGX, it's worth noting that it spends a lot of time in System Management Mode emulating the hardware it claims to have, but doesn't actually have.

IIRC, the 386DX @ 40 MHz did turn out to be faster than the 486SX @ 25 MHz.

I'll try running it on various machines I've got tonight. The list:

5155 - bone stock 4.77 MHz revised Intel 8088, bone stock CGA, so the results won't be interesting
200LX 2 MiB - bone stock 7.91 MHz 80186, sounds like this won't work, but I'll try it anyway. It's worth noting that the 200LX has its own video routines at INT 5Fh.
ThinkPad 365XD - 120 MHz Pentium with no L2, Trident Cyber9320 PCI graphics.
Fujitsu P1620 - 1200 MHz Core 2 Duo (Merom), i945 graphics.
MacBookPro10,1 - 2300 MHz Core i7-3615QM, factory overclocked (very, very overclocked - faster than a GTX 660M) GeForce GT 650M graphics. I think DOS should boot using Boot Camp (because it's BIOS emulation), we'll see whether I can actually interact with it.
 
Last edited:
IIRC, the 386DX @ 40 MHz did turn out to be faster than the 486SX @ 25 MHz.



What video card would you be using in the 386/40?
 
Last edited:
Wacky numbers from a wacky system

Unfortunately I can't use these because they're too wacky. I would need a MediaGX system at my home to directly debug the code on, as I can't reproduce this behavior.

I know I must have missed it the first time around, but what did you deduce regarding the 386/40 vs, the 425sx/25?

The general consensus is that the 386DX-40 is faster than the 486sx-25 for most 16-bit and 32-bit code, with the 486 being faster for calc-heavy code and the 386 being faster for memory-access-heavy code. It also depends heavily on the 386, as there were many boards you could and couldn't add L2 cache memory to.

5155 - bone stock 4.77 MHz revised Intel 8088, bone stock CGA, so the results won't be interesting

There's already one in the database (mine :) so there's no need to submit that.

200LX 2 MiB - bone stock 7.91 MHz 80186, sounds like this won't work, but I'll try it anyway. It's worth noting that the 200LX has its own video routines at INT 5Fh.

Not only should it work, but I'm curious if my 80186 detection routines work. I've never tested them. See if TOPBENCH correctly detects an 80186.
 
I didn't do the test on a 386DX 40 or a 486SX 25, that's just my memory of what was posted in this thread, reading through it.

Anyway, it correctly detected the 200LX as an 80186, but at 6 MHz (or, on topbench -r, 6.652368 MHz), not 7.91. It also detected it as an "HP 95 LX " - wonder where you're getting that from, there is a way to detect what LX machine you're running on (it can't distinguish between a 100LX and a 200LX, but they're damn near the same machine, just different firmware revisions): http://www.delorie.com/djgpp/doc/rbinter/id/91/13.html
 
Last edited:
Unfortunately I can't use these because they're too wacky. I would need a MediaGX system at my home to directly debug the code on, as I can't reproduce this behavior.

I've asked my parents to see if they can find either my or my brother's old Presario 2200. If they can, I'll see if I can help in any way.
 
It also detected it as an "HP 95 LX " - wonder where you're getting that from, there is a way to detect what LX machine you're running on (it can't distinguish between a 100LX and a 200LX, but they're damn near the same machine, just different firmware revisions): http://www.delorie.com/djgpp/doc/rbinter/id/91/13.html

Code:
  Regs.AX := $4DD4;
  Intr ($15, Regs);
  If Regs.BX = $4850 Then EndString := 'HP 95 LX ';

Looks like I'm not doing enough checking. I'll add more code if you're willing to re-test.

I've asked my parents to see if they can find either my or my brother's old Presario 2200. If they can, I'll see if I can help in any way.


I appreciate that, but I'd need a system exhibiting that behavior in my house with my dev tools running on it to see if it can be fixed. I'm not sure it can since I'm not entirely sure what is going on -- the timer values are read, then read again, and compared; if they are showing very high values then the second value read is somehow behind the first value instead of ahead of it.

MediaGX systems can still be used with topbench if you compare Score -- that works on all systems because of the way it was designed. So that's accurate. You just can't look at or trust the microsecond timings.
 
I am willing to do further testing.

It's worth noting that tracing the results of a MOV AX,4DD4h, INT 15h, in debug, I'm getting 0201h in DX, on my version 1.02 A US BIOS - same as the German ROM in the link I gave. I'm suspicious that this is really a BIOS version word, and now I'm wondering what happens on a 100LX. I'm guessing you'll get results that completely overlap with 200LX versions, as the 100LX had its own BIOS version numbering. And, you'll also get 1000CXes and OmniGo 700LXes, which are based on the 100LX/200LX hardware. (I don't think you'll be able to tell the difference in direct hardware access from software, although the BIOSes did change apparently.)

Anyway, I e-mailed you the results from the four machines I've tested as well.
 
Last edited:
That detected it as a 100LX/200LX, so that appears to be good. Still detected as 6 MHz, and now I'm wondering how to verify the clock speed without a frequency counter of some sort. I don't have an AC adapter, but I don't believe it's throttling on battery at all.
 
Still detected as 6 MHz, and now I'm wondering how to verify the clock speed without a frequency counter of some sort.

The actual CPU in the 200LX is an HP Hornet, not an Intel 80186. The speed numbers I have were calibrated for an 80186 -- I have no speed numbers for that specific CPU. You should ignore the speed rating and just correct it before saving it to the database.
 
I should also mention that Topbench incorrectly reported the speed of the MediaGX. It recognized it as running at 100 MHz instead of 180. I amended the results.
 
The actual CPU in the 200LX is an HP Hornet, not an Intel 80186. The speed numbers I have were calibrated for an 80186 -- I have no speed numbers for that specific CPU. You should ignore the speed rating and just correct it before saving it to the database.

Some docs out there (I believe that was sourced from HP's developer documentation for the 100LX/200LX) claim it is actually an 80C186 CPU core, though, and Intel made the thing, it's got no HP copyrights.

This just gets more and more mystifying as I benchmark weird setups of it. I'm wondering if there's wait state issues or something going on with video memory access, because...

This is what I submitted to you:
Code:
[UID826E11ACAF]
MemoryTest=2182
OpcodeTest=731
VidramTest=460
MemEATest=1040
3DGameTest=720
Score=10
CPU=Intel 80186
CPUspeed=7.91 MHz
BIOSinfo=Copyright 1984,1985  Phoenix Software Associates Ltd (05/24/94, rev. 1)
BIOSdate=19940524
BIOSCRC16=826E
VideoSystem=CGA
VideoAdapter=CGA
Machine=HP 200LX
Description=
Submitter=Eric Rucker <bhtooefr@bhtooefr.org>

The actual detected speed was 6.652368 MHz. This was run in the default 80x25 text mode.

Now, here's where it gets downright STRANGE. I'm not correcting for CPU speeds in these. 64x18 zoomed text mode:
Code:
[UID826EA4793]
MemoryTest=2046
OpcodeTest=683
VidramTest=424
MemEATest=968
3DGameTest=661
Score=10
CPU=Intel 80186
CPUspeed=7.039064 MHz
BIOSinfo=Copyright 1984,1985  Phoenix Software Associates Ltd (05/24/94, rev. 1)
BIOSdate=19940524
BIOSCRC16=826E
VideoSystem=CGA
VideoAdapter=CGA
Machine=HP 200LX
Description=Text display zoomed to 64x18
Submitter=Eric Rucker <bhtooefr@bhtooefr.org>

And 40x16 zoomed text mode:
Code:
[UID826EA64E6]
MemoryTest=1853
OpcodeTest=636
VidramTest=394
MemEATest=908
3DGameTest=625
Score=11
CPU=Intel 80186
CPUspeed=7.55108 MHz
BIOSinfo=Copyright 1984,1985  Phoenix Software Associates Ltd (05/24/94, rev. 1)
BIOSdate=19940524
BIOSCRC16=826E
VideoSystem=CGA
VideoAdapter=CGA
Machine=HP 200LX
Description=Text display zoomed to 40x16
Submitter=Eric Rucker <bhtooefr@bhtooefr.org>

It seems like, the less text is on the screen, the faster things get. Mind you, the display controller has to write just as many pixels to the display - the lower modes render larger characters - but in a text mode, that probably wouldn't matter.

Edit: Decided to try MODE BW40, too. The UI is incredibly broken in that mode, for what it's worth, but...
Code:
[UID826EB0396]
MemoryTest=1927
OpcodeTest=634
VidramTest=408
MemEATest=911
3DGameTest=625
Score=11
CPU=Intel 80186
CPUspeed=7.473171 MHz
BIOSinfo=Copyright 1984,1985  Phoenix Software Associates Ltd (05/24/94, rev. 1)
BIOSdate=19940524
BIOSCRC16=826E
VideoSystem=CGA
VideoAdapter=CGA
Machine=HP 200LX
Description=MODE BW40 
Submitter=Eric Rucker <bhtooefr@bhtooefr.org>

Suggestion... this is a gaming-centric benchmark, yes? Seems certain systems steal CPU memory bandwidth for video (not just the 200LX - the PCjr does too, after all), and the video mode that the system is in during the benchmark affects performance. Sounds like a case for ensuring the machine's in a graphical mode, although that would exclude MDA (and Hercules would have to be in a different mode from everything else).

Edit: And as a sanity check... crystal is 15.836774 MHz, divided by two for use by the IC (which actually means rounds to 7.92, not 7.91), bus access takes 4 cycles and transfers 2 bytes on a 80186, for 3,959,193.5 bytes per second.

80x25x2 bytes per character at 90 Hz (per this post) is 360,000 bytes per second, 64x18x2@90 Hz is 138,240 bytes per second, 40x25x2@90 Hz is 120,000 bytes per second, 40x16x2@90 Hz is 76,800 bytes per second.

This means that the correct speeds for each video mode would be 7.198387, 7.641907, 7.678387, and 7.764787 MHz. However, that's not counting any accesses it might do during blanking, or anything else taking memory bandwidth away. A CGA graphical mode would be 1,440,000 bytes per second if it stays at 90 Hz, I think, resulting in 5.038387 MHz, or 960,000 bytes per second at 60 Hz, resulting in 5.998387 MHz.
 
Last edited:
Minor update for 2016, which includes 8 new systems including an HP 200LX palmtop, Apple MacBook Pro, and LOBO (the fastest 286 on record). Can grab the newest version and database from dosbenchmark.wordpress.com. As always, DATABASE.INI entries for any Pentium or lower systems not already in the database are welcome; just email them to me.

Suggestion... this is a gaming-centric benchmark, yes?

Yes and no. It was primarily engineered to definitively answer which systems could run real-mode code faster than others. But, because video card performance can drastically affect the speed of games and graphical applications, I do test the raw memory speed of the video adapter. The design was quite deliberately limited to operations that every single PC and clone in existence could perform: 16-bit code only, documented instructions only, no 8087 code, 4K of video RAM tested (the amount that MDA has, and also explains why I do the video ram benchmark in text mode, as MDA cannot do graphics). This makes TOPBENCH 100% suitable for 286 and lower systems, and I truly believe it is the best synthetic benchmark for those systems and can defend it in that area.

TOPBENCH is not as suitable for 386+ systems. It is still a perfectly good benchmark if you will be running mostly 16-bit code on your system, but many vintage DOS gaming users enjoy running games like Doom, Quake, Duke Nukem 3D, and others that use 32-bit instructions, 32-bit protected mode, or both. Someday if I run out of other projects, I have given some thought to creating TOPBENCH32, where I would likely make the following changes:

  • Build the entire benchmark and all of its metrics to use 100% 32-bit protected mode
  • Perform video adapter ram benchmarking in common VGA graphics modes: VGA 640x480x16, 320x200x256 chained, 320x200x256 unchained (or whatever mode Doom uses, maybe it's 320x240 unchained), and 360x480x256 unchained. (all of these modes display on all VGA monitors)
  • Extend the RAM benchmarking to use more RAM, to better avoid L1 and L2 cache effects
  • Add all 80386 opcodes and 32-bit registers to the opcode exercise metric
  • Change the 3D metric into functional 3D code (ie. it would generate a 3D scene, and maybe even render it in system RAM)

The above changes would allow for consistent, even testing of every 386+ system, it would be more "fair" to those systems, and it would be a relatively straightforward project (extend existing TOPBENCH, no need to completely redesign the entire benchmarking methodology). I could see myself doing this if/when I run out of other projects.

The design challenge I have is if I want to be "fair" to many more system configurations (like, a Pentium II with a PCI video card). For example, it would be nice to test the benefits of 486+ instructions, x87/math instructions, or Pentium+ instructions; the effects of Pentium pipelining vs. not having two pipes; and VESA graphics modes that use much more video RAM and possible bank switching. However, since not every system could be benchmarked in this way, how could I come up with a "score" that represents a bonus for enhanced speed due to more CPU features, or additional VESA video modes? That's the challenge. I think one viable way would be to build everything into a single metric that intentionally has several opportunities for speedup if additional cpu/math/graphics features are present, and then take advantage of those features. The score will reflect higher numbers if those features improved the running time of the metric, and what additional features were utilized can be reported to the user so that, for example, the difference in a 486dx-33 and a 386dx-33 is more easily understandable. But this is a massive pie-in-the-sky ambition, and is extremely unlikely to happen given that most DOS gamers are happy benchmarking their favorite protected-mode games anyway.
 
Just ran TopBench on my Panasonic CF-52.

[UIDDC8D14F488]
MemoryTest=1
OpcodeTest=1
VidramTest=75
MemEATest=0
3DGameTest=0
Score=637
CPU=Intel Core 2 DuoCPU T7100
CPUspeed=1795 MHz
BIOSinfo=unknown
BIOSdate=20070718
BIOSCRC16=DC8D
VideoSystem=VGA
VideoAdapter=VGA, VESA, 256kb Video Memory (BIOS)
Machine=Panasonic CF-52
Description=
Submitter=BrianS


I manually edited the CPU type, TOPBENCH stated it was a Pentium Pro. It must have looked over to my Pentium Pro... Which I will also try with TOPBENCH.

This CF-52 Dual-Boots DOS 7.1 (Win98SE/Bootgui= 0) and XP SP3. The hard part was getting the 512GByte SATA drive formatted as a bootable FAT-32 disk. DOS was easy after that. XP required a new install disk to be created with the SATA drivers folded into the CD. FDISK from FreeDOS used to create the active DOS partition (100% of Disk), then FORMAT from FreeDOS to format it, THEN Format/s from Win98 to make bootable. This machine is used mostly to run PharLap extended DOS programs, which can use up to 3.4GBytes RAM in it.
 
Last edited:
[UIDB2AAE8F83]
MemoryTest=0
OpcodeTest=0
VidramTest=93
MemEATest=0
3DGameTest=0
Score=523
CPU=Pentium-M
CPUspeed=1995 MHz
BIOSinfo=unknown
BIOSdate=20040317
BIOSCRC16=B2AA
VideoSystem=VGA
VideoAdapter=VGA, ATI Bios : 0.0, VESA, 256kb Video Memory (BIOS)
Machine=Panasonic CF-51
Description=
Submitter=BrianS

Ran the benchmark on the 2GHz Pentium-M CF-51. This computer uses IDE drives, supports USB in booted into DOS (7.1, Win98se Bootgui=0)
 
Once upon a time there was a processor named Pentium Pro, which was said to be very fast in 32-bit code, but not so good in 16-bit.
I look forward to TOPBENCH32 in order to verify that - it that's true, TOPBENCH32/TOPBENCH score ratio should be noticeably greater on PPro (and its descendants?) than on earlier CPUs.
 
We can test your theory by looking at existing collected data: There's currently only one Pentium Pro in the database, at 200 MHz, with a score of 250. Very close to that score is a Pentium 75 with a score of 223, so considering the Pentium running at less than half the clock speed of the PPro is nearly the same score, I'd say this data favors the conclusion. I have a PPro 166 and a Pentium II 233 that I haven't yet added to the database; if adding them drastically confirms or denies this conclusion, I'll post an update.
 
Last edited:
Once upon a time there was a processor named Pentium Pro, which was said to be very fast in 32-bit code, but not so good in 16-bit.
I look forward to TOPBENCH32 in order to verify that - it that's true, TOPBENCH32/TOPBENCH score ratio should be noticeably greater on PPro (and its descendants?) than on earlier CPUs.

It is true, some background info:
The problem wiht the Pentium Pro is in the register renaming scheme.
The CPU backend has far more registers than what the x86 exposes, in order to help with out-of-order execution.
The specific problem with the x86 instructionset is that it allows you to address 'partial' registers. Eg, you have the 32-bit eax register, but with ax you can address the bottom 16-bit word, with al the bottom byte, and with ah the high byte of ax.

The register renamer cannot handle these 'partial' registers, so what happens is that they each get aliased/renamed to a separate internal register.
So if you do something like this:
mov ax, FF
mov ah, A
mov al, B
mov [di], ax

Then what happens is this:
mov ax, FF -> ax gets aliased to register 0
mov ah, A -> ah gets aliased to register 1
mov al, B -> al gets aliased to register 2
mov [di], ax -> oops... we're trying to read ax again, flush the pipeline and combine the results of registers 0, 1 and 2 to find the actual value of ax

Now, this happens all the time in 16-bit code (especially since you never actually use the full 32-bit registers).
What Intel did to improve this in the Pentium II is the following:
They added an extra internal status bit for each register to indicate whether it is 0. This makes the recombining pretty trivial, since you can just discard all registers of value 0, and often enough you can just promote the non-zero register to the new value.

This will not solve the above example.
However, it does solve things like this:
xor eax, eax
mov al, F

On a Pentium Pro, this would still stall the pipeline on the next access of eax. On a Pentium II, the register aliased to al will be promoted to eax directly, without a stall, because the xor eax, eax had set the 'zero' state bit earlier.
So Pentium II does much better on 16-bit code in practice. However, if you deliberately create 'difficult' code for the register renamer, you can still stall it like crazy, and a regular Pentium will perform much better (it does not have register renaming, and does not suffer from stalls on partial register access).

However, this is the most common case, especially with compiler-generated code (basically just casting values to different sizes). And it is trivial to implement the 'xor eax, eax' workaround in the compiler (you'll see this a lot in compiled code. There was a specific movzx instruction added to x86 at an earlier stage, but the xor variation is more efficient on Pentium and newer CPUs).
 
It is true, some background info:
The problem wiht the Pentium Pro is in the register renaming scheme.

What also hurt the Pentium Pro running 16 bit code was the segment register reloads. Being superscalar, it would speculatively and out-of-order execute instructions until it saw a segment register load. Then it stalled until the segment register was loaded. Not so bad running a flat 32 bit memory model where segment registers were rarely updated. Not so good in 16 bit code where you have to load the segment registers all the time. This was fixed in the Pentium II.
 
Last edited:
Back
Top