My response went AWOL or got deleted somehow... probably due to the PAINFUL forum speeds and recent increase in 500 errors, so let's try this again.
That's faster than "stock" as you're running the jrmem driver to enable the extra RAM, which is not as slow as the first 128K. For a true "stock" speed rating (and to simulate a 128K jr's speed), re-run the test without that driver loaded.
Check the cpu test -- 1120 is HALF what it should be, and the only excuse for that on a Jr is it running in the bottom 128, so I kinda doubt that. It's actually more of a RAM and CPU test since it's just a repeat until 5 seconds expire or ch is pressed... The frame rates are about what I was expecting out of a Jr due to less video wait states, which is where the REAL bottleneck is.
If anything, you're XT numbers seem a bit off to me -- is that a V20? You sure it's at 4.77mhz? Are you sure that's a REAL cga card? Those numbers just look wrong. (well, the low sprite count numbers look right, the high sprite count numbers look WAY too high)
Ever run CGA_COMP on a Jr? (now who wrote that again?) -- specifically the video bandwidth tests? You might be in for a surprise as the block read/block write tests actually come in 50% FASTER than a XT with a CGA! (the interleaved read/write tests come in at about 2/3rds if cga_comp is operating out of the bottom 128).
I don't have a Jr anymore to test on, but you want FUN in that department take my Tandy 1000HX as an example. According to CGA_COMP the results for a normal CGA 4.77mhz PC are:
Block Read: 246 KB/sec
Block Write: 298 KB/sec
Interlaced Read: 175 KB/sec
Interlaced Write: 170 KB/sec
My 1000HX in "Slow" - 4.77mhz
Block Read: 483 KB/sec
Block Write: 692 KB/sec
Interlaced Read: 248 KB/sec
Interlaced Write: 267 KB/sec
Back when I had a JR, running from the bottom memory it returned around 350ish for block read, 400ish for block write, and low 100's for the interlaced... forcing the code out the numbers jumped up into tandy 1000 territory.
Due to the lack of extra wait states on video memory -- which is why the PCJr is actually FASTER at reading/writing to video memory (especially on rep operation small enough to fit into the BIU) than a stock CGA card... the bus isn't locked as often with wait states as it has dual ported RAM.
Gah, scary thought -- could you imagine a PC JR with single ported/unbuffered RAM? Snow while code is running. (We'd be yelling "What is this, a ZX-80?)... even MORE wait states dragging code execution in that bottom RAM to a crawl?
In testing here between various machines, I noticed something odd... For 1000's I've got an SX and two HX setup here... the SX has the stock AMD 8088-2, one HX has the stock Seimens 8088-2, and the other HX recently got a V20 (Wonder where that came from?)... I thought the Seimens was just another 1:1 8088 knockoff under license, but the numbers don't reflect that. I ended up pulling all the chips and trying them in both SX and HX machines to verify it wasn't mainboard differences skewing my results. The "CPU Count" test from this little video test of mine showed something... odd.
SX Stock AMD 8088-2 - 2046
SX Seimens 8088-2 - 2163
SX NEC V20 - 2304
HX Stock AMD 8088-2 - 2051
HX Seimens 8088-2 - 2172
HX NEC V20 - 2342
I pulled out Nortons SI, as well as MIPS, and got similar skews in the numbers. The Seimens appears to be marginally faster! (I chalk up the HX being faster due to bios or other board differences). Not as much of a boost as the V20, but it's still interesting to see.
Also shows why all the old games that ran "unthrottled" with no timer control at all are so... bad across even systems operating at 4.77mhz.
The SX also gave some odd speed results -- this is with the V20:
CPU: 2304
Test 1 - FPS:48.00
Test 3 - FPS:36.20
Test 5 - FPS:28.60
Test 7 - FPS:23.60
Back on the AMD 8088-2:
CPU: 2046
Test 1 - FPS:42.00
Test 3 - FPS:33.70
Test 5 - FPS:24.80
Test 7 - FPS:22.00
Eerily low compared to my HX, or the numbers Trixter reported from the XT... Kind of strange as I didn't think there was that much difference between the SX and HX.
But again, that's why I'm putting this test out there, so I have an idea how much I can actually put on-screen at once and keep the frame rate playable, instead of getting my heart set on doing things that just aren't feasable.
Oh, and I'm aware of that bit of video corruption up top -- originally it ran all 16 sprites from the start, when I broke them into 4 pieces some of the initialization code is a bit off -- that's the 'erase the old location' code screwing up. NOt a big deal given this is only a test and has no impact on actual speed measurements.
Though I am a bit shocked at the amount of snow... I really shouldn't be as about 70% of each loop's clock cycles are spent blitting to the screen, but still...
Thankfully unlike Paku Paku where I needed 240 ticks/second for sound management, I can do my desired frame rate as my timer since I can make the sound however I want for the game instead of trying to mimic an existing game. (couldn't even put it into an interrupt as it firing during the blit to screen looked like arse).
I was looking at the hardware scrolling, but found issues between the video adapters with more than 16k of video RAM; it's also not entirely viable as I'm only going to be using 112px of width for the play area, with stats on the right much akin to paku paku -- or more specifically Silpheed... keeping that large a sidebar fixed with hardware scrolling just isn't an option... also all the calculations to subtract the offset from my sprites and to redraw the top ends up MORE work and slower than to just draw and erase the 24 pixels... Looks like a great technique for games like River Raid, but not so great for what I'm doing. (especially since... well.. I'm not going to give away the surprises just yet).
Though I really squeezed a lot of speed out of the stars by NOT tracking them via X/y but by their memory offset, and only allowing stars on every other pixel so I don't even need shifts -- it's just a single attribute write. (well, really 3 bytes of write per frame...)
I'm also going to be using MUCH larger back-buffers this time out so I can have sprites smoothly enter/exit the screen and to simplify/speed up address calculations. That 112 isn't coincidence, as it gives me 8px per side to reach 128... room enough for my sprites, and making the back-buffer address calculation a simple:
Code:
mov bx,x
mov ah,y
xor al,al
add bx,ax
mov ax,spriteShift
les di,backBuffer
lds si,spriteBuffer
shr bx,1
jnc @noShift
add si,ax
@noShift
add di,bx
Which is a lot nicer than trying to calculate (x+y*160) and $FFFE; or more specifically
mov ah,y
xor al,al
shr ax,1
add bx,ax
shr ax,1
shr ax,1
add bx,ax
Which is pretty hefty.