Trixter's latest magic... Holy how-in-the-hell!!!???

per · Apr 7, 2015

Impressive! I'd love to read some writeup of the techniques behind this.

I've been writing some Z80 sprite-drawing code recently, trying to get it as fast as possible. Right now the fastest I've gotten drawing with transparency is at 1/4th the speed of the same image being drawn by a set of unrolled ldir-s (no transparency). Nevertheless, for the target system this is still a bit limited!

Scali · Apr 7, 2015

per said:
I've been writing some Z80 sprite-drawing code recently, trying to get it as fast as possible. Right now the fastest I've gotten drawing with transparency is at 1/4th the speed of the same image being drawn by a set of unrolled ldir-s (no transparency). Nevertheless, for the target system this is still a bit limited!

I can give you a quick-and-dirty explanation of how my sprite routine works. I basically divided things up in all possible cases... based on 16-bit words. Something like:
- Starting on even or odd scanline (CGA uses separate bitplanes for even and odd scanlines)
- Starting on even or odd x-coordinate (there are 2 pixels packed in a byte in the mode I use)
- All pixels in word opaque
- All pixels in word transparent
- Some pixels in word opaque/transparent
- All pixels in byte opaque
- All pixels in byte transparent
- Some pixels in byte opaque/transparent

I then coded hand-optimized assembly 'templates' for each case. Then I derived some heustistics for when to select which variation for the fastest/smallest possible code (sometimes it is faster to process things per word, other times it is faster to process per byte. Not a problem you would have on Z80).
Then I made a 'compiler' for this: I load a bitmap, and have the compiler automatically generate the proper blocks of code for each case, inserting the proper pixel/masking data into the 'template', with a few peephole optimizations added (eg, if you have multiple opaque bytes/words next to eachother, it merges the pointer updates to a single instruction)
So basically the sprite code is 'perfect' hand-optimized code for drawing sprites with transparency.

MauriceH · Apr 7, 2015

@Scali

Found out The running XT problem.
Have a 5160 with Hercules AND a Hercules GB200(CGA) card.
SW1 is on MONO setting.
With "MODE co80" and "MODE MONO" I switch between the modes/cards.

SO at mode CGA I got the massage "Runtime error 201 at 0AEC:015F

I took out the MONO card, SW1 at 40x25 - so only CGA GB200 Hercules card.
Up and running fine with that GB200 card.

Tried a Tulip CGA card (switchable between MDA and CGA with a switch) but that card had sync issues with the program 8088MPH.
Found the manual by a change on a google search, its a "DGA" called card with a YAMAHA V6363 videoprocessor.

Pics of that card be found here on the forum
http://www.vintage-computer.com/vcf...t-Videocard-with-Yamaha-V6363&highlight=v6363

My pics of running the program on the 5160 XT with Hercules GB200 CGA card

With both cards in XT MDA and CGA-error code

The Tulip DGA card (Yamaha V6363) sync error -Scrolling Text between Demo pics went OK.

njroadfan · Apr 7, 2015

The demo is meant to be run on a NTSC composite monitor, not a RGB monitor. It takes advantage of the NTSC color fringing artifacts to display all those colors.

Scali · Apr 7, 2015

njroadfan said:
The demo is meant to be run on a NTSC composite monitor, not a RGB monitor. It takes advantage of the NTSC color fringing artifacts to display all those colors.

More specifically, it takes advantage of the color fringing artifacts as generated by an 'old style' IBM CGA card.
'New style' IBM CGA cards have slightly different output, making the colours slightly different. This means that things will look 'wrong' in our demo.
All the clones we have tried so far are even more off colour-wise than the 'new style' IBM CGA cards.

MauriceH · Apr 7, 2015

@njroadfan

Ah that way
Wiki explain CGA/Composite with pic's.
http://en.wikipedia.org/wiki/Composite_artifact_colors

@ Scali
Still a great effort that Demo.

per · Apr 7, 2015

Scali said:
I can give you a quick-and-dirty explanation of how my sprite routine works. I basically divided things up in all possible cases... based on 16-bit words. Something like:
- Starting on even or odd scanline (CGA uses separate bitplanes for even and odd scanlines)
- Starting on even or odd x-coordinate (there are 2 pixels packed in a byte in the mode I use)
- All pixels in word opaque
- All pixels in word transparent
- Some pixels in word opaque/transparent
- All pixels in byte opaque
- All pixels in byte transparent
- Some pixels in byte opaque/transparent

I then coded hand-optimized assembly for each case. Then I derived some heustistics for when to select which variation for the fastest/smallest possible code (sometimes it is faster to process things per word, other times it is faster to process per byte. Not a problem you would have on Z80).
Then I made a 'compiler' for this: I load a bitmap, and have the compiler automatically generate the proper blocks of code for each case, inserting the proper pixel/masking data, with a few peephole optimizations added (eg, if you have multiple opaque bytes/words next to eachother, it merges the pointer updates to a single instruction)
So basically the sprite code is 'perfect' hand-optimized code for drawing sprites with transparency.

Thanks a lot!

I didn't think much about instruction-encoded images, but it's clear it's faster than a universal draw-image routine for things with transparency.

The target system is blessed with one big bitplane for the entire display, but you have 4, 2 or 1 bit per pixel so X-position matters. All modes use the full 32K of video-RAM as bitmap, and that's precisely why speed is everything. There is a vertical scroll register and a proper 256-color palette, but otherwise the only other thing the graphics hardware provides is an extra wait-state when video-RAM is paged in.

njroadfan · Apr 8, 2015

Does this demo have any hope of working on a Turbo XT with a 8Mhz V30? I'm curious if a PC Transporter's internal CGA circuit will work. Chances are the full length CGA clone card I got with a "CIC 8645BE" won't work

Great Hierophant · Apr 8, 2015

Scali said:
Yes, to be exact, the capture on YouTube is also the one that was shown at Revision. We captured it on the spot from my PC/XT.
My configuration was like this:
- IBM PC/XT 5160 from 1987
- Old style IBM CGA card
- Serial card
- Floppy controller
- 5.25" 360k FD drive
- Harddisk controller
- Seagate ST225 HDD
- 640k of memory
- Sound Blaster Pro 2.0
- IBM PC DOS 3.30 (note that the demo does not work with 2.x versions of DOS)

The HDD and serial port were only for convenience during development, and were not actually used during the demo.
The Sound Blaster Pro 2.0 was used for the capture, because it has a PC speaker connection on its mixer. This allowed us to tap the signal from the motherboard, and pass it through the SB Pro mixer, then out to a 3.5" jack, so we could connect it to the capture device, and adjust the levels for recording.
The SB Pro itself was not actually used during the demo of course, and in fact, no SB software was installed on my machine whatsoever. Not even a SET BLASTER-statement in my autoexec.bat.

Scali said:
More specifically, it takes advantage of the color fringing artifacts as generated by an 'old style' IBM CGA card.
'New style' IBM CGA cards have slightly different output, making the colours slightly different. This means that things will look 'wrong' in our demo.
All the clones we have tried so far are even more off colour-wise than the 'new style' IBM CGA cards.

The old style IBM CGA card is a must here, new style CGA made for a jumpy image on both a CGA and a composite monitor whenever the extreme color screens appeared. Anything else will likely be worse.

The regular PC music is output at a good volume, but the MOD music at the end really requires some type of amplifier. I am not a fan of the PC Speaker input on the Sound Blaster Pro and later Creative cards, but if that is what these guys used, I'm good with it for this.

chjmartin2 · Apr 8, 2015

Scali said:
That is the reason why the demo will probably crash on emulators.
But even if it doesn't crash, some effects will not look/sound right, because they rely on cycle-exactness of the CPU, the CRTC and video memory wait states.
And then there is probably no emulator out there that will correctly simulate the high-colour tweakmodes with NTSC artifacting.

This demo will also not work entirely correctly on most clones, because just having a 4.77 MHz 8088 and a CGA-compatible adapter is no guarantee for cycle-exactness with the real IBM PC/XT and CGA. We have also found that the artifact colours on clone CGA (ATi Small Wonder/Paradise PVC4) tend to be different from real CGA.

So no dice on my PVC4....

I am going to run it anyway - could probably get proper result by shifting phase by 135 degrees. Does it time right on a 10 MHz?

Great Hierophant · Apr 8, 2015

I am very impressed in that this demo uses just about every conventional method and then some to display color from a CGA card onto an NTSC monitor. 320x200 color composite graphics, 640x200 color composite graphics, 160x100 color graphics, 40-column text and hacked 80-column text modes.

vwestlife · Apr 8, 2015

I tried it on my Tandy 1000SX with a NEC V20 running in 4.77 MHz mode (which it reported as being 8% too fast) and it mostly ran fine, except the first animation kept going much longer than it should have, causing the entire demo to take about 15 minutes to complete, instead of just under 8½ minutes. Also, the artifact colors were wrong, but that was expected.

Trixter · Apr 8, 2015

Hey guys, I completed my portion of the write-up: http://trixter.oldskool.org/2015/04/07/8088-mph-we-break-all-your-emulators/

Hopefully this answers questions that reenigne and Scali haven't tackled yet

They may produce their own writeups and/or source too.

Great Hierophant · Apr 8, 2015

This demo won first place in the oldskool demo category at Revision 2015 against some tough competition. It also got the third highest number of positive votes of any demo shown at the patry, so a heartfelt congratulations are in order!

Trixter · Apr 8, 2015

Timo W. said:
Very, very stunning demo. Makes me wonder how games would have looked back then, had the programmers known such tricks.

You can get a few hints from people who did know tricks back then; try running Spy Hunter (uses vertical scrolling) or Super Zaxxon (uses diagonal scrolling). And of course, California Games uses a custom timer interrupt programming to switch RGB palettes mid-screen a few times to simulate up to 7 colors in 320x200. Personal favorite part of the game that does that is the hackysack, which arranges the graphics cleverly so the transition between red-cyan-white and red-green-yellow is hidden.

The only thing I don't quite get is the intro text. Why would anyone think an IBM PC from 1981 would crush a C64 in a demo compo? The C64 didn't even exist in 1981. It's later hardware and also made for games. Souldn't it say: "C64 would crush IBM in a compo, right?"

That's because most demosceners think "286+EGA+sound device" when you mention "old PC demo", as that was the realistic birth of the PC demo scene. There has never been an 8088+CGA demo of this caliber and we wanted the audience to understand exactly what we were dealing with.

MauriceH said:
Have a 5160 with Hercules AND a Hercules GB200(CGA) card.
SW1 is on MONO setting.
With "MODE co80" and "MODE MONO" I switch between the modes/cards.
SO at mode CGA I got the massage "Runtime error 201 at 0AEC:015F

I believe that address is in the video detection code. What's odd is that I have a CGA and an MDA card in my 5160, and the detection code works fine, so I'm afraid I don't know what to tell you, sorry.

vwestlife said:
I tried it on my Tandy 1000SX with a NEC V20 running in 4.77 MHz mode (which it reported as being 8% too fast) and it mostly ran fine, except the first animation kept going much longer than it should have, causing the entire demo to take about 15 minutes to complete, instead of just under 8½ minutes. Also, the artifact colors were wrong, but that was expected.

What was the "first animation" that ran too long? I'm curious.

Great Hierophant said:
This demo won first place in the oldskool demo category at Revision 2015 against some tough competition. It also got the third highest number of positive votes of any demo shown at the patry, so a heartfelt congratulations are in order!

Definitely a dream come true for us.

Q: "How can Trixter possibly create anything better than 8088 Domination?"
A: "Work with people who are better than he is!"

romanon · Apr 8, 2015

So i tried this, and here are results.
I have IBM PC 5160, IBM CGA, 256kb RAM.
On my IBM DOS 2.0 didnt work, there was runtime error...
I tried DOS 3.0, ok it works but, collors was different, i know, it is for NTSC monitor not for RGB

romanon · Apr 8, 2015

romanon · Apr 8, 2015

2 scenes before this was not displayed, i dont know why.

after this scene i saw only this and nothing more, no exit.

Scali · Apr 8, 2015

romanon said:
2 scenes before this was not displayed, i dont know why.

I think it's because you only have 256k memory. Some parts need more, the largest part needs about 507k.
And indeed, we currently use some code that is not compatible with DOS 2.x. If we do a final version, we may address this problem. We mostly used some DOS 3.x functionality because it was easier to code, not because it would not be possible at all with DOS 2.x.
DOS 1.x will not work though, for the simple reason that it does not support 360k floppies. However, even THAT could be fixed, in theory.. if we were to make a version that runs off a 120k floppy on both sides, and prompt the user to flip the disk at the appropriate time.

reenigne · Apr 8, 2015

romanon said:
On my IBM DOS 2.0 didnt work, there was runtime error...

We had originally hoped to target DOS 2.0, but discovered quite late on (too late to fix) that the Turbo Pascal runtime that Trixter used for his parts has some DOS calls that were not supported in versions before 3.0. We might see if we can fix that up for a final version.

romanon said:
2 scenes before this was not displayed, i dont know why.

Probably due to lack of RAM - the 3D shapes and Kefrens bars require more than 256kB.

romanon said:
after this scene i saw only this and nothing more, no exit.

That's similar to what I see for the final part on DOSBox - what CPU are you using? If it's something other than an i8088 (especially if the prefetch queue works differently) that might account for that. Again, lack of RAM is the more likely explanation though.

Trixter's latest magic... Holy how-in-the-hell!!!???

Veteran Member

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member