Trixter's latest magic... Holy how-in-the-hell!!!???

reenigne · Jun 14, 2015

uncutspline said:
That was very-very interesting to watch. Is it even possible to bitbang more out of that hardware? Your team warped the computers physics barriers at PhD levels.

Thanks! More is possible, though it becomes increasingly difficult and I'm going to need to make some new tools to top it.

I have a question though. On old consoles, hackers often use unused i/o registers for scratchpad memory storage + squeeze some extra cycles sometimes. Or even safely run code off the sram of a SMS cartridge. Is there any benefit / possibility to do this on a legacy pc?

Not really... For 8088 MPH we targeted 640kB, which (misattributed quotes aside) is actually pretty spacious for a machine of this speed. We didn't even use it all (and in the effects where we were using most RAM, the limiting factor was startup time - we didn't want to leave people staring at that bouncing text for too long while we decompressed our big chunks of data). I think that kind of trick would be more at home on a machine with a kilobyte count in the single digits.

The other reason that kind of trick wouldn't be helpful is that (unlike a console) all the code is in RAM, so adding tens of bytes of code to store a byte or two in a scratch IO register is not a win. There are a few scratch IO registers here and there in the PC architecture, but nowhere near enough to make back what you'd spend in terms of code to access them.

And of course this program comes on a floppy disk rather than a cart, so there's no way to add SRAM to the system that way!

Scali · Jun 14, 2015

Well, if you were to replace all the RAM in your machine with SRAM, you could disable the DMA refresh entirely, which saves a few extra cycles.
Reenigne actually experimented with using 'self-refreshing code' earlier, see this blog: http://www.reenigne.org/blog/how-to-get-away-with-disabling-dram-refresh/
In a nutshell, refreshing memory is as simple as reading bytes. Because the addressing of memory is not done per-byte, but is done with a matrix of rows and columns, you do not need to access every single byte, but merely one byte in every row. This is enough to refresh the whole column.
Since your code is also read from memory, you could create sequences of code that as a 'side-effect' also refresh all your DRAM, because the access pattern touches a byte in every row.

Although Reenigne has experimented with this somewhat, in the end we chose not to use this trick in 8088 MPH, because we hadn't perfected the technique yet, and got inconsistent results when testing on our various machines, leading to stability/compatibility issues.
We do change the DMA refresh period however, to synchronize it with either the graphics or the music, to avoid random timing glitches. Which was the original aim for using this technique anyway. The performance advantage wasn't that interesting.

reenigne · Jun 14, 2015

Scali said:
Although Reenigne has experimented with this somewhat, in the end we chose not to use this trick in 8088 MPH, because we hadn't perfected the technique yet, and got inconsistent results when testing on our various machines, leading to stability/compatibility issues.

It's not that it's inconsistent, it's that disabling the refresh DMA and relying on self-refreshing code would cause most of the RAM in the machine to decay. On the 5150/5160, a (DMA channel 0) refresh activates all DRAM banks but a read only activates the bank you're reading. It's most annoying, because a small change to the circuit would have allowed that to work. I should really update that blog post...

Scali said:
The performance advantage wasn't that interesting.

Every little helps... but yeah, the more significant advantage would have been to make cycle counting much easier.

Moondog · Jun 14, 2015

Hi! New to the site. That demo was awesome! It shows what can be done with the good information and an imagination.

Scali · Jun 14, 2015

reenigne said:
It's not that it's inconsistent

Well, what I seem to recall was that Trixter's system decayed quicker than yours, because of different types of memory modules used.
So that's what I meant. But I may have remembered wrong.

reenigne said:
On the 5150/5160, a (DMA channel 0) refresh activates all DRAM banks but a read only activates the bank you're reading. It's most annoying, because a small change to the circuit would have allowed that to work. I should really update that blog post...

Oh I see, that information is new to me. I didn't know there was a difference between DMA and CPU access.
So basically this means that self-refreshing code would have to touch a LOT more memory, because you have to access each of the banks? Which means it isn't a very efficient approach then, and may actually be worse than just using the DMA?

reenigne · Jun 14, 2015

Scali said:
Well, what I seem to recall was that Trixter's system decayed quicker than yours, because of different types of memory modules used.

That is also true - some DRAMs take longer to decay than other. Fortunately we didn't find any that decayed with a period 19 DRAM refresh.

Scali said:
So basically this means that self-refreshing code would have to touch a LOT more memory, because you have to access each of the banks? Which means it isn't a very efficient approach then, and may actually be worse than just using the DMA?

Exactly. And the amount of RAM you'd have to touch depends on the kind of DRAM chips in the machine. With 41256s you'd have to touch 3 addresses to do the same job as one DMA cycle, for 4164s it's 10 addresses and for 4116s it's 40! So it might be useful if your effect only needs 16kB and you don't mind losing the data in the other 624kB!

uncutspline · Jun 15, 2015

Thanks for the great thorough explanations! I have a new one.

https://en.wikipedia.org/wiki/SWAR

Can your creativity expand its potential? I haven't seen hobbyists make use of this in projects and was curious why a bit. Thanks again!

reenigne · Jun 18, 2015

uncutspline said:
Thanks for the great thorough explanations! I have a new one.

https://en.wikipedia.org/wiki/SWAR

Can your creativity expand its potential? I haven't seen hobbyists make use of this in projects and was curious why a bit. Thanks again!

There's a nice example of SWAR without SIMD registers in the Galaxy Player (glx212.zip) code if you disassemble it. It mixes two 8-bit samples at once with a single 16-bit ADD instruction. I'm sure there are others in tightly written assembly routines for various platforms, but they're pretty difficult to pull off.

It didn't make it into this demo, but I have some code which uses the value in a single register (DX) in two completely different ways - as a port address (the high 6 bits are ignored by the hardware) and as an audio pulse width (for which 6 significant bits are enough and the low 10 bits being fixed doesn't matter).

Scali · Jun 18, 2015

uncutspline said:
Thanks for the great thorough explanations! I have a new one.

https://en.wikipedia.org/wiki/SWAR

Can your creativity expand its potential? I haven't seen hobbyists make use of this in projects and was curious why a bit. Thanks again!

This is quite a common trick in the demoscene.
In fact, the Java software renderer I made back in the early 2000s is filled with this stuff, to speed up pixel lighting/blending/saturation stuff (yup, even bilinear texturefiltering):
http://www.pouet.net/prod.php?which=10808

Scali · Jun 18, 2015

reenigne said:
It didn't make it into this demo

Actually it did.
Remember the plasma effect? We optimized it to process two pixels at a time, because the sin-table only ran from 0..127, so we could do a packed add on a register without worrying about overflow:

Code:

        lodsw                           ;ax=(first sin value, inc 1st idx)*2
        add     ax,[bp+si]              ;ax=(first sin + second sin)*2
        xlat                            ;convert low byte to AmirColor
        xchg    ah,al
        xlat
        xchg    ah,al

I suppose the sprite-compiler could also be seen as SWAR: it performs 8-bit or 16-bit wide operations on the pixels. Each pixel is only 4 bits wide, so in some cases it processes 4 pixels in parallel in a single register (performing mask-blit operations of the sprite to the background image so that transparency is handled correctly).
The polygon routine does something similar to draw each delta-span on top of the previous frame.

Gabucino · Jun 20, 2015

Caluser2000 said:
I've got the same monitor matched up to a EGA card on my XT clone desktop.

Hey there! I've also matched it to an EGA, but I'd rather have a full-fledged CGA setup than a 640x350-less EGA

On the other hand, it works perfectly with a Commodore 128 plus CP/M.

njroadfan · Jun 25, 2015

I eagerly await the final release version. I want to see how well it runs on a Applied Engineering PC Transporter card. :D It has composite output for its onboard CGA, and I'm curious if the PC speaker output (which might actually be thru the Apple II bus), will work with the audio. Speaking of audio, any chance the song over the ending credits is available as a MOD? Its catchy.

Trixter · Jun 25, 2015

njroadfan said:
Speaking of audio, any chance the song over the ending credits is available as a MOD? Its catchy.

With most demos that abuse systems like MPH does, the song format is custom to the demo. However, you're in luck; we designed the chiptune player to accept mod files so that it made the musician's life easier. Here's the file we used for the demo: ftp://ftp.oldskool.org/pub/misc/temp/test1.mod

BTW, don't hold your breath it will work on the transporter...

njroadfan · Jun 25, 2015

It might, I double checked and the PC Transporter has direct PC speaker outputs as well, although AE doesn't instruct IIgs users to connect them. Timing will be an issue though as its a 8Mhz NEC V30 based "PC".

Trixter · Jun 25, 2015

The timing is what I was referring to

Scali · Jun 25, 2015

Trixter said:
With most demos that abuse systems like MPH does, the song format is custom to the demo. However, you're in luck; we designed the chiptune player to accept mod files so that it made the musician's life easier. Here's the file we used for the demo: ftp://ftp.oldskool.org/pub/misc/temp/test1.mod

I'd like to point out that it's not a 'standard' MOD, as in, it is not fully ProTracker-compatible.
The custom playback routine supports a wider range of notes than ProTracker. So, if your player sounds a bit weird, skips notes or whatever, try another one. I've tried it in Cubic Player, and that worked fine.

Scali · Jun 25, 2015

njroadfan said:
I eagerly await the final release version.

Yes, perhaps I can give a small progress report on that.
Initially we just wanted to fix a few minor glitches that bothered us (but might not even be noticed by others until pointed out).
Then we ran into the problem with the HD6845 chips on some CGA cards, so we wanted to make a workaround for that as well.
And then we were alerted that DOS 2.x does not run the demo, so we worked around that...

All in all, the final version is becoming a lot more than a quick fix.
We are currently working on an alternative set of graphics resources to make the demo look correctly on new style CGA as well. New-style CGA has slightly different artifact colours, which threw off the 256/1024-colour parts as well as the shading on the polygons. So some of the graphics basically had to be re-done completely, and some of the code had to be adapted to support multiple sets of colours.

This is something I personally love, since my 5160 has a new style CGA card. I have not been able to fully enjoy the demo on my own machine at home.
Also, together with the HD6845-fix, the demo should be fully compatible with ALL IBM PCs with 8088 and CGA. I do not know what the ratio is between old-style and new-style CGA cards, but I'm quite sure this will improve compatibility a lot.

dr.zeissler · Jun 25, 2015

Scali said:
...Also, together with the HD6845-fix, the demo should be fully compatible with ALL IBM PCs with 8088 and CGA...

Thx for the upgrade, this means:

- EuroPC 8088 CGA(RGB) => YES
- EuroPC VC20 CGA(RGB) => NO
- Tandy 1000RL/HD => NO (because TGA and 8086)

Is that right?

Scali · Jun 25, 2015

dr.zeissler said:
Thx for the upgrade, this means:

- EuroPC 8088 CGA(RGB) => YES
- EuroPC VC20 CGA(RGB) => NO
- Tandy 1000RL/HD => NO (because TGA and 8086)

Is that right?

Well, that depends.
TGA seems to be compatible enough with CGA to run the demo: https://youtu.be/rAUM89Xu7jo
Kefrens bars part does not work though, because the CPU speed is wrong. End music also plays at the wrong speed.

I don't think the EuroPC will run the demo correctly, because the chipset and the CGA clone are not fully compatible with IBM. I believe it uses the same Faraday FE2010 chipset and Paradise PVC4 as my PC10-III uses. It locks up on the end part, even when I use a real IBM CGA card.

1ST1 · Jun 25, 2015

I am already looking forward to test the final version on my Olivetti M24, M19 and ETV260 ...

The M24 might be too fast as it has 8086 at 8 Mhz, but let's see what happens...

Trixter's latest magic... Holy how-in-the-hell!!!???

Veteran Member

Veteran Member

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Member

Veteran Member

Veteran Member

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Experienced Member

Veteran Member

Veteran Member