MagiDuck, a DOS / CGA text mode game project

mangis · May 21, 2014

Hi!

I thought I'd share this little project I've been working on in QuickBasic 7.1 PDS and Assembly language.

It's a simple platform game running in 40x25 CGA text mode, tweaked to show 50 rows of half height glyphs. Combined with using the ASCII 222 it gives the game a resolution of 80x50 pixels. I chose this resolution to avoid CGA snow completely. Works in DosBox-X anyway

The engine supports "smooth" scrolling using two 4K off screen buffers. One for drawing changed tile areas at the current scroll offset and another one a blitted copy of that with sprites drawn over it every frame. The sprite animation routine supports multi-part sprites.

Currently most of the graphics routines are made in Assembly and I'm getting framerates of 19-22 in DosBox @ 270 cycles. I'm fairly pleased with that, considering my humble programming experience.

My goal is to have this running on an 4.77mhz 8088 / 256k RAM system with framerates above 15 and have a decent game to play too. I guess I was hoping to get some insights from here, to find out if this is actually possible by showing some stuff. This forum has been a big help already though, especially Trixter's and Deathshadows's stuff

I'll make little updates along the way to this thread and my blog: http://bluepandion.tumblr.com/

And here's my Assembly scrolling routine to show some actual code too, sorry if this is too horrible to look at:

Code:

;---------------------------------------------------------------------------
;Textmode Tile buffer scroll routine for 40x50 mode
;
;Version 7.2
;
;
;---------------------------------------------------------------------------

	push bp				
	mov bp, sp				;Get stack pointer

	push ds
	push si
	push es
	push di
	
;----------------------
; Parameter stack offsets
; Order is inverted from qbasic CALL ABSOLUTE parameter order

;00 bp
;02 ds
;04 si
;06 0a es
;08 0c di

;10 Qbasic return segment
;12 Qbasic return offset

;06 Tile buffer offset
;08 Tile buffer segment
;10 Screen Buffer offset
;12 Screen Buffer segment
;14 Tile Buffer scroll offset

;-----------------------------------------------------------------
	
	mov ax, [bp + 14]		;AX = Scroll offset
	mov bx, 4096			;
	sub bx, ax				;BX = 4000 - Scroll offset
	
;---------------------------------------------------------------
							;BLIT Tile buffer to screen buffer
							
;Screen buffer is written linearly from 0 to 1999

;But SI (Read Offset) first goes from 2000-offset to 2000 ...

	mov ds, [bp + 8]		;Change read offset to Tile buffer
	mov si, [bp + 6]
		
	mov es, [bp + 12]		;Change write offset to Screen buffer
	mov di, [bp + 10]
	
	add si, bx
	
	mov cx, ax				;Loop counter, AX = Scroll offset
	shr cx, 1				;CX / 2
	rep	movsw				;Blit

;... And then from 0 to 2000-offset

	mov ds, [bp + 8]		;Change read offset to Tile buffer
	mov si, [bp + 6]
	
	mov cx, bx				;Loop counter, BX (1999 - scroll offset)
	shr cx, 1				;CX / 2
	rep	movsw				;Blit

;---------------------------------------------------------------
exit:
	pop di
	pop es
	pop si
	pop ds
	
	pop bp					;Return stack pointer
	
	retf 8

I'll post some more routines later if anyone's interested, maybe it's better not to bloat this post with too much stuff.
Cheers!

reenigne · May 22, 2014

Looks awesome so far!

If I understand correctly, for each frame you need to do:
* Modify changed tiles in first buffer
* Copy first buffer to second buffer
* Draw sprites on second buffer
* Copy second buffer to screen

Have you thought of doing hardware scrolling instead, by modifying the CGA start address registers? That gives you the ability to put your framebuffer anywhere in CGA RAM with a resolution of one character (i.e. two pixels). It's a bit more fiddly but a lot faster - I think you'd be able to get up to 60fps on a 4.77MHz 8088 with that method, and have the backgrounds as complicated as you like without any slowdown.

mangis · May 22, 2014

reenigne said:
Looks awesome so far!

If I understand correctly, for each frame you need to do:
* Modify changed tiles in first buffer
* Copy first buffer to second buffer
* Draw sprites on second buffer
* Copy second buffer to screen

Have you thought of doing hardware scrolling instead, by modifying the CGA start address registers? That gives you the ability to put your framebuffer anywhere in CGA RAM with a resolution of one character (i.e. two pixels). It's a bit more fiddly but a lot faster - I think you'd be able to get up to 60fps on a 4.77MHz 8088 with that method, and have the backgrounds as complicated as you like without any slowdown.

Thanks!

Yes, that's excactly how the buffering works currently.
I actually have a version of the engine with CGA hardware scrolling too. It was very much faster indeed, but it had some issues too:
- Just drawing the changed tile areas, there was flicker every time at the part of the screen that changed. Vertical scrolling is easy, but horizontal scrolling always ends up drawing on visible areas of the screen.
- When drawing sprites, things get more complicated. You need to copy sprite backgrounds to memory from video memory, which is considerably slower than RAM.
- Clearing previous sprites and drawing new ones will, again cause more flicker...

After some fighting, I made a dirty rectangle replacement of a system. That divided the screen into 64 zones, 5x6 pixels each. Sprites and tile drawing routines marked those zones for copying to screen from a 4K buffer. So the updating the screen had these stages:
* Modify changed tiles in first buffer. Flag changed zones.
* Copy all flagged zones to second buffer.
* Draw sprites to second buffer. Flag zones that the sprites occupy. (Use flag value 2 so these zones will be cleared twice, in case the sprites move)
* Copy all flagged zones from second buffer to screen.
* Decrease all zone flags if > 0.

This did work, but ended up halving the frame rate and the scrolling just looked really jittery because the CGA scroll offset ended up lagging just abit every time... Maybe it was just too complicated compared to a 2K REP MOVSW. Much of the logic was inside Qbasic, which may have slowed it even further.

I actually tried to study how Commander Keen 4 CGA version does its scrolling by dropping DosBox cycles to 50 while playing... Keen seems to copy the whole 16K of a screen from a buffer every frame, you can see the screen appear line by line when scrolling. The game runs really well at 300 cycles too. That's why I'm thinking block copying might be the best way to go when having to deal with sprites too.

reenigne · May 22, 2014

Yeah, it's definitely trickier to get flicker-free updates without a second buffer - you have to pay attention to where the raster beam is and write your drawing code so that it either always stays ahead of the beam or always stays behind it (but still finishes before the raster beam starts the next frame). That probably means doing all the screen updates in a single top-to-bottom pass.

But even if you don't use CGA hardware scrolling directly, you might still be able to speed up your code considerably by using some variations on that technique. Instead of modifying tiles in the first buffer and then blitting to the second buffer, copy from the first buffer to the second buffer with an offset start address (i.e. pretend you're "hardware scrolling" the first buffer) so that you only need to update the edges. You'll probably need to do the copy in two chunks so you can use a circular buffer. The second buffer could be in CGA memory (there's enough space for 4 pages in this mode) so that after you've blitted the background to it and drawn the sprites on top you can just change the start address to flip between the pages.

Krille · May 23, 2014

mangis said:
I thought I'd share this little project I've been working on in QuickBasic 7.1 PDS and Assembly language.

I agree with reenigne, this looks awesome! I hope you don't mind a few suggestions for improvement?

Code:

; Order is inverted from qbasic CALL ABSOLUTE parameter order

Are you calling the ASM procedures using CALL ABSOLUTE? This can be a bit kludgy because it requires you to store the machine code somewhere (DATA statements are often used) and then read it in and you also need to do DEF SEG before CALL ABSOLUTE. All this overhead can be avoided. Let me know if you need help with this.

This code;

Code:

	mov ds, [bp + 8]		;Change read offset to Tile buffer
	mov si, [bp + 6]

can be replaced with this (shorter and more efficient);

Code:

	lds si, [bp + 6]

Likewise, this;

Code:

	mov es, [bp + 12]		;Change write offset to Screen buffer
	mov di, [bp + 10]

is more efficiently done like this;

Code:

	les di, [bp + 10]

and so on.

Shorter instructions are almost always faster on 8088 so instead of this (2 bytes);

Code:

	mov cx, ax				;Loop counter, AX = Scroll offset

you can do it like this (1 byte);

Code:

	xchg cx, ax

Of course this is assuming you don't need to preserve AX (which you don't in this particular case).

Finally, if I remember correctly, you don't need to preserve ES but I'm not 100% sure on this (maybe someone else can confirm).

Trixter · May 23, 2014

mangis said:
I actually tried to study how Commander Keen 4 CGA version does its scrolling by dropping DosBox cycles to 50 while playing... Keen seems to copy the whole 16K of a screen from a buffer every frame, you can see the screen appear line by line when scrolling. The game runs really well at 300 cycles too. That's why I'm thinking block copying might be the best way to go when having to deal with sprites too.

Good catch on the great engine in Keen 4. It is copying entire 16K to the screen, but the engine achieves it's (relatively) high speed because it is doing almost nothing inbetween copies. The engine maintains a larger buffer for the playfield that is larger than the screen, and it copies only the visible 16K portion on every update. When the playfield is shown scrolling to the right, what is actually happening is that the visible 16k "window" is scrolling to the left. Only the left edge of the playfield is drawn... then the sprites are animated by replacing the background behind them and redrawing them, then the visible 16K portion is copied, and the cycle repeats.

In fact, this is what Andrew was referring to:

reenigne said:
Instead of modifying tiles in the first buffer and then blitting to the second buffer, copy from the first buffer to the second buffer with an offset start address (i.e. pretend you're "hardware scrolling" the first buffer) so that you only need to update the edges.

In other words, work "smarter" not "harder"

Only draw/redraw exactly what you have to.

mangis · May 24, 2014

reenigne said:
But even if you don't use CGA hardware scrolling directly, you might still be able to speed up your code considerably by using some variations on that technique. Instead of modifying tiles in the first buffer and then blitting to the second buffer, copy from the first buffer to the second buffer with an offset start address (i.e. pretend you're "hardware scrolling" the first buffer) so that you only need to update the edges. You'll probably need to do the copy in two chunks so you can use a circular buffer.

My first buffer already is circular / wrapping. The first routine I posted copies the buffer in two chunks just as you explained, if I understood correctly. All the blue areas you see in the game are actually tiles just like everything else, so there is no limit in complexity currently. The solid blues and simple colours are just easier on the eyes IMHO.

The tile routine also only draws as much as is needed. In most cases either a 40x2 or a 2x48 byte area. To accommodate strange (less than tile width/height) drawing sizes the routine is abit complicated though:

Code:

;============================================================================
;
; Tile area drawing routine     v. 7.02
;
; 40x50 mode drawing. 2 Pixels per byte.
;
; Draws an area from Tile Map to Tile Buffer, using Tile Bank graphics.
;
;============================================================================

; Parameter stack offsets
; Order is inverted from qbasic CALL ABSOLUTE parameter order

;00 bp
;02 Qbasic return segment
;04 Qbasic return offset

;06 tileBank offset
;08 tileMap offset
;10 tileBuffer offset
;12 tileBuffer Segment
;14 tile read y
;16 tile read x
;18 area Height
;20 area Width
;22 Write area offset
;24 Tilemap read offset

;============================================================================

push bp
mov bp,sp

;---------------------------------------------------------------------------

begin:

mov es, [bp + 12]               ;ES = Tilebuffer seg
mov di, [bp + 10]               ;DI = Tilebuffer ofs
add di, [bp + 22]               ;DI + Write area offset
inc di                          ;DI + 1, for attribute cell

add [bp + 10], 4096             ;[BP + 10] = Tilebuffer Wraparound

mov ds, [bp + 12]               ;DS = Tilebuffer seg
mov si, [bp + 08]               ;SI = Tilemap ofs
add si, [bp + 24]               ;SI + Tilemap Read offset

mov bh, [bp + 18]               ;BH = Height

mov dl, [bp + 16]               ;DL = Tile read X
mov dh, [bp + 14]               ;DH = Tile read Y


;============================================================================

mov [bp + 14], si               ;[BP + 14] = SI


loopy:  ;....................................................................

mov cx, [bp + 20]               ;CX = Width

mov w[bp + 24], 0               ;[BP + 24] = Tile map read carriage return value.

CMP di, [bp + 10]               
JL nowrap1                      ;IF DI > (Tilebuffer ofs + 3999) THEN
    sub di, 4096                    ;DI - 4000
nowrap1:                        ;END IF


    mov ax, 0
    mov al, ds:[si]
                                ;Get Tile Bank Pointer
    mov si, ax                      
    shl si, 5                       ;SI = Tile index * 32
    mov ah, 0                       ;Tile pixel read offset =
    mov al, dl                      ; Tileread X
    shl dh, 2                       ; + (
    add al, dh                      ;    Tileread Y * 4 )
    add si, ax                      ;SI + Tile pixel read offset
    add si, [bp + 06]               ;   + Tile Bank offset
    shr dh, 2                       ;Tileread Y / 4

loopx:  ;....................................................................

CMP dl, 4                       
JNZ noXmapInc                   ;IF Tileread X > 3 THEN
    mov dl, 0                       ;Tileread X = 0
    inc w[bp + 24]                  ;[BP + 24] (Carriage return value)
    
    mov si, [bp + 14]               ;SI = Tilemap read offset [BP + 14]
    inc si                          ;SI + 1
    
    xor ax, ax                      ;AX = 0
    mov al, ds:[si]                 ;AL = Tile Index
    
    mov [bp + 14], si               ;Store SI back to [BP + 14]
    
                                    ;Get Tile Bank Pointer
    mov si, ax                      
    shl si, 5                       ;SI = Tile index * 32
    mov ah, 0                       ;Tile pixel read offset =
    mov al, dl                      ; Tileread X
    shl dh, 2                       ; + (
    add al, dh                      ;    Tileread Y * 4 )
    add si, ax                      ;SI + Tile pixel read offset
    add si, [bp + 06]               ;   + Tile Bank offset
    shr dh, 2                       ;Tileread Y / 4
noXmapInc:                      ;END IF

movsb
inc di
inc dl                          ;Tileread X + 1

CMP di, [bp + 10]               
JL nowrap                       ;IF DI > (Tilebuffer ofs + 3999) THEN
    sub di, 4096                    ;DI - 4000
nowrap:                         ;END IF

LOOP loopx ;^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

mov dl, [bp + 16]               ;Tileread X = [BP + 16] (Starting value)

add di, 80                      ;DI + 80
mov ax, [bp + 20]               ;AX = Width
shl ax, 1                       ;AX * 2
sub di, ax                      ;DI - Width * 2

mov si, [bp + 14]               ;SI = Tilemap read offset
sub si, [bp + 24]               ;SI - Tilemap Carriage return value
mov [bp + 14], si

inc dh                          ;Tileread Y + 1
CMP dh, 8
JNZ noYmapInc                   ;IF Tileread Y > 7 THEN
    mov dh, 0                       ;Tileread Y = 0
        
    mov si, [bp + 14]               ;SI = Tilemap read offset [BP + 14]
    add si, 20                      ;SI + 20
        
    xor ax, ax                      ;AX = 0
    mov al, ds:[si]                 ;AH = Tile Index
    
    mov [bp + 14], si               ;Store SI back to [BP + 14]
    
noYmapInc:                      ;END IF

dec bh                          ;Height - 1
CMP bh, 0
JZ exit                         ;IF Height = 0 THEN EXIT
JMP loopy  ;^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

;============================================================================
exit:
pop bp
retf 20

reenigne said:
The second buffer could be in CGA memory (there's enough space for 4 pages in this mode) so that after you've blitted the background to it and drawn the sprites on top you can just change the start address to flip between the pages.

I tried your idea last night and it worked. I removed the second buffer and replaced it with video memory addresses. This was a really simple fix to make and the game runs about 3 fps faster now and 4K less memory used!

Here's a video comparing it to the old routine:

This certainly got me thinking if there's still more to be done with video memory...

Krille said:
I hope you don't mind a few suggestions for improvement?

Are you calling the ASM procedures using CALL ABSOLUTE? This can be a bit kludgy because it requires you to store the machine code somewhere (DATA statements are often used) and then read it in and you also need to do DEF SEG before CALL ABSOLUTE. All this overhead can be avoided. Let me know if you need help with this.

Thanks, I certainly don't mind improvements and these were great tips! I'll update all my routines to use these and will study some more on the subject.

Do you mean avoiding CALL ABSOLUTE by making a quick library with LINK? Wow, I forgot all about that... I was just so happy to get anything in Assembly working at all. But yeah, this would be a great improvement and I have some tutorials in store for that so I might get it to work. At the moment, the Assembly routines are loaded from .COM-files into strings.

I'm using A86 as my assembler. I'm a bit worried about using strange stuff like .PROC because they've caused me errors in many compliers I've tried. But this should be worth the trouble.

Trixter said:
It is copying entire 16K to the screen, but the engine achieves it's (relatively) high speed because it is doing almost nothing inbetween copies. The engine maintains a larger buffer for the playfield that is larger than the screen, and it copies only the visible 16K portion on every update. When the playfield is shown scrolling to the right, what is actually happening is that the visible 16k "window" is scrolling to the left. Only the left edge of the playfield is drawn... then the sprites are animated by replacing the background behind them and redrawing them, then the visible 16K portion is copied, and the cycle repeats.

In fact, this is what Andrew was referring to:

In other words, work "smarter" not "harder" Only draw/redraw exactly what you have to.

Ah, I see. I'm surprised to hear it has a buffer larger than the screen, that would mean it can only copy 80 byte lines with REP and have an additional loop for rows. Maybe that's doesn't slow it all that much then.

Can I ask how you know so much about Keen 4 code, since the source hasn't been released yet? I'm just really glad you explained this, because I've been trying to understand the engine from brief and confusing explanations in "Masters of Doom" and some Wikipedia articles.

I think my engine works in a similar way since only one pixel wide tile areas are usually drawn at the screen edges when scrolling. Copying the scroll buffer to video memory also clears old sprites at the same cost. But yeah, I'm sure there's still room for improvement. Currently the biggest CPU-hogs seem to be object behavior and sprite drawing, instead of scrolling.

offensive_Jerk · May 24, 2014

That game looks pretty cool. Seems to have a lot of potential.

Great Hierophant · May 24, 2014

I have nothing to add on the programming front, but your graphics tiles are very impressive looking. Good luck with your game, I hope to play it someday on real CGA.

offensive_Jerk · May 24, 2014

Will this run on an 8088?

Trixter · May 24, 2014

mangis said:
Can I ask how you know so much about Keen 4 code, since the source hasn't been released yet?

John Romero confirmed my suspicions. But even if he hadn't, all you have to do to work it out is watch Keen 4 run on the original hardware and it becomes obvious after a minute of thinking hard about the problem.

Not to blow your mind further, but there's actually four larger-than-screen buffers, one for each pixel offset in a byte, due to the pixel packing of 4-color cga graphics. He updates to and copies from the appropriate buffer depending on where the "window" is positioned.

I think my engine works in a similar way since only one pixel wide tile areas are usually drawn at the screen edges when scrolling. Copying the scroll buffer to video memory also clears old sprites at the same cost. But yeah, I'm sure there's still room for improvement. Currently the biggest CPU-hogs seem to be object behavior and sprite drawing, instead of scrolling.

Still, you have the advantage of having extra video memory; 320x200x4 doesn't. You can tell CGA where to start drawing 40x25 at any word position in video memory, so it would be really easy for you to do the same thing -- slide a "hardware window" around your playfield and only update what is necessary. Assuming your tile/sprite draw/erase code isn't super-slow, you'll get 60fps that way.

Krille · May 24, 2014

mangis said:
Do you mean avoiding CALL ABSOLUTE by making a quick library with LINK?

I've never actually done a quick library but you probably will want to do that to be able to use the assembly code in the QBX environment. But yes, what I referred to is essentially the same thing, I was just talking about making the finished executable;

1.) Assemble your ASM source file to an object file. I use MASM 6.11d but you might be able to use A86 (I've never used A86 so I can't help with that). If you use MASM then do "ml /c" to prevent it from trying to call LINK.EXE. You can use this as a template for doing it in MASM;

Code:

.MODEL MEDIUM, BASIC

.CODE

SomeProc1	PROC
	; Code goes here
SomeProc1	ENDP

SomeProc2	PROC
	; Code goes here
SomeProc2	ENDP

		END

2.) Add DECLARE FUNCTION/SUB statements to your BASIC source file for each ASM procedure you're going to use. See "DECLARE Statement (Non-BASIC Procedures)" in the QBX help for more details. You can then call the procedures just like anything else in BASIC (with CALL, CALLS or just the name).

3.) Compile the BASIC source to an object file. This is done manually with BC.EXE (see BC /? and the QBX help "BC Command Line" for details).

4.) Link all the object files together into an executable file. This is where it gets tricky because for some reason which I can't recall now (it's been years since I did this), the linker included with QB7.1 doesn't work and you need a newer version (the linker included with VB for DOS 1.0 does work though). Alternatively, you might be able to use this version (I haven't tried this though): http://download.microsoft.com/download/vc15/Update/1/WIN98/EN-US/Lnk563.exe

That's all there is to it really. The hardest part is figuring out the correct options for BC and LINK but the help is pretty good in this regard. As an additional bonus you will be able to trim off a lot of the bloat that is included by default when compiling from within the QBX IDE. See PACKING.LST in the \BC7\SRC folder for more info on this.

Now some more advice regarding the assembly code;

Code:

    shl si, 5                       ;SI = Tile index * 32
    shl dh, 2                       ; + (
    shr dh, 2                       ;Tileread Y / 4

These instructions requires a 186 or higher and is not valid for an 8088. I guess you've forgotten to add an assembler directive to tell A86 which CPU the code is for? In MASM the target CPU is 8088/8086 unless you tell it otherwise.

Code:

CMP di, [bp + 10]
JL nowrap                       ;IF DI > (Tilebuffer ofs + 3999) THEN
    sub di, 4096                    ;DI - 4000
nowrap:                         ;END IF

It's very important to know which conditional jumps (jcc:s) are signed and which are unsigned because they are not interchangeable. In the above code you are using a signed jcc (JL) after comparing DI with an offset (and the offset in this context is basically just an unsigned variable). This is a very common mistake and thus a common reason for bugs. So remember that jcc:s with L or G (Less or Greater) in the mnemonic are signed and B or A (Below or Above) are unsigned.

Trixter · May 25, 2014

Krille said:
Now some more advice regarding the assembly code;

Code:

shl si, 5 ;SI = Tile index * 32 shl dh, 2 ; + ( shr dh, 2 ;Tileread Y / 4

These instructions requires a 186 or higher and is not valid for an 8088. I guess you've forgotten to add an assembler directive to tell A86 which CPU the code is for?

a86 intercepts this and recodes as necessary for the architecture you specify. So you can state SHR AX,5 in the code and you'll get SHR AX,1 five times in the output.

Unfortunately it doesn't go very far; for example, it would be great to have PUSHA/POPA/ENTER/LEAVE translated to their 808x equivalents, but a86 won't do that.

deathshadow · May 25, 2014

Cool to see I'm not the only one playing with 40x50 and/or 80x50 graphics -- I'm currently experimenting with re-writing my pac-man ripoff to 80x50 for MDA cards, so I'm in a similar sad state of affairs.

You seem to be keeping your buffer and your output in the same format -- That's something that bit me on frame rate early on with my own stuff. If you kept your sprites and backgrounds as just the color/pixel data, it can actually speed things up enough that the penalty of skipping bytes when writing to the display is a non-issue.

assuming ds:si is your backbuffer and es:di is $B800:0000 video buffer:

@loop:
movsb
inc di
loop @loop

Can actually be faster overall despite the slower byte-skipping copy, because your back-buffer operations don't have extra unchanging bytes in them, and you aren't writing as much to video memory.

You are using a back-buffer instead of writing sprites and tiles directly to the display, right?

Krille · May 25, 2014

Trixter said:
a86 intercepts this and recodes as necessary for the architecture you specify. So you can state SHR AX,5 in the code and you'll get SHR AX,1 five times in the output.

Aha, that explains it.

deathshadow · May 25, 2014

You know, looking at that shift by five... have you considered using xchg instead? (much less the 1 byte LODSB?) -- would be WAY faster on a real 8088.

Code:

xor  ah, ah ; 1 byte less than mov, saves ~5 clocks?
lodsb       ; 1 byte opcode, so faster
xchg ah, al ; 1 byte opcode, so faster
shr  ax, 1  ; two less shifts 8088
shr  ax, 1  ; saving 4 bytes and ~20 clocks?
shr  ax, 1   
mov  si, ax

XOR instead of mov since it's one less byte and one less clock than mov reg, imm8
LODSB as it's a 1 byte opcode, and it's not like you're preserving SI
XCHG AH, AL is a quick equivalent of shl ax, 8
SHR instead of SHL, three shifts instead of five, so 4 less bytes to fetch saving 16 clocks, in addition to the 8 clocks of shift execution time.

Just a thought...

mangis · May 25, 2014

offensive_Jerk said:
That game looks pretty cool. Seems to have a lot of potential.
Will this run on an 8088?

Cheers! I don't know yet, probably not since just a couple of the routines had so many improvement suggestions here. I don't own an 8088 machine so I can't really test it myself, but will do my best to make it work.

Great Hierophant said:
I have nothing to add on the programming front, but your graphics tiles are very impressive looking. Good luck with your game, I hope to play it someday on real CGA.

Thanks! I'm hoping to get back to doing graphics and content as soon as possible.

Trixter said:
John Romero confirmed my suspicions. Not to blow your mind further, but there's actually four larger-than-screen buffers, one for each pixel offset in a byte, due to the pixel packing of 4-color cga graphics. He updates to and copies from the appropriate buffer depending on where the "window" is positioned.

That really is mindblowing! Holy crap... What a complicated but brilliant way to do pixel perfect scrolling. It would've been funny to visit QuakeCon and ask about this from Carmack on his keynote, but maybe it's better this way...

Trixter said:
Still, you have the advantage of having extra video memory; 320x200x4 doesn't. You can tell CGA where to start drawing 40x25 at any word position in video memory, so it would be really easy for you to do the same thing -- slide a "hardware window" around your playfield and only update what is necessary. Assuming your tile/sprite draw/erase code isn't super-slow, you'll get 60fps that way.

Judging from my earlier tests, the routines do seem too slow for this. But this gave me an idea. Maybe I could have two "hardware windows". Both would have an 8K area for themselves and they could be offset in tandem with scrolling.

The windows would be video pages at the same time so one could be drawn, while the other was visible. This would mean drawing twice the amount of changing tile area per window, but it shouldn't make much difference. I think I'll try this but it's gonna take a few days.

Krille said:
I've never actually done a quick library but you probably will want to do that to be able to use the assembly code in the QBX environment. But yes, what I referred to is essentially the same thing, I was just talking about making the finished executable;

Thank you so much for the examples and corrections! I gotta get this stuff working, this would make everything so much easier.

deathshadow said:
Cool to see I'm not the only one playing with 40x50 and/or 80x50 graphics -- I'm currently experimenting with re-writing my pac-man ripoff to 80x50 for MDA cards, so I'm in a similar sad state of affairs.

You seem to be keeping your buffer and your output in the same format -- That's something that bit me on frame rate early on with my own stuff. If you kept your sprites and backgrounds as just the color/pixel data, it can actually speed things up enough that the penalty of skipping bytes when writing to the display is a non-issue.

You are using a back-buffer instead of writing sprites and tiles directly to the display, right?

Hahah, yeah I guess I can feel the pain too

But that sounds awesome! Paku Paku is such a great game and your source code was alot fun to look through. Seems like I owe you a postcard though.

Will it be a scrolling version or are you going to crunch the graphics to that tiny resolution?

Good to know about the buffers. I'll try attribute-only buffers if my hardware scrolling experiment fails. Yeah, tiles are drawn in a buffer, but after Reengine's suggestion the sprites are now drawn to video memory.

I'll add your shift-improvement to my code as soon as possible. Looks great, but seems like I have a lot to fix so this might take a while...

VileR · May 26, 2014

Looking great so far! Old PCs need more love in the gaming department - it seems like even now, there's new stuff all the time on the C64, Atari, etc. front, so things like this are always cool to see.
(And of course, CGA trickery = I'm all over it.)

deathshadow · May 26, 2014

mangis said:
Will it be a scrolling version or are you going to crunch the graphics to that tiny resolution?

I'm actually squeezing it into 80x50. The map gets some minor changes (minus two 'tiles' height, plus four tiles width) to keep the pellet count close to the original. I'm also trying to squeeze it down to fit into 48k or less so a 64k 5150 could theoretically run it on DOS 2.0

The MDA never got a lot of lovin' from game developers, so I thought it might be nice to give them something with a bit more meat to it; it can quite easily do 80x50, and that's spitting distance from the 128x48 so many Tandy games did using their semigraphics. If Cornsoft could do it with Scarfman...

Even better though, the MDA does support two intensities, and that is making it look way better than I was expecting.

Screencaps
menu : http://www.cutcodedown.com/images/monoPaku.png
playfield : http://www.cutcodedown.com/images/monoReady.png

Still very rough around the edges, and on hold as I finish another project (my JavaScript library) -- but it is coming sometime this year. My test platform is tons of fun since I don't have a MDA equipped 5150, but I do have a Hercules in-color card that miracle of miracles will co-exist with my Tandy 1000SX's internal cga, so I've got tandy CGA and hercules color on a EGA monitor working side-by-side. Great since allegedly the only CGA that will co-exist with an in-color is an actual Hercules CGA -- put that to the lie.

Great Hierophant · May 27, 2014

deathshadow said:
I'm actually squeezing it into 80x50. The map gets some minor changes (minus two 'tiles' height, plus four tiles width) to keep the pellet count close to the original. I'm also trying to squeeze it down to fit into 48k or less so a 64k 5150 could theoretically run it on DOS 2.0

The MDA never got a lot of lovin' from game developers, so I thought it might be nice to give them something with a bit more meat to it; it can quite easily do 80x50, and that's spitting distance from the 128x48 so many Tandy games did using their semigraphics. If Cornsoft could do it with Scarfman...

Even better though, the MDA does support two intensities, and that is making it look way better than I was expecting.

Screencaps
menu : http://www.cutcodedown.com/images/monoPaku.png
playfield : http://www.cutcodedown.com/images/monoReady.png

Still very rough around the edges, and on hold as I finish another project (my JavaScript library) -- but it is coming sometime this year. My test platform is tons of fun since I don't have a MDA equipped 5150, but I do have a Hercules in-color card that miracle of miracles will co-exist with my Tandy 1000SX's internal cga, so I've got tandy CGA and hercules color on a EGA monitor working side-by-side. Great since allegedly the only CGA that will co-exist with an in-color is an actual Hercules CGA -- put that to the lie.

Not to reign in on the OP's parade, but the 5151 brings an extra "ghost" to the table. Look forward to trying that too.

MagiDuck, a DOS / CGA text mode game project

Experienced Member

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Veteran Member