• Please review our updated Terms and Rules here
  • Exhibitor application for VCF West 2022 is now open! If you are interested in exhibiting, please fill out the form here.
  • Here are the results of the VCF East 2022 Post Event Survey: Survey Results

Came across some of my old Hercules code

deathshadow

Veteran Member
Joined
Jan 4, 2011
Messages
1,357
Thought I'd share it... some of it has a rather oddball approach to handling things. It's all from an old menu system I wrote back in the late '80's for a friend -- I don't have all the code, but was thinking I might resurrect it for nostalgia sake; especially since there's not a whole lot of 'hercules specific' programs out there.

In any case, here's some of the fun stuff I found -- odd part is for old TASM and TP code i'm a bit surprised there's little if any optimizations I'd make after all this time. First up is the code to set the herc video mode manually since there's no BIOS routine for it (and I don't like relying on TSR's)

Code:
MODEL TINY

DATASEG

grModeData     DB  00h,00h,03h,02h,57h,57h,02h,5bh,07h,2Eh,2Dh,35h

CODESEG
	STARTUPCODE
	mov  dx,03BFh
	mov  al,03h
	out  dx,al
	mov  dx,03B8h
	mov  al,02h
	out  dx,al
	mov  dx,03B4h
	mov  si,OFFSET grModeData
	mov  ah,11
@crtcLoop:
	mov  al,ah
	out  dx,al
	inc  dx
	lodsb
	out  dx,al
	dec  dx
	dec  ah
	jns  @crtcLoop
	mov  ax,0B000h
	mov  es,ax
	xor  di,di
	mov  cx,4000h
	xor  ax,ax
	rep  stosw
	mov  dx,3B8h
	mov  al,0Ah
	out  dx,al
	mov  ax,4C00h
	int  21h

END

I remember back in the day I made the mode set a standalone .com file (68 bytes apparently) because memory in DOS was at such a premium every byte counted. Since setting the video mode didn't need to sit in RAM all the time, don't let it sit there all the time. More unusual is that I apparently was loading the CRTC from the top-down so as to avoid doing a CMP ah,12 or using CX. I'm not sure if that's faster... or less code... Seems to work on real hardware and in DOSBOX though -- gonna have to play with that.

It's funny because since I found this I was looking at other people's routines for offset calculations -- I've seen people use lookup tables (way too big IMHO at 696 bytes) and then there's the "Shift left 13" approach... Mine, well...

Code:
{
	CalcOffset
	
	INPUT
		AX = y
		CX = x
	OUTPUT
		CL = x & $07
		DI = Offset
		ES = B000
	CORRUPTS
		AX,BX,CX,ES,DI
}
procedure calcOffset; assembler;
asm
	mov  di,cx
{$IFOPT G+}
	shr  di,3
	ror  ax,2
{$ELSE}
	shr  di,1
	shr  di,1
	shr  di,1
	ror  ax,1
	ror  ax,1
{$ENDIF}
	mov  bx,ax
	and  ax,$C000
	shr  ax,1
	add  di,ax
	mov  ax,bx
	and  ax,$3FFF
	shl  ax,1   
	add  di,ax  { 2 }
{$IFOPT G+}
	shl  ax,2
{$ELSE}
	shl  ax,1
	shl  ax,1
{$ENDIF}
	add  di,ax  { 8 }
	shl  ax,1
	add  di,ax  { 16 }
{$IFOPT G+}
	shl  ax,2
{$ELSE}
	shl  ax,1
	shl  ax,1
{$ENDIF}
	add  di,ax  { 64 }
	mov  ax,$B000
	mov  es,ax
	and  cl,$07
end;

The normal approach breaks down to:
((y&3)<<13)+(y>>2)*90+(x>>3)

I do the X shift first in DI, then I get a little funky. Instead of (y&3)<<13 I apparently chose to rotate it the opposite direction two bits, save a copy, 0xC000, then shift it right one more time.... add it to DI. to do the *90 I got VERY convoluted doing a total of 6 shifts adding them together several times... 2+8+16+64 = 90. I've got to look deeper at that as all that code may not actually be faster than a mul. (particularly on a 286 where it's only 21 clocks!)

The plot routine ended up pretty simple too:
Code:
procedure plot(x,y:word; c:byte); assembler;
asm
	mov  ax,y
	mov  cx,x
	call calcOffset
	mov  al,c
	or   al,al
	jz   @andVal
	mov  al,$80
	shr  al,cl
	or   es:[di],al
	jmp  @done
@andVal:
	mov  al,$7F
	ror  al,cl
	and  es:[di],al
@done:
end;

Seems pretty fast overall -- it's kinda strange to look at code I wrote some two decades ago though and see a lot of the tricks I've flat out forgotten in this day and age.
 
Last edited:

deathshadow

Veteran Member
Joined
Jan 4, 2011
Messages
1,357
Just played with that nested shifting vs. Mul -- 8088/8086 it's definitely faster to use the shifts... 286 and higher, not so much, so I've changed that calc routine to:

Code:
procedure calcOffset; assembler;
asm
	mov  di,cx
{$IFOPT G+}
	shr  di,3
	ror  ax,2
{$ELSE}
	shr  di,1
	shr  di,1
	shr  di,1
	ror  ax,1
	ror  ax,1
{$ENDIF}
	mov  bx,ax
	and  bx,$C000
	shr  bx,1
	add  di,bx
	and  ax,$3FFF
{$IFOPT G+}
	mov  bx,$005A
	mul  bx
{$ELSE}
	shl  ax,1
	add  di,ax
	shl  ax,1
	shl  ax,1
	add  di,ax
	shl  ax,1
	add  di,ax
	shl  ax,1
	shl  ax,1
{$ENDIF}
	add  di,ax
	mov  ax,$B000
	mov  es,ax
	and  cl,$07
end;

Much better on 286/higher, around 20% faster overall.
 

Krille

Veteran Member
Joined
Aug 14, 2010
Messages
981
Location
Sweden
This is what I do for fun on a friday night! :)

Code:
MODEL TINY

DATASEG

grModeData     DB  00h,00h,03h,02h,57h,57h,02h,5bh,07h,2Eh,2Dh,35h

CODESEG
	STARTUPCODE
	mov  dx,03BFh
	mov  al,03h
	out  dx,al
	mov  dl,0B8h
	dec  ax
	out  dx,al
	mov  dl,0B4h
	mov  cx,12
	mov  si,OFFSET grModeData
@crtcLoop:
	mov  al,cl
	dec  ax
	out  dx,al
	inc  dx
	lodsb
	out  dx,al
	dec  dx
	loop @crtcLoop
	mov  dl,0B8h
	mov  di,cx
	mov  ax,cx
	mov  ch,0B0h
	mov  es,cx
	mov  ch,40h
	rep  stosw
	mov  al,0Ah
	out  dx,al
	mov  ax,4C00h
	int  21h

END

I did the same with the loop counter in AH but the size was the same and this variant just feels faster. Not that it really matters...
 

deathshadow

Veteran Member
Joined
Jan 4, 2011
Messages
1,357
I did the same with the loop counter in AH but the size was the same and this variant just feels faster. Not that it really matters...
Inside the loop I just really hate the idea of decrementing the same number twice -- in THEORY your loop should be one clock per loop slower; LOOP is 17 clocks on loop, JNS is only 16 and it's the same number of register DEC's... BUT -- using CX and copying to AX does open the door to using XLAT instead of LODSB -- which shaves four clocks off the inside of the loop (in theory, again BIU is one hell of a game changer). I also think moving the lodsb (or xlat) right after the OUT might make it more likely to be able to fill the BIU.

I do like the setting just DL part though -- that shaves a number of bytes off it... but I'd move that last DL change after the REP STOSW given the sheer number of clocks free during that to fill the BIU. Both MOV would end up in it, and the LOOP would be fetched during the two MOV reg,imm8 (since those are 4 clocks a pop, same as a byte fetch)

Hmm... yeah, something like:
Code:
MODEL TINY

DATASEG

grModeData  DB  35h,2Dh,2Eh,07h,5Bh,02h,57h,57h,02h,03h,00h,00h

CODESEG
	STARTUPCODE
	mov  dx,03BFh
	mov  al,03h
	out  dx,al
	mov  dl,0B8h
	dec  ax
	out  dx,al
	mov  dl,0B4h
	mov  cx,000Ch
	mov  bx,OFFSET grModeData
@crtcLoop:
	mov  ax,cx
	dec  ax
	out  dx,al
	xlat
	inc  dx
	out  dx,al
	dec  dx
	loop @crtcLoop
	mov  ax,cx
	mov  di,cx
	mov  ch,0B0h
	mov  es,cx
	mov  ch,40h
	rep  stosw
	mov  dl,0B8h
	mov  al,0Ah
	out  dx,al
	mov  ax,4C00h
	int  21h
END

I like that. Especially since it puts the CRTC data back in order from low to high.
 
Last edited:
Top