deathshadow
Veteran Member
- Joined
- Jan 4, 2011
- Messages
- 1,378
Thought I'd share it... some of it has a rather oddball approach to handling things. It's all from an old menu system I wrote back in the late '80's for a friend -- I don't have all the code, but was thinking I might resurrect it for nostalgia sake; especially since there's not a whole lot of 'hercules specific' programs out there.
In any case, here's some of the fun stuff I found -- odd part is for old TASM and TP code i'm a bit surprised there's little if any optimizations I'd make after all this time. First up is the code to set the herc video mode manually since there's no BIOS routine for it (and I don't like relying on TSR's)
I remember back in the day I made the mode set a standalone .com file (68 bytes apparently) because memory in DOS was at such a premium every byte counted. Since setting the video mode didn't need to sit in RAM all the time, don't let it sit there all the time. More unusual is that I apparently was loading the CRTC from the top-down so as to avoid doing a CMP ah,12 or using CX. I'm not sure if that's faster... or less code... Seems to work on real hardware and in DOSBOX though -- gonna have to play with that.
It's funny because since I found this I was looking at other people's routines for offset calculations -- I've seen people use lookup tables (way too big IMHO at 696 bytes) and then there's the "Shift left 13" approach... Mine, well...
The normal approach breaks down to:
((y&3)<<13)+(y>>2)*90+(x>>3)
I do the X shift first in DI, then I get a little funky. Instead of (y&3)<<13 I apparently chose to rotate it the opposite direction two bits, save a copy, 0xC000, then shift it right one more time.... add it to DI. to do the *90 I got VERY convoluted doing a total of 6 shifts adding them together several times... 2+8+16+64 = 90. I've got to look deeper at that as all that code may not actually be faster than a mul. (particularly on a 286 where it's only 21 clocks!)
The plot routine ended up pretty simple too:
Seems pretty fast overall -- it's kinda strange to look at code I wrote some two decades ago though and see a lot of the tricks I've flat out forgotten in this day and age.
In any case, here's some of the fun stuff I found -- odd part is for old TASM and TP code i'm a bit surprised there's little if any optimizations I'd make after all this time. First up is the code to set the herc video mode manually since there's no BIOS routine for it (and I don't like relying on TSR's)
Code:
MODEL TINY
DATASEG
grModeData DB 00h,00h,03h,02h,57h,57h,02h,5bh,07h,2Eh,2Dh,35h
CODESEG
STARTUPCODE
mov dx,03BFh
mov al,03h
out dx,al
mov dx,03B8h
mov al,02h
out dx,al
mov dx,03B4h
mov si,OFFSET grModeData
mov ah,11
@crtcLoop:
mov al,ah
out dx,al
inc dx
lodsb
out dx,al
dec dx
dec ah
jns @crtcLoop
mov ax,0B000h
mov es,ax
xor di,di
mov cx,4000h
xor ax,ax
rep stosw
mov dx,3B8h
mov al,0Ah
out dx,al
mov ax,4C00h
int 21h
END
I remember back in the day I made the mode set a standalone .com file (68 bytes apparently) because memory in DOS was at such a premium every byte counted. Since setting the video mode didn't need to sit in RAM all the time, don't let it sit there all the time. More unusual is that I apparently was loading the CRTC from the top-down so as to avoid doing a CMP ah,12 or using CX. I'm not sure if that's faster... or less code... Seems to work on real hardware and in DOSBOX though -- gonna have to play with that.
It's funny because since I found this I was looking at other people's routines for offset calculations -- I've seen people use lookup tables (way too big IMHO at 696 bytes) and then there's the "Shift left 13" approach... Mine, well...
Code:
{
CalcOffset
INPUT
AX = y
CX = x
OUTPUT
CL = x & $07
DI = Offset
ES = B000
CORRUPTS
AX,BX,CX,ES,DI
}
procedure calcOffset; assembler;
asm
mov di,cx
{$IFOPT G+}
shr di,3
ror ax,2
{$ELSE}
shr di,1
shr di,1
shr di,1
ror ax,1
ror ax,1
{$ENDIF}
mov bx,ax
and ax,$C000
shr ax,1
add di,ax
mov ax,bx
and ax,$3FFF
shl ax,1
add di,ax { 2 }
{$IFOPT G+}
shl ax,2
{$ELSE}
shl ax,1
shl ax,1
{$ENDIF}
add di,ax { 8 }
shl ax,1
add di,ax { 16 }
{$IFOPT G+}
shl ax,2
{$ELSE}
shl ax,1
shl ax,1
{$ENDIF}
add di,ax { 64 }
mov ax,$B000
mov es,ax
and cl,$07
end;
The normal approach breaks down to:
((y&3)<<13)+(y>>2)*90+(x>>3)
I do the X shift first in DI, then I get a little funky. Instead of (y&3)<<13 I apparently chose to rotate it the opposite direction two bits, save a copy, 0xC000, then shift it right one more time.... add it to DI. to do the *90 I got VERY convoluted doing a total of 6 shifts adding them together several times... 2+8+16+64 = 90. I've got to look deeper at that as all that code may not actually be faster than a mul. (particularly on a 286 where it's only 21 clocks!)
The plot routine ended up pretty simple too:
Code:
procedure plot(x,y:word; c:byte); assembler;
asm
mov ax,y
mov cx,x
call calcOffset
mov al,c
or al,al
jz @andVal
mov al,$80
shr al,cl
or es:[di],al
jmp @done
@andVal:
mov al,$7F
ror al,cl
and es:[di],al
@done:
end;
Seems pretty fast overall -- it's kinda strange to look at code I wrote some two decades ago though and see a lot of the tricks I've flat out forgotten in this day and age.
Last edited: