• Please review our updated Terms and Rules here

Improving CGA Graphics performance

Khatoblepas

New Member
Joined
Sep 3, 2025
Messages
6
I'm currently writing a game in Turbo C using CGA graphics, and I've run into a really annoying problem with writing to the screen. I'm using a double buffer that I write to and then copy it to the screen to update it, but at my current skill level, I can't figure out how to only update the bits of the screen I've changed. I can do it in C, but then it's super slow and causes there to be giant holes in the screen where it isn't updated, and copying the entire buffer multiple times a second can't be right.

I'd like to implement some kind of dirty rectangle system, but I need some help modifying the asm that pushes the data to video ram.


Here's the current buffer copying code:
Code:
public        _cgaCopyBuffer

_cgaCopyBuffer proc far
ARG buffer: dword

    push     bp
    mov     bp,sp
    push     ds

    lds     si,[buffer]

    mov     ax,0b800h
    mov     es,ax
    xor     di,di

    mov     cx,2000h
    rep movsw

    pop        ds
    pop        bp
    
    ret
_cgaCopyBuffer endp

I'd like for it to only copy a set rectangle, so probably:
- set starting position to (Y*80+x)
- mov width/4 bytes
- skip ahead 80-width/4 bytes.
- repeat for 1/2 height.
- skip ahead 0x8000 bytes from (y*80+x).
- repeat the above for the other 1/2 height.
If it has to round up so it draws one extra row or byte it doesn't matter.

But thinking about how to juggle this in assembler gives me a headache. I'm not sure I have enough registers to hold everything in. Is this even the right direction to go in? All I want is to be able to draw to the screen fast so I can work on an animation system. If anyone knows how I should go about writing it in assembler I'd be eternally grateful. Does anyone have any other advice on CGA programming? I've got sprites I can draw at an arbitrary place on the screen, a tile engine, it's just very very slow at the moment.
 
Dirty rectangles is definitely a good strategy. There are plenty of registers for the routine you want to write. As you're optimising for speed, you'll want to write the inner loop first. That's easy enough, it's just "rep movsw". Then you need to move to the next row. If your source buffer is in the same format as VRAM then this will look like "add si,bx", "add di,bx" where the value in the bx is the amount to skip ahead (i.e. 80 - width_in_bytes). Next you'll need to reset cx for the next row: "mov cx,dx" where dx is width_in_words. And then you'll need a loop for vertical - you can do that with "dec ax", "jnz yloop". So that's your inner loop - then you just need to add code to initialize the correct values for es, ds, si, di, dx, bx and ax. And repeat the whole thing for the odd scanlines. That should give you a pretty fast rectangle copy routine. Not the fastest possible (at least on 8088) - for that you need to unroll all the loops (if you do that you can use a table to decide where to jump into the unrolled code). That would free up the ax, cx and dx registers which would allow you to write the scanlines in y order instead of in VRAM order (different di increment on the even scanlines than on the odd ones) and also keep your source buffer in y order which would simplify the code that writes to that.
 
Here's what I got so far, it only updates half the rows, but I need to know if I'm doing it right. Is this the right direction to go in?

Code:
.data

scrOffset dw     0, 8192, 80, 8272, 160, 8352, 240, 8432, 320, 8512, 400, 8592, 480, 8672, 560, 8752, 640, 8832, 720, 8912, 800, 8992, 880, 9072, 960, 9152, 1040, 9232, 1120, 9312, 1200, 9392, 1280, 9472, 1360, 9552, 1440, 9632, 1520, 9712, 1600, 9792, 1680
          dw    9872, 1760, 9952, 1840, 10032, 1920, 10112, 2000, 10192, 2080, 10272, 2160, 10352, 2240, 10432, 2320, 10512, 2400, 10592, 2480, 10672, 2560, 10752, 2640, 10832, 2720, 10912, 2800, 10992, 2880, 11072, 2960, 11152, 3040, 11232, 3120, 11312, 3200
          dw     11392, 3280, 11472, 3360, 11552, 3440, 11632, 3520, 11712, 3600, 11792, 3680, 11872, 3760, 11952, 3840, 12032, 3920, 12112, 4000, 12192, 4080, 12272, 4160, 12352, 4240, 12432, 4320, 12512, 4400, 12592, 4480, 12672, 4560, 12752, 4640, 12832, 4720
          dw     12912, 4800, 12992, 4880, 13072, 4960, 13152, 5040, 13232, 5120, 13312, 5200, 13392, 5280, 13472, 5360, 13552, 5440, 13632, 5520, 13712, 5600, 13792, 5680, 13872, 5760, 13952, 5840, 14032, 5920, 14112, 6000, 14192, 6080, 14272, 6160, 14352, 6240
          dw    14432, 6320, 14512, 6400, 14592, 6480, 14672, 6560, 14752, 6640, 14832, 6720, 14912, 6800, 14992, 6880, 15072, 6960, 15152, 7040, 15232, 7120, 15312, 7200, 15392, 7280, 15472, 7360, 15552, 7440, 15632, 7520, 15712, 7600, 15792, 7680, 15872, 7760
          dw    15952, 7840, 16032, 7920, 16112


.code


public        _cgaCopyBufferRect


_cgaCopyBufferRect proc far
ARG buffer: dword,  x: word,y: word, w:word, h:word
LOCAL w_wordlength:word, halfheight:word, skipbytes:word
    push     bp
    mov     bp,sp
    push     ds
    
    
    lds     si,[buffer]
    
    mov      ax,w
    shr        ax,5        ; convert from pixels to word space. w/4 = pixels/2 = words.
    mov     [w_wordlength], ax ; store w_wordlength for later.
    
    mov     ax,h
    shr        ax,1 ; half height for even/odd rows.
    mov     [halfheight], ax
    
    ;work out skipped bytes.
    mov ax, 80
    mov bx, [w]
    shr bx, 4
    sub ax, bx
    mov [skipbytes], ax
    
    ;set starting position.
    mov     ax,0b800h ; 0,0 in VRAM.
    mov     bx, [y]
    shl        bx, 1 ;word index for scrOffset.
    add     ax, [scrOffset+bx]
    add     si, ax ;add offset to source too.
    
    mov     es,ax
    ;xor     di,di
    mov     di, si ;copy to destination start too.
    
    mov     cx,[w_wordlength]
    ; loop for half height.
    mov ax, [halfheight]
    
    @@yloop:
    rep movsw
    add si, [skipbytes] ;skip ahead to the next line on the source.
    add di, [skipbytes] ;skip on destination too. 
    dec ax
    jnz @@yloop ;if we still have rows to go, loop.

    pop        ds
    pop        bp
    
    ret
_cgaCopyBufferRect endp
 
Last edited:
You have some bugs... you're not reloading cx in the loop ("rep movsw" will leave 0 in cx). And I don't think you want to add your offset to your es value. There may be others - I didn't try running it either.
 
I tried to fix the bugs, it just hangs the engine. I'm afraid I'm incredibly bad at assembler. I don't know what could be going wrong. I set the destination at 0b800h, then I add the offset into it, since I don't want to write pixels to 0,0. I set the source the same way, and add in the offset. Is movsw expecting something different, like for the offset to be in words? Is my math wrong?

Code:
_cgaCopyBufferRect proc far
ARG buffer: dword,  x: word,y: word, w:word, h:word
LOCAL w_wordlength:word, halfheight:word, skipbytes:word
    push     bp
    mov     bp,sp
    push     ds
    
    
    lds     si,[buffer]
    
    mov      ax,w
    shr        ax,5        ; convert from pixels to word space. w/4 = pixels/2 = words.
    mov     [w_wordlength], ax ; store w_wordlength for later.
    
    mov     ax,h
    shr        ax,1 ; half height for even/odd rows.
    mov     [halfheight], ax
    
    ;work out skipped bytes.
    mov ax, 80
    mov bx, [w]
    shr bx, 4
    sub ax, bx
    mov dx, ax
    
    ;set starting position.
    mov     ax,0b800h ; 0,0 in VRAM.
    mov     es,ax
    mov     bx, [y]
    shl        bx, 1 ;word index for scrOffset.
    mov     ax, [scrOffset+bx]
    add     si, ax ;add offset to source.
    
    xor     di,di
    add     di, ax ;set offset to destination.
    
    ; loop for half height.
    mov ax, [halfheight]
    
    @@yloop:
    mov     cx,[w_wordlength]
    rep movsw
    add si, dx ;skip ahead to the next line on the source.
    add di, dx ;skip on destination too. 
    dec ax
    jnz @@yloop ;if we still have rows to go, loop.

    pop        ds
    pop        bp
    
    ret
_cgaCopyBufferRect endp
 
You need to do some reading up about how segments in 16-bit x86 work. The physical 20-bit address is generated by shifting the segment left by 4 and adding the offset. Video RAM is at physical address 0xb8000 so the usual way of accessing it is via segment 0xb800 and offsets 0x0000-0x3fff. I.e. the far pointer es:di for the top left of the screen is 0xb800:0x0000, not 0xb800:0xb800 as you're setting it to.

You also need to be aware of what your segment registers are, and which are used to access what variables. The arguments and local variables are referenced via offset from bp which means that they implicitly use the stack segment ss: unless overridden. So those are fine. But the table lookup [scrOffset+bx] will use an implicit ds: segment. And you've reloaded ds with the "lds si,[buffer]" line. So unless you know that the segment of the buffer is the same as the .data segment then you're going to be loading your offset from the wrong place.

Neither of those should hang your engine, though. You might have to step through in a debugger to see why that's happening. One thing that might: you're using instructions like "shr bx,4" which don't exist in 8088/8086 so if you're targetting those older CPUs (which you probably should if you're targetting CGA) make sure your assembler is set up appropriately for that so that it doesn't generate 186+ instructions.
 
I'll throw in my $.02 - it sounds like you are fairly proficient in C, not so much in assembly, but are struggling with some algorithmic issues as well. You mention you did a version of this rectangle copy code in C but left holes in the screen image. Switching to assembly won't fix any logic problems.

I would suggest you get a version of your code working in C, even if it is slow, to ensure you have fleshed out the logistics and understand fully the memory mapping of the CGA buffer. Look for any obvious improvements you can make to your C routine to increase performance. In doing so, you should find a way to time your code to ensure you are actually making improvements and not focussing on something irrelevant.

I will point to one of the best resource for aspiring (and seasoned) programmers when it comes to graphics programming: Micheal Abrash's Graphics Programming Black Book (https://archive.org/details/michaelabrashsgr00abra)
You should read this. No, really, you need to read this. Come back when you have.

Once you have digested the above, start the iterative process of converting your *working* C code to assembly. I always have the C compiler spit out the assembly source of the
compiled C code. It makes a great starting point in conversion. If it feels overwhelming, break out the inner loops of your C code into separate routines that you can convert in manageable chunks. If something stops working, you always have the original code to restart from. Starting with assembly code generated from the compiler will give you a chance to understand what working assembly code looks like, even if it isn't the most optimal. The idea is to start small in scope and comprehension before expanding out to more comprehensive implementations.

Most programmers don't want to invest the time it takes to be good at low level graphics programming. It's hard and there are no shortcuts.

Good luck
 
Is movsw expecting something different, like for the offset to be in words?

Nope, addresses are always byte offsets from the start of the relevant segment. movsw will update source and destination addresses by 2 (since it's copying two bytes)

mov ax,w
shr ax,5 ; convert from pixels to word space. w/4 = pixels/2 = words.

shr ax,5 divides AX by 32, you're meaning to divide it by 8. (Shift right of 3). As reengine said shr ax,5 isn't compatible with the 8086/8088 but if you have a V20 or something you're good.
(V20 is cool)

Also worth noting, though, that assuming your CPU is older than 80386, even if "shr ax,5" is supported, it will be slower than repeating "shr ax,1" five times. (Well, on a 286 it actually breaks even, on a NEC v20 it's slower) 8086/8088 do provide "shr reg,cl" but it is SLOW unless you really need a variable number of bit shifts.

Also note that you're rounding w_wordlength down here: so if you're copying a box 15 pixels wide (w=15), you get w_wordlength=1, when actually you need to copy two words to copy the whole row of the box.

;work out skipped bytes.
mov ax, 80
mov bx, [w]
shr bx, 4
As with w_wordlength the same notes apply here (bx being, I guess, a byte offset corresponding to the width of your box):
Shifting right by 4 is equivalent to dividing by 16, you want to be dividing by 4 (shift right by 2)
And you're rounding down in the process.

;set starting position.
mov ax,0b800h ; 0,0 in VRAM.
mov es,ax
mov bx, [y]
shl bx, 1 ;word index for scrOffset.
mov ax, [scrOffset+bx]
add si, ax ;add offset to source.

xor di,di
add di, ax ;set offset to destination.

This looks good, destination segment in video RAM with a 16 bit row offset (drawn from a table) applied to both si and di...

; loop for half height.
mov ax, [halfheight]

@@yloop:
mov cx,[w_wordlength]
rep movsw
add si, dx ;skip ahead to the next line on the source.
add di, dx ;skip on destination too.
dec ax
jnz @@yloop ;if we still have rows to go, loop.

pop ds
pop bp

ret
_cgaCopyBufferRect endp

I did notice that x isn't being used... I'd expect maybe a bit of

mov bx, x
shr bx,1
shr bx,1
shl bx,1
add ax,bx


Before you add AX to SI and DI...

Also regarding the calculation of w_wordlength, maybe something like this:
mov ax,x ; Adjust width to include the whole area actually being copied
and ax,7 ; (X & 7 is the number of pixels from the last memory word boundary to X)
add ax,w ; Add the width of the box
add ax, 7 ; Make the "division" round up: If we extend even one pixel beyond a word boundary, we take another word.
shr ax,1 ; convert from pixels to word space. w/4 = pixels/2 = words.
shr ax,1
shr ax,1
mov [w_wordlength], ax ; store w_wordlength for later.


Expanding this to copy both the even and the odd lines is a bit tricky if you allow odd heights. You could support that with this:
  1. Change the calculation of halfheight to be ((h+1)/2) - rounding up (so for a 5 row rectangle, halfheight is 3)
  2. Loop through the lines starting with the first row of the rectangle (basically the code you've got)
  3. When resetting AX for the second loop, use halfheight-(h&1) (Branchless logic!. So for a 5 row rectangle, you get 2 rows in the second loop)
  4. Loop through the lines starting with the second row of the rectangle (basically a repeat of the row-copying code you've got but with the appropriate adjustments when doing the lookups in your offset tables...)
I'm not sure why your code would be hanging, as long as (as reengine mentioned) you're using a CPU capable of those V20/80186 instructions like "shr register,4". But also it's kind of not worth using shifts other than (1) on an 8086, V20, or even 80286, not if speed is your priority.

This corner of programming (performant CGA on an XT-class machine) has been an interest of mine recently, old Toshiba laptop I upgraded to a V30 (V30 is cool). I wouldn't worry that you're hitting some snags, you're doing good work here.

(EDIT) - one other thing I'd look out for- you're calling this from C, right? What compiler? What memory model? What's the C function prototype for this function look like...? But also I'm wondering if you're restoring all the registers you need to. Right now I'm working with Borland Turbo C v2. I forget which registers you need to restore and which you don't... I think di and si need to be restored if you're using register variables or something...

Also I'm wondering about using LOCAL along with push bp inside the procedure. Wouldn't you need to adjust the stack pointer within your code to "push" the space for the locals to prevent a subsequent push from overlapping your locals space? I haven't used the LOCAL and ARGS macros before... And I really should, they seem very useful. :)
 
Last edited:
I did it!!! I learnt a lot about assembler programming in the process. The reason for the hang is that using lds destroys all the local variables, so I dropped them entirely and just worked using registers. With some fiddling (and double, and triple checking my math), I managed to create a function that does what I want! I'll attach the code in case there's something I've missed, and an image of it working. I update only the area of the square, and on the right is the entire back buffer. I'm sure it could be sped up but it WORKS, and that's a big step.

I have read the black book before, it's a really good resource. It was one of the first things I read when I started programming!

Code:
.data

scrOffset dw 0,8192,80,8272,160,8352,240,8432,320,8512,400,8592,480,8672,560,8752
          dw 640,8832,720,8912,800,8992,880,9072,960,9152,1040,9232,1120,9312,1200,9392
          dw 1280,9472,1360,9552,1440,9632,1520,9712,1600,9792,1680,9872,1760,9952,1840,10032
          dw 1920,10112,2000,10192,2080,10272,2160,10352,2240,10432,2320,10512,2400,10592,2480,10672
          dw 2560,10752,2640,10832,2720,10912,2800,10992,2880,11072,2960,11152,3040,11232,3120,11312
          dw 3200,11392,3280,11472,3360,11552,3440,11632,3520,11712,3600,11792,3680,11872,3760,11952
          dw 3840,12032,3920,12112,4000,12192,4080,12272,4160,12352,4240,12432,4320,12512,4400,12592
          dw 4480,12672,4560,12752,4640,12832,4720,12912,4800,12992,4880,13072,4960,13152,5040,13232
          dw 5120,13312,5200,13392,5280,13472,5360,13552,5440,13632,5520,13712,5600,13792,5680,13872
          dw 5760,13952,5840,14032,5920,14112,6000,14192,6080,14272,6160,14352,6240,14432,6320,14512
          dw 6400,14592,6480,14672,6560,14752,6640,14832,6720,14912,6800,14992,6880,15072,6960,15152
          dw 7040,15232,7120,15312,7200,15392,7280,15472,7360,15552,7440,15632,7520,15712,7600,15792
          dw 7680,15872,7760,15952,7840,16032,7920,16112


.code

public        _cgaCopyBuffer

_cgaCopyBuffer proc far
ARG buffer: dword

    push     bp
    mov     bp,sp
    push     ds

    lds     si,[buffer]

    mov     ax,0b800h
    mov     es,ax
    xor     di,di

    mov     cx,2000h
    rep movsw

    pop        ds
    pop        bp
    
    ret
_cgaCopyBuffer endp

public        _cgaCopyBufferRect



_cgaCopyBufferRect proc far
ARG buffer: dword,  x: word,y: word, w:word, h:word


    push     bp
    mov     bp,sp
    push     ds

    ;work out skipped bytes.
    mov ax, 80
    mov bx, [w]
    inc bx ;round up
    mov cl, 2
    shr bx, cl
    sub ax, bx
    mov dx, ax  ;store skipped bytes in dx.
    
    
    ;set starting position.
    mov     ax,0b800h ; 0,0 in VRAM.
    mov     es,ax    ;set segment to 0b800h
    
    
    mov     bx, [y]
    shl        bx, 1 ;word index for scrOffset.
    mov     ax, [scrOffset+bx] 
    mov     bx, [x]
    mov        cl, 2
    shr        bx, cl ;convert pixel space to byte space.
    add     ax, bx
    
    lds     si,[buffer] ;load buffer here to not overwrite si?
    
    add     si, ax ;add offset to source.
    mov     di, ax ;add offset to destination
    
    mov      ax,[w]
    inc     ax     ;round up to avoid being caught short.
    mov     cl, 3
    shr        ax,cl        ; convert from pixels to word space. w/4 = bytes, bytes/2 = words.
    mov     bx, ax ; store w_wordlength for later.
    
    mov     ax,[h]
    shr        ax,1 ; half height for even/odd rows.    
    push     ax    ; store height for later.
    
    
    @@yloop:
    mov     cx,bx
    rep movsw
    add si, dx ;skip ahead to the next line on the source.
    add di, dx ;skip on destination too. 
    dec ax
    jnz @@yloop ;if we still have rows to go, loop.    
    
    push bx ;store wordlength again.
    
    mov     bx, [y]
    add     bx, 1 ;offset by 1.
    shl        bx, 1 ;word index for scrOffset.
    mov     ax, [scrOffset+bx] ;get yOffset.
    mov     bx, [x]
    mov        cl, 2
    shr        bx, cl ;convert pixel space to byte space.
    add     ax, bx ;;add x offset.
    
    lds     si,[buffer] ;load buffer here to not overwrite si?
    
    add     si, ax ;add offset to source.
    mov     di, ax ;add offset to destination
    
    pop bx ;restore wordlength
    pop ax ;restore height.
    
    @@yloop2:
    mov     cx,bx
    rep movsw
    add si, dx ;skip ahead to the next line on the source.
    add di, dx ;skip on destination too. 
    dec ax
    jnz @@yloop2 ;if we still have rows to go, loop.    
    
    
    pop        ds
    pop        bp
    
    retf
_cgaCopyBufferRect endp

end
 

Attachments

  • updateRect.PNG
    updateRect.PNG
    10.6 KB · Views: 10
One of the reasons I suggested you look at the assembly code generated by the C compiler is to understand what you need to save/restore for interfacing assembly to C code. The LDS instruction won't destroy your local variables. They are referenced off the stack (using SS). However, you usually have to save and restore SI and DI in your code as higher level C routines assume those are preserved across function calls, just like DS. I'm pretty sure Turbo C makes this assumption. Also, your code makes a big assumption that the segment of your buffer is also the default DATA segment. It will work for the small model and your buffer exists inside the default DATA segment, but not much else. Look where you reference srcOffset.
 
Last edited:
Okay, I've taken some of the advice in the thread and swapped shr reg,cl with shr reg, 1 multiple times, and pushed and popped di and si.

Code:
_cgaCopyBufferRect proc far
ARG buffer: dword,  x: word,y: word, w:word, h:word


    push     bp
    mov     bp,sp
    push     ds
    push     di
    push     si

    ;work out skipped bytes.
    mov ax, 80
    mov bx, [w]
    inc bx ;round up
    shr bx, 1
    shr bx, 1
    sub ax, bx
    mov dx, ax  ;store skipped bytes in dx.
    
    
    ;set starting position.
    mov     ax,0b800h ; 0,0 in VRAM.
    mov     es,ax    ;set segment to 0b800h
    
    
    mov     bx, [y]
    shl        bx, 1 ;word index for scrOffset.
    mov     ax, [scrOffset+bx] 
    mov     bx, [x]
    
    shr        bx, 1 ;convert pixel space to byte space.
    shr        bx, 1 ;convert pixel space to byte space.
    add     ax, bx
    
    lds     si,[buffer] ;load buffer here to not overwrite si?
    
    add     si, ax ;add offset to source.
    mov     di, ax ;add offset to destination
    
    mov      ax,[w]
    inc     ax     ;round up to avoid being caught short.
    shr        ax,1        ; convert from pixels to word space. w/4 = bytes, bytes/2 = words.
    shr        ax,1        
    shr        ax,1        
    mov     bx, ax ; store w_wordlength for later.
    
    mov     ax,[h]
    shr        ax,1 ; half height for even/odd rows.    
    push     ax    ; store height for later.
    
    
    @@yloop:
    mov     cx,bx
    rep movsw
    add si, dx ;skip ahead to the next line on the source.
    add di, dx ;skip on destination too. 
    dec ax
    jnz @@yloop ;if we still have rows to go, loop.    
    
    push bx ;store wordlength again.
    
    mov     bx, [y]
    add     bx, 1 ;offset by 1.
    shl        bx, 1 ;word index for scrOffset.
    mov     ax, [scrOffset+bx] ;get yOffset.
    mov     bx, [x]
    shr        bx, 1 ;convert pixel space to byte space.
    shr        bx, 1 
    add     ax, bx ;;add x offset.
    
    lds     si,[buffer] ;load buffer here to not overwrite si?
    
    add     si, ax ;add offset to source.
    mov     di, ax ;add offset to destination
    
    pop bx ;restore wordlength
    pop ax ;restore height.
    
    @@yloop2:
    mov     cx,bx
    rep movsw
    add si, dx ;skip ahead to the next line on the source.
    add di, dx ;skip on destination too. 
    dec ax
    jnz @@yloop2 ;if we still have rows to go, loop.    
    
    pop     si
    pop     di
    pop        ds
    pop        bp
    
    retf
_cgaCopyBufferRect endp

This is like, my first assembler program from scratch so I'm struggling! I wouldn't know how to ensure that scrOffset and buffer's segments are respected, that's a little beyond my pay grade at the moment.

I'm using Turbo C++ 4.5, large memory model because I have a lot of data and I wasn't sure how to deal with it and large let me not have to.
 
Well, the good news is you're getting the introduction to the segmented x86 architecture nobody looks forward to. And you've done a pretty respectable job. The cheap, quick and dirty fix to making srcOffset available to your routine is to utilize the segment you are assured is accessible: the code segment. You would need to move the array into the code segment and add a segment override prefix to access srcOffset, but otherwise you should be good to go. Not necessarily the recommended use of segments, but being low level graphics programmers gives us carte blanche to break all the rules. Wait until you come across self modifying code and on-the-fly compiling BitBLTs.
 
Like this?
Code:
mov     ax, CS:[scrOffset+bx]

It feels a little dirty to put variable declaration in the code segment, but it assembles and runs like normal.

Honestly I wish I could come across some of the weirder stuff like self modifying code but there's a big gap in documentation between "here is drawing a pixel to the screen" and "here is how to compile a sprite so it runs its own assembly code to draw itself".
 
Ah, I didn't get why the lds was a problem - like why would it cause a problem for the local variables...? I totally forgot about the table lookups, of course the data segment would need to be correct when those happen...
 
Like this?
Code:
mov     ax, CS:[scrOffset+bx]

It feels a little dirty to put variable declaration in the code segment, but it assembles and runs like normal.

Honestly I wish I could come across some of the weirder stuff like self modifying code but there's a big gap in documentation between "here is drawing a pixel to the screen" and "here is how to compile a sprite so it runs its own assembly code to draw itself".
That should do it, yeah. You lose a couple cycles due to the segment override, but it gets the job done. I'm still a bit weak on some of the practical details of dealing with 8086 segmentation, so this is good info for me too, personally.

As for the discomfort of putting a variable in the code segment... Well it's not really *variable*, is it? It's a set of constants that are mostly used internally by your graphics library. If you had a function that computed those values, that would be in the graphics lib's code segment... But you have a table instead because that's faster... So the table isn't executable code but it's still a part of your graphics lib's implementation.

Self-modifying code is kind of simple in principle - figure out the sequence of instructions you want to run and put them in memory. I think it's often classified as a bad practice (too hardware-specific, too hard to maintain, and more modern OSes frequently disallow it to mitigate things like buffer overrun exploits) - but of course sometimes it's just an efficient way to do things.
 
Back
Top