For VGA you just need to wait for 2 hsyncs per iteration, i.e. EGA and VGA need their own loops. I'm using that technique for a copper bar effect in mode x0d somewhere, changing the background color every other 2 scanlines.
I always thought that the offset register and PEL only took effect on...