• Please review our updated Terms and Rules here

Square Raycast


Experienced Member
Oct 9, 2016
Seattle, USA
Magenta's Maze is a tile-based 3D game but the first version supported general 3D objects and used C's qsort() to depth sort objects by their first vertex value. Although this was flexible, it was unnecessary (again: tile-based) and slow! In redesigning it for assembler, I did away with the generalized 3D. One can still paint 3D objects but the caller must decide the order in which to paint them, rather than the engine accepting an arbitrary amount of objects, polygons, and vertices then figuring out how to display them.

Obviously, I could still use a depth sort but the best way to optimize something is to optimize it out, e.g. just don't do it at all. How then do I propose to know the order in which to draw the objects? A ray-casting algorithm is what I chose and yet had a hard time finding one simplified enough for tiles. A lot of simple 3D games are so-called "2.5D" like Wolfenstein 3D or Doom. These games excel at rendering walls and sprites rather than true 3D objects. What they do is cast a ray by drawing an imaginary line for every vertical strip of the screen. What I wanted to do is something similar except render everything in the path of the ray and constrain it to a box, e.g. a square, rather than an arc or circle.

View attachment 38963
(Click image to see animation; sadly I had to remove half the frames otherwise the image got converted to a JPG)

What I came up with therefore I have been calling a "square ray-cast" but that may not make sense at all, in which case consider it a code name for the method. I'm quite rusty (or rather new and squeaky) when it comes to math so figuring this out took a bit of my noodle and I'm struggling just as much in figuring out how to explain it!

The first step is to take the viewing direction and start at that rotation multiplied by the maximum viewing distance. For my game, I am supporting up to 4 tiles so this is 4. Now either the X or Y is going to be maximized but not necessarily both; determining which is a matter of max(abs(sin(dir)), abs(cos(dir)) which is why I had asked those questions recently about performing sign-related functions in assembler without branching. The maximized axis is what I call the extremity and it is stationary since I move along the opposite axis.

Here is pseudo-code for the first step in the algorithm (determining the starting position at a giving "D" distance). This becomes the main code of the outer loop which goes from D to 0:

if (abs(cos(dir)) < abs(sin(dir)) then
  X = sin(dir) * D
  Y = D * -sgn(cos(dir))
  MoveX = sgn(cos(dir))
  MoveY = 0
  X = D * sgn(sin(dir))
  Y = -cos(dir) * D
  MoveX = 0
  MoveY = sgn(sin(dir))

Now there is enough data to run the inner loop which adds MoveX to X, MoveY to Y, and then renders that particular tile. The complication here is when X or Y overflow their bounds, e.g. less than -D or greater than +D. This is where I constrain them to a square. The trick is that the nature of the axes switch places so that movement takes place along the original extremity and the original movement axis is stationary. Further, I must determine whether the movement continues in the same direction or opposite (reversing the sign of the Move variable).

if (X to Y turn) then
  if (clockwise) then 
    MoveY = MoveX
    MoveY = -MoveX
  end if
else (Y to X turn) then
  if (clockwise) then
    MoveX = -MoveY
    MoveX = MoveY
end if

Clockwise is essentially "going in the same direction as" but it changes and I realize I didn't get to explaining that yet! So out from the starting point in the outer loop are two lines drawn clockwise (in the same direction) and counter-clockwise (opposite). If you're looking at 0 degrees then these sides or "wings" go right (clockwise) and left (counter-clockwise). This means for every D in the outer loop there are two inner loops, one for each side.

Ah, and one more detail: each side is done D + 1 times. Thus when D = 4, I iterate 5 times for each side.

Finally, the tile at the original position in the outer loop is plotted and D is decremented towards zero. The zero position (player) is always implied as last.

With a maximum "D" distance of 4, this algorithm chooses exactly 32 tiles to render not including the implied zero tile. Having a fixed number like this lends itself to caching so I further extended this algorithm to be a cache rather than a real-time method: RAYCACHE! It writes out 192 bytes for 32 tiles in 6 byte records: X and Y in single bytes as well as X and Z in words for the 3D translation of the tile. You might be curious why 192 ... well, that leaves 64 to round out to 256 (a maximum unsigned byte value) which happens to be the width of my tile map.

Yes, I'll detail this in my next post which will explain how the tile map itself is integrated with this cache ...