Conditional Instruction Execution: Predication. How Useful?

mmruzek · Sep 6, 2022

Hi, I am in the process of building and improving a homebrew computer design that uses programmed ROMS for the Controller and ALU. The computer is called LALU, for "Lookup Arithmetic Logic Unit". The computer architecture has an 8 bit data bus, and a 16 bit address bus. The basic architecture is shown in this diagram.

The Instructions sent to the Instruction Register are a single byte. The top nibble of the byte contains the instruction, of which there are 16. The instruction for performing an ALU operation uses the lower 4 bits of the Instruction Byte to select which ALU operation to perform. The ALU operations contains simple things like addition of Registers A and B, Logic like an AND of Registers A and B, etc. The ALU also can be instructed to generate a flag (1 or 0), which appears in the LSB of the LALU output.

Here's the neat thing, I don't need to register the flag into C and then into A to be able to use it. So, for example, with my Conditional Jump Instructions I can specify a Jump and again use the lower 4 bits to tell the LALU what flag to generate, and therefore make the jump depend on that Flag. (For example, IF A>B the Flag = 1, THEN Jump). The flags I have right now are A=0, A=B, A>B, A<B, Overflow for A+B.

Well, it ocurred to me today that I can actually make any instruction conditional on the state of the flags. For example I have an instruction MAB. This instruction Moves A to B. (Contents of Register A to Register B). It would be a simple matter to reprogram my Lookup Controller to make this Instruction Conditional on the state of any flag that is available. Again by specifying the lower 4 bits to setup the LALU to generate the relevant flag.

Doing some reading it looks like this idea has around for a long time, but has not been widely used. So basically I am putting this topic out there to see if anyone has thoughts on the use of assembly language instructions which are conditional on the state of a flag. Thanks! Michael

daver2 · Sep 6, 2022

Checkout the Apollo Guidance Computer CCS (Count, Compare and Skip) instruction for details of a 4-way flag branch. https://www.ibiblio.org/apollo/assembly_language_manual.html.

Instruction n+0: CCS Instruction.
Instruction n+1: This instruction is executed if the CCS result returned >+0 or a positive overflow.
Instruction n+2: This instruction is executed if the CCS result returned = +0.
Instruction n+3: This instruction is executed if the CCS result returned < -0 or a negative overflow.
Instruction n+4: This instruction is executed if the CCS result returned = -0.

Note that the AGC is a 1's complement machine, so +0 and -0 are not the same bit pattern!

I have come across a few old computer architectures using exactly this mechanism.

If you look at the source code for Command Module or Lunar Module AGC you will observe that the CCS instruction can be used with fewer branches filled in. This looks strange - but if it is known that the operand will only be > +0 or +0 then a 2-way branch can be used and the remaining 2 instructions used after the CCS for further parts of the code. However, this must have made software maintenance and reuse a nightmare! Just think if the result of the CCS returned < -0 or -0 by accident...

An example can be found in the source code here (Command Module code for Apollo 11 that never actually flew - bugs): https://www.ibiblio.org/apollo/listings/Comanche051/P11.agc.html:

Long form:

Short form:

Dave

mbbrutman · Sep 6, 2022

I think one of the proposed VLIW architectures used predicated instructions.

The general advantage is that branching is expensive in a pipelined architecture, especially in a speculative pipelined architecture. (Not a concern for VLIW, but keep reading ...) Being able to just simply "skip" the next instruction is a nice optimization that gets around the overhead of branch, even when that branch is tiny. Think about that case where you just need to increment a pointer or do a quick computation that can be done in one instruction. Of course the downside is the extra complexity.

Looking at it from a different perspective, the original Thinking Machines CM-2 had a pretty neat approach to parallel execution that looks a lot like instruction predicates, but in parallel. It was a SIMD machine - "Single Instruction Multiple Data" meaning that every CPU did the same instruction at the same time, but on different pieces of data. A CPU could choose to "sit out" an instruction ... which is basically the predicate idea.

-Mike

stepleton · Sep 6, 2022

mmruzek said:
this idea has around for a long time, but has not been widely used.

It depends on what you mean by "widely", I suppose. ARM32 uses conditional instructions, and that's been an architecture that occasionally found an application in the real world

I've never written ARM32 assembly, but everyone I've spoken to about doing it has told me that it was enjoyable.

64-bit ARM did away with conditional instructions for reasons discussed in this Stack Overflow post.

Chuck(G) · Sep 6, 2022

For condition codes embedded in instructions, it's been done. I don't recall if it was the Bellmac or PA-Risc, but one of those had a field in every instruction reserved for condition codes. Speculative execution has been around for a very long time. Branch prediction probably originated on the IBM Stretch. Personally, I'm getting to be a fan of RSIC-V (no condition codes). I mean to delve into it when I get the time.

Chuck(G) · Sep 6, 2022

"Note that the AGC is a 1's complement machine, so +0 and -0 are not the same bit pattern!"

I programmed for years on a ones complement system. -0 is rarely a problem and you can do some creative bit twiddling on a ones complement machine that you can't on a two's.

bakemono · Sep 7, 2022

I put conditional move instructions into my RISC FPGA CPU design. In my case there are four conditions, the same ones used for branching, which are equal/unequal/carry/no-carry. So that's a 2-bit field in the opcodes that use it. I figured this would be useful for things like clipping values to a certain range:

Code:

if x<-32768 then x=-32768
if x>32767 then x=32767
etc.

A compare-and-branch could be replaced by a compare-and-conditional-move. That was the plan anyway. Looking at the source for the one program I have written using this instruction set, I have not actually used any conditional moves yet...

Chuck(G) · Sep 7, 2022

I always admired Seymour Cray for the ability of his to fit the bulk of his instruction set into 15 bit and 30 bit instructions--and those were three-address variety. I long suspected that he had a deeper understanding of instruction sets. The best IBM could do with the S/360 was 16 bit two-address instructions.

Beyond simple min and max operations, what additional use will compare-and-move instruction have?

mmruzek · Feb 25, 2023

Thanks everyone for those great replies! I especially appreciate the links to other information, and the historical context.

This same topic of Predication came up at another forum, and I pointed to the discussion above, here at the VCF.

AnyCPU - View topic - Deferred calculation of processor flags

BTW: You might enjoy checking out the ANYCPU.ORG forum. It's a very low key place where people discuss building their own CPU's and experimenting with computer architecture, alot of it using real hardware. There is an absolute wealth of information in the posts over there.

Chuck(G) · Feb 25, 2023

Branch prediction goes all the way back to IBM 7030 STRETCH, which I believe assumed branch-taken; the CDC 6600 is the other way--assumes branch not taken.

I was (and still am) a fan of the Cray approach--forget about processor flags and just set a register flag every time something is stored into the register. That way, you can compute the condition way ahead of the branch because there's no need to get the condition setting operation even close to the branch. Cray's flags were "zero" and "negative" for each register. You can cover a lot of ground with those.

It's really useful when you have multiple functional units. You can move the compare operation way upstream of the branch.

You might want to look at the Intel i860 or MIPS design with a "free" instruction slot after the branch that's always executed. That gives you some time to do some useful work should the branch involve a delay.

On the STAR, we added a couple of instructions that were the close relative of an IBM BXLE instruction, but instead of branching, merely set a register to 0 or -1. That is, increment/decrement and compare, but store the result of the compare as 0 or -1 instead of branching. Turned out to be very useful.

Some food for thought.

Conditional Instruction Execution: Predication. How Useful?

mmruzek

Experienced Member

daver2

10k Member

mbbrutman

Associate Cat Herder

stepleton

Veteran Member

Chuck(G)

25k Member

Chuck(G)

25k Member

bakemono

Experienced Member

Chuck(G)

25k Member

mmruzek

Experienced Member

Chuck(G)

25k Member

VCF West	Aug 01 - 02 2025,	CHM, Mountain View, CA
VCF Midwest	Sep 13 - 14 2025,	Schaumburg, IL
VCF Montreal	Jan 24 - 25, 2026,	RMC Saint Jean, Montreal, Canada
VCF SoCal	Feb 14 - 15, 2026,	Hotel Fera, Orange CA
VCF Southwest	May 29 - 31, 2026,	Westin Dallas Fort Worth Airport
VCF Southeast	June, 2026	Atlanta, GA