• Please review our updated Terms and Rules here

Converting disassembled code to something that can be fed to a C compiler?

I know very little about assembly but I always assumed C could do anything assembly could even if less efficiently.
You can express every conceivable algorithm in both assembly and C (because Turing-completeness and all that), but that does not mean you can do it equally efficiently for a given CPU.

For example, many assembly function return two results - an actual result and a flag (like carry or zero) telling you whether the function call succeeded in the first place. In C, functions only return one result; returning two requires returning a complex data structure, which compiles to far more instructions and uses more memory.

Another area are rotation instructions. C does not have operators for rotations, so you have to express those as "(A << N) | (A >> M-N)" or similar and hope that the compiler recognizes those. But in C, overflow on signed integers is undefined (implementation-defined) behavior and flags are not exposed at all, so a lot of assembly-level trickery "rotate the top bit into the sign bit" for math just does not translate well.

And finally, C compilers do not generate all instructions a CPU can run. Especially older CPUs have many special instructions which require a complex environmental setup to be used efficiently - compilers are not smart enough to recognize it. For common cases (like memset or memcpy), the C library may contain hand-written assembly for optimization, but general code will not use it. Which is why C compilers support inline assembly or similar means.

Well-optimized, tricky assembly code is very hard to translate into reasonably efficient C.

How do you mean "because GOTO can be anywhere"? Goto in C can be anywhere too.
GOTO in C cannot go anywhere, it needs to stay within the same stack frame. Computed GOTO (i.e. "goto *ptr") is a compiler-extension and also only legal within the same stack frame. If you want to go further, you have to use setjmp/longjmp which come with a ton of caveats.
 
For example, many assembly function return two results - an actual result and a flag (like carry or zero) telling you whether the function call succeeded in the first place. In C, functions only return one result; returning two requires returning a complex data structure, which compiles to far more instructions and uses more memory.
If you compile things in a way where the C compiler doesn't have to export any externally callable function the compiler can take advantage of that combined type of response.

Also: I would say that it's sad if we in 2025 still don't have compilers where you can declare flags as return codes for functions.
 
GOTO in C cannot go anywhere, it needs to stay within the same stack frame. Computed GOTO (i.e. "goto *ptr") is a compiler-extension and also only legal within the same stack frame. If you want to go further, you have to use setjmp/longjmp which come with a ton of caveats.

That is exactly my point.
 
it needs to stay within the same stack frame
And pure assembly code will not necessarily even be using stack frames aside from call/return addresses. The main point is that a C compiler translates code in certain mechanical ways that can be automatically identified and reverse-engineered. Pure assembly code has no obligation to follow any of those patterns.

At least we have Ghidra for free to give a good try at it.
 
If you compile things in a way where the C compiler doesn't have to export any externally callable function the compiler can take advantage of that combined type of response.
Yes, and very few compilers take advantage of that. But you are free to implement it.

Also: I would say that it's sad if we in 2025 still don't have compilers where you can declare flags as return codes for functions.
Make that a language feature and you just broke the language for MIPS or RISC-V. Very smart idea in 2025.
 
Yes, and very few compilers take advantage of that. But you are free to implement it.


Make that a language feature and you just broke the language for MIPS or RISC-V. Very smart idea in 2025.
It would of course be a feature that is used differently (or not at all) on different architectures and different OSes.

Sure, you always need the stack as a fallback for calling parameters, and a single register as a fallback for return parameters, but in general it seems inefficient to not use what the processor can do.

Also, it would not need to actually change the syntax of the language at all. Just treat everything that can be stored in registers and flags as a struct you return, and the compiler can choose to put all that in registers + flags if it finds that suitable.
 
Interesting. Can you be more specific? I know very little about assembly but I always assumed C could do anything assembly could even if less efficiently.
Certainly with something like Z80 assembly, it's possible to convert it into C by hook or by crook. The existence of Z80 emulators serves as an existence proof; at minimum, you could replace each disassembled line by an invocation of the emulation code for that instruction. Converting to readable idiomatic C is harder, I grant you.
 
Back
Top