MSDOS loading .com / .exe process

anormal · Mar 26, 2018

Hi!

I've been looking for more information about this, after some googling (only got modern msdos stuff/windows), y tried
to read source of dosbox, but it's not very detalied as reference.

What i am looking for is the loading process of .com and .exe files before execution.

I already know .com files are loaded at 100h, 0..99h data is setup by msdos and the run it jumping to 100h
For exes, is a bit more complicated, because relocations and possibility of exe file being very large. Not much idea about this.

I read my book of Undocument PC 2nd ed., but don't found the information there.

For now i am thinking in looking for Ralph Brown Interrupt lists, or a very gold program i remember called helppc with lots of
technical info. Um... i am thinking i got a bunch of MS technical memos i downloaded from somewhere.

What i want to do is, i got an snapshot of a typical x86 msdos 3.30 system ram saved to disk (use hampa pce emulator with monitor
command "save filename bin 0 fffff" (other ways are possible as running a small script inside debug to same entire 1st MB of ram to a file). Then i want to load a com or exe file to this ram and do some tracing in the binary.

If anyone could point me to any book or url or whatever explaining how a com/exe file is loaded in memory prior to running it...

Maybe this is basic stuff for many of you but i am bit rusty in old stuff by now.

Thanks

per · Mar 26, 2018

If you can get hold of a MS-DOS technical reference, then that contains most of the info you need to know.

reenigne · Mar 26, 2018

I've written an (extremely simple) 8088/8086 emulator which handles .exe files - see https://github.com/reenigne/reenigne/tree/master/8088/86sim/86sim.cpp lines 562-596. I think http://www.delorie.com/djgpp/doc/exe/ and http://www.fileformat.info/format/exe/corion-mz.htm were probably the main references I looked at to implement that code. If you have any more specific questions that those don't answer I can probably help.

anormal · Mar 26, 2018

Thanks for links!
@reenigne: i don't know if you remember when we talk about my interest in binary indexing executables, i finally decided you follow your tip to try a partial tracing/emulation approach, let's see if i can code some stuff
thanks

reenigne · Mar 26, 2018

Thinking about it some more, though, it's possible that MS-DOS maintains some internal data structures which are affected by information about the currently running process (particularly the size). It might get confused if it thinks it's running one program and then actually ends up running a different one. If you get weird errors or unusual behaviour making DOS calls (especially ones about allocating memory), you might do better to have a stub program which calls DOS to do the actual loading (interrupt 0x21 with AH=0x4b).

anormal · Mar 26, 2018

My idea is load the 1st MB to ram already saved from a msdos 3.30 prompt.
So no boot/msdos initialization, just load executable and trace.

But, i plan to abort/skip tracing if some conditions are meet, as interrupt calls, jmp fars, keyboard input, dumb in/out simulation, etc...

I'll start trying to trace for example only the first 1000 opcodes or something like that. As i am interested in exe stubs, and starting code first.
If i can get this stable i could try tracing a bit more

I am stubborn here with my coding experiments in Delphi/Pascal, so i'll code speed focused basic 8086 emu first. Nothing advanced as timing (but i'll try to be as precise i could in timing), and also be aware of prefetch queue.

I haven't put a single line of code, just basic program structured, exe checking, and so, but i'm looking the way as i could do this fast as possible (i plan to analyze thousands of executables).

For me, most interesting is to found non-standard executing paths, i am most interested in non-compiled stuff, as i think standard compiled code, as
C, Pascal, or other old msdos language compiled code is fairly standard.
So i'll trace asm coded stuff first, protected exes stuff, demos/intros, cracks, etc...
Many games need precise timings, but this demand for me to build a much more complex and robust emu.

Thanks

reenigne · Mar 26, 2018

anormal said:
Thanks for links!
@reenigne: i don't know if you remember when we talk about my interest in binary indexing executables, i finally decided you follow your tip to try a partial tracing/emulation approach, let's see if i can code some stuff
thanks

Oh, are you the same person that I had an email conversation with starting on the 6th of November last year?

You'll be glad to hear that that conversation inspired me to make some good progress with my cycle-exact emulator. I am generating some 500,000 or so testcases, batching them together, running them on my XT, comparing the results with the emulator and generating traces when they don't match. By doing that, I've now tuned my emulator to the point where it is cycle-exact as long as you only access memory with no wait states and ports with one wait state. However, I haven't got the timing of other wait states right yet - in particular, DRAM refresh DMAs cause longer waits which currently get the CPU into a state I'm not handling properly. Fixing this requires some pretty invasive changes because certain things are currently not done in the correct order, and it's very difficult to make changes without affecting the timings. But I am actively working on it!

reenigne · Mar 26, 2018

anormal said:
i'm looking the way as i could do this fast as possible (i plan to analyze thousands of executables).

Whenever you're optimizing something for speed, you need to be aware of what's taking the most time and concentrate on that. If you're running your emulator on a lot of small executables, the thing that takes the most time might be starting up the emulator (even if you're not doing an emulated boot, just starting up the process and setting up that 1MB RAM image might be the limiting factor). That's what I found when running testcases for gcc-ia16, hence the simple emulator I linked to above is optimized for this scenario - it's simple and therefore small so it starts quickly. A more complicated emulator might be able to run more instructions per second once started, but the time for a full run would be dominated by the time it takes to start the emulator.

Another possibility might be to have an emulator as a background process or daemon and just have a small stub program to send testcases to it via an inter-process communication mechanism like named pipes. Then you can have a big, fast emulator but the part which is started for each testcase is very small.

A year ago I started an emulator (closely related to 86sim that I mentioned above) that seems very similar to your project - the idea of that one was to use emulation to find the bits of code that actually run first, and then disassemble/decompile those. The results can then be used to find more code paths and so on until the entire program is reverse-engineered (I other disassemblers/decompilers work with the code statically rather than trying to actually run it dynamically). I implemented just enough DOS/BIOS calls to be able to run Ultima VI (albeit it not playably as there is no keyboard or mouse input) but I didn't get very far with the actual decompilation. Still, it might be a useful starting point or reference for your project.

alecv · Mar 26, 2018

http://vetusware.com/download/MS DOS 6.00_Sources included_/?id=4093

anormal · Mar 26, 2018

Thanks for information, yes it was me , but i was disconnected from this stuff for some months.
Nice you advanced so much your cycle exact emu, as this is simple a thing we need.
Precisely i was reading your github this last weekend, as i thought i could find some loader there, i was ok, but was not able to found it all your repos xD

As you said i thought setting, booting, and executing msdos could be a time consuming process, so the idea of simple loading a entire RAM state, load myself the executables and do some tracing looking for interesting codepaths. Also i got huge collection of old and interesting "non-standard" executables i collected for years, so i already got a lot of exes to explore. Let's see.. as i said, i am the typical geek who starts projects but never finish it as another interesting ideas come in and i jump to the new one...

I'll check your repo these days.

Of course if i finally do something with this idea and can log interesting codepaths i'll share my tests with you for testing possible edge cases.

@alecv: thanks for the link, i forgot i already have the msdos sources.

Xacalite · Mar 26, 2018

If you need DOS sources, note that MS-DOS sources aren't the only choice - there's also FreeDOS.
Also, weren't OpenDOS/DR-DOS sources released at some point as well?

reenigne · Mar 26, 2018

anormal said:
load myself the executables and do some tracing looking for interesting codepaths.

I'm curious about what kind of code paths you're looking for that might be interesting. In email you mentioned determining what CPU (and other hardware) a program targets (or can make use of). Are there other things that you think you might find? If so, how would you recognize them in the emulator trace?

anormal said:
Also i got huge collection of old and interesting "non-standard" executables i collected for years, so i already got a lot of exes to explore.

What makes an executable "non-standard" and interesting in your view? You mentioned looking for things that were written in assembly rather than C or Pascal but I'm not sure how you could tell - some compilers can generate pretty sophisticated code that would be difficult to distinguish (especially automatically). And many (perhaps even most) non-trivial programs contain both code written in a high-level language and some bits of assembly for low-level stuff and hardware access.

I'm also curious if something like the DOS version of Digger Remastered would be considered "interesting". It's compiled using Borland C++ 4.52 but is also linked with assembly routines assembled with the unregistered version of A86 (and which therefore may contain hidden watermarking in the opcode selection to say so). It also uses startup/runtime libraries from Turbo C 2.01 (since they were smaller than the BC++4.52 ones). I'm not sure how any of these things could be determined by running or examining the binary, though - at least without deep knowledge of those 3 tools.

anormal said:
Let's see.. as i said, i am the typical geek who starts projects but never finish it as another interesting ideas come in and i jump to the new one...

You're definitely in good company there as you can probably tell from my github repo!

anormal · Mar 27, 2018

Thanks for your comments reenigne.

Well, it started as my interesting in indexing binaries. The original idea was binary indexing using an inverted trigram index. My love for MSDOS and the possibility of using already preserved huge collections of DOS binaries helped. Then I thought that a fast binary index could be used in any other vintage (or modern) platform. As i am interested in old msdos software, games, etc. I wanted to make use of software repositories,for example in gaming, you could do searches looking for specific code sequences (including wildcards of course), so you can search for Gravis Ultrasound initializations, Mode-X setups, Joystick testing, floating point stuff, jumping from real mode to protected mode, etc, etc.. This could be of use to someone doing some digging in specific areas. In C64 could be used to index all known c64 software and do fast searches in the whole repository.

As you told me months ago, and I already was aware of it, many executables are compressed, protected with antidebugging tricks, or just play cyphered under layers of xors and that :D

So indexing that executable will be a loss of time. Then the ball started rolling xd So... Could i organize exes and try to do some kind of tracer or generic unpacker so i could index the unpacked (or at least a representation of it) binary?

If you look to, for example the wonderful DIE (detect it easy) it contains lots of search strings to identify compiler versions, packers, etc, etc...

Well, the idea is start doing some work on this and see if i can go further.

I called "non standard" codepaths to try to differenciate common code, compiled with compilers (yes i know many contain specific asm parts), but the general code is "standard" as far i see it. Then there is pretty obfuscated code out there, or just very time critical code, or accesing msdos through far calls/jmps (not using int 21), etc, etc... So i thought this could be interesting to explore and do some kind of index.
And it's useful for example to you in the cycle exact emulation to found easily testing cases (for example, look for msdos binaries accesing determinate ports, or accesing floppy controller, pit, etc). Or people doing reversing in old software applications and games (see for example the many interesting things in os2museum site)

Not useful for pretty anyone i suppose, but it's interesting for me, I am finnishing developing a fast btree for resolving the hashing chains in the trigrams, etc... So it let me learn new stuff, new algorithms, improve regexp searchs, etc. I know this is a complex idea, but as i said i use it to improve my coding skills.

Thanks for your time !

reenigne · Mar 27, 2018

anormal said:
you could do searches looking for specific code sequences (including wildcards of course), so you can search for Gravis Ultrasound initializations, Mode-X setups, Joystick testing, floating point stuff, jumping from real mode to protected mode, etc, etc..

Interesting stuff. One hurdle you might face is that there are a lot of different code sequences out there for doing something like initializing Mode-X - the important part for detecting them isn't so much finding a particular sequence of instructions (even with wildcards) but a particular set of register writes. The actual routine that you hit for that might be a very boring CRT implementation of outportb(), for example.

anormal said:
very time critical code

One thing that might be interesting to look for there would be heavily unrolled loops: code sequences that (excluding immediate offsets/values) are extremely repetitive. These might be generated at run-time instead of appearing in the static binary, so doing de-packing in a generic way would be important. From the actually-executed part of the in-memory image you could apply a simple compression algorithm to the opcode and mod r/m bytes to look for highly redundant code. It would be interesting to know how common such techniques were and what software used them.

Another interesting technique to find out about is self-modifying code. This could be detected by keeping track of a "last modified time" for each byte in RAM and then examining that value for the bytes that end up being executed. Most of the time that value will be in the "code was loaded and/or unpacked" period of time but more recent values would indicate self-modifying code. It would be interesting to draw some graphs of modification-time vs execution-time for different pieces of software and look for graphs with particularly unusual patterns.

anormal said:
And it's useful for example to you in the cycle exact emulation to found easily testing cases

Possibly, though for that project I'm interested more in what the hardware actually does in any given circumstance than in which set of circumstances occur in the extent software - I'm hoping to emulate a lot of behaviours that nobody ever used in real programs. That way, it'll be useful for creating new software rather than just running existing software.

anormal said:
Not useful for pretty anyone i suppose, but it's interesting for me, I am finnishing developing a fast btree for resolving the hashing chains in the trigrams, etc... So it let me learn new stuff, new algorithms, improve regexp searchs, etc. I know this is a complex idea, but as i said i use it to improve my coding skills.

Cool stuff - I look forward to seeing the results!

anormal · Mar 27, 2018

I've already thought many ideas about what you said, as calculating execution graphs.
But please my skills are limited xD, I'll try my best, i can't promise even a prototype in months.
I coded a fast x32 6502 emu many years ago and read a lot of source code of emus, but just all.
Also i remember doing a tutorial using a Virtual cpu for protecting exes for the old good Fravia site :D
As you see, i've ever loved the low-level stuff.

Also i've thought about not implemented myself the search terms, as i am not proficient in this kind of low level coding, but build a tool other people could build libraries of search patterns and share them in some way (as today in sites that shared useful regexps).

Also my idea is about not only indexing binarys as-is, but also try to do some kind of index with meta-instruccions, instead of something like
MOV AX,301
MOV BX,200
MOV CX,1
MOV DX,80
INT 13h

(don't execute that code, it'll overwrite sector in your hard disk, erasing it)

This could be code in a miriad of other ways
MOV AX,XXXX
MOV BX,200
XOR AX,BX (so ax now is 301)
MOV CL,AL (so cl is 1 sector)
MOV DX,80
(get int 13 address from ints vector doing a pushf and then a callfar)

etc...

If you look old msdos stuff, cracks, virus (i still wonder about the incredible work done in mutation engines) you can see tons of very crazy stuff.

An intermediate representation could index this in another way. But this is out of scope right now and a decompiler is something
i've thought but totally out of my skills)

Well... let's see, because i am also a lot in emulation stuff and pretty much jump between ideas. Procrastination at my best!

anormal · Mar 27, 2018

For reference, I asked in another forum and got and pretty nice and extensive answer, i'll post here for anyone interested in the future.

Credits to NewRisingSun

Uh, let's see...
Open the file and read the first two bytes. If they're "MZ" or "ZM", load it as an .EXE file, otherwise if it's larger than 65,280 bytes, exit with an error, otherwise load as a .COM file.

.COM Files:
Allocate a new memory block for the child process' environment strings. Fill it with a copy of the parent process's environment block and the full file specification of the child process' file.
Allocate a new memory block (let's call it PSPSeg) filling up the entirety of remaining convential RAM.
Fill the first 256 bytes of PSPSeg with the Program Segment Prefix (PSP), including the initial Int20 instruction, the size of this memory block in paragraphs, saved Int22, Int23 and Int24 vectors, the segment of the parent process' PSP, the command line, and the first two command line options parsed into two File Control Blocks (FCBs).
Load the entirety of the .COM file into the segment starting 256 bytes after PSPSeg (let's call it LoadSeg).
Set AL to 00 if the first FCB in the PSP has a valid driver letter, otherwise FF; same with AH and the second FCB
Set DS, ES, SS to PSPSeg
Set SP to zero, then push the value 0, so the program can exit via a simple RET to PSP offset 0 which holds the INT 20 instruction.
Jump to PSP:0100.

.EXE Files
Load the initial 28 bytes of the .EXE header into an internal buffer (all the fields until the start of the relocation table).
Allocate a new memory block for the child process' environment strings. Fill it with a copy of the parent's environment block and the full file specification of the child process file.
Determine the "load image size" by multiplying the .EXE header's "pages in file" field with 512, subtracting the header size.
Determine the memory block size: it's at least "load image size" plus "extra paragraphs needed SHL 4" and at most "load image size" plus "extra paragraphs wanted SHL 4" (the latter usually 65535 indicating as much as possible) plus 256 bytes for the Program Segment Prefix.
Allocate a new memory block (let's call it PSPSeg) with the just-determined memory block size.
Fill the first 256 bytes of PSPSeg with the Program Segment Prefix (PSP), including the initial Int20 instruction, the size of this memory block in paragraphs, saved Int22, Int23 and Int24 vectors, the segment of the parent process' PSP, the command line, and the first two command line options parsed into two File Control Blocks (FCBs).
Load the "load image" (the part of the .EXE file following the .EXE header, "load image size" long, which is followed by overlay data) into the segment starting 256 bytes after PSPSeg (let's call it LoadSeg). Exception: If "extra paragraphs wanted" is zero, LoadSeg is set so that the load image is loaded to the very end of the memory block instead. This is called "loading high" (not to be confused with the "LOADHIGH" command which loads a file into Upper Memory Blocks).
Using the number of relocation table entries and the start offset of the relocation table from the .EXE header, load the relocation table into memory, somewhere after the load image.
For each relocation table entry, add LoadSeg to the word at the address specified by the table entry, which is a Segment:Offset pair (and thus four bytes long) relative to LoadSeg.
Set AL to 00 if the first FCB in the PSP has a valid driver letter, otherwise FF; same with AH and the second FCB
Set DS, ES to PSPSeg.
Set SS:SP to what is specified in the .EXE header (SS relative to LoadSeg).
Jump to CS:IP specified in the .EXE header (CS relative to LoadSeg).

Chuck(G) · Mar 27, 2018

I assume that you've looked at the MSDOS 6.0 source that is (still should be) available on the web.

One thing about .EXE files (not the COM ones masquerading as such, but real "MZ" ones) is added payload at the end of the file. I used this extensively in my products back in the day. So there's code/data there that you don't initially see when the program is loaded. In the old BBS days, where it paid to keep executables small, I used the DIET utility that LZH-compressed an executable.

VileR · Mar 27, 2018

reenigne said:
Another interesting technique to find out about is self-modifying code. This could be detected by keeping track of a "last modified time" for each byte in RAM and then examining that value for the bytes that end up being executed. Most of the time that value will be in the "code was loaded and/or unpacked" period of time but more recent values would indicate self-modifying code. It would be interesting to draw some graphs of modification-time vs execution-time for different pieces of software and look for graphs with particularly unusual patterns.

That's something I was also thinking about, because I've had a few cases where I wanted to know what memory is changing during execution and when, or whether some seemingly unused code / free space within a segment was safe to repurpose for a patch, etc. But in the end I figured that what would really help me there is a memory visualizer. One of these days I plan to try implementing something more or less like this: https://www.youtube.com/watch?v=b-7BTnGlEDc, but for DOSBox, covering the first megabyte of RAM, with a text-based display showing you an annotated segment axis (and perhaps even letting you "zoom" into a specific range of addresses for better resolution).

Trixter · Mar 27, 2018

anormal said:
I called "non standard" codepaths to try to differenciate common code, compiled with compilers (yes i know many contain specific asm parts), but the general code is "standard" as far i see it.

This sounds like a combination of Hex-ray's FLIRT signature library coupled with their professional Decompiler (not IDA -- the Decompiler is a separate product that analyzes asm snippets to turn them into generalized C code).

anormal · Mar 27, 2018

Yes trixter it sounds like that, it's not my idea to measure my little project with something developed for, how many years? I remember start using version 3 or so. Until that I used of course Sourcer.

Last year I tried to make some kind of autocommenter for Ida for DOS binaries, my plan was to merge resources as HelpPc, Ralph Brown's, etc, etc in some kind of db and then use that information with some python to comment as much as possible Ida disassembling. But it got stucked in the merging info step, hehe, maybe I'll could try again sometime in the future.

My plan with this is not doing a complete emu (it's totally out of my abilities), just a very partial one, as, apart from trigram indexing the binary, I wanted to get the initial code path until possible (wait for keyboard, etc). I think someone modified a dosbox for auto running games, take snapshots, force keyboard inputs and doing more snapshots? I remember something like that. These could be use for image scraping Dos games collections.

So, my idea was not to fully implement a full exe loader (imitating origina DOS code), just enough to be run xxx opcodes.

About RAM, my idea when loading the RAM snapshot is marking the already used ram locations, with exe loaded. So, any futher ram modifications indicates new memory used, also it could indicate modifications in critical ram areas, as dos zones, interrupt vector, etc.

Of course I imagine this is much more easy to just write in words than write real code, and I am smelling this could have some many side cases and quirks that is much more work that i can think.

Anyway, thanks everyone for comments and tips.

VCF West	Aug 01 - 02 2025,	CHM, Mountain View, CA
VCF Midwest	Sep 13 - 14 2025,	Schaumburg, IL
VCF Montreal	Jan 24 - 25, 2026,	RMC Saint Jean, Montreal, Canada
VCF SoCal	Feb 14 - 15, 2026,	Hotel Fera, Orange CA
VCF Southwest	May 29 - 31, 2026,	Westin Dallas Fort Worth Airport
VCF Southeast	June, 2026	Atlanta, GA

MSDOS loading .com / .exe process

Experienced Member

Veteran Member

Veteran Member

Experienced Member

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Experienced Member

Experienced Member

Veteran Member

Veteran Member

Experienced Member

Veteran Member

Experienced Member

Experienced Member

25k Member

Veteran Member

Veteran Member

Experienced Member