8086 emulator tech discussion

Mike Chambers · May 12, 2011

i'm actually putting together a CPU emulator programming tutorial to share what i've learned with other programmers who are interested. so far the table of contents looks like this... any suggestions on additions?

Contents
1. Introduction to CPU emulation
- What is CPU emulation?
- Emulation vs. virtualization.
- Goals of this tutorial.
- What you need to know before going further.

2. The basics
- How do CPUs work?
- Emulation basics.
- Meet our example CPU, the Intel 8086.

3. Intel 8086 technical overview
- The internal registers.
- The flags register.
- Addressing modes.
- Memory segmentation.
- Program execution flow.
- The 8086 instruction set overview.

4. Implementation of an 8086 emulator
- Emulator engine overview.
- Emulating the registers.
- Emulating the memory.
- Parsing the bytecode and software flow.
- Addressing mode byte decoding.
- Emulating the various instructions.

5. Emulating basic PC support hardware
- Simple text-mode video card emulation.
- Keyboard input.
- Floppy disk emulation.
- Hardware timer interrupt.

6. Appendix
- Full 8086 instruction set.
- Opcode hex table.

Chuck(G) · May 12, 2011

So, is your next step going to be 80186 or V20 emulators?

Mike Chambers · May 12, 2011

Chuck(G) said:
So, is your next step going to be 80186 or V20 emulators?

it already supports all the 80186 instructions other than the string-port ones. the next step is smooth out the many rough edges and try to turn this into a presentable application! i'm just finalizing getting the C code caught up to where the BASIC version is, and then i'll go from there.

for some reason i'm having a difficult time getting the PC speaker emulation to sound decent.

kiyotewolf · May 16, 2011

I want a copy of the booklet tutorial thingy when you're done Mike.

~Kiyote!

Mike Chambers · May 19, 2011

will do!

Mike Chambers · Jun 12, 2011

i'm working on adding proper ATA controller support, but i can't seem to find any good detailed info on the contents that the IDENTIFY command is supposed to return. does anybody know about this? btw, i'm using aitotat's universal IDE BIOS.

Chuck(G) · Jun 12, 2011

Try this, Mike:

ftp://ftp.seagate.com/pub/acrobat/reference/111-1c.pdf

Mike Chambers · Jun 13, 2011

perfect! thanks.

Mike Chambers · Jun 13, 2011

if i could borrow your knowledge again, chuck (or anybody else) - an unrelated question. i'm trying to implement A20 support for himem. it does enable the A20 line, but then it seemingly locks up. really, it's just executing erroneous code somewhere it shouldn't be.

is there anything i need to do other than add 0x100000 to the mem address for read/writes when A20 is toggled on via port 60h (bit 1)? this is what i am doing.

i know it's only really found on 286+ machines, but as long as the port 60h toggle is implemented it should work just fine on an 8086, no? one thought was that the himem driver might be trying to run 286 code, but the emulator never reports any illegal opcodes being performed.

i don't need to implement protected mode support if i only want to be able to access up 64 KB of RAM past 1 MB, right?

reenigne · Jun 13, 2011

There's a difference between enabling the A20 line, and pulling it high - it sounds like you're doing the latter but you should be doing the former. The reason it would have no effect on a real 8086 is that there is no A20 line - the physical chip only has pins for A0-A19. With the 286+ there is a physical A20 line but when the A20 line is disabled it isn't connected to anything, so that reading from or writing to address 0xFFFF:0xFFFF accesses the physical memory at address 0x0FFEF, as it does on the 8086. When the A20 line is enabled and >1Mb of memory is installed, address 0xFFFF:0xFFFF accesses the physical memory at address 0x10FFEF. So that 0x100000 is only added if both A20 is enabled and the address wraps around.

Mike Chambers · Jun 13, 2011

reenigne said:
There's a difference between enabling the A20 line, and pulling it high - it sounds like you're doing the latter but you should be doing the former. The reason it would have no effect on a real 8086 is that there is no A20 line - the physical chip only has pins for A0-A19. With the 286+ there is a physical A20 line but when the A20 line is disabled it isn't connected to anything, so that reading from or writing to address 0xFFFF:0xFFFF accesses the physical memory at address 0x0FFEF, as it does on the 8086. When the A20 line is enabled and >1Mb of memory is installed, address 0xFFFF:0xFFFF accesses the physical memory at address 0x10FFEF. So that 0x100000 is only added if both A20 is enabled and the address wraps around.

right, no A20 line on the 8086 - however, if you were to put a line from the keyboard controller to the memory bus you should still be able to access the first 64 KB above 1 MB regardless of whether the CPU realizes it or not. but yeah, himem isn't designed to work that way.

it was about 3 AM when i had this whole idea, i needed sleep and i didn't think it through.

Mike Chambers · Jun 13, 2011

i finally started putting together an actual website for fake86. it also has a (currently empty) forum, and i have 8 screenshots up. i will probably get a download up tonight for the source and binaries of the latest version. i want to tweak a few things first though. also, it now handles disk images via the command line similar to QEMU. no more config files.

http://fake86.rubbermallet.org/

reenigne · Jun 14, 2011

Mike Chambers said:
right, no A20 line on the 8086 - however, if you were to put a line from the keyboard controller to the memory bus you should still be able to access the first 64 KB above 1 MB regardless of whether the CPU realizes it or not. but yeah, himem isn't designed to work that way.

Well, you'd be able to access a whole other megabyte, but the trouble is that it would be empty - all your code would still be in the first megabyte. So it would crash immediately after you toggled the A20 line to high just running a bunch of "ADD [BX+SI],AL".

What you need is a way to access the >1Mb memory (for putting some data and/or code there) whilst still allowing the CPU to access the first megabyte for running the code that populates the >1Mb memory. So what you can do (on an 8088/8086) is make some addresses map to the first megabyte and some addresses map to different parts of the larger physical memory depending on some IO port bits. Then you've just reinvented EMS!

Mike Chambers · Jun 14, 2011

reenigne said:
Well, you'd be able to access a whole other megabyte, but the trouble is that it would be empty - all your code would still be in the first megabyte. So it would crash immediately after you toggled the A20 line to high just running a bunch of "ADD [BX+SI],AL".

What you need is a way to access the >1Mb memory (for putting some data and/or code there) whilst still allowing the CPU to access the first megabyte for running the code that populates the >1Mb memory. So what you can do (on an 8088/8086) is make some addresses map to the first megabyte and some addresses map to different parts of the larger physical memory depending on some IO port bits. Then you've just reinvented EMS!

you're right, it wouldn't be able to run any proper code. whoops. down the road, i will look into coming up with an EMS "hardware" interface scheme for fake86, and write a DOS driver for it.

Chuck(G) · Jun 14, 2011

Take a look at the 8086/8088 data sheets. The CPU status indicates which segment register is being used to access memory. In fact, the 8088 can access 4MB of memory if you're willing to submit to a Harvard architecture.

But you could just as easily gate only the ES accesses so that they addressed the second meg of memory. It does work; I've tried it.

pearce_jj · Jun 14, 2011

Been following this thread for a while... I've never heard of the 4MB address space thing before - is this the limit for the total EMS or something else?

Chuck(G) · Jun 14, 2011

Something else. The 8088/8086 outputs with every memory access the ID of the segment register associated with that access. Most systems, such as the PC and XT ignore this status. But if you decode the status, you can have a separate 1MB address space for each of the segments addressed using CS, DS, ES and SS.

pearce_jj · Jun 14, 2011

Interesting indeed. I wonder why this was never explored - I guess simply that the 286 was already in production when the XT shipped so there was no point.

eeguru · Jun 14, 2011

a) "640K is enough for everyone"

b) Pure Harvard architectures are very limiting without immediately addressable static data

Chuck(G) · Jun 14, 2011

eeguru said:
a) "640K is enough for everyone"

Hmmm, the first time I heard the quote it was "65K". Still, it's probably a valid observation.

And I experienced a similar quote before BillG's quip. I was asked to participated in a review of a new supercomputer architecture in the early 70s that was being pitched to the government (they didn't bite). As this was nothing but a huge number-crunching box, memory protection was very simplistic. When the system switched into executive mode, you were given a fixed 16 KW for code space to stick all of your privileged OS code. When I objected to the inflexibility, the designer responded that if I couldn't fit an executive into 16KW, I should be selling shoes.

BillG's observation of 640K/64K/65K (however it's reported) is still perfectly valid if you ask "64K what?" I could be very happy with 64K words of a terabit each, or 64K petabyte-sized pages...

b) Pure Harvard architectures are very limiting without immediately addressable static data

Pure Harvard architectures are extremely rare if you argue that immediate-operand instructions violate the model. Otherwise, they're more common than you'd think. Consider the small 6- and 8-pin PICs. If you need a lookup table, you do it with RETLW instructions.

But that's not what I was talking about. The x86 family still allows for segment override prefixes and they're used quite a bit.

VCF West	Aug 01 - 02 2025,	CHM, Mountain View, CA
VCF Midwest	Sep 13 - 14 2025,	Schaumburg, IL
VCF Montreal	Jan 24 - 25, 2026,	RMC Saint Jean, Montreal, Canada
VCF SoCal	Feb 14 - 15, 2026,	Hotel Fera, Orange CA
VCF Southwest	June, 2025	University of Texas at Dallas
VCF Southeast	June, 2025	Atlanta, GA

8086 emulator tech discussion

Veteran Member

25k Member

Veteran Member

Experienced Member

Veteran Member

Veteran Member

25k Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

25k Member

Veteran Member

25k Member

Veteran Member

Veteran Member

25k Member