• Please review our updated Terms and Rules here

80386 D-bit

kerravon

Experienced Member
Joined
Oct 30, 2021
Messages
175
PM32 of the 80386 allows you to set the D-bit on a segment to say it is 16-bit.

There are some op codes that have changed behavior between RM16 and PM32. I can probably find an example if required.

If you set the D-bit to 16-bit, do you get the RM16 behavior of those opcodes, or the new PM32 behavior?

Thanks.
 
Wouldn't it be PM16 behavior since the descriptor of 16-bits would only apply in code running in 286 protected mode? Real mode and V86 mode don't have descriptors.
 
Wouldn't it be PM16 behavior since the descriptor of 16-bits would only apply in code running in 286 protected mode? Real mode and V86 mode don't have descriptors.

Ok, yes, I should have said PM16, not RM16.

Here is an example I was given of the same opcodes being used for quite different things:

;32-bit code
00000028 89FB mov ebx,edi
0000002A 8B4720 mov eax,[edi+32]

;16-bit code
00000028 89FB mov bx,di
0000002A 8B4720 mov ax,[bx+0x20]


The D-bit on an 80386 set to 16-bit would be expected to get the "mov bx, di" to take effect, as that is exactly what it is supposed to do - override the size.

But will it ensure that an opcode that (in this case) references edi, instead references bx?
 
Yes, the 16-bit addressing modes are different from those in 32-bit mode. Intel originally chose to only allow certain combinations of base+index, so that all of them could be encoded in 3 bits - and this was apparently more important than allowing any register to be used. So base must be BX or BP (or null), and index must be SI/DI/null.

In 32-bit mode, the short encoding is used to select any single 32-bit register(*), and more complex modes require another 'SIB' byte (scale, index, base).

There are two prefixes that override 16-/32-bit mode: one for operand size (66H) and another for addressing (67H).

So in 16-bit code, it will be interpreted like this:

Code:
8B 47 20        mov ax,[bx+32]
66 8B 47 20     mov eax,[bx+32]
67 8B 47 20     mov ax,[edi+32]
66 67 8B 47 20  mov eax,[edi+32]

And in 32-bit code, the effect of the prefixes is simply reversed.

(*) except ESP, though it can be used as a base with no index when the SIB byte is present. The assembler should take care of encoding details like that.
 
Last edited:
Yes, the 16-bit addressing modes are different from those in 32-bit mode. Intel originally chose to only allow certain combinations of base+index, so that all of them could be encoded in 3 bits - and this was apparently more important than allowing any register to be used. So base must be BX or BP (or null), and index must be SI/DI/null.

In 32-bit mode, the short encoding is used to select any single 32-bit register(*), and more complex modes require another 'SIB' byte (scale, index, base).

There are two prefixes that override 16-/32-bit mode: one for operand size (66H) and another for addressing (67H).

So in 16-bit code, it will be interpreted like this:

Code:
8B 47 20        mov ax,[bx+32]
66 8B 47 20     mov eax,[bx+32]
67 8B 47 20     mov ax,[edi+32]
66 67 8B 47 20  mov eax,[edi+32]

And in 32-bit code, the effect of the prefixes is simply reversed.

(*) except ESP, though it can be used as a base with no index when the SIB byte is present. The assembler should take care of encoding details like that.
I am getting back to this in another context now.

So in order to get this interpretation:

8B4720 mov ax,[bx+0x20]

it IS INDEED sufficient to set the D-bit in PM32 which will give me double the number of selectors.

The new context I spoke of is - I would like to use CM16 in x64 long mode.

I'm not familiar with that, but I wish to address the maximum amount of memory for my 16-bit code.

And critically - I want that interpretation:

8B4720 mov ax,[bx+0x20]

So, while in long mode, what do I need to do in order to set up a suitable environment, which may or may not be called CM16?

Note that I am NOT talking about legacy mode and I am NOT talking about v8086.

Thanks. Paul.


Edit: I am not (anymore) talking about PM32 either. I MAY be talking about CM32 though, if that is what gives me the most selectors suitable for running the above instruction.
 
Last edited:
Could you describe what you are planning in more detail?

As a starting point (ie for the purpose of this conversation), I wish to run *certain* OS/2 1.0 apps.

I already have a mini OS/2 1.0 clone, which you can find here:


(and surrounds)

I wish to run PDOS-generic 16-bit directly on an x64 UEFI-only computer. No Windows. No Linux. No other flavor of PDOS.

I have a pseudobios designed to be the glue between my totally portable - quibbling aside - PDOS-generic operating system, and whatever environment I have been given.

So I now wish to write that pseudobios for this particular OS matched to this particular hardware.

I have code to switch from 64-bit mode to CM32 already:




although it hasn't yet been fleshed out.

And now I'm wanting to do CM16:


assuming that is technically possible.

Currently I don't know how many selectors I get in long mode either. I was told that a fairly minor change to the above code should get me what I want, ie:

You can append any global descriptors you want to the existing descriptors the same way CM32 code descriptor is added (allocate more memory, copy existing descriptors in, create new descriptors, reload GDT).

Note that the OS/2 1.0 programs I wish to run use just doscalls.dll. There are no interrupts involved. So I "just" need to convert those DLL calls into callbacks via the pseudobios back to UEFI - switching between 64-bit long mode and CM16.

I don't wish to confuse the issue, so I should probably stop here.

However, I also have a PDOS/86 at pdos.org. All of my MSDOS apps are accessed via C wrappers like PosOpenFile. If I detect an 8086, I could do an actual INT 21h. But if I detect that I am on something better (I will have a mechanism to do that), then I can do the same thing that OS/2 1.0 apps would do - a callback.

Thus "properly-written" (TM) apps (which I am just defining now) would work on both a modern x64 with 512 MiB of memory or whatever CM16 can give me when combining GDT+LDT, with a code and data selector for every 64k chunk of memory starting at 0. So I can edit large files using microemacs. And microemacs would potentially still be a valid Euro MSDOS 4.0 (which handles NE) program. I am predominantly interested in getting huge memory model programs to work. I can potentially get some other memory model programs working too, perhaps even on non-Euro MSDOS (as well as CM16). Huge memory model on non-Euro MSDOS would require some extensions to the MSDOS executable format, as I suggested here:


I'm open to suggestions.
 
BTW, that "mechanism" I spoke of was invented while attempting to port PDOS to the Atari and demonstrated on MVS, with this code:


#if defined(NEED_VSE) || defined(NEED_MVS) || defined(NEED_BIGFOOT)
os.Xservice = service_call;

#if defined(NEED_BIGFOOT)
if (memcmp(entry_point + 12, "\x50\x47\x43\x58", 4) == 0)
{
*(void **)(entry_point + 20) = &os;
ascii_flag = 1;
}
#else
if (memcmp(entry_point + 4, "PGCX", 4) == 0)
{
*(void **)(entry_point + 12) = &os;
}
#endif

#endif




CSECT
DS 0H
USING *,R15
ENTRY @@CRT0
@@CRT0 DS 0H
#if BIGFOOT
BALR R15,R0
BCTR R15,0
BCTR R15,0
NOPR 0
#endif
B SKIPHDR
DC C'PGCX' # PDOS-generic (or compatible) extension
DC F'4' # length of header data
ENTRY @@PGPARM
* This will be zapped by z/PDOS-generic if running under it
@@PGPARM DC F'0'
*
SKIPHDR DS 0H
#if MVS
STM R14,R12,12(R13)
LR R11,R1
LR R6,R13
L R1,=V(@@PGPARM)
L R1,0(,R1)
LTR R1,R1
BZ NOTPDOS
LA R13,80(,R13)
LR R12,R15
DROP R15
USING @@CRT0,R12
B BYPASS1
 
Yes, long mode is fully compatible with PM16 - the only thing that isn't supported anymore is v86, but as you said, you aren't going to use that.

It's as simple as adding 16-bit code and data segments to the LDT (or GDT, if you really want to). OS/2 style call gates should work too, however I suggest implementing the int21h interface as well. That would allow you to run the same binary both under real mode DOS and PDOS, as long as it's "well behaved". Not many existing DOS programs are, of course, but your own would work in both.

---

I've played around with this idea a bit under Linux, and made a minimal proof-of-concept "DOS compatibility layer". A converter program takes in a specially arranged .EXE (small model, DS must be same as initial SS, no relocations), and produces a .COM file that is only about 800 bytes larger, still runs under DOS, but is at the same time also an ELF binary compatible with 32- and 64-bit Linux.

The tacked-on code is so short since it only supports translating AH=3Fh (read), 40h (write) and 4Ch (exit) to Linux syscalls, but with standard handles redirected by the shell that's already enough to do things. The converter itself was written according to these restrictions, so that I had a demo program to use it on :)

---

it IS INDEED sufficient to set the D-bit in PM32 which will give me double the number of selectors.

Not quite sure what you mean here, but regular descriptor table entries are always 8 bytes long, only the system descriptors take up two of these slots in long mode.
 
Yes, long mode is fully compatible with PM16 - the only thing that isn't supported anymore is v86, but as you said, you aren't going to use that.

It's as simple as adding 16-bit code and data segments to the LDT (or GDT, if you really want to). OS/2 style call gates should work too, however I suggest implementing the int21h interface as well. That would allow you to run the same binary both under real mode DOS and PDOS, as long as it's "well behaved". Not many existing DOS programs are, of course, but your own would work in both.

---

I've played around with this idea a bit under Linux, and made a minimal proof-of-concept "DOS compatibility layer". A converter program takes in a specially arranged .EXE (small model, DS must be same as initial SS, no relocations), and produces a .COM file that is only about 800 bytes larger, still runs under DOS, but is at the same time also an ELF binary compatible with 32- and 64-bit Linux.

The tacked-on code is so short since it only supports translating AH=3Fh (read), 40h (write) and 4Ch (exit) to Linux syscalls, but with standard handles redirected by the shell that's already enough to do things. The converter itself was written according to these restrictions, so that I had a demo program to use it on :)

---



Not quite sure what you mean here, but regular descriptor table entries are always 8 bytes long, only the system descriptors take up two of these slots in long mode.
Thanks. Sounds like it will work. Note:

My programs can run under MSDOS without implementing int 21h in CM16 so long as the app detects its environment and chooses whether to do an actual interrupt or do a callback to a pseudobios-provided int86 function.

And the other point.

How many selectors are there in PM16, PM32, CM16 and CM32?

Whichever one has the most is the one that I want, as I can address the most memory in a simple, flat setup.

Because even the 32 bit ones are suitable for 16 bit code. I just need to set the d bit appropriately and I get 16 bit behavior. Right?
 
One of us must be misunderstanding something here...

There's always a maximum of 8192 entries per descriptor table. In long mode, descriptors for LDT, TSS and call gates take up two consecutive entries, but code/data segments don't. The setting of the 'L' and 'D' bits does not affect how many selectors are available.

A code segment with the 'L' bit set will execute in 64-bit mode. The 'D' bit must be clear in that case since that combination is "reserved for future use" (see AMD64 System Programming Guide). When 'L' is clear and 'D' set, it is a 32-bit segment, and 16-bit when both bits are clear.

The default address and operand size is always determined by the code segment. For data segments, the only effect of the 'D' bit is the upper limit for expand-down segments and whether stack operations use SP or ESP.

And I don't get why the program should have to detect its environment and use a call to some "pseudobios". Software interrupts will work in any mode, for example 64-bit Linux continues to provide both INT 80h and the SYSCALL instruction as ways to call into the kernel. And as mentioned before, it even provides enough functionality to emulate INT 21h entirely in userspace, by installing a custom SIGSEGV handler with separate stack. Doing it in the kernel would obviously be more efficient, and something any DOS compatible OS should do, IMO.
 
One of us must be misunderstanding something here...

There's always a maximum of 8192 entries per descriptor table. In long mode, descriptors for LDT, TSS and call gates take up two consecutive entries, but code/data segments don't. The setting of the 'L' and 'D' bits does not affect how many selectors are available.

A code segment with the 'L' bit set will execute in 64-bit mode. The 'D' bit must be clear in that case since that combination is "reserved for future use" (see AMD64 System Programming Guide). When 'L' is clear and 'D' set, it is a 32-bit segment, and 16-bit when both bits are clear.

The default address and operand size is always determined by the code segment. For data segments, the only effect of the 'D' bit is the upper limit for expand-down segments and whether stack operations use SP or ESP.

And I don't get why the program should have to detect its environment and use a call to some "pseudobios". Software interrupts will work in any mode, for example 64-bit Linux continues to provide both INT 80h and the SYSCALL instruction as ways to call into the kernel. And as mentioned before, it even provides enough functionality to emulate INT 21h entirely in userspace, by installing a custom SIGSEGV handler with separate stack. Doing it in the kernel would obviously be more efficient, and something any DOS compatible OS should do, IMO.

Thanks for all that info.

I believe PM16 on the 80386 at least has half as many selectors available. So 4096 per table I assume.

I think I have enough info to begin now.
 
Looking at the documentation for both the 286 and 386 indicate there should be a maximum of 8192 selectors available in 16-bit protected mode.
 
Is that a combined GDT+LDT?

It's 8192 for each table. The selector is ANDed with FFF8h to get the offset of the descriptor, while the lowest 3 bits indicate which table to use and what privilege level is requested. RTFM!

That would allow mapping almost a full gigabyte of linear address space to 16-bit programs, using the GDT and LDT combined with paging. Each process would have a separate page map like it does in mainstream OSes, so there's not even a need to reload GDTR/LDTR - they'll always be at the same linear address and can have the limit set to maximum, the pages that don't contain valid entries for the current process would simply be marked as not present.

---

You could load unmodified MZ executables by first marking unknown-type segments as invalid, then using the trap handler to change them into code or data depending on whether the instruction is a control transfer or not. Limit can be determined by the distance to the next higher paragraph number, or to the end of initially allocated memory if it's the highest one.

Extending the format with a table to specify the type and exact limit for each segment seems like a good idea too, but I really don't like adding too much else, especially things specific to one compiler and runtime library. Passing parameters like AHINCR/AHSHIFT should be done in the PSP segment. If they need to be 'zapped' in anywhere, then the program's init code can take care of it.

For memory models other than large/huge, one useful feature would be the ability to specify an expand-down stack segment with a different selector and limit but the same base address as the (full 64K) data segment. Then near pointers in DS would be able to refer to data on the stack with the same offset as in SS, but any access using SS would be confined to be inside the actual stack area at the top of the combined data/stack segment.

DOS always pushes a zero onto the initial stack for historical reasons. Since almost no MZ EXE program depends on it being zero, some other value could be used as a minimally intrusive way to inform the program that it's running in a PM16 environment. Could contain some bitflags, or a pointer to a structure with more information. Or maybe replace the INT 20h at PSP:0000 with the ASCII characters 'PM'?
 
Last edited:
It's 8192 for each table. The selector is ANDed with FFF8h to get the offset of the descriptor, while the lowest 3 bits indicate which table to use and what privilege level is requested. RTFM!

Thanks for the correction.


I traced the source of my misunderstanding of PM16 and have posted this to alt.os.development:



On 2021-07-14 wolfgang kern wrote:



>> I assume I still have 16384 selectors, so I will still be limited
>> to a total of 1 GiB.


> No you can have maximal 8191 selectors


I believe you failed to account for the fact that I can
use the LDT in this scheme too.

So I have a total of approx 16384 selectors available
under PM16.


...
 
My programs can run under MSDOS without implementing int 21h in CM16 so long as the app detects its environment and chooses whether to do an actual interrupt or do a callback to a pseudobios-provided int86 function.

I made a mistake here. It's not a pseudobios-provided int86 function. It's an OS-provided function. With the OS being PDOS/86.

I believe the OS should have the right to tell the app to not insist on doing a real interrupt. Similar to how calling doscalls.dll in OS/2 1.0 gives the OS the flexibility to choose whether real interrupts are used or not.

Yes - I agree there are alternatives. I just don't like them.
 
The default address and operand size is always determined by the code segment. For data segments, the only effect of the 'D' bit is the upper limit for expand-down segments and whether stack operations use SP or ESP.

Assuming that the D bit is set for a stack segment (I assume UEFI has that already), does this mean that in my 16-bit code, I can do the normal:

mov bp, sp
mov ax, [bp + 6]

And bp is actually ebp so I don't need to change the UEFI-provided stack, so long as I am lucky enough to get a high value for the lower-16 bits there?

Or even if I'm not lucky, and the lower 16 bits are small, I still get a very large stack, as it handles the wrap on the 64k boundary?

Thanks.
 
I don't get exactly what you're trying to do here, but this approach seems completely unworkable and based on several misunderstandings of the x86 architecture. When I said "stack operations" what I meant was push, pop and other operations that implicitly use SP/ESP.

Also, UEFI assumes a flat address space, there is no "stack segment" provided by it. You'll have to set up your own environment for the 16-bit code, with separate segments defined in the GDT and/or LDT. That code can call into your kernel via interrupt (or some other mechanism, but I wouldn't recommend it), which can then invoke the UEFI services or do whatever else you want.
 
Back
Top