80x86 16-bit Compiling How-to

80x86 16-bit Compiling How-to
=============================
by Alexei A. Frounze
July the 4th, 2004

Table of Contents
=================
* Introduction
* Reviewing Memory Addressing in Real Mode of 80x86 CPU
* From 8080/8085 to 8086
* 16 or 20 Bits? Meet the Segment:Offset Pair!
* More Than 1 MB?
* Which Segment Register?
* Memory Models Employed by Realmode Compilers
* NASM, assembler
* Compiling with Open Watcom C/C++
* Important details on Open Watcom C/C++ compiler
* Calling Conventions and Register Conventions
* Work In Progress

Introduction
============
The need for making 16-bit code in is primarily due to the following
facts:

* An 80x86 CPU starts up in the real mode, employing its 16-bit
addressing scheme
* An 80x86 PC BIOS (which is what the CPU starts executing after
reset/power on) is mostly 16-bit and cannot be easily used in
32-bit protected mode of the 80386+ CPU
* To load the OS kernel from a disk (floppy or hard) it's natural to
use the BIOS, when no other I/O drivers are available
* To change screen modes, perform power management, etc, it's also
natural to use the BIOS functionality (for the same reason as
above)

So, one would want 16-bit real mode code to run on the 80x86 PC to
take advantage of using the BIOS and/or prepare to switch to the
32-bit protected mode of the CPU, like in e.g. bootloaders or OS
loaders. For some purposes, pure 16-bit real mode code is enough as
well. And you can compile your own ROM BIOS for an embedded x86-based
system!

Reviewing Memory Addressing in Real Mode of 80x86 CPU
=====================================================
Let's review realmode 80x86 memory addressing.

From 8080/8085 to 8086
======================
The intel 8086 CPU was derived from intel 8080/8085 CPU and inherited
16-bit ideas from it. Although being 16-bit and somewhat compatible
with 8080/8085, the 8086 CPU has an enhanced memory addressing
mechanism, which isn't condemned to the 16 lines of the address bus,
instead the 8086 has a 20 lines-wide address bus. So, unlike
8080/8085 (which could address up to 216 = 65536 bytes of memory,
i.e. 64 KB), the 8086 can address up to 220 = 1048576 bytes of
memory, i.e. 1 MB.

Now, let's see how intel implemented memory addressing...

An 8080/8085 would access its worth of 64 KB memory using direct and
indirect forms of address specifications in the CPU instructions.

For example:

Instruction: LDA 2050H
Action:
Load A (8-bit accumulator register) with byte from memory location 2050H.

Instruction: LHLD 0A00H
Action:
Load HL (16-bit register) with word from memory location 0A00H (byte
at 0A00H would go to L (least signigicant half of HL) and byte at
0A01H would go to H (most significant half of HL)).

Instuction: MOV A, M
Action:
Load A (8-bit accumulator register) with byte from memory location
specified in the 16-bit register HL (M designates accessing memory
indirectly through the HL register).

Instruction: LDAX B
Action:
Load A (8-bit accumulator register) with byte from memory location
specified in the 16-bit register BC.

Hence, it's very simple with 8080/8085. Either the 16-bit address is
a constant value encoded in the CPU instruction and the memory
location is accessed directly by using the encoded address (this is
direct addressing) or the 16-bit address is contained in a 16-bit
register of the CPU (BC or HL in our examples) and this address is
read from the register before accessing a memory location by this
address (this is indirect addressing).

Now, the 8086 can do the same thing...

Instruction: MOV AL, [2050H]
Action:
Load AL (least significant half of 16-bit accumulator register AX)
with byte from memory location 2050H.

Instruction: MOV BX, [0A00H]
Action:
Load BX (16-bit register) with word from memory location 0A00H (byte
at 0A00H would go to BL (least signigicant half of BX) and byte at
0A01H would go to BH (most significant half of BH)).

Instruction: MOV AL, [BX]
Action:
Load AL (least significant half of 16-bit accumulator register AX)
with byte from memory location specified in the 16-bit register BX.

Instruction: LODSB
Action:
Load AL (least significant half of 16-bit accumulator register AX)
with byte from memory location specified in the 16-bit register SI.

Same thing.
Almost...

16 or 20 Bits? Meet the Segment:Offset Pair!
============================================
Do you remember that 8086 has been said to have 20-bit-wide address bus?

You surely do, don't you?

Then how come the four 8086 instructions above specify only 16 bits
of the address?

Where's the leftover, the 4 other bits to make it 20-bit? :)

The fun part is that there's one special address register involved,
the DS register (data segment register). The DS register is also a
16-bit register.

The value of the DS register is concatenated with the 16-bit address
specified in the instruction. The concatenation is a bit tricky. The
DS value is shifted left by 4 binary positions (or, equivalently,
multiplied by 16) and then added to the 16-bit address specified in
the instruction.

Example:

BX=341BH
DS=123AH
MOV AL, [BX]

would load the AL register with a byte from memory location 123AH *
16 + 341BH = 123A0H + 341BH = 157BBH. 157BBH is the physical 20-bit
address that is placed on the address bus so that the memory value at
this address can be transferred to CPU (or backward, from CPU to
memory, with e.g. MOV [BX], AL).

Really simple.

The address of the form 123AH:341BH is referred as to logical address.

The part that is specified before the colon is referred as to segment
part of the address (or often for shortness just segment). The part
that is specified after the colon is referred as to offset part of
the address (for shortness just offset or sometimes displacement).

segment:offset pair is a logical address
segment * 16 + offset = physical address

So, with a constant value of segment (say, constant DS; there can be
other segment registers used) but with different values of offset, we
can address up to 216 = 65536 bytes = 64 KB of memory starting at the
physical address equal to segment * 16. This 64 KB region of memory
is referred as to segment. Right, same word is often used to refer to
different things and smart guys are known to do it all the time. :)
This is important to remember, if you're new to this addressing stuff
and its terminology. Hopefully, you'll be able to deduce from context
what segment stands for.

By changing the segment value (say DS value) and offset value we can
generate all the physical addresses from 0 up to 220-1, but this is
not the upper bound. Technically, if we take segment=0FFFFH and
offset=0FFFFH, then we'll end up with physical address equal to
10FFEFH, which needs 21 bit to be represented. The 8086 CPU has only
20 address lines, so such an address would lose its most significant
bit and wrap around zero and in this example the 8086 CPU would
access the byte at physical address 0FFEFH instead of 10FFEFH.

It is important to mention that there are many different logical
address possible such that transform to the same physical address.
This is the effect of the way the segment:offset pair is transformed
to the final, physical, address.

Just an example:

123AH * 16 + 341BH = 123A0H + 341BH = 157BBH
1239H * 16 + 342BH = 12390H + 342BH = 157BBH
143AH * 16 + 141BH = 143A0H + 141BH = 157BBH
...

More Than 1 MB?
===============
With introduction of the intel 80286 CPU, the number of address lines
extended to 24, so on the 80286 you can access memory above 1 MB mark
by using the segment:offset pair. Only FFF0H = 65520 bytes (almost 64
KB) above 1 MB can be accessed this way. But that can only be
possible if you enable the A20 address line (8086 had only A0 through
A19 lines). For compatibility (with 8086 PCs) reasons, the PC
engineers had added a programmable hardware mechanism on 80286+ based
PCs to enable and disable the A20 address line, so that the address
wrap around be possible just like on the 8086. When the A20 is
disabled, both 10FFEFH and 0FFEFH physical addresses, generated by a
80286+ CPU, would appear to the memory as physical address 0FFEFH,
i.e. the 20th address bit would always be 0.

We won't discuss details of A20 enabling and disabling here because
it's an off-topic.

For now, let's just mention that in the protected mode of the intel
80286+ and 80386+ CPUs, it's possible to access to much more memory
than 1 MB. The 80286 can access up to 16 MB of memory and the 80386
and 80486 can access up to 4 GB. Pentium class CPUs can access even
more. That's it about protected mode for now.

Which Segment Register?
=======================
OK. Let's get back to the segment registers... In fact, the 8086 CPU
always uses some segment register to read code/data from memory or
write data to memory.

The instructions executed by the 8086 CPU are sequentially read from
memory using the CS:IP pair of CPU registers (CS is Code Segment
register, IP is Instruction Pointer register). After execution of an
instruction has completed, the IP will increment so the next
instruction can be feched and executed. IP can also be changed by the
near jump, call and return instructions, e.g. the control is
transferred within 64 KB segment starting at physical address equal
to CS * 16. The far jump, call and return instructions modify both IP
and CS and make it possible to transfer control to any part of a
program anywhere in the 1 MB of addressable memory. Interrupt and
return from interrupt instructions always modify CS and IP, similarly
to far call and return instructions.

The 8086 CPU stack is organized with the SS:SP pair of registers (SS
is Stack Segment register, SP is Stack Pointer register). SP
decrements by 2 before a 16-bit word is stored on the stack, and
conversly increments by 2 after a 16-bit word is removed from the
stack. All interrupt, call and return instructions affect SP, not
affecting SS.

Let alone instruction fetch (with CS:IP) and stack manipulations
(with SS:SP)... The interesting thing is how the 8086 CPU transfers
data between itself and memory using direct and indirect addressing
with registers other than IP and SP. It might look a bit complicated,
but here's how it works...

The 8086 CPU registers are:

* AH/AL = AX
* BH/BL = BX
* CH/CL = CX
* DH/DL = DX
* FLAGS
* DI
* SI
* BP
* SP
* IP
* ES
* DS
* SS
* CS

Just for the completeness, 8086 CPU registers description:

Register: AX
Description:
16-bit Accumulator register, least and most significant halves (AL
and AH respectively) are separately accessible. Most suited
for/dedicated to the ALU operations and I/O.

Register: BX
Description:
16-bit Base register, least and most significant halves (BL and BH
respectively) are separately accessible. Can be used as indirect
address register when accessing memory.

Register: CX
Description:
16-bit Counter register, least and most significant halves (CL and CH
respectively) are separately accessible. Can be used to organize
loops and repeat string instructions.

Register: DX
Description:
16-bit Data register, least and most significant halves (DL and DH
respectively) are separately accessible. Used in some special ALU and
I/O operations.

Register: FLAGS
Description: 16-bit Flags register. Contains control/status flags.

Register: IP
Description:
16-bit Instruction Pointer register. Points to an instruction to be
executed.

Register: SP
Description:
16-bit Stack Pointer register. Points to the last 16-bit word pushed
to the stack.

Register: BP
Description:
16-bit Base Pointer register. Can be used as indirect address
register when accessing memory (handy for stack memory accesses).

Register: SI
Description:
16-bit Source Index register. Can be used as indirect address
register when accessing memory (used by string instructions).

Register: DI
Description:
16-bit Destination Index register. Can be used as indirect address
register when accessing memory (used by string instructions).

Register: CS
Description:
16-bit Code Segment register. Selects the 64 KB region of memory,
from which instructions are fetched and executed by the CPU.

Register: SS
Description:
16-bit Stack Segment register. Selects the 64 KB region of memory,
where the CPU stack is located.

Register: DS
Description:
16-bit Data Segment register. Selects the 64 KB region of memory,
with which most of memory reads and writes are done.

Register: ES
Description:
16-bit Extra data Segment register. Selects an additional 64 KB
region (additional to one selected by DS) of memory, with which more
memory reads and writes can be done. Used by string instructions that
work with DI.

Now, having introduced all of the 8086 CPU registers, let's see how
we can access memory using them for indirect addressing. What if I
want to use say register SI to indirectly address memory? Which
segment register will be used by default in this case? The following
table below lists all possible addressing modes and the default data
segment register used in each of them.

Addressing Mode: Direct/Displacement
Address Operand Format: [displacement/offset/label/whatever you call it]
Default Segment Register: DS

Addressing Mode: Indirect
Address Operand Format: [BX]
Default Segment Register: DS

Addressing Mode: Indirect
Address Operand Format: [BP]
Default Segment Register: SS

Addressing Mode: Indirect
Address Operand Format: [SI]
Default Segment Register: DS

Addressing Mode: Indirect
Address Operand Format: [DI]
Default Segment Register: DS (ES for string instructions)

Addressing Mode: Indirect+Displacement
Address Operand Format: [BX+displacement]
Default Segment Register: DS

Addressing Mode: Indirect+Displacement
Address Operand Format: [BP+displacement]
Default Segment Register: SS

Addressing Mode: Indirect+Displacement
Address Operand Format: [SI+displacement]
Default Segment Register: DS

Addressing Mode: Indirect+Displacement
Address Operand Format: [DI+displacement]
Default Segment Register: DS

Addressing Mode: Double Indirect+Displacement
Address Operand Format: [BX][SI]+displacement
Default Segment Register: DS

Addressing Mode: Double Indirect+Displacement
Address Operand Format: [BX][DI]+displacement
Default Segment Register: DS

Addressing Mode: Double Indirect+Displacement
Address Operand Format: [BP][SI]+displacement
Default Segment Register: SS

Addressing Mode: Double Indirect+Displacement
Address Operand Format: [BP][DI]+displacement
Default Segment Register: SS

Notes:

* displacement is a constant 8/16-bit value.
* [reg] means that a memory location is being indirectly accessed
through the register reg. The memory address (offset) is contained
in the register reg.
* [reg+displacement] means that a memory location is being indirectly
accessed through the register reg. The memory address (offset) is
the sum of the register reg value and the displacement value.
* [reg1][reg2]+displacement means that a memory location is being
indirectly accessed through the two registers reg1 and reg2. The
memory address (offset) is the sum of the values of the registers
reg1 and reg2 and the displacement value. That is, all three values
are added together to form the offset.

To summarize:

* Wherever the BP register used as indirect, SS is used as the
default segment register to make up the physical address
* Wherever the DI register is used by a string instruction, it's used
together with the ES segment register
* In all other cases, DS is used as default segment register for
accessing data

If you need to override the use of the default segment register, you
can explicitly specify the segment register to use, like so:

MOV AL, CS:MyTable[BP][SI] or
MOV AL, [CS:BP+SI+MyTable]

whichever format is supported by your assembler.

The prefix, consisting of segment name and colon, overrides the
default segment register to the one specified before the colon.

Memory Models Employed by Realmode Compilers
============================================
The following table summarizes the most common memory models employed
by 16-bit realmode 80x86 compilers.

Near pointers (in real mode) are 16-bit pointers, consisting only of
a 16-bit offset. The default segment register (CS for code, DS/SS for
data/stack) is assumed to be constant. Near pointers are small and
quick, need less code to handle.

Far pointers (in real mode) are 32-bit pointers, consisting of the
both 16-bit parts, segment and offset. Far pointer
increment/decrement usually doesn't affect the segment part of the
far pointer. Far pointers are big and slow, need more code to handle.

It is problematic to access objects or arrays bigger than 64 KB with
both near and far pointers in HLL (C/C++) compilers because this
needs manual implementation of far pointer arithmetics.

Tiny Memory Model
=================
< 64 KB code segment size, near pointer type
< 64 KB data segment size, near pointer type

Use the tiny model for small size applications.

All four segment registers (CS, DS, ES, SS) are set to the same
address, so you have a total of 64 KB for all of your code, data, and
stack. Near pointers are always used.

Tiny model programs can be compiled to .COM format.

SS=ES=DS=CS, always

Small Memory Model
==================
< 64 KB code segment size, near pointer type
< 64 KB data segment size, near pointer type

Use the small model for average size applications.

The code and data segments are different and don't overlap, so you
have 64 KB of code and 64 KB of data and stack. Near pointers are
always used.

SS=DS, usually

Medium Memory Model
===================
< 1 MB code segment size, far pointer type
< 64 KB data segment size, near pointer type

The medium model is best for large programs that don't keep much data
in memory.

Far pointers are used for code but not for data. As a result, data
plus stack are limited to 64 KB, but code can occupy up to 1 MB.

SS=DS, usually

Compact Memory Model
====================
< 64 KB code segment size, near pointer type
< 1 MB data segment size, far pointer type

Use Compact model if your code is small but you need to address a lot
of data.

The opposite of the medium model is true for the compact model: far
pointers are used for data but not for code; code is then limited to
64 KB, while data has a 1 MB range.

All functions are near by default and all data pointers are far by
default.

SS!=DS, usually

Large Memory Model
==================
< 1 MB code segment size, far pointer type
< 1 MB code segment size, far pointer type

Use Large model for very large applications, only.

Far pointers are used for both code and data, giving both a 1 MB
range. All functions and data pointers are far by default.

SS!=DS, usually

Huge Memory Model
=================
< 1 MB code segment size, far pointer type
< 1 MB code segment size, far pointer type

Use Huge Model for very large applications only. Far pointers are
used for both code and data. Turbo C++ normally limits the size of
all data to 64 KB; the huge memory model sets aside that limit,
allowing data to occupy more than 64 KB.

The Huge model allows multiple data segments, (each 64 KB in size),
up to 1 MB for code, and 64 KB for stack. All functions and data
pointers are assumed to be far.

SS!=DS, usually

NASM, assembler
===============
Useful options and commands:

* -f obj
Will generate Intel/OMF .OBJ object outfile (compatible with
Borland/Turbo C/C++/Pascal compilers) from the specified file.
* -F obj
Will generate Borland debug information (useful for TD only).
* -D[=value]
Predefines a macro.
* -U
Undefines a macro.

Compiling with Open Watcom C/C++
================================

Important details on Open Watcom C/C++ compiler
===============================================
Code is put to _TEXT segment with class CODE and USE16 attribute

By default, the data group DGROUP consists of the CONST, CONST2,
_DATA and _BSS segments. The compiler places certain types of data in
each segment.

The CONST segment (of class DATA) contains constant literals that
appear in your source code.

Example:

char* birds[3] = {"robin", "finch", "wren"};
printf ("Hello world\n");

In the above example, the strings "Hello world\n", "robin", "finch",
etc. appear in the CONST segment.

The CONST2 segment (of class DATA) contains initialized read-only data.

The _DATA segment (of class DATA) contains initialized writable data.

Example:

const int cvar = 1;
int var = 2;
int table[5] = {1, 2, 3, 4, 5};
char* birds[3] = {"robin", "finch", "wren"};

In the above example, the constant variable "cvar" is placed in the
CONST2 segment, "var", "table" and "birds" are placed in the _DATA
segment. Finally, the strings "robin", "finch", "wren" are placed in
the CONST segment.

The _BSS segment (of class BSS) contains uninitialized data such as
scalars, structures or arrays.

Example:

int var1;
int array1[400];

For Tiny/.COM model/format _TEXT, _DATA, CONST, CONST2 and _BSS
segments are grouped together into the DGROUP group. The _TEXT
segment must have "ORG 100h" or equivalent ("RESB 100h" if NASM used)
directive so the .COM format be possible. The _TEXT segment must be
the first in the DGROUP group.

For Small/.EXE model/format only _DATA, CONST, CONST2 and _BSS
segments are grouped together into the DGROUP group. The .EXE stack
segment (named _STACK, with attribute STACK and class STACK) is
either small unused (instead SS:SP is initialized by application to
point to end of data segment (DGROUP), so that SS=DS=DGROUP) or big
enough to be usable (and also grouped to DGROUP, so that
SS=DS=DGROUP).

Some of the arithmetic operators (long multiplication and division)
routines are implemented as functions and must be additionally linked
with your program.

By default, the compiler uses the register-based argument passing
(unlike Turbo C++). This register convention isn't covered here, but
I suppose, it can be deduced from the generated code and from the
assembler source codes for the Watcom standard C library.

By default, the compiler appends an underscore character to the
function and variable names when compiling C/C++ code to the object
file, e.g. "void MyFunction()" would have "MyFunction_" name in the
object file. Therefore, any external assembly functions must be
written with this in mind. An assembler function must have a name
with trailing underscore to be accessible from C/C++, e.g. asm name
"MyAsmFxn_" will be seen to the C/C++ code as say "extern void
MyAsmFxn()". And of course, if MyAsmFxn() needs to call MyFunction(),
it must "call MyFunction_" because in the object files the C/C++
names must have the trailing underscore.

Note: the additional underscore character in function/variable names
appears at different positions in Borland/Turbo C/C++ and Open Watcom
C/C++. By default Borland/Turbo does leading underscore, Watcom does
trailing underscore.

It is, however, possible to generate code with stack-based argument
passing and link Watcom compiled code with the code whose functions
have leading underscore in the object files. For this, there's a
special reserved keyword cdecl (may also be _cdecl and __cdecl).
Functions definded as, say, int cdecl fxn (int x); will compile for
stack-based argument passing and the additional underscore in the
name will appear in front of the C name, e.g. _fxn. This (cdecl)
calling and naming convention is exactly the same as adopted by the
Turbo C++ compiler.

Calling Conventions and Register Conventions
============================================
When calling a function, the following is pushed into the stack, in
the specified order:

* function arguments from last to first (notice reverse order!),
* return address.

The called function never removes its arguments from stack when
returning to the caller. The caller pushes arguments to the stack and
removes them after the call.

8-bit arguments are extended to 16-bit when pushed on the stack.

Function return values are placed into:
* AL (8-bit value) or
* AX (16-bit value) or
* both DX and AX (32-bit value, most significant half goes to DX,
least significant half goes to AX). Pointers in the real mode can
be 16-bit (near) and 32-bit (far). Segment part of a far pointer
goes to DX, while offset part goes to AX.

A function must preserve values of:
* DS, SS, BP, SI, DI (remember them for writing functions in
assembler).

* The direction flag (DF, in FLAGS register) should also be preserved
as 0.

* The ES register is not guaranteed to be equal to DS. Set ES to
value of DS if needed.

The reserved interrupt keyword provides additional entry and exit
code for void(void) functions to make them directly usable as
interrupt service routines. Their addresses can be directly stored
to the interrupt vector table. Remember that the compiler is
16-bit. The entry and exit code of interrupt functions won't
save/restore 80386+ 32-bit registers entirely, it will only
save/restore AX of EAX, etc. Floating point unit state isn't
saved/restored either.

Structure passing and returning by value isn't covered here.
Neither is floating point data. If you feel need this information,
you may create a C function that makes use of structures of
floating point types and generate assembly source code from it. You
may find most of answers to your questions by inspecting the
generated assembly source code. I'm not used to pass structures,
for most things a pointer to a structure is enough. And I don't
consider floating point support in Turbo C++ any serious or really
helpful for the kind of stuff OS developers do at first place.

Work In Progress
================
The work on this document is in progress. Meanwhile, try learning
things from the compiler documentation and source codes provided here
(already available).

If you want to contact me regarding this doc or anything else, please
post a message on the usenet:

news:alt.os.development

Alexei A. Frounze

From: <https://alexfru.narod.ru/os/c16/c16.html>