HTML "Adding Application Support for a New Architecture in Plan 9
TL
Adding Application Support for a New Architecture in Plan 9
AU
Bob Flandrena
[email protected]
SH
Introduction
LP
Plan 9 has five classes of architecture-dependent software:
headers, kernels, compilers and loaders, the
CW libc
system library, and a few application programs.  In general,
architecture-dependent programs
consist of a portable part shared by all architectures and a
processor-specific portion for each supported architecture.
The portable code is often compiled and stored in a library
associated with
each architecture.  A program is built by
compiling the architecture-specific code and loading it with the
library.  Support for a new architecture is provided
by building a compiler for the architecture, using it to
compile the portable code into libraries,
writing the architecture-specific code, and
then loading that code with
the libraries.
LP
This document describes the organization of the architecture-dependent
code and headers on Plan 9.
The first section briefly discusses the layout of
the headers and the source code for the kernels, compilers, loaders, and the
system library,
CW libc .
The second section provides a detailed
discussion of the structure of
CW libmach ,
a library containing almost
all architecture-dependent code
used by application programs.
The final section describes the steps required to add
application program support for a new architecture.
SH
Directory Structure
PP
Architecture-dependent information for the new processor
is stored in the directory tree rooted at \f(CW/\fP\fIm\fP
where
I m
is the name of the new architecture (e.g.,
CW mips ).
The new directory should be initialized with several important
subdirectories, notably
CW bin ,
CW include ,
and
CW lib .
The directory tree of an existing architecture
serves as a good model for the new tree.
The architecture-dependent
CW mkfile
must be stored in the newly created root directory
for the architecture.  It is easiest to copy the
mkfile for an existing architecture and modify
it for the new architecture.  When the mkfile
is correct, change the
CW OS
and
CW CPUS
variables in the
CW /sys/src/mkfile.proto
to reflect the addition of the new architecture.
SH
Headers
LP
Architecture-dependent headers are stored in directory
CW /\fIm\fP/include
where
I m
is the name of the architecture (e.g.,
CW mips ).
Two header files are required:
CW u.h
and
CW ureg.h .
The first defines fundamental data types,
bit settings for the floating point
status and control registers, and
CW va_list
processing which depends on the stack
model for the architecture.  This file
is best built by copying and modifying the
CW u.h
file from an architecture
with a similar stack model.
The
CW ureg.h
file
contains a structure describing the layout
of the saved register set for
the architecture; it is defined by the kernel.
LP
Header file
CW /sys/include/a.out.h
contains the definitions of the magic
numbers used to identify executables for
each architecture.  When support for a new
architecture is added, the magic number
for the architecture must be added to this file.
LP
The header format of a bootable executable is defined by
each manufacturer.  Header file
CW /sys/include/bootexec.h
contains structures describing the headers currently
supported.  If the new architecture uses a common header
such as COFF,
the header format is probably already defined,
but if the bootable header format is non-standard,
a structure defining the format must be added to this file.
LP
SH
Kernel
LP
Although the kernel depends critically on the properties of the underlying
hardware, most of the
higher-level kernel functions, including process
management, paging, pseudo-devices, and some
networking code, are independent of processor
architecture.  The portable kernel code
is divided into two parts: that implementing kernel
functions and that devoted to the boot process.
Code in the first class is stored in directory
CW /sys/src/9/port
and the portable boot code is stored in
CW /sys/src/9/boot .
Architecture-dependent kernel code is stored in the
subdirectories of
CW /sys/src/9
named for each architecture.
LP
The relationship between the kernel code and the boot code
is convoluted and subtle.  The portable boot code
is compiled into a library for each architecture.  An architecture-specific
main program is loaded with the appropriate library and the resulting
executable is compiled into the kernel where it is executed as
a user process during the final stages of kernel initialization.  The boot process
performs authentication, attaches the name space root to the appropriate
file system and starts the
CW init
process.
LP
The organization of the portable kernel source code differs from that
of most other architecture-specific code.
Instead of storing the portable code in a library
and loading it with the architecture-specific
code, the portable code is compiled directly into
the directory containing the architecture-specific code
and linked with the object files built from the source in that directory.
LP
SH
Compilers and Loaders
LP
The compiler source code conforms to the usual
organization: portable code is compiled into a library
for each architecture
and the architecture-dependent code is loaded with
that library.
The common compiler code is stored in
CW /sys/src/cmd/cc .
The
CW mkfile
in this directory compiles the portable source and
archives the objects in a library for each architecture.
The architecture-specific compiler source
is stored in a subdirectory of
CW /sys/src/cmd
with the same name as the compiler (e.g.,
CW /sys/src/cmd/vc ).
LP
There is no portable code shared by the loaders.
Each directory of loader source
code is self-contained, except for
a header file and an instruction name table
included from the
directory of the associated
compiler.
LP
SH
Libraries
LP
Most C library modules are
portable; the source code is stored in
directories
CW /sys/src/libc/port
and
CW /sys/src/libc/9sys .
Architecture-dependent library code
is stored in the subdirectory of
CW /sys/src/libc
named the same as the target processor.
Non-portable functions not only
implement architecture-dependent operations
but also supply assembly language implementations
of functions where speed is critical.
Directory
CW /sys/src/libc/9syscall
is unusual because it
contains architecture-dependent information
for all architectures.
It holds only a header file defining
the names and numbers of system calls
and a
CW mkfile .
The
CW mkfile
executes an
CW rc
script that parses the header file, constructs
assembler language functions implementing the system
call for each architecture, assembles the code,
and archives the object files in
CW libc .
The assembler language syntax and the system interface
differ for each architecture.
The
CW rc
script in this
CW mkfile
must be modified to support a new architecture.
LP
SH
Applications
LP
Application programs process two forms of architecture-dependent
information: executable images and intermediate object files.
Almost all processing is on executable files.
System library
CW libmach
provides functions that convert
architecture-specific data
to a portable format so application programs
can process this data independent of its
underlying representation.
Further, when a new architecture is implemented
almost all code changes
are confined to the library;
most affected application programs need only be reloaded.
The source code for the library is stored in
CW /sys/src/libmach .
LP
An application program running on one type of
processor must be able to interpret
architecture-dependent information for all
supported processors.
For example, a debugger must be able to debug
the executables of
all architectures, not just the
architecture on which it is executing, since
CW /proc
may be imported from a different machine.
LP
A small part of the application library
provides functions to
extract symbol references from object files.
The remainder provides the following processing
of executable files or memory images:
IP \(bu
Header interpretation.
IP \(bu
Symbol table interpretation.
IP \(bu
Execution context interpretation, such as stack traces
and stack frame location.
IP \(bu
Instruction interpretation including disassembly and
instruction size and follow-set calculations.
IP \(bu
Exception and floating point number interpretation.
IP \(bu
Architecture-independent read and write access through a
relocation map.
LP
Header file
CW /sys/include/mach.h
defines the interfaces to the
application library.  Manual pages
I mach (2),
I symbol (2),
and
I object (2)
describe the details of the
library functions.
LP
Two data structures, called
CW Mach
and
CW Machdata ,
contain architecture-dependent  parameters and
a jump table of functions.
Global variables
CW mach
and
CW machdata
point to the
CW Mach
and
CW Machdata
data structures associated with the target architecture.
An application determines the target architecture of
a file or executable image, sets the global pointers
to the data structures associated with that architecture,
and subsequently performs all references indirectly through the
pointers.
As a result, direct references to the tables for each
architecture are avoided and the application code intrinsically
supports all architectures (though only one at a time).
LP
Object file processing is handled similarly: architecture-dependent
functions identify and
decode the intermediate files for the processor.
The application indirectly
invokes a classification function to identify
the architecture of the object code and to select the
appropriate decoding function.  Subsequent calls
then use that function to decode each record.  Again,
the layer of indirection allows the application code
to support all architectures without modification.
LP
Splitting the architecture-dependent information
between the
CW Mach
and
CW Machdata
data structures
allows applications to choose
an appropriate level of service.  Even though an application
does not directly reference the architecture-specific data structures,
it must load the
architecture-dependent tables and code
for all architectures it supports.  The size of this data
can be substantial and many applications do not require
the full range of architecture-dependent functionality.
For example, the
CW size
command does not require the disassemblers for every architecture;
it only needs to decode the header.
The
CW Mach
data structure contains a few architecture-specific parameters
and a description of the processor register set.
The size of the structure
varies with the size of the register
set but is generally small.
The
CW Machdata
data structure contains
a jump table of architecture-dependent functions;
the amount of code and data referenced by this table
is usually large.
SH
Libmach Source Code Organization
LP
The
CW libmach
library provides four classes of functionality:
LP
IP "Header and Symbol Table Decoding\ -\ "
Files
CW executable.c
and
CW sym.c
contain code to interpret the header and
symbol tables of
an executable file or executing image.
Function
CW crackhdr
decodes the header,
reformats the
information into an
CW Fhdr
data structure, and points
global variable
CW mach
to the
CW Mach
data structure of the target architecture.
The symbol table processing
uses the data in the
CW Fhdr
structure to decode the symbol table.
A variety of symbol table access functions then support
queries on the reformatted table.
IP "Debugger Support\ -\ "
Files named
CW \fIm\fP.c ,
where
I m
is the code letter assigned to the architecture,
contain the initialized
CW Mach
data structure and the definition of the register
set for each architecture.
Architecture-specific debugger support functions and
an initialized
CW Machdata
structure are stored in
files named
CW \fIm\fPdb.c .
Files
CW machdata.c
and
CW setmach.c
contain debugger support functions shared
by multiple architectures.
IP "Architecture-Independent Access\ -\ "
Files
CW map.c ,
CW access.c ,
and
CW swap.c
provide accesses through a relocation map
to data in an executable file or executing image.
Byte-swapping is performed as needed.  Global variables
CW mach
and
CW machdata
must point to the
CW Mach
and
CW Machdata
data structures of the target architecture.
IP "Object File Interpretation\ -\ "
These files contain functions to identify the
target architecture of an
intermediate object file
and extract references to symbols.  File
CW obj.c
contains code common to all architectures;
file
CW \fIm\fPobj.c
contains the architecture-specific source code
for the machine with code character
I m .
LP
The
CW Machdata
data structure is primarily a jump
table of architecture-dependent debugger support
functions. Functions select the
CW Machdata
structure for a target architecture based
on the value of the
CW type
code in the
CW Fhdr
structure or the name of the architecture.
The jump table provides functions to swap bytes, interpret
machine instructions,
perform stack
traces, find stack frames, format floating point
numbers, and decode machine exceptions.  Some functions, such as
machine exception decoding, are idiosyncratic and must be
supplied for each architecture.  Others depend
on the compiler run-time model and several
architectures may share code common to a model.  For
example, many architectures share the code to
process the fixed-frame stack model implemented by
several of the compilers.
Finally, some
functions, such as byte-swapping, provide a general capability and
the jump table need only select an implementation appropriate
to the architecture.
LP
SH
Adding Application Support for a New Architecture
LP
This section describes the
steps required to add application-level
support for a new architecture.
We assume
the kernel, compilers, loaders and system libraries
for the new architecture are already in place.  This
implies that a code-character has been assigned and
that the architecture-specific headers have been
updated.
With the exception of two programs,
application-level changes are confined to header
files and the source code in
CW /sys/src/libmach .
LP
IP 1.
Begin by updating the application library
header file in
CW /sys/include/mach.h .
Add the following symbolic codes to the
CW enum
statement near the beginning of the file:
RS
IP \(bu
The processor type code, e.g.,
CW MSPARC .
IP \(bu
The type of the executable.  There are usually
two codes needed: one for a bootable
executable (i.e., a kernel) and one for an
application executable.
IP \(bu
The disassembler type code.  Add one entry for
each supported disassembler for the architecture.
IP \(bu
A symbolic code for the object file.
RE
LP
IP 2.
In a file name
CW /sys/src/libmach/\fIm\fP.c
(where
I m
is the identifier character assigned to the architecture),
initialize
CW Reglist
and
CW Mach
data structures with values defining
the register set and various system parameters.
The source file for a similar architecture
can serve as template.
Most of the fields of the
CW Mach
data structure are obvious
but a few require further explanation.
RS
IP "\f(CWkbase\fP\ -\ "
This field
contains the address of the kernel
CW ublock .
The debuggers
assume the first entry of the kernel
CW ublock
points to the
CW Proc
structure for a kernel thread.
IP "\f(CWktmask\fP\ -\ "
This field
is a bit mask used to calculate the kernel text address from
the kernel
CW ublock
address.
The first page of the
kernel text segment is calculated by
ANDing
the negation of this mask with
CW kbase .
IP "\f(CWkspoff\fP\ -\ "
This field
contains the byte offset in the
CW Proc
data structure to the saved kernel
stack pointer for a suspended kernel thread.  This
is the offset to the
CW sched.sp
field of a
CW Proc
table entry.
IP "\f(CWkpcoff\fP\ -\ "
This field contains the byte offset into the
CW Proc
data structure
of
the program counter of a suspended kernel thread.
This is the offset to
field
CW sched.pc
in that structure.
IP "\f(CWkspdelta\fP and \f(CWkpcdelta\fP\ -\ "
These fields
contain corrections to be added to
the stack pointer and program counter, respectively,
to properly locate the stack and next
instruction of a kernel thread.  These
values bias the saved registers retrieved
from the
CW Label
structure named
CW sched
in the
CW Proc
data structure.
Most architectures require no bias
and these fields contain zeros.
IP "\f(CWscalloff\fP\ -\ "
This field
contains the byte offset of the
CW scallnr
field in the
CW ublock
data structure associated with a process.
The
CW scallnr
field contains the number of the
last system call executed by the process.
The location of the field varies depending on
the size of the floating point register set
which precedes it in the
CW ublock .
RE
LP
IP 3.
Add an entry to the initialization of the
CW ExecTable
data structure at the beginning of file
CW /sys/src/libmach/executable.c .
Most architectures
require two entries: one for
a normal executable and
one for a bootable
image.  Each table entry contains:
RS
IP \(bu
Magic Number\ \-\
The big-endian magic number assigned to the architecture in
CW /sys/include/a.out.h .
IP \(bu
Name\ \-\
A string describing the executable.
IP \(bu
Executable type code\ \-\
The executable code assigned in
CW /sys/include/mach.h .
IP \(bu
\f(CWMach\fP pointer\ \-\
The address of the initialized
CW Mach
data structure constructed in Step 2.
You must also add the name of this table to the
list of
CW Mach
table definitions immediately preceding the
CW ExecTable
initialization.
IP \(bu
Header size\ \-\
The number of bytes in the executable file header.
The size of a normal executable header is always
CW sizeof(Exec) .
The size of a bootable header is
determined by the size of the structure
for the architecture defined in
CW /sys/include/bootexec.h .
IP \(bu
Byte-swapping function\ \-\
The address of
CW beswal
or
CW leswal
for big-endian and little-endian
architectures, respectively.
IP \(bu
Decoder function\ -\
The address of a function to decode the header.
Function
CW adotout
decodes the common header shared by all normal
(i.e., non-bootable) executable files.
The header format of bootable
executable files is defined by the manufacturer and
a custom function is almost always
required to decode it.
Header file
CW /sys/include/bootexec.h
contains data structures defining the bootable
headers for all architectures.  If the new architecture
uses an existing format, the appropriate
decoding function should already be in
CW executable.c .
If the header format is unique, then
a new function must be added to this file.
Usually the decoding function for an existing
architecture can be adopted with minor modifications.
RE
LP
IP 4.
Write an object file parser and
store it in file
CW /sys/src/libmach/\fIm\fPobj.c
where
I m
is the identifier character assigned to the architecture.
Two functions are required: a predicate to identify an
object file for the architecture and a function to extract
symbol references from the object code.
The object code format is obscure but
it is often possible to adopt the
code of an existing architecture
with minor modifications.
When these
functions are in hand, insert their addresses
in the jump table at the beginning of file
CW /sys/src/libmach/obj.c .
LP
IP 5.
Implement the required debugger support functions and
initialize the parameters and jump table of the
CW Machdata
data structure for the architecture.
This code is conventionally stored in
a file named
CW /sys/src/libmach/\fIm\fPdb.c
where
I m
is the identifier character assigned to the architecture.
The fields of the
CW Machdata
structure are:
RS
IP "\f(CWbpinst\fP and \f(CWbpsize\fP\ -\ "
These fields
contain the breakpoint instruction and the size
of the instruction, respectively.
IP "\f(CWswab\fP\ -\ "
This field
contains the address of a function to
byte-swap a 16-bit value.  Choose
CW leswab
or
CW beswab
for little-endian or big-endian architectures, respectively.
IP "\f(CWswal\fP\ -\ "
This field
contains the address of a function to
byte-swap a 32-bit value.  Choose
CW leswal
or
CW beswal
for little-endian or big-endian architectures, respectively.
IP "\f(CWctrace\fP\ -\ "
This field
contains the address of a function to perform a
C-language stack trace.  Two general trace functions,
CW risctrace
and
CW cisctrace ,
traverse fixed-frame and relative-frame stacks,
respectively.  If the compiler for the
new architecture conforms to one of
these models, select the appropriate function.  If the
stack model is unique,
supply a custom stack trace function.
IP "\f(CWfindframe\fP\ -\ "
This field
contains the address of a function to locate the stack
frame associated with a text address.
Generic functions
CW riscframe
and
CW ciscframe
process fixed-frame and relative-frame stack
models.
IP "\f(CWufixup\fP\ -\ "
This field
contains the address of a function to adjust
the base address of the register save area.
Currently, only the
68020 requires this bias
to offset over the active
exception frame.
IP "\f(CWexcep\fP\ -\ "
This field
contains the address of a function to produce a
text
string describing the
current exception.
Each architecture stores exception
information uniquely, so this code must always be supplied.
IP "\f(CWbpfix\fP\ -\ "
This field
contains the address of a function to adjust an
address prior to laying down a breakpoint.
IP "\f(CWsftos\fP\ -\ "
This field
contains the address of a function to convert a single
precision floating point value
to a string.  Choose
CW leieeesftos
for little-endian
or
CW beieeesftos
for big-endian architectures.
IP "\f(CWdftos\fP\ -\ "
This field
contains the address of a function to convert a double
precision floating point value
to a string.  Choose
CW leieeedftos
for little-endian
or
CW beieeedftos
for big-endian architectures.
IP "\f(CWfoll\fP, \f(CWdas\fP, \f(CWhexinst\fP, and \f(CWinstsize\fP\ -\ "
These fields point to functions that interpret machine
instructions.
They rely on disassembly of the instruction
and are unique to each architecture.
CW Foll
calculates the follow set of an instruction.
CW Das
disassembles a machine instruction to assembly language.
CW Hexinst
formats a machine instruction as a text
string of
hexadecimal digits.
CW Instsize
calculates the size in bytes, of an instruction.
Once the disassembler is written, the other functions
can usually be implemented as trivial extensions of it.
LP
It is possible to provide support for a new architecture
incrementally by filling the jump table entries
of the
CW Machdata
structure as code is written.  In general, if
a jump table entry contains a zero, application
programs requiring that function will issue an
error message instead of attempting to
call the function.  For example,
the
CW foll ,
CW das ,
CW hexinst ,
and
CW instsize
jump table slots can be zeroed until a
disassembler is written.
Other capabilities, such as
stack trace or variable inspection,
can be supplied and will be available to
the debuggers but attempts to use the
disassembler will result in an error message.
RE
IP 6.
Update the table named
CW machines
near the beginning of
CW /sys/src/libmach/setmach.c .
This table binds the
file type code and machine name to the
CW Mach
and
CW Machdata
structures of an architecture.
The names of the initialized
CW Mach
and
CW Machdata
structures built in steps 2 and 5
must be added to the list of
structure definitions immediately
preceding the table initialization.
If both Plan 9 and
native disassembly are supported, add
an entry for each disassembler to the table.  The
entry for the default disassembler (usually
Plan 9) must be first.
IP 7.
Add an entry describing the architecture to
the table named
CW trans
near the end of
CW /sys/src/cmd/prof.c .
RE
IP 8.
Add an entry describing the architecture to
the table named
CW objtype
near the start of
CW /sys/src/cmd/pcc.c .
RE
IP 9.
Recompile and install
all application programs that include header file
CW mach.h
and load with
CW libmach.a .