Intel Corporation makes no warranty for the use of its products and
assumes no responsibility for any errors which may appear in this document
nor does it make a commitment to update the information contained herein.
Intel retains the right to make changes to these specifications at any
time, without notice.
Contact your local sales office to obtain the latest specifications before
placing your order.
The following are trademarks of Intel Corporation and may only be used to
identify Intel Products:
Above, BITBUS, COMMputer, CREDIT, Data Pipeline, FASTPATH, Genius, i, �,
ICE, iCEL, iCS, iDBP, iDIS, I�ICE, iLBX, im, iMDDX, iMMX, Inboard, Insite,
Intel, intel, intelBOS, Intelevision, inteligent Identifier, inteligent
Programming, Intellec, Intellink, iOSP, iPDS, iPSC, iRMX, iSBC, iSBX, iSDM,
iSXM, KEPROM, Library Manager, MAP-NET, MCS, Megachassis, MICROMAINFRAME,
MULTIBUS, MULTICHANNEL, MULTIMODULE, MultiSERVER, ONCE, OpenNET, OTP,
PC-BUBBLE, Plug-A-Bubble, PROMPT, Promware, QUEST, QueX, Quick-Pulse
Programming, Ripplemode, RMX/80, RUPI, Seamless, SLD, UPI, and VLSiCEL, and
the combination of ICE, iCS, iRMX, iSBC, iSBX, MCS, or UPI and a numerical
suffix, 4-SITE.
MDS is an ordering code only and is not used as a product name or
trademark. MDS(R) is a registered trademark of Mohawk Data Sciences
Corporation.
*MULTIBUS is a patented Intel bus.
Additional copies of this manual or other Intel literature may be obtained
from:
Intel Corporation
Literature Distribution
Mail Stop SC6-59
3065 Bowers Avenue
Santa Clara, CA 95051
An Introduction to the 80287
This supplement describes the 80287 Numeric Processor Extension (NPX) for
the 80286 microprocessor. Below is a brief overview of 80286 concepts, along
with some of the nomenclature used throughout this and other Intel
publications.
The 80286 Microsystem
The 80286 is a new VLSI microprocessor system with exceptional capabilities
for supporting large-system applications. Based on a new-generation CPU (the
Intel 80286), this powerful microsystem is designed to support multiuser
reprogrammable and real-time multitasking applications. Its dedicated system
support circuits simplify system hardware; sophisticated hardware and
software tools reduce both the time and the cost of product development.
The 80286 is a virtual-memory microprocessor with on-chip memory management
and protection. The 80286 microsystem offers a total-solution approach,
enabling you to develop high-speed, interactive, multiuser, multitasking��
and multiprocessor��systems more rapidly and at higher performance than ever
before.
� Reliability and system up-time are becoming increasingly important in
all applications. Information must be protected from misuse or
accidental loss. The 80286 includes a sophisticated and flexible
four-level protection mechanism that isolates layers of operating
system programs from application programs to maintain a high degree of
system integrity.
� The 80286 provides 16 megabytes of physical address space to support
today's application requirements. This large physical memory enables
the 80286 to keep many large programs and data structures
simultaneously in memory for high-speed access.
� For applications with dynamically changing memory requirements, such
as multiuser business systems, the 80286 CPU provides on-chip memory
management and virtual memory support. On an 80286-based system, each
user can have up to a gigabyte (2^(30) bytes) of virtual-address space.
This large address space virtually eliminates restrictions on the
number or size of programs that may be part of the system.
� Large multiuser or real-time multitasking systems are easily supported
by the 80286. High-performance features, such as a very high-speed
task switch, fast interrupt-response time, inter-task protection, and a
quick and direct operating system interface, make the 80286 highly
suited to multiuser/multitasking applications.
� The 80286 has two operating modes: Real-Address mode and
Protected-Address mode. In Real-Address mode, the 80286 is fully
compatible with the 8086, 8088, 80186, and 80188 microprocessors; all
of the extensive libraries of 8086 and 8088 software execute four to
six times faster on the 80286, without any modification.
� In Protected-Address mode, the advanced memory management and
protection features of the 80286 become available, without any
reduction in performance. Upgrading 8086 and 8088 application programs
to use these new memory management and protection features usually
requires only reassembly or recompilation (some programs may require
minor modification). This compatibility between 80286 and 8086
processor families reduces both the time and the cost of
software development.
The Organization of This Manual
This manual describes the 80287 Numeric Processor Extension (NPX) for the
80286 microprocessor. The material in this manual is presented from the
perspective of software designers, both at an applications and at a systems
software level.
� Chapter One, "Overview of Numeric Processing," gives an overview of
the 80287 NPX and reviews the concepts of numeric computation using the
80287.
� Chapter Two, "Programming Numeric Applications," provides detailed
information for software designers generating applications for systems
containing an 80286 CPU with an 80287 NPX. The 80286/80287 instruction
set mnemonics are explained in detail, along with a description of
programming facilities for these systems. A comparative 80287
programming example is given.
� Chapter Three, "System-Level Numeric Programming," provides
information of interest to systems software writers, including details
of the 80287 architecture and operational characteristics.
� Chapter Four, "Numeric Programming Examples," provides several
detailed programming examples for the 80287, including conditional
branching, the conversion between floating-point values and their ASCII
representations, and the calculation of several trigonometric
functions. These examples illustrate assembly-language programming on
the 80287 NPX.
� Appendix A, "Machine Instruction Encoding and Decoding," gives
reference information on the encoding of NPX instructions.
� Appendix B, "Compatability between the 80287 NPX and the 8087,"
describes the differences between the 80287 and the 8087.
� Appendix C, "Implementing the IEEE P754 Standard," gives details of
the IEEE P754 Standard.
� The Glossary defines 80287 and floating-point terminology. Refer to
it as needed.
Related Publications
To best use the material in this manual, readers should be familiar with
the operation and architecture of 80286 systems. The following manuals
contain information related to the content of this supplement and of
interest to programmers of 80287 systems:
� Introduction to the 80286, order number 210308
� ASM286 Assembly Language Reference Manual, order number 121924
� 80286 Operating System Writer's Guide, order number 121960
� 80286 Hardware Reference Manual, order number 210760
� Microprocessor and Peripheral Handbook, order number 210844
� PL/M-286 User's Guide, order number 121945
� 80287 Support Library Reference Manual, order number 122129
� 8086 Software Toolbox Manual, order number 122203 (includes
information about 80287 Emulator Software)
Notational Conventions
This manual uses special notation to represent sub- and superscript
characters. Subscript characters are surrounded by {curly brackets}, for
example 10{2} = 10 base 2. Superscript characters are preceeded by a caret
and enclosed within (parentheses), for example 10^(3) = 10 to the third
power.
Introduction to the 80287 Numeric Processor Extension
Performance
Ease of Use
Applications
Upgradability
Programming Interface
Hardware Interface
80287 Numeric Processor Architecture
The NPX Register Stack
The NPX Status Word
Control Word
The NPX Tag Word
The NPX Instruction and Data Pointers
Computation Fundamentals
Number System
Data Types and Formats
Binary Integers
Decimal Integers
Real Numbers
Rounding Control
Precision Control
Infinity Control
Special Computational Situations
Special Numeric Values
Nonnormal Real Numbers
Denormals and Gradual Underflow
Unnormals��Descendents of Denormal Operands
Zeros and Pseudo Zeros
Infinity
NaN (Not a Number)
Indefinite
Encoding of Data Types
Numeric Exceptions
Invalid Operation
Zero Divisor
Denormalized Operand
Numeric Overflow and Underflow
Inexact Result
Handling Numeric Errors
Automatic Exception Handling
Software Exception Handling
Chapter 2 Programming Numeric Applications
The 80287 NPX Instruction Set
Compatibility with the 8087 NPX
Numeric Operands
Data Transfer Instructions
Arithmetic Instructions
Comparison Instructions
Transcendental Instructions
Constant Instructions
Processor Control Instructions
Instruction Set Reference Information
Instruction Execution Time
Bus Transfers
Instruction Length
Programming Facilities
High-Level Languages
PL/M-286
ASM286
Defining Data
Records and Structures
Addressing Modes
Comparative Programming Example
80287 Emulation
Concurrent Processing with the 80287
Managing Concurrency
Instruction Synchronization
Data Synchronization
Error Synchronization
Incorrect Error Synchronization
Proper Error Synchronization
Chapter 3 System-Level Numeric Programming
80287 Architecture
Processor Extension Data Channel
Real-Address Mode and Protected Virtual-Address Mode
Dedicated and Reserved I/O Locations
Processor Initialization and Control
System Initialization
Recognizing the 80287 NPX
Configuring the Numerics Environment
Initializing the 80287
80287 Emulation
Handling Numeric Processing Exceptions
Simultaneous Exception Response
Exception Recovery Examples
Chapter 4 Numeric Programming Examples
Conditional Branching Examples
Exception Handling Examples
Floating-Point to ASCII Conversion Examples
Function Partitioning
Exception Considerations
Special Instructions
Description of Operation
Scaling the Value
Inaccuracy in Scaling
Avoiding Underflow and Overflow
Final Adjustments
Output Format
Trigonometric Calculation Examples
FPTAN and FPREM
Cosine Uses Sine Code
Appendix A Machine Instriction Encoding and Decoding
Appendix B Compatibility Between the 80287 NPX and the 8087
Appendix C Implementing The IEEE P754 Standard
Options implemented in the 80287
Areas of the Standard Implemented in Software
Additional Software to Meet the Standard
Glossary of 80287 and Floating-Point Terminology
Index
Figures
1-1 Evolution and Performance of Numeric Processors
1-2 80287 NPX Block Diagram
1-3 80287 Register Set
1-4 80287 Status Word
1-5 80287 Control Word Format
1-6 80287 Tag Word Format
1-7 80287 Instruction and Data Pointer Image in Memory
1-8 80287 Number System
1-9 Data Formats
1-10 Projective versus Affine Closure
1-11 Arithmetic Example Using Infinity
2-1 FSAVE/FRSTOR Memory Layout
2-2 FSTENV/FLDENV Memory Layout
2-3 Sample 80287 Constants
2-4 Status Word RECORD Definition
2-5 Structure Definition
2-6 Sample PL/M-286 Program
2-7 Sample ASM286 Program
2-8 Instructions and Register Stack
2-9 Synchronizing References to Shared Data
2-10 Documenting Data Synchronization
2-11 Nonconcurrent FIST Instruction Code Macro
2-12 Error Synchronization Examples
1-1 Numeric Processing Speed Comparisons
1-2 Numeric Data Types
1-3 Principal NPX Instructions
1-4 Interpreting the NPX Condition Codes
1-5 Real Number Notation
1-6 Rounding Modes
1-7 Denormalization Process
1-8 Exceptions Due to Denormal Operands
1-9 Unnormal Operands and Results
1-10 Zero Operands and Results
1-11 Masked Overflow Response with Directed Rounding
1-12 Infinity Operands and Results
1-13 Binary Integer Encodings
1-14 Packed Decimal Encodings
1-15 Real and Long Real Encodings
1-16 Temporary Real Encodings
1-17 Exception Conditions and Masked Responses
2-1 Data Transfer Instructions
2-2 Arithmetic Instructions
2-3 Basic Arithmetic Instructions and Operands
2-4 Condition Code Interpretation after FPREM
2-5 Comparison Instructions
2-6 Condition Code Interpretation after FCOM
2-7 Condition Code Interpretation after FTST
2-8 FXAM Condition Code Settings
2-9 Transcendental Instructions
2-10 Constant Instructions
2-11 Processor Control Instructions
2-12 Key to Operand Types
2-13 Execution Penalties
2-14 Instruction Set Reference Data
2-15 PL/M-286 Built-In Procedures
2-16 80287 Storage Allocation Directives
2-17 Addressing Mode Examples
3-1 NPX Processor State Following Initialization
3-2 Precedence of NPX Exceptions
The 80287 NPX is a high-performance numerics processing element that
extends the 80286 architecture by adding significant numeric capabilities
and direct support for floating-point, extended-integer, and BCD data types.
The 80286 CPU with 80287 NPX easily supports powerful and accurate numeric
applications through its implementation of the proposed IEEE 754 Standard
for Binary Floating-Point Arithmetic.
Introduction to the 80287 Numeric Processor Extension
The 80287 Numeric Processor Extension (NPX) is highly compatible with its
predecessor, the earlier Intel 8087 NPX.
The 8087 NPX was designed for use in 8086-family systems. The 8086 was the
first microprocessor family to partition the processing unit to permit
high-performance numeric capabilities. The 8087 NPX for this processor
family implemented a complete numeric processing environment in compliance
with the proposed IEEE 754 Floating-Point Standard.
With the 80287 Numeric Processor Extension, high-speed numeric computations
have been extended to 80286 high-performance multi-tasking and multi-user
systems. Multiple tasks using the numeric processor extension are afforded
the full protection of the 80286 memory management and protection features.
Figure 1-1 illustrates the relative performance of 8-MHz 8086/8087 and
80286/80287 systems in executing numerics-oriented applications.
Figure 1-1. Evolution and Performance of Numeric Processors
Performance
Table 1-1 compares the execution times of several 80287 instructions with
the equivalent operations executed in software on an 8-MHz 80286. The
software equivalents are highly-optimized assembly-language procedures from
the 80287 emulator. As indicated in the table, the 80287 NPX provides about
50 to 100 times the performance of software numeric routines on the 80286
CPU. An 8-MHz 80287 multiplies 32-bit and 64-bit real numbers in about 11.9
and 16.9 microseconds, respectively. Of course, the actual performance of
the NPX in a given system depends on the characteristics of the individual
application.
Although the performance figures shown in table 1-1 refer to operations on
real (floating-point) numbers, the 80287 also manipulates fixed-point binary
and decimal integers of up to 64 bits or 18 digits, respectively. The 80287
can improve the speed of multiple-precision software algorithms for integer
operations by 10 to 100 times.
Because the 80287 NPX is an extension of the 80286 CPU, no software
overhead is incurred in setting up the NPX for computation. The 80287 and
80286 processors coordinate their activities in a manner transparent to
software. Moreover, built-in coordination facilities allow the 80286 CPU to
proceed with other instructions while the 80287 NPX is simultaneously
executing numeric instructions. Programs can exploit this concurrency of
execution to further increase system performance and throughput.
Table 1-1. Numeric Processing Speed Comparisons
Approximate Performance
Ratios: 8 MHz 80287 to
8 MHz Protected Mode iAPX
������� Floating-Point Instruction �����������Ŀ using E80287
Ease of Use
The 80287 NPX offers more than raw execution speed for
computation-intensive tasks. The 80287 brings the functionality and power
of accurate numeric computation into the hands of the general user.
Like the 8087 NPX that preceded it, the 80287 is explicitly designed to
deliver stable, accurate results when programmed using straightforward
"pencil and paper" algorithms. The IEEE 754 standard specifically addresses
this issue, recognizing the fundamental importance of making numeric
computations both easy and safe to use.
For example, most computers can overflow when two single-precision
floating-point numbers are multiplied together and then divided by a third,
even if the final result is a perfectly valid 32-bit number. The 80287
delivers the correctly rounded result. Other typical examples of
undesirable machine behavior in straightforward calculations occur when
solving for the roots of a quadratic equation:
-b � �(b� - 4ac)
��������������������
2a
for computing financial rate of return, which involves the expression:
(1+i)^(n). On most machines, straightforward algorithms will not deliver
consistently correct results (and will not indicate when they are
incorrect). To obtain correct results on traditional machines under all
conditions usually requires sophisticated numerical techniques that are
foreign to most programmers. General application programmers using
straightforward algorithms will produce much more reliable programs using
the 80287. This simple fact greatly reduces the software investment required
to develop safe, accurate computation-based products.
Beyond traditional numerics support for scientific applications, the 80287
has built-in facilities for commercial computing. It can process decimal
numbers of up to 18 digits without round-off errors, performing exact
arithmetic on integers as large as 2^(64) or 10^(18). Exact arithmetic is
vital in accounting applications where rounding errors may introduce
monetary losses that cannot be reconciled.
The NPX contains a number of optional facilities that can be invoked by
sophisticated users. These advanced features include two models of infinity,
directed rounding, gradual underflow, and either automatic or programmed
exception-handling facilities.
These automatic exception-handling facilities permit a high degree of
flexibility in numeric processing software, without burdening the
programmer. While performing numeric calculations, the NPX automatically
detects exception conditions that can potentially damage a calculation. By
default, on-chip exception handlers may be invoked to field these exceptions
so that a reasonable result is produced, and execution may proceed without
program interruption. Alternatively, the NPX can signal the CPU, invoking a
software exception handler whenever various types of exceptions are
detected.
Applications
The NPX's versatility and performance make it appropriate to a broad array
of numeric applications. In general, applications that exhibit any of the
following characteristics can benefit by implementing numeric processing on
the 80287:
� Numeric data vary over a wide range of values, or include nonintegral
values.
� Algorithms produce very large or very small intermediate results.
� Computations must be very precise; i.e., a large number of significant
digits must be maintained.
� Performance requirements exceed the capacity of traditional
microprocessors.
� Consistently safe, reliable results must be delivered using a
programming staff that is not expert in numerical techniques.
Note also that the 80287 can reduce software development costs and improve
the performance of systems that use not only real numbers, but operate on
multiprecision binary or decimal integer values as well.
A few examples, which show how the 80287 might be used in specific numerics
applications, are described below. In many cases, these types of systems
have been implemented in the past with minicomputers. The advent of the
80287 brings the size and cost savings of microprocessor technology to these
applications for the first time.
� Business data processing��The NPX's ability to accept decimal operands
and produce exact decimal results of up to 18 digits greatly simplifies
accounting programming. Financial calculations that use power functions
can take advantage of the 80287's exponentiation and logarithmic
instructions.
� Process control��The 80287 solves dynamic range problems
automatically, and its extended precision allows control functions to
be fine-tuned for more accurate and efficient performance. Control
algorithms implemented with the NPX also contribute to improved
reliability and safety, while the 80287's speed can be exploited in
real-time operations.
� Computer numerical control (CNC)��The 80287 can move and position
machine tool heads with accuracy in real-time. Axis positioning also
benefits from the hardware trigonometric support provided by the 80287.
� Robotics��Coupling small size and modest power requirements with
powerful computational abilities, the NPX is ideal for on-board
six-axis positioning.
� Navigation��Very small, lightweight, and accurate inertial guidance
systems can be implemented with the 80287. Its built-in trigonometric
functions can speed and simplify the calculation of position from
� Graphics terminals��The 80287 can be used in graphics terminals to
locally perform many functions that normally demand the attention of a
main computer; these include rotation, scaling, and interpolation. By
also using an 82720 Graphics Display Controller to perform high speed
data transfers, very powerful and highly self-sufficient terminals can
be built from a relatively small number of 80286 family parts.
� Data acquisition��The 80287 can be used to scan, scale, and reduce
large quantities of data as it is collected, thereby lowering storage
requirements and time required to process the data for analysis.
The preceding examples are oriented toward traditional numerics
applications. There are, in addition, many other types of systems that do
not appear to the end user as computational, but can employ the 80287 to
advantage. Indeed, the 80287 presents the imaginative system designer with
an opportunity similar to that created by the introduction of the
microprocessor itself. Many applications can be viewed as numerically-based
if sufficient computational power is available to support this view. This
is analogous to the thousands of successful products that have been built
around "buried" microprocessors, even though the products themselves bear
little resemblance to computers.
Upgradability
The architecture of the 80286 CPU is specifically adapted to allow easy
upgradability to use an 80287, simply by plugging in the 80287 NPX. For this
reason, designers of 80286 systems may wish to incorporate the 80287 NPX
into their designs in order to offer two levels of price and performance at
little additional cost.
Two features of the 80286 CPU make the design and support of upgradable
80286 systems particularly simple:
� The 80286 can be programmed to recognize the presence of an 80287 NPX;
that is, software can recognize whether it is running on an 80286 or an
80287 system.
� After determining whether the 80287 NPX is available, the 80286 CPU
can be instructed to let the NPX execute all numeric instructions. If
an 80287 NPX is not available, the 80286 CPU can emulate all 80287
numeric instructions in software. This emulation is completely
transparent to the application software��the same object code may be
used by both 80286 and 80287 systems. No relinking or recompiling of
application software is necessary; the same code will simply execute
faster on the 80287 than on the 80286 system.
To facilitate this design of upgradable 80286 systems, Intel provides a
software emulator for the 80287 that provides the functional equivalent of
the 80287 hardware, implemented in software on the 80286. Except for
timing, the operation of this 80287 emulator (E80287) is the same as for
the 80287 NPX hardware. When the emulator is combined as part of the
systems software, the 80286 system with 80287 emulation and the 80286 with
80287 hardware are virtually indistinguishable to an application program.
This capability makes it easy for software developers to maintain a single
set of programs for both systems. System manufacturers can offer the NPX
as a simple plug-in performance option without necessitating any changes
in the user's software.
Programming Interface
The 80286/80287 pair is programmed as a single processor; all of the 80287
registers appear to a programmer as extensions of the basic 80286 register
set. The 80286 has a class of instructions known as ESCAPE instructions, all
having a common format. These ESC instructions are numeric instructions for
the 80287 NPX. These numeric instructions for the 80287 are simply encoded
into the instruction stream along with 80286 instructions.
All of the CPU memory-addressing modes may be used in programming the NPX,
allowing convenient access to record structures, numeric arrays, and other
memory-based data structures. All of the memory management and protection
features of the CPU are extended to the NPX as well.
Numeric processing in the 80287 centers around the NPX register stack.
Programmers can treat these eight 80-bit registers as either a fixed
register set, with instructions operating on explicitly-designated
registers, or a classical stack, with instructions operating on the top one
or two stack elements.
Internally, the 80287 holds all numbers in a uniform 80-bit temporary-real
format. Operands that may be represented in memory as 16-, 32-, or 64-bit
integers, 32-, 64-, or 80-bit floating-point numbers, or 18-digit packed BCD
numbers, are automatically converted into temporary-real format as they are
loaded into the NPX registers. Computation results are subsequently
converted back into one of these destination data formats when they are
stored into memory from the NPX registers.
Table 1-2 lists each of the seven data types supported by the 80287,
showing the data format for each type. All operands are stored in memory
with the least significant digits starting at the initial (lowest) memory
address. Numeric instructions access and store memory operands using only
this initial address. For maximum system performance, all operands should
start at even memory addresses.
Table 1-3 lists the 80287 instructions by class. No special programming
tools are necessary to use the 80287, because all of the NPX instructions
and data types are directly supported by the ASM286 Assembler and Intel's
appropriate high-level languages.
Software routines for the 80287 may be written in ASM286 Assembler or any
of the following higher-level languages:
PL/M-286
PASCAL-286
FORTRAN-286
C-286
In addition, all of the development tools supporting the 8086 and 8087 can
also be used to develop software for the 80286 and 80287 operating in
Real-Address mode.
All of these high-level languages provide programmers with access to the
computational power and speed of the 80287 without requiring an
understanding of the architecture of the 80286 and 80287 chips. Such
architectural considerations as concurrency and data synchronization are
handled automatically by these high-level languages. For the ASM286
programmer, specific rules for handling these issues are discussed in a
later section of this supplement.
Table 1-2. Numeric Data Types
Significant
Data Type Bits Digits Approximate Range (Decimal)
(Decimal)
Word integer 16 4 -32,768 � X � +32,767
Short integer 32 9 -2*10^(9) � X � +2*10^(9)
Long integer 64 18 -9*10^(18) � X � +9*10^(18)
Packed decimal 80 18 -99...99 � X � +99...99 (18 digits)
Short real 32 6-7 8.43*10^(-37) � �X� � 3.37*10^(38)
Long real 64 15-16 4.19*10^(-307) � �X� � 1.67*10^(308)
Temporary real 80 19 3.4*10^(-4932) � �X� � 1.2*10^(4932)
Table 1-3. Principal NPX Instructions
Class Instruction Types
Data Transfer Load (all data types), Store (all data types), Exchange
Processor Load Control Word, Store Control Word, Store Status Word,
Control Load Environment, Store Environment, Save, Restore, Clear
Exceptions, Initialize, Set Protected Mode
Hardware Interface
As an extension of the 80286 processor, the 80287 is wired very much in
parallel with the 80286 CPU. Four special status signals, PEREQ, PEACK,
BUSY, and ERROR, permit the two processors to coordinate their
activities. The 80287 NPX also monitors the 80286 S1, S0,
COD/INTA, READY, HLDA, and CLK pins to monitor the execution of
ESC instructions (numeric instructions) by the 80286.
As shown in figure 1-2, the 80287 NPX is divided internally into two
processing elements; the Bus Interface Unit (BIU) and the Numeric Execution
Unit (NEU). The two units operate independently of one another: the BIU
receives and decodes instructions, requests operand transfers with memory,
and executes processor control instructions, whereas the NEU processes
individual numeric instructions.
The BIU handles all of the status and signal lines between the 80287 and
the 80286. The NEU executes all instructions that involve the register
stack. These instructions include arithmetic, logical, transcendental,
constant, and data transfer instructions. The data path in the NEU is 84
bits wide (68 fraction bits, 15 exponent bits, and a sign bit), allowing
internal operand transfers to be performed at very high speeds.
The 80287 executes a single numeric instruction at a time. Before executing
most ESC instructions, the 80286 tests the BUSY pin and, before initiating
the command, waits until the 80287 indicates that it is not busy. Once
initiated, the 80286 continues program execution, while the 80287 executes
the numeric instruction. Unlike the 8087, which required a WAIT instruction
to test the BUSY signal before each ESC opcode, these WAIT instructions are
permissible, but not necessary, in 80287 programs.
In all cases, a WAIT or ESC instruction should be inserted after any 80287
store to memory (except FSTSW or FSTCW) or load from memory (except FLDENV,
FLDCW, or FRSTOR) before the 80286 reads or changes the memory value.
When needed, all data transfers between memory and the 80287 NPX are
performed by the 80286 CPU, using its Processor Extension Data Channel.
Numeric data transfers performed by the 80286 use the same timing as any
other bus cycle, and all such transfers come under the supervision of the
80286 memory management and protection mechanisms. The 80286 Processor
Extension Data Channel and the hardware interface between the 80286 and
80287 processors are described in Chapter Six of the 80286 Hardware
Reference Manual.
From the programmer's perspective, the 80287 can be considered just an
extension of the 80286 processor. All interaction between the 80286 and the
80287 processors on the hardware level is handled automatically by the 80286
and is transparent to the software.
To communicate with the 80287, the 80286 uses the reserved I/O port
addresses 00F8H, 00FAH, and 00FCH (I/O ports numbered 00F8H through 00FFH
are reserved for the 80286/80287 interface). These I/O operations are
performed automatically by the 80286 and are distinct from I/O operations
that result from program I/O instructions. I/O operations resulting from
the execution of ESC instructions are completely transparent to software.
Any program may execute ESCAPE (numeric) instructions, without regard to its
current I/O Privilege Level (IOPL).
To guarantee correct operation of the 80287, programs must not perform any
explicit I/O operations to any of the eight ports reserved for the 80287.
The IOPL of the 80286 can be used to protect the integrity of 80287
computations in multiuser reprogrammable applications, preventing any
accidental or other tampering with the 80287 (see Chapter Eight of the 80286
Operating System Writer's Guide).
80287 Numeric Processor Architecture
To the programmer, the 80287 NPX appears as a set of additional registers
complementing those of the 80286. These additional registers consist of
� Eight individually-addressable 80-bit numeric registers, organized as
a register stack
� Three sixteen-bit registers containing:
an NPX status word
an NPX control word
a tag word
� Four 16-bit registers containing the NPX instruction and data pointers
All of the NPX numeric instructions focus on the contents of these NPX
registers.
The NPX Register Stack
The 80287 register stack is shown in figure 1-3. Each of the eight numeric
registers in the 80287's register stack is 80 bits wide and is divided into
fields corresponding to the NPX's temporary-real data type.
Numeric instructions address the data registers relative to the register on
the top of the stack. At any point in time, this top-of-stack register is
indicated by the ST (Stack Top) field in the NPX status word. Load or push
operations decrement ST by one and load a value into the new top register.
A store-and-pop operation stores the value from the current ST register and
then increments ST by one. Like 80286 stacks in memory, the 80287 register
stack grows down toward lower-addressed registers.
Many numeric instructions have several addressing modes that permit the
programmer to implicitly operate on the top of the stack, or to explicitly
operate on specific registers relative to the ST. The ASM286 Assembler
supports these register addressing modes, using the expression ST(0), or
simply ST, to represent the current Stack Top and ST(i) to specify the ith
register from ST in the stack (0 � i � 7). For example, if ST contains 011B
(register 3 is the top of the stack), the following statement would add the
contents of the top two registers on the stack (registers 3 and 5):
FADD ST, ST(2)
The stack organization and top-relative addressing of the numeric registers
simplify subroutine programming by allowing routines to pass parameters on
the register stack. By using the stack to pass parameters rather than using
"dedicated" registers, calling routines gain more flexibility in how they
use the stack. As long as the stack is not full, each routine simply loads
the parameters onto the stack before calling a particular subroutine to
perform a numeric calculation. The subroutine then addresses its parameters
as ST, ST(1), etc., even though ST may, for example, refer to physical
register 3 in one invocation and physical register 5 in another.
15 0
���������������������ͻ
� CONTROL REGISTER �
���������������������Ķ
� STATUS REGISTER �
���������������������Ķ
� TAG WORD �
���������������������Ķ
� �
��INSTRUCTION POINTERĶ
� �
���������������������Ķ
� �
�� DATA POINTER Ķ
� �
���������������������ͼ
The NPX Status Word
The 16-bit status word shown in figure 1-4 reflects the overall state of
the 80287. This status word may be stored into memory using the
FSTSW/FNSTSW, FSTENV/FNSTENV, and FSAVE/FNSAVE instructions, and can be
transferred into the 80286 AX register with the FSTSW AX/FNSTSW AX
instructions, allowing the NPX status to be inspected by the CPU.
The Busy bit (bit 15) and the BUSY pin indicate whether the 80287's
execution unit is idle (B = 0) or is executing a numeric instruction or
signalling an exception (B = 1). (The instructions FNSTSW, FNSTSW AX,
FNSTENV, and FNSAVE do not set the Busy bit themselves, nor do they require
the Busy bit to be clear in order to execute.)
The four NPX condition code bits (C{0}-C{3}) are similar to the flags in a
CPU: the 80287 updates these bits to reflect the outcome of arithmetic
operations. The effect of these instructions on the condition code bits is
summarized in table 1-4. These condition code bits are used principally for
conditional branching. The FSTSWAX instruction stores the NPX status word
directly into the CPU AX register, allowing these condition codes to be
inspected efficiently by 80286 code.
Bits 12-14 of the status word point to the 80287 register that is the
current Stack Top (ST). The significance of the stack top has been described
in the section on the Register Stack.
Figure 1-4 shows the six error flags in bits 0-5 of the status word. Bit 7
is the error summary status (ES) bit. ES is set if any unmasked exception
bits are set, and is cleared otherwise. If this bit is set, the ERROR
signal is asserted. Bits 0-5 indicate whether the NPX has detected one of
six possible exception conditions since these status bits were last cleared
or reset.
Table 1-4. Interpreting the NPX Condition Codes
����������������������������������������������������������������������������ķ
Instruction
Type C{3} C{2} C{1} C{0} Interpretation
Compare, Test 0 0 X 0 ST > Source or 0 (FTST)
0 0 X 1 ST < Source or 0 (FTST)
1 0 X 0 ST = Source or 0 (FTST)
1 1 X 1 ST is not comparable
Instruction
Type C{3} C{2} C{1} C{0} Interpretation
1 1 X 1 ST is not comparable
Remainder Q{1} 0 Q{0} Q{2} Complete reduction with three
low bits of quotient in C{0},
C{3}, and C{1}
U 1 U U Incomplete Reduction
Control Word
The NPX provides the programmer with several processing options, which are
selected by loading a word from memory into the control word. Figure 1-5
shows the format and encoding of the fields in the control word.
The low-order byte of this control word configures the 80287 error and
exception masking. Bits 0-5 of the control word contain individual masks for
each of the six exception conditions recognized by the 80287. The high-order
byte of the control word configures the 80287 processing options, including
� Precision control
� Rounding control
� Infinity control
The Precision control bits (bits 8-9) can be used to set the 80287 internal
operating precision at less than the default precision (64-bit significand).
These control bits can be used to provide compatibility with the
earlier-generation arithmetic processors having less precision than the
80287, as required by the IEEE 754 standard. Setting a lower precision,
however, will not affect the execution time of numeric calculations.
The rounding control bits (bits 10-11) provide for directed rounding and
true chop as well as the unbiased round-to-nearest-even mode specified in
the IEEE 754 standard.
The infinity control bit (bit 12) determines the manner in which the 80287
treats the special values of infinity. Either affine closure (where positive
infinity is distinct from negative infinity) or projective closure (infinity
is treated as a single unsigned quantity) may be specified. These two
alternative views of infinity are discussed in the section on Computation
Fundamentals.
The NPX Tag Word
The tag word indicates the contents of each register in the register stack,
as shown in figure 1-6. The tag word is used by the NPX itself in order to
track its numeric registers and optimize performance. Programmers may use
this tag information to interpret the contents of the numeric registers.
The tag values are stored in the tag word corresponding to the physical
registers 0-7. Programmers must use the current Stack Top (ST) pointer
stored in the NPX status word to associate these tag values with the
relative stack registers ST(0) through ST(7).
The NPX Instruction and Data Pointers
The NPX instruction and data registers provide support for programmed
exception-handlers. Whenever the 80287 executes a math instruction, the NPX
internally saves the instruction address, the operand address (if present),
and the instruction opcode. The 80287 FSTENV and FSAVE instructions store
this data into memory, allowing exception handlers to determine the precise
nature of any numeric exceptions that may be encountered.
When stored in memory, the instruction and data pointers appear in one of
two formats, depending on the operating mode of the 80287. Figure 1-7 shows
these pointers as they are stored following an FSTENV instruction. In
Real-Address mode, these values are the 20-bit physical address and 11-bit
opcode formatted like the 8087. In Protected mode, these values are the
32-bit virtual addresses used by the program that executed the ESC
instruction.
The instruction address saved in the 80287 will point to any prefixes that
preceded the instruction. This is different from the 8087, for which the
instruction address pointed only to the ESC instruction opcode.
Figure 1-7. 80287 Instruction and Data Pointer Image in Memory
MEMORY OFFSET
�������������������������������ͻ
REAL MODE � CONTROL WORD � +0
�������������������������������Ķ
� STATUS WORD � +2
�������������������������������Ķ
� TAG WORD � +4
�������������������������������Ķ
� INSTRUCTION POINTER(15-0) � +6
�������������������������������Ķ
�INSTRUCTION� � INSTRUCTION �
� POINTER �0� OPCODE � +8
� (19-16) � � (10-0) �
�����������������������������Ķ
� DATA POINTER(15-0) � +10
�������������������������������Ķ
� DATA � �
� POINTER � 0 � +12
� (19-16) � �
�������������������������������ͼ
15 12 11 0
MEMORY OFFSET
�������������������������������ͻ
PROTECTED MODE � CONTROL WORD � +0
�������������������������������Ķ
� STATUS WORD � +2
�������������������������������Ķ
� TAG WORD � +4
�������������������������������Ķ
� IP OFFSET � +6
�������������������������������Ķ
� CS SELECTOR � +8
�������������������������������Ķ
� DATA OPERAND OFFSET � +10
�������������������������������Ķ
� DATA OPERAND SELECTOR � +12
�������������������������������ͼ
15 0
Computation Fundamentals
This section covers 80287 programming concepts that are common to all
applications. It describes the 80287's internal number system and the
various types of numbers that can be employed in NPX programs. The most
commonly used options for rounding, precision, and infinity (selected by
fields in the control word) are described, with exhaustive coverage of less
frequently used facilities deferred to later sections. Exception conditions
that may arise during execution of NPX instructions are also described along
with the options that are available for responding to these exceptions.
Number System
The system of real numbers that people use for pencil and paper
calculations is conceptually infinite and continuous. There is no upper or
lower limit to the magnitude of the numbers one can employ in a calculation,
or to the precision (number of significant digits) that the numbers can
represent. When considering any real number, there is always an infinity of
numbers both larger and smaller. There is also an infinity of numbers
between (i.e., with more significant digits than) any two real numbers. For
example, between 2.5 and 2.6 are 2.51, 2.5897, 2.500001, etc.
While ideally it would be desirable for a computer to be able to operate on
the entire real number system, in practice this is not possible. Computers,
no matter how large, ultimately have fixed-size registers and memories that
limit the system of numbers that can be accommodated. These limitations
determine both the range and the precision of numbers. The result is a set
of numbers that is finite and discrete, rather than infinite and continuous.
This sequence is a subset of the real numbers that is designed to form a
useful approximation of the real number system.
Figure 1-8 superimposes the basic 80287 real number system on a real number
line (decimal numbers are shown for clarity, although the 80287 actually
represents numbers in binary). The dots indicate the subset of real numbers
the 80287 can represent as data and final results of calculations. The
80287's range is approximately �4.19*10^(-307) to �1.67*10^(308).
Applications that are required to deal with data and final results outside
this range are rare. For reference, the range of the IBM 370 is about
�0.54*10^(-78) to �0.72*10^(76).
The finite spacing in figure 1-8 illustrates that the NPX can represent a
great many, but not all, of the real numbers in its range. There is always a
gap between two adjacent 80287 numbers, and it is possible for the result of
a calculation to fall in this space. When this occurs, the NPX rounds the
true result to a number that it can represent. Thus, a real number that
requires more digits than the 80287 can accommodate (e.g., a 20-digit
number) is represented with some loss of accuracy. Notice also that the
80287's representable numbers are not distributed evenly along the real
number line. In fact, an equal number of representable numbers exists
between successive powers of 2 (i.e., as many representable numbers exist
between 2 and 4 as between 65,536 and 131,072). Therefore, the gaps between
representable numbers are larger as the numbers increase in magnitude. All
integers in the range �2^(64) (approximately �10^(18)), however, are exactly
representable.
In its internal operations, the 80287 actually employs a number system that
is a substantial superset of that shown in figure 1-8. The internal format
(called temporary real) extends the 80287's range to about �3.4*10^(-4932)
to �1.2*10^(4932), and its precision to about 19 (equivalent decimal)
digits. This format is designed to provide extra range and precision for
constants and intermediate results, and is not normally intended for data
or final results.
From a practical standpoint, the 80287's set of real numbers is
sufficiently large and dense so as not to limit the vast majority of
microprocessor applications. Compared to most computers, including
mainframes, the NPX provides a very good approximation of the real number
system. It is important to remember, however, that it is not an exact
representation, and that arithmetic on real numbers is inherently
approximate.
Conversely, and equally important, the 80287 does perform exact arithmetic
on integer operands. That is, an operation on two integers returns an exact
integral result, provided that the true result is an integer and is in
range. For example, 4 � 2 yields an exact integer, 1 � 3 does not, and
2^(40) * 2^(30) + 1 does not, because the result requires greater than 64
bits of precision.
Data Types and Formats
The 80287 recognizes seven numeric data types, divided into three classes:
binary integers, packed decimal integers, and binary reals. A later section
describes how these formats are stored in memory (the sign is always located
in the highest-addressed byte). Figure 1-9 summarizes the format of each
data type. In the figure, the most significant digits of all numbers (and
fields within numbers) are the leftmost digits. Table 1-5 provides the
range and number of signficant (decimal) digits that each format can
accommodate.
NOTES:
S = Sign bit (0 = positive, 1 = negative)
dn = Decimal digit (two per byte)
X = Bits have no significance;
80287 ignores when loading, zeros when storing.
Exponent Bias (normal values):
Short Real: 127 (7FH)
Long Real: 1023 (3FFH)
Temporary Real: 16383 (3FFFH)
Binary Integers
The three binary integer formats are identical except for length, which
governs the range that can be accommodated in each format. The leftmost bit
is interpreted as the number's sign: 0 = positive and 1 = negative. Negative
numbers are represented in standard two's complement notation (the binary
integers are the only 80287 format to use two's complement). The quantity
zero is represented with a positive sign (all bits are 0). The 80287 word
integer format is identical to the 16-bit signed integer data type of the
80286.
Decimal Integers
Decimal integers are stored in packed decimal notation, with two decimal
digits "packed" into each byte, except the leftmost byte, which carries the
sign bit (0 = positive, 1 = negative). Negative numbers are not stored in
two's complement form and are distinguished from positive numbers only by
the sign bit. The most significant digit of the number is the leftmost
digit. All digits must be in the range 0H-9H.
Real Numbers
The 80287 stores real numbers in a three-field binary format that resembles
scientific, or exponential, notation. The number's significant digits are
held in the significand field, the exponent field locates the binary point
within the significant digits (and therefore determines the number's
magnitude), and the sign field indicates whether the number is positive or
negative. (The exponent and significand are analogous to the terms
"characteristic" and "mantissa" used to describe floating point numbers on
some computers.) Negative numbers differ from positive numbers only in the
sign bits of their significands.
Table 1-5 shows how the real number 178.125 (decimal) is stored in the
80287 short real format. The table lists a progression of equivalent
notations that express the same value to show how a number can be converted
from one form to another. The ASM286 and PL/M-286 language translators
perform a similar process when they encounter programmer-defined real number
constants. Note that not every decimal fraction has an exact binary
equivalent. The decimal number 1/10, for example, cannot be expressed
exactly in binary (just as the number 1/3 cannot be expressed exactly in
decimal). When a translator encounters such a value, it produces a rounded
binary approximation of the decimal value.
The NPX usually carries the digits of the significand in normalized form.
This means that, except for the value zero, the significand is an integer
and a fraction as follows:
1{}fff...ff
where indicates an assumed binary point. The number of fraction bits
varies according to the real format: 23 for short, 52 for long, and 63 for
temporary real. By normalizing real numbers so that their integer bit is
always a 1, the 80287 eliminates leading zeros in small values (�X� < 1).
This technique maximizes the number of significant digits that can be
accommodated in a significand of a given width. Note that, in the short and
long real formats, the integer bit is implicit and is not actually stored;
the integer bit is physically present in the temporary real format only.
If one were to examine only the signficand with its assumed binary point,
all normalized real numbers would have values between 1 and 2. The exponent
field locates the actual binary point in the significant digits. Just as in
decimal scientific notation, a positive exponent has the effect of moving
the binary point to the right, and a negative exponent effectively moves the
binary point to the left, inserting leading zeros as necessary. An unbiased
exponent of zero indicates that the position of the assumed binary point is
also the position of the actual binary point. The exponent field, then,
determines a real number's magnitude.
In order to simplify comparing real numbers (e.g., for sorting), the 80287
stores exponents in a biased form. This means that a constant is added to
the true exponent described above. The value of this bias is different for
each real format (see figure 1-9). It has been chosen so as to force the
biased exponent to be a positive value. This allows two real numbers
(of the same format and sign) to be compared as if they are unsigned binary
integers. That is, when comparing them bitwise from left to right
(beginning with the leftmost exponent bit), the first bit position that
differs orders the numbers; there is no need to proceed further with the
comparison. A number's true exponent can be determined simply by subtracting
the bias value of its format.
The short and long real formats exist in memory only. If a number in one of
these formats is loaded into an 80287 register, it is automatically
converted to temporary real, the format used for all internal operations.
Likewise, data in registers can be converted to short or long real for
storage in memory. The temporary real format may be used in memory also,
typically to store intermediate results that cannot be held in registers.
Most applications should use the long real form to store real number data
and results; it provides sufficient range and precision to return correct
results with a minimum of programmer attention. The short real format is
appropriate for applications that are constrained by memory, but it should
be recognized that this format provides a smaller margin of safety. It is
also useful for debugging algorithms, because roundoff problems will
manifest themselves more quickly in this format. The temporary real format
should normally be reserved for holding intermediate results, loop
accumulations, and constants. Its extra length is designed to shield final
results from the effects of rounding and overflow/underflow in intermediate
calculations. However, the range and precision of the long real form are
adequate for most microcomputer applications.
Rounding Control
Internally, the 80287 employs three extra bits (guard, round, and sticky
bits) that enable it to represent the infinitely precise true result of a
computation; these bits are not accessible to programmers. Whenever the
destination can represent the infinitely precise true result, the 80287
delivers it. Rounding occurs in arithmetic and store operations when the
format of the destination cannot exactly represent the infinitely precise
true result. For example, a real number may be rounded if it is stored in a
shorter real format, or in an integer format. Or, the infinitely precise
true result may be rounded when it is returned to a register.
The NPX has four rounding modes, selectable by the RC field in the control
word (see figure 1-5). Given a true result b that cannot be represented by
the target data type, the 80287 determines the two representable numbers a
and c that most closely bracket b in value (a < b < c). The processor then
rounds (changes) b to a or to c according to the mode selected by the RC
field as shown in table 1-6. Round introduces an error in a result that is
less than one unit in the last place to which the result is rounded. "Round
to nearest" is the default mode and is suitable for most applications; it
provides the most accurate and statistically unbiased estimate of the true
result. The chop mode is provided for integer arithmetic applications.
"Round up" and "round down" are termed directed rounding and can be used to
implement interval arithmetic. Interval arithmetic generates a certifiable
result independent of the occurrence of rounding and other errors. The upper
and lower bounds of an interval may be computed by executing an algorithm
twice, rounding up in one pass and down in the other.
Table 1-6. Rounding Modes
��������������������������������������������������������������������������Ŀ
�RC Field� Rounding Mode � Rounding Action �
��������������������������������������������������������������������������Ĵ
� 00 � Round to nearest � Closer to b of a or c; if equally �
� � � close, select even number (the one �
� � � whose least significant bit is zero). �
� � � �
� 01 � Round down (toward -�) � a �
� � � �
� 10 � Round up (toward +�) � c �
� � � �
� 11 � Chop (toward 0) � Smaller in magnitude of a or c �
��������������������������������������������������������������������������
���������������������������������������������������������������������������
NOTE
a < b < c; a and c are representable, b is not.
���������������������������������������������������������������������������
Precision Control
The 80287 allows results to be calculated with either 64, 53, or 24 bits of
precision in the significand as selected by the precision control (PC) field
of the control word. The default setting, and the one that is best suited
for most applications, is the full 64 bits of significance provided by the
temporary-real format. The other settings are required by the proposed IEEE
standard, and are provided to obtain compatibility with the specifications
of certain existing programming languages. Specifying less precision
nullifies the advantages of the temporary real format's extended fraction
length, and does not increase execution speed. When reduced precision is
specified, the rounding of the fractional value clears the unused bits on
the right to zeros.
Infinity Control
The 80287's system of real numbers may be closed by either of two models of
infinity. These two means of closing the number system, projective and
affine closure, are illustrated schematically in figure 1-10. The setting
of the IC field in the control word selects one model or the other. The
default means of closure is projective, and this is recommended for most
computations. When projective closure is selected, the NPX treats the
special values +� and -� as a single unsigned infinity (similar to its
treatment of signed zeros). In the affine mode the NPX respects the signs
of +� and -�.
While affine mode may provide more information than projective, there are
occasions when the sign may in fact represent misinformation. For example,
consider an algorithm that yields an intermediate result x of +0 and -0 (the
same numeric value) in different executions. If 1/x were then computed in
affine mode, two entirely different values (+� and -�) would result from
numerically identical values of x. Projective mode, on the other hand,
provides less information but never returns misinformation. In general,
then, projective mode should be used globally, with affine mode reserved
for local computations where the programmer can take advantage of the sign
and knows for certain that the nature of the computations will not produce a
misleading result.
Special Computational Situations
Besides being able to represent positive and negative numbers, the 80287
data formats may be used to describe other entities. These special values
provide extra flexibility, but most users will not need to understand them
in order to use the 80287 successfully. This section describes the special
values that may occur in certain cases and the significance of each. The
80286 exceptions are also described, for writers of exception handlers and
for those interested in probing the limits of computation using the 80287.
The material presented in this section is mainly of interest to programmers
concerned with writing exception handlers. For many readers, this section
can be browsed lightly.
Special Numeric Values
The 80287 data formats encompass encodings for a variety of special values
in addition to the typical real or integer data values that result from
normal calculations. These special values have significance and can express
relevant information about the computations or operations that produced
them. The various types of special values are
� Non-normal real numbers, including
denormals
unnormals
� Zeros and pseudo zeros
� Positive and negative infinity
� NaN (Not-a-Number)
� Indefinite
The following description explains the origins and significance of each of
these special values. Tables 1-12 through 1-15 at the end of this
section show how each of these special values is encoded for each of the
numeric data types.
Nonnormal Real Numbers
As described previously, the 80287 generally stores nonzero real numbers in
normalized floating-point form; that is, the integer (leading) bit of the
significand is always a 1. This bit is explicitly stored in the temporary
real format, and is implicitly assumed to be a one (1{}) in the short- and
long-real formats. Since leading zeros are eliminated, normalized storage
allows the maximum number of significant digits to be held in a significand
of a given width.
When a floating-point numeric value becomes very close to zero, normalized
storage cannot be used to express the value accurately. To accommodate these
instances, the 80287 can store and operate on reals that are not normalized,
i.e., whose significands contain one or more leading zeros. Nonnormals
typically arise when the result of a calculation yields a value that is too
small to be represented in normal form.
Nonnormal values can exist in one of two forms:
� The floating-point exponent may be stored at its most negative value
(a Denormal),
� The integer bit (and perhaps other leading bits) of the significand
may be zero (an Unnormal).
The leading zeros of nonnormals permit smaller numbers to be represented,
at the cost of some lost precision (the number of significant bits is
reduced by the leading zeros). In typical algorithms, extremely small values
are most likely to be generated as intermediate, rather than final results.
By using the NPX's temporary real format for holding intermediate, values as
small as �3.4*10^(-4932) can be represented; this makes the occurrence of
nonnormal numbers a rare phenomenon in 80287 applications. Nevertheless, the
NPX can load, store, and operate on nonnormalized real numbers when they do
occur.
Denormals and Gradual Underflow
A denormal is the result of the NPX's response to an underflow exception
when that exception has been masked by the programmer (see the 80287 control
word, figure 1-5). Underflow occurs when the absolute value of a real
number becomes too small to be represented in the destination format, that
is, when the exponent of the true result is too negative to be represented
in the destination format. For example, a true exponent of -130 will cause
underflow if the destination is short real, because -126 is the smallest
exponent this format can accommodate. No underflow would occur if the
destination were long real or temporary real, since these formats can handle
exponents down to -1023 and -16,383, respectively.
Most computers underflow "abruptly:" they simply return a zero result,
which is likely to produce an unacceptable final result if computation
continues. The 80287, on the other hand, underflows "gradually" when the
underflow exception is masked. Gradual underflow is accomplished by
denormalizing the result until it is just within the exponent range of the
destination format. Denormalizing means incrementing the true result's
exponent and inserting a corresponding leading zero in the significand,
shifting the rest of the significand one place to the right. Denormal
values may occur in any of the short-real, long-real, or temporary-real
formats. Table 1-7 illustrates how a result might be denormalized to fit a
short-real destination.
The intent of the 80287's masked response to underflow is to allow
computation to continue without program intervention, while introducing an
error that carries about the same risk of contaminating the final result as
roundoff error. Roundoff (precision) errors occur frequently in real number
calculations; sometimes they spoil the result of computation, but often they
do not. Recognizing that roundoff errors are often nonfatal, computation
usually proceeds, and the programmer inspects the final results to see if
these errors have had a significant effect. The 80287's masked underflow
response allows programmers to treat underflows in a similar manner; the
computation continues and the programmer can examine the final result to
determine if an underflow has had important consequences. (If the underflow
has had a significant effect, an invalid operation will probably be
signalled later in the computation.)
Denormalization produces a denormal or a zero. Denormals are readily
identified by their exponents, which are always the minimum for their
formats; in biased form, this is always the bit string: 00...00. This same
exponent value is also assigned to the zeros, but a denormal has a nonzero
significand. A denormal in a register is tagged special. Tables 1-14 and
1-15 later in this chapter show how denormal values are encoded in each of
the real data formats.
The denormalization process may cause the loss of low-order significand
bits as they are shifted off the right. In a severe case, all the
significand bits of the true result are shifted out and replaced by the
leading zeros. In this case, the result of denormalization is a true zero,
and if the value is in a register, it is tagged as such. However, this is a
comparatively rare occurrence and, in any case, is no worse than "abrupt"
underflow.
Denormals are rarely encountered in most applications. Typical debugged
algorithms generate extremely small results during the evaluation of
intermediate subexpressions; the final result is usually of an appropriate
magnitude for its short or long real destination. If intermediate results
are held in temporary real, as is recommended, the great range of this
format makes underflow very unlikely. Denormals are likely to arise only
when an application generates a great many intermediates, so many that they
cannot be held on the register stack or in temporary real memory variables.
If storage limitations force the use of short or long reals for
intermediates, and small values are produced, underflow may occur, and, if
masked, may generate denormals.
Accessing a denormal may produce an exception as shown in table 1-8. (The
denormalized exception signals that a denormal has been fetched.) Denormals
may have reduced significance due to lost low-order bits, and an option of
the proposed IEEE standard precludes operations on nonnormalized operands.
This option may be implemented in the form of an exception handler that
responds to unmasked denormalized exceptions. Most users will mask this
exception so that computation may proceed; any loss of accuracy will be
analyzed by the user when the final result is delivered.
As table 1-8 shows, the division and remainder operations do not accept
denormal divisors and raise the invalid operation exception. Recall also
that the transcendental instructions require normalized operands and do not
check for exceptions. In all other cases, the NPX converts denormals to
unnormals, and the rules governing unnormal arithmetic then apply
(unnormals are described in the following section).
������������������������������������������������������������������������Ŀ
�Operation � Exception � Masked Response �
������������������������������������������������������������������������Ĵ
�FLD (short/long real) � D � Load as equivalent unnormal �
������������������������������������������������������������������������Ĵ
�Arithmetic (except following)� D � Convert (in a work area) �
� � � denormal to equivalent �
� � � unnormal and proceed �
������������������������������������������������������������������������Ĵ
�Compare and test � D � Convert (in a work area) �
� � � denormal to equivalent �
� � � unnormal and proceed �
������������������������������������������������������������������������Ĵ
�Division or FPREM with � I � Return real indefinite �
�denormal divisor � � �
������������������������������������������������������������������������
Unnormals��Descendents of Denormal Operands
An unnormal is the result of a computation using denormal operands and is
therefore the descendent of the 80287's masked underflow response. An
unnormal may exist only in the temporary real format; it may have any
exponent that a normal value may have (that is, in biased form any nonzero
value), but it is distinguished from a normal by the integer bit of its
significand, which is always 0. An unnormal in a register is tagged valid.
Unnormals are distinct from denormals, which have an exponent of 00...00 in
biased form.
Unnormals allows arithmetic to continue following an underflow while still
retaining their identity as numbers that may have reduced significance. That
is, unnormal operands generate unnormal results, so long as their
unnormality has a significant effect on the result. Unnormals are thus
prevented from "masquerading" as normals, numbers that have full
significance. On the other hand, if an unnormal has an insignificant effect
on a calculation with a normal, the result will be normal. For example,
adding a small unnormal to a large normal yields a normal result. The
converse situation yields an unnormal.
Table 1-9 shows how the instruction set deals with unnormal operands. Note
that the unnormal may be the original operand or a temporary created by the
80287 from a denormal.
Table 1-9. Unnormal Operands and Results
�����������������������������������������������������������������������������
Operation Result
Addition/subtraction Normalization of operand with larger
abosolute value determines normalization
of result.
Multiplication If either operand is unnormal, result
Operation Result
Multiplication If either operand is unnormal, result
is unormal.
Division (unnormal dividend only) Result is unnormal.
FPREM (unnormal dividend only) Result if normalized.
Division/FPREM (unnormal Signal invalid operation.
divisor)
Compare/FTST Normalize as much as possible before
making comparison.
FRNDINT Normalize as much as possible before
rounding.
FSQRT Signal invalid operation.
FST, FSTP (short/long real If value is above destination's underflow
Operation Result
FST, FSTP (short/long real If value is above destination's underflow
destination) boundary, then signal invalid operation;
else signal underflow.
FSTP (temporary real destination) Store as usual.
FIST, FISTP, FBSTP Signal invalid operation.
FLD Load as usual.
FXCH Exchange as usual.
Transcendental instructions Undefined; operands must be normal and
are not checked.
Zeros and Pseudo Zeros
The value zero in the real and decimal integer formats may be signed either
positive or negative, although the sign of a binary integer zero is always
positive. For computational purposes, the value of zero always behaves
identically, regardless of sign, and typically the fact that a zero may be
signed is transparent to the programmer. If necessary, the FXAM instruction
may be used to determine a zero's sign.
The zeros discussed above are called true zeros; if one of them is loaded
or generated in a register, the register is tagged zero. Table 1-10 lists
the results of instructions executed with zero operands and also shows how a
true zero may be created from nonzero operands.
Only the temporary real format may contain a special class of values called
pseudo zeros. A pseudo zero is an unnormal whose significand is all zeros,
but whose (biased) exponent is nonzero (true zeros have a zero exponent).
Neither is a pseudo zero's exponent all ones, since this encoding is
reserved for infinities and NANs. A pseudo zero result will be produced if
two unnormals, containing a total of more than 64 leading zero bits in their
significands, are multiplied together. This is a remote possibility in most
applications, but it can happen.
Pseudo zero operands behave like unnormals, except in the following cases
where they produce the same results as true zeros:
� Compare and test instructions
� FRNDINT (round to integer)
� Division, where the dividend is either a true zero or a pseudo zero
(the divisor is a pseudo zero)
In addition and subtraction of a pseudo zero and a true zero or another
pseudo zero, the pseudo zero(s) behaves like unnormals, except for the
determination of the result's sign. The sign is determined as shown in table
1-10 for two true zero operands.
Infinity
The real formats support signed representations of infinities. These values
are encoded with a biased exponent of all ones and a significand of
1{}00...00; if the infinity is in a register, it is tagged special. The
significand distinguishes infinities from NANs, including real indefinite.
A programmer may code an infinity, or it may be created by the NPX as its
masked response to an overflow or a zero divide exception. Note that when
rounding is up or down, the masked response may create the largest valid
value representable in the destination rather than infinity. See table 1-11
for details. As operands, infinities behave somewhat differently depending
on how the infinity control field in the control word is set (see table
1-12). When the projective model of infinity is selected, the infinities
behave as a single unsigned representation; because of this, infinity
cannot be compared with any value except infinity. In affine mode, the signs
of the infinities are observed, and comparisons are possible.
Table 1-10. Zero Operands and Results
�����������������������������������������������������������������������������
Operation/Operands Result Operation/Operands Resul
-0 -0
+X +0 FPREM
-X -0 �0 rem �0 Inval
FBSTP �X rem �0 Inval
+0 +0 +0 rem +X, +0 rem -X +0
-0 -0 -0 rem +X, -0 rem -X -0
FIST, FISTP +X rem +Y, +X rem -Y +0
+0 +0 -X rem -Y, -X rem +Y -0
Operation/Operands Result Operation/Operands Resul
+0 +0 -X rem -Y, -X rem +Y -0
-0 +0
+X +0 FSQRT
-X +0 -0 -0
+0 +0
Addition
+0 plus +0 +0 Compare
-0 plus -0 -0 �0: +X A < B
+0 plus -0, -0 plus +0 *0 �0: �0 A = B
-X plus +X, +X plus -X *0 �0: -X A > B
�0 plus �X, �X plus �0 �X
FTST
Subtraction �0 Zero
+0 minus -0 +0 FCHS
Operation/Operands Result Operation/Operands Resul
+0 minus -0 +0 FCHS
-0 minus +0 -0 +0 -0
+0 minus +0, -0 minus -0 *0 -0 +0
+X minus +X, -X minus -X *0 FABS
�0 minus �X, �X minus �0 �X �0 +0
F2XM1
Multiplication +0 +0
+0 * +0, -0 * -0 +0 -0 -0
+0 * -0, -0 * +0 -0 FRNDINT
+0 * +X, +X * +0 +0 +0 +0
+0 * -X, -X * +0 -0 -0 -0
-0 * +X, +X * -0 -0 FXTRACT
-0 * -X, -X * -0 +0 +0 Both
+X * +Y, -X * -Y +0, underflow -0 Both
+X * -Y, -X * +Y -0, underflow
NaN (Not a Number)
A NaN (Not a Number) is a member of a class of special values that exist in
the real formats only. A NaN has an exponent of 11..11B, may have either
sign, and may have any significand except 1{}00..00B, which is assigned to
the infinities. A NaN in a register is tagged special.
The 80287 will generate the special NaN, real indefinite, as its masked
response to an invalid operation exception. This NaN is signed negative; its
significand is encoded 1{}100..00. All other NaNs represent
programmer-created values.
Whenever the NPX uses an operand that is a NaN, it signals an invalid
operation exception in its status word. If this exception is masked in the
80287 control word, the 80287's masked exception response is to return the
NaN as the operation result. If both operands of an instruction are NaNs,
the result is the NaN with the larger absolute value. In this way, a NaN
that enters a computation propagates through the computation and will
eventually be delivered as the final result. Note, however, that the
transcendental instructions do not check their operands, and a NaN will
produce an undefined result.
By unmasking the invalid operation exception, the programmer can use NaNs
to trap to the exception handler. The generality of this approach and the
large number of NaN values that are available provide the sophisticated
programmer with a tool that can be applied to a variety of special
situations.
For example, a compiler could use NaNs as references to uninitialized
(real) array elements. The compiler could preinitialize each array element
with a NaN whose significand contained the index (relative position) of the
element. If an application program attempted to access an element that it
had not initialized, it would use the NaN placed there by the compiler. If
the invalid operation exception were unmasked, an interrupt would occur, and
the exception handler would be invoked. The exception handler could
determine which element had been accessed, since the operand address field
of the exception pointers would point to the NaN, and the NaN would contain
the index number of the array element.
NaNs could also be used to speed up debugging. In its early testing phase,
a program often contains multiple errors. An exception handler could be
written to save diagnostic information in memory whenever it was invoked.
After storing the diagnostic data, it could supply a NaN as the result of
the erroneous instruction, and that NaN could point to its associated
diagnostic area in memory. The program would then continue, creating a
different NaN for each error. When the program ended, the NaN results could
be used to access the diagnostic data saved at the time the errors
occurred. Many errors could thus be diagnosed and corrected in one test run.
Table 1-11. Masked Overflow Response with Directed Rounding
��������������������������������������������������������������������������Ŀ
� True Result � � �
�������������������Ĵ Rounding � Result Delivered �
�Normalization �Sign� Mode � �
��������������������������������������������������������������������������Ĵ
�Normal � + � Up � +� �
�Normal � + � Down � Largest finite positive number �
�Normal � - � Up � Largest finite negative number �
�Normal � - � Down � -� �
�Unnormal � + � Up � +� �
�Unnormal � - � Down � Largest exponent, result's significand �
�Unnormal � + � Up � Largest exponent, result's significand �
�Unnormal � - � Down � -� �
�������������������������������������������������������������������������
Table 1-12. Infinity Operands and Results
����������������������������������������������������������������������������ķ
Key to symbols used in this table
Operation Projective Result Affine Result
Addition
+� plus +� Invalid operation +�
-� plus -� Invalid operation -�
+� plus -� Invalid operation Invalid operation
-� plus +� Invalid operation Invalid operation
�� plus �X *� *�
�X plus �� *� *�
Subtraction
+� minus -� Invalid operation +�
-� minus +� Invalid operation -�
Key to symbols used in this table
Operation Projective Result Affine Result
-� minus +� Invalid operation -�
+� minus +� Invalid operation Invalid operation
-� minus -� Invalid operation Invalid operation
�� minus �X *� *�
�X minus �� �� ��
FSQRT
-� Invalid operation Invalid operation
Key to symbols used in this table
Operation Projective Result Affine Result
-� Invalid operation Invalid operation
+� Invalid operation +�
FPREM
�� rem �� Invalid operation Invalid operation
�� rem �X Invalid operation Invalid operation
�Y rem �� *Y *Y
�0 rem �� *0 *0
FRNDINT
�� *� *�
FSCALE
�� scaled by �� Invalid operation Invalid operation
�� scaled by �X *� *�
�0 scaled by �� *0 *0
�Y scaled by � Invalid operation Invalid operation
Key to symbols used in this table
Operation Projective Result Affine Result
FXTRACT
�� Invalid operation Invalid operation
Compare
��: �� A = B -� < +�
��: �Y A ? B (and) invalid operation -� < Y < +�
��: �0 A ? B (and) invalid operation -� < 0 < +�
FTST
�� A ? B (and) invalid operation *�
Indefinite
For every 80287 numeric data type, one unique encoding is reserved for
representing the special value indefinite. The 80287 produces this encoding
as its response to a masked invalid-operation exception. In the case of
reals, the indefinite value can be stored and loaded like any NaN, and it
always retains its special identity; programmers are advised not to use this
encoding for any other purpose. Packed decimal indefinite may be stored by
the NPX in a FBSTP instruction; attempting to use this encoding in a FBLD
instruction, however, will have an undefined result. In the binary
integers, the same encoding may represent either indefinite or the largest
negative number supported by the format (-2^(15), -2^(31), or -2^(63)). The
80287 will store this encoding as its masked response to an invalid
operation, or when the value in a source register represents or rounds to
the largest negative integer representable by the destination. In situations
where its origin may be ambiguous, the invalid operation exception flag can
be examined to see if the value was produced by an exception response. When
this encoding is loaded, or used by an integer arithmetic or compare
operation, it is always interpreted as a negative number; thus indefinite
cannot be loaded from a packed decimal or binary integer.
Encoding of Data Types
Tables 1-13 through 1-16 show how each of the special values just
described is encoded for each of the numeric data types. In these tables,
the least-significant bits are shown to the right and are stored in the
lowest memory addresses. The sign bit is always the left-most bit of the
highest-addressed byte.
Numeric Exceptions
Whenever the 80287 NPX attempts a numeric operation with invalid operands
or produces a result that cannot be represented, the 80287 recognizes a
numeric exception condition. Altogether, the 80287 checks for the following
six classes of exceptions while executing numeric instructions:
Invalid Operation
The 80287 reports an invalid operation if any of the following occurs:
� An attempt to load a register that is not empty (stack overflow).
� An attempt to pop an operand from an empty register (stack underflow).
� An operand is a NaN.
� The operands cause the operation to be indeterminate (square root of a
negative number, 0/0).
An invalid operation generally indicates a program error.
Zero Divisor
If an instruction attempts to divide a finite nonzero operand by zero, the
80287 will report a zero divide exception.
Denormalized Operand
If an instruction attempts to operate on a denormal, the NPX reports the
denormalized operand exception. This exception allows users to implement in
software an option of the proposed IEEE standard specifying that operands
must be prenormalized before they are used.
Numeric Overflow and Underflow
If the exponent of a numeric result is too large for the destination real
format, the 80287 signals a numeric overflow. Conversely, if the exponent of
a result is too small to be represented in the destination format, a numeric
underflow is signaled. If either of these exceptions occur, the result of
the operation is outside the range of the destination real format.
Typical algorithms are most likely to produce extremely large and small
numbers in the calculation of intermediate, rather than final, results.
Because of the great range of the temporary real format (recommended as the
destination format for intermediates), overflow and underflow are
relatively rare events in most 80287 applications.
Inexact Result
If the result of an operation is not exactly representable in the
destination format, the 80287 rounds the number and reports the precision
exception. For example, the fraction 1/3 cannot be precisely represented in
binary form. This exception occurs frequently and indicates that some
(generally acceptable) accuracy has been lost; it is provided for
applications that need to perform exact arithmetic only.
Handling Numeric Errors
When numeric errors occur, the NPX takes one of two possible courses of
action:
� The NPX can itself handle the error, producing the most reasonable
result and allowing numeric program execution to continue undisturbed.
� A software exception handler can be invoked by the CPU to handle the
error.
Each of the six exception conditions described above has a corresponding
flag bit in the 80287 status word and a mask bit in the 80287 control word.
If an exception is masked (the corresponding mask bit in the control
word = 1), the 80287 takes an appropriate default action and continues with
the computation. If the exception is unmasked (mask = 0), the 80287 asserts
the ERROR output to the 80286 to signal the exception and invoke a
software exception handler.
The NPX reports an exception by setting the corresponding flag in the NPX
status word to 1. The NPX then checks the corresponding exception mask in
the control word to determine if it should "field" the exception (mask = 1),
or if it should signal the exception to the CPU to invoke a software
exception handler (mask = 0).
If the mask is set, the exception is said to be masked (from user
software), and the NPX executes its on-chip masked response for that
exception. If the mask is not set (mask = 0), the exception is unmasked,
and the NPX performs its unmasked response. The masked response always
produces a standard result, then proceeds with the instruction. The unmasked
response always traps to a software exception handler, allowing the CPU to
recognize and take action on the exception. Table 1-17 gives a complete
description of all exception conditions and the NPX's masked response.
Note that when exceptions are masked, the NPX may detect multiple
exceptions in a single instruction, because it continues executing the
instruction after performing its masked response. For example, the 80287
could detect a denormalized operand, perform its masked response to this
exception, and then detect an underflow.
Table 1-17. Exception Conditions and Masked Responses
����������������������������������������������������������������������������ķ
���������������������������������������������������������������������������
Condition Masked Response
���������������������������������������������������������������������������
Invalid Operation
���������������������������������������������������������������������������
Source register is tagged empty Return real indefinite.
(usually due to stack underflow).
Destination register is not tagged Return real indefinite
empty (usually due to stack (overwrite destination value).
overflow).
One or both operands is a NaN. Return NaN with larger absolute
value (ignore signs).
(Compare and test operations only): Set condition codes "not
one or both operands is a NaN. comparable."
(Addition operations only): closure Return real indefinite.
���������������������������������������������������������������������������
Condition Masked Response
���������������������������������������������������������������������������
Invalid Operation
���������������������������������������������������������������������������
(Addition operations only): closure Return real indefinite.
is affine and operands are
opposite-signed infinities; or
closure is projective and both
operands are � (signs immaterial).
(Subtraction operations only): Return real indefinite.
closure is affine and operands are
like-signed infinities; or closure
is projective and both operands are
� (signs immaterial).
(Multiplication operations only): Return real indefinite.
� * 0; or 0 * �.
(Division operations only): Return real indefinite.
���������������������������������������������������������������������������
Condition Masked Response
���������������������������������������������������������������������������
Invalid Operation
���������������������������������������������������������������������������
(Division operations only): Return real indefinite.
� � �; or 0 � 0; or 0 � pseudo
zero; or divisor is denormal
or unormal.
(FPREM instruction only): modulus Return real indefinite, set
(divisor) is unnormal or denormal; condition code = "complete
or dividend is �. remainder."
(FSQRT instruction only): operand Return real indefinite.
is nonzero and negative; or operand
is denormal or unnormal; or closure
is affine and operand is -�; or
closure is projective and operand
is �.
(Compare operations only): closure Set condition code = "not
is projective and � is being comparable."
compared with 0, a normal or �.
(FTST instruction only): closure is Set condition code = "not
projective and operand is �. comparable."
(FIST, FISTP instructions only): Store integer indefinite.
source register is empty, a NaN,
denormal, unnormal, �, or exceeds
representable range of destination.
(FBSTP instruction only): source Stored packed decimal
register is empty, a NaN, denormal, indefinite.
unnormal, �, or exceeds 18 decimal
���������������������������������������������������������������������������
Condition Masked Response
���������������������������������������������������������������������������
Invalid Operation
���������������������������������������������������������������������������
unnormal, �, or exceeds 18 decimal
digits.
(FST, FSTP instructions only): Store real indefinite.
destination is short or long real
and source register is an unnormal
with exponent in range.
(FXCH instruction only): one or Change empty register(s) to
both registers is tagged empty. real indefinite and then
perform exchange.
����������������������������������������������������������������������������ķ
���������������������������������������������������������������������������
Condition Masked Response
���������������������������������������������������������������������������
Condition Masked Response
���������������������������������������������������������������������������
Denormalized Operand
���������������������������������������������������������������������������
(FLD instruction only): source No special action; load as usual.
operand is denormal.
(Arithmetic operations only): one Convert (in a work area) the
or both operands is denormal. operand to the equivalent unnormal
and proceed.
(Compare and test operations only): Convert (in a work area) any
one or both operands is denormal denormal to the equivalent
or unnormal other than pseudo unnormal; normalize as much as
zero). possible, and proceed with
operation.
���������������������������������������������������������������������������
Zero Divide
���������������������������������������������������������������������������
(Division operations only): Return � signed with "exclusive or"
���������������������������������������������������������������������������
Condition Masked Response
���������������������������������������������������������������������������
(Division operations only): Return � signed with "exclusive or"
divisor = 0. of operand signs.
���������������������������������������������������������������������������
Overflow
���������������������������������������������������������������������������
(Arithmetic operations only): Return properly signed � and signal
rounding is nearest or chop, and precision exception.
exponent of true result > 16,383.
(FST, FSTP instructions only): Return properly signed � and signal
rounding is nearest or chop, and precision exception.
exponent of true result > +127
(short real destination) or > +1023
(long real destination).
���������������������������������������������������������������������������
Underflow
���������������������������������������������������������������������������
(Arithmetic operations only): Denormalize until exponent rises to
���������������������������������������������������������������������������
Condition Masked Response
���������������������������������������������������������������������������
(Arithmetic operations only): Denormalize until exponent rises to
exponent of true result < -16,382 -16,382 (true), round significand
(true). to 64 bits. If denormalized rounded
significand = 0, then return true
0; else, return denormal (tag =
special, biased exponent = 0).
(FST, FSTP instructions only): Denormalize until exponent rises to
destination is short real and -126 (true), round significand to
exponent of true result < -126 24 bits, store true 0 if
(true). denormalized rounded significand = 0;
else, store denormal (biased
exponent = 0).
(FST, FSTP instructions only): Denormalize until exponent rises to
destination is long real and -1022 (true), round significand to
exponent of true result < -1022 53 bits, store true 0 if rounded
(true). denormalized significand = 0; else,
���������������������������������������������������������������������������
Condition Masked Response
���������������������������������������������������������������������������
(true). denormalized significand = 0; else,
store denormal (biased exponent = 0).
���������������������������������������������������������������������������
Precision
���������������������������������������������������������������������������
True rounding error occurs. No special action.
Masked response to overflow No special action.
exception earlier in instruction.
Automatic Exception Handling
As described in the previous section, when the 80287 NPX encounters an
exception condition whose corresponding mask bit in the NPX control word is
set, the NPX automatically performs an internal fix-up (masked-exception)
response. The 80287 NPX has a default fix-up activity for every possible
exception condition it may encounter. These masked-exception responses are
designed to be safe and are generally acceptable for most numeric
applications.
As an example of how even severe exceptions can be handled safely and
automatically using the NPX's default exception responses, consider a
calculation of the parallel resistance of several values using only the
standard formula (figure 1-11). If R{1} becomes zero, the circuit
resistance becomes zero. With the divide-by-zero and precision exceptions
masked, the 80287 NPX will produce the correct result.
By masking or unmasking specific numeric exceptions in the NPX control
word, NPX programmers can delegate responsibility for most exceptions to the
NPX, reserving the most severe exceptions for programmed exception handlers.
Exception-handling software is often difficult to write, and the NPX's
masked responses have been tailored to deliver the most reasonable result
for each condition. For the majority of applications, programmers will find
that masking all exceptions other than Invalid Operation will yield
satisfactory results with the least programming effort. An Invalid
Operation exception normally indicates a fatal error in a program that must
be corrected; this exception should not normally be masked.
The exception flags in the NPX status word provide a cumulative record of
exceptions that have occurred since these flags were last cleared. Once set,
these flags can be cleared only by executing the FCLEX (clear exceptions)
instruction, by reinitializing the NPX, or by overwriting the flags with an
FRSTOR or FLDENV instruction. This allows a programmer to mask all
exceptions (except invalid operation), run a calculation, and then inspect
the status word to see if any exceptions were detected at any point in the
calculation.
Software Exception Handling
If the NPX encounters an unmasked exception condition, it signals the
exception to the 80286 CPU using the ERROR status line between the two
processors.
The next time the 80286 CPU encounters a WAIT or ESC instruction in its
instruction stream, the 80286 will detect the active condition of the
ERROR status line and automatically trap to an exception response
routine using interrupt #16��the Processor Extension Error exception.
This exception response routine is typically a part of the systems
software. Typical exception responses may include:
� Incrementing an exception counter for later display or printing
� Printing or displaying diagnostic information (e.g., the 80287
environment and registers)
� Aborting further execution
� Using the exception pointers to build an instruction that will run
without exception andexecuting it
Application programmers on 80286 systems having systems software support
for the 80287 NPX should consult their references for the appropriate system
response to NPX exceptions. For systems programmers, specific details on
writing software exception handlers are included in the section
"System-Level Numeric Programming" later in this manual.
The 80287 NPX differs from the 8087 NPX in the manner in which numeric
exceptions are signalled to the CPU; the 8087 requires an interrupt
controller (8259A) to interrupt the CPU, while the 80287 does not.
Programmers upgrading 8087 software to operate on an 80287 should be aware
of these differences and any implications they might have on numeric
exception-handling software. Appendix B explains the differences between
the 80287 and the 8087 NPX in greater detail.
Programmers developing applications for the 80287 have a wide range of
instructions and programming alternatives from which to choose.
The following sections describe the 80287 instruction set in detail, and
follow up with a discussion of several of the programming facilities that
are available to programmers of 80287.
The 80287 NPX Instruction Set
This section describes the operation of all 80287 instructions. Within this
section, the instructions are divided into six functional classes:
� Data Transfer instructions
� Arithmetic instructions
� Comparison instructions
� Transcendental instructions
� Constant instructions
� Processor Control instructions
At the end of this section, each of the instructions is described in terms
of its execution speed, bus transfers, and exceptions, as well as a coding
example for each combination of operands accepted by the instruction. For
easy reference, this information is concentrated into a table, organized
alphabetically by instruction mnemonic.
Throughout this section, the instruction set is described as it appears to
the ASM286 programmer who is coding a program. Appendix A covers the actual
machine instruction encodings, which are principally of use to those reading
unformatted memory dumps, monitoring instruction fetches on the bus, or
writing exception handlers.
Compatibility with the 8087 NPX
The instruction set for the 80287 NPX is largely the same as that for the
8087 NPX used with 8086 and 8088 systems. Most object programs generated for
the 8087 will execute without change on the 80287. Several instructions are
new to the 80287, and several 8087 instructions perform no useful function
on the 80287. Appendix B at the back of this manual gives details of these
instruction set differences and of the differences in the ASM86 and ASM286
assemblers.
Numeric Operands
The typical NPX instruction accepts one or two operands as inputs, operates
on these, and produces a result as an output. Operands are most often (the
contents of) register or memory locations. The operands of some instructions
are predefined; for example, FSQRT always takes the square root of the
number in the top stack element. Others allow, or require, the programmer to
explicitly code the operand(s) along with the instruction mnemonic. Still
others accept one explicit operand and one implicit operand, which is
usually the top stack element.
Whether supplied by the programmer or utilized automatically, the two basic
types of operands are sources and destinations. A source operand simply
supplies one of the inputs to an instruction; it is not altered by the
instruction. Even when an instruction converts the source operand from one
format to another (e.g., real to integer), the conversion is actually
performed in an internal work area to avoid altering the source operand. A
destination operand may also provide an input to an instruction. It is
distinguished from a source operand, however, because its content may be
altered when it receives the result produced by the operation; that is, the
destination is replaced by the result.
Many instructions allow their operands to be coded in more than one way.
For example, FADD (add real) may be written without operands, with only a
source or with a destination and a source. The instruction descriptions in
this section employ the simple convention of separating alternative operand
forms with slashes; the slashes, however, are not coded. Consecutive
slashes indicate an option of no explicit operands. The operands for FADD
are thus described as
//source/destination, source
This means that FADD may be written in any of three ways:
FADD
FADD source
FADD destination, source
When reading this section, it is important to bear in mind that memory
operands may be coded with any of the CPU's memory addressing modes. To
review these modes��direct, register indirect, based, indexed, based
indexed��refer to the 80286 Programmer's Reference Manual. Table 2-17 later
in this chapter also provides several addressing mode examples.
Data Transfer Instructions
These instructions (summarized in table 2-1) move operands among elements
of the register stack, and between the stack top and memory. Any of the
seven data types can be converted to temporary real and loaded (pushed) onto
the stack in a single operation; they can be stored to memory in the same
manner. The data transfer instructions automatically update the 80287 tag
word to reflect the register contents following the instruction.
FLD source
FLD (load real) loads (pushes) the source operand onto the top of the
register stack. This is done by decrementing the stack pointer by one and
then copying the content of the source to the new stack top. The source may
be a register on the stack (ST(i)) or any of the real data types in memory.
Short and long real source operands are converted to temporary real
automatically. Coding FLD ST(0) duplicates the stack top.
FST destination
FST (store real) transfers the stack top to the destination, which may be
another register on the stack or a short or long real memory operand. If the
destination is short or long real, the significand is rounded to the width
of the destination according to the RC field of the control word, and the
exponent is converted to the width and bias of the destination format.
If, however, the stack top is tagged special (it contains �, a NaN, or a
denormal) then the stack top's significand is not rounded but is chopped (on
the right) to fit the destination. Neither is the exponent converted, but it
also is chopped on the right and transferred "as is." This preserves the
value's identification as � or a NaN (exponent all ones) or a denormal
(exponent all zeros) so that it can be properly loaded and tagged later in
the program if desired.
Table 2-1. Data Transfer Instructions
���������������������������������������������������������Ŀ
� Real Transfers �
���������������������������������������������������������Ĵ
� FLD � Load real �
� FST � Store real �
� FSTP � Store real and pop �
� FXCH � Exchange registers �
��������������������������������������������������������Ĵ
� Integer Transfers �
���������������������������������������������������������Ĵ
� FILD � Integer load �
� FIST � Integer store �
� FISTP � Integer store and pop �
��������������������������������������������������������Ĵ
� Packed Decimal Transfers �
���������������������������������������������������������Ĵ
� FBLD � Packed decimal (BCD) load �
� FBSTP � Packed decimal (BCD) store and pop �
����������������������������������������������������������
FSTP destination
FSTP (store real and pop) operates identically to FST except that the stack
is popped following the transfer. This is done by tagging the top stack
element empty and then incrementing ST. FSTP permits storing to a temporary
real memory variable, whereas FST does not. Coding FSTP ST(0) is equivalent
to popping the stack with no data transfer.
FXCH//destination
FXCH (exchange registers) swaps the contents of the destination and the
stack top registers. If the destination is not coded explicitly, ST(1) is
used. Many 80287 instructions operate only on the stack top; FXCH provides a
simple means of effectively using these instructions on lower stack
elements. For example, the following sequence takes the square root of the
third register from the top:
FXCH ST(3)
FSQRT
FXCH ST(3)
FILD source
FILD (integer load) converts the source memory operand from its binary
integer format (word, short, or long) to temporary real and loads (pushes)
the result onto the stack. The (new) stack top is tagged zero if all bits in
the source were zero, and is tagged valid otherwise.
FIST destination
FIST (integer store) rounds the content of the stack top to an integer
according to the RC field of the control word and transfers the result to
the destination. The destination may define a word or short integer
variable. Negative zero is stored in the same encoding as positive zero:
0000...00.
FISTP destination
FISTP (integer and pop) operates like FIST and also pops the stack
following the transfer. The destination may be any of the binary integer
data types.
FBLD source
FBLD (packed decimal (BCD) load) converts the content of the source operand
from packed decimal to temporary real and loads (pushes) the result onto the
stack. The sign of the source is preserved, including the case where the
value is negative zero. FBLD is an exact operation; the source is loaded
with no rounding error.
The packed decimal digits of the source are assumed to be in the range
0-9H. The instruction does not check for invalid digits (A-FH) and the
result of attempting to load an invalid encoding is undefined.
FBSTP destination
FBSTP (packed decimal (BCD) store and pop) converts the content of the
stack top to a packed decimal integer, stores the result at the destination
in memory, and pops the stack. FBSTP produces a rounded integer from a
nonintegral value by adding 0.5 to the value and then chopping. Users who
are concerned about rounding may precede FBSTP with FRNDINT.
Arithmetic Instructions
The 80287's arithmetic instruction set (table 2-2) provides a wealth of
variations on the basic add, subtract, multiply, and divide operations, and
a number of other useful functions. These range from a simple absolute value
to a square root instruction that executes faster than ordinary division;
80287 programmers no longer need to spend valuable time eliminating square
roots from algorithms because they run too slowly. Other arithmetic
instructions perform exact modulo division, round real numbers to integers,
and scale values by powers of two.
The 80287's basic arithmetic instructions (addition, subtraction,
multiplication, and division) are designed to encourage the development of
very efficient algorithms. In particular, they allow the programmer to
minimize memory references and to make optimum use of the NPX register
stack.
Table 2-3 summarizes the available operation/operand forms that are
provided for basic arithmetic. In addition to the four normal operations,
two "reversed" instructions make subtraction and division "symmetrical" like
addition and multiplication. The variety of instruction and operand forms
give the programmer unusual flexibility:
� Operands may be located in registers or memory.
� Results may be deposited in a choice of registers.
� Operands may be a variety of NPX data types: temporary real, long
real, short real, short integer or word integer, with automatic
conversion to temporary real performed by the 80287.
Five basic instruction forms may be used across all six operations, as
shown in table 2-3. The classicial stack form may be used to make the 80287
operate like a classical stack machine. No operands are coded in this form,
only the instruction mnemonic. The NPX picks the source operand from the
stack top and the destination from the next stack element. It then pops the
stack, performs the operation, and returns the result to the new stack top,
effectively replacing the operands by the result.
The register form is a generalization of the classical stack form; the
programmer specifies the stack top as one operand and any register on the
stack as the other operand. Coding the stack top as the destination provides
a convenient way to access a constant, held elsewhere in the stack, from
the stack top. The converse coding (ST is the source operand) allows, for
example, adding the top into a register used as an accumulator.
Often the operand in the stack top is needed for one operation but then is
of no further use in the computation. The register pop form can be used to
pick up the stack top as the sourced operand, and then discard it by
popping the stack. Coding operands of ST(1), ST with a register pop
mnemonic is equivalent to a classical stack operation: the top is popped
and the result is left at the new top.
The two memory forms increase the flexibity of the 80287's arithmetic
instructions. They permit a real number or a binary integer in memory to
be used directly as a source operand. This is a very useful facility in
situations where operands are not used frequently enough to justify
holding them in registers. Note that any memory addressing mode may be
used to define these operands, so they may be elements in arrays,
structures, or other data organizations, as well as simple scalars.
The six basic operations are discussed further in the paragraphs following
table 2-3, and descriptions of the remaining seven arithmetic operations
follow.
Table 2-2. Arithmetic Instructions
Addition
FADD Add real
FADDP Add real and pop
FIADD Integer add
Subtraction
FSUB Subtract real
FSUBP Subtract real and pop
FISUB Integer subtract
FSUBR Subtract real reversed
FSUBRP Subtract real reversed and pop
FISUBR Integer subtract reversed
Multiplication
FMUL Multiply real
FMULP Multiply real and pop
FIMUL Integer multiply
Division
FDIV Divide real
FDIVP Divide real and pop
FIDIV Integer divide
FDIVR Divide real reversed
FDIVRP Divide real reversed and pop
FIDIVR Integer divide reversed
Other Operations
FSQRT Square root
FSCALE Scale
FPREM Partial remainder
FRNDINT Round to integer
FXTRACT Extract exponent and significand
FABS Absolute value
FCHS Change sign
Table 2-3. Basic Arithmetic Instruction and Operands
The addition instructions (add real, add real and pop, integer add) add
the source and destination operands and return the sum to the destination.
The operand at the stack top may be doubled by coding:
FADD ST,ST(0)
NORMAL SUBTRACTION
FSUB //source/destinaton,source
FSUBP //destination/source
FISUB source
The normal subtraction instruction (subtract real,subtract real and pop,
integer subtract) subtract the source operand from the destination and
return the difference to the destination.
The reversed subtraction instructions (subtract real reversed, subtract
real reversed and pop, integer subtract reversed) subtract the destination
from the source and return the difference to the destination.
The multiplication instructions (multiply real, multiply real and pop,
integer multiply) multiply the source and destination operands and return
the product to the destination. Coding FMUL ST,ST(0) squares the content
of the stack top.
NORMAL DIVISION
FDIV //source/destination,source
FDIVP destination,source
FIDIV source
The normal division instructions (divide real, divide real and pop,
integer divide) divide the destination by the source and return the
quotient to the destination.
The reversed division instructions (divide real reversed, divide real
reversed and pop, integer divide reversed) divide the source operand by
the destination and return the quotient to the destination.
FSQRT
FSQRT (square root) replaces the content of the top stack element with its
square root. (Note: The square root of -0 is defined to be -0.)
FSCALE
FSCALE (scale) interprets the value contained in ST(1) as an integer and
adds this value to the exponent of the number in ST. This is equivalent to
ST ST * 2^(ST(1))
Thus FSCALE provides rapid multiplication or division by integal powers of
2. It is particularly useful for scaling the elements of a vector.
Note that FSCALE assumes the scale factor in ST(1) is an integral value in
the range -2^(15) � x < 2^(15). If the value is not integral, but is
in-range and is greater in magnitude than 1, FSCALE uses the nearest integer
smaller in magnitude; i.e., it chops the value toward 0. If the value is out
of range, or 0 < �x� < 1, the instruction will produce an undefined result
and will not signal an exception. The recommended practice is to load the
scale factor from a word integer to ensure correct operation.
FPREM
FPREM (partial remainder) performs modulo division of the top stack
element by the next stack element, i.e., ST(1) is the modulus. FPREM
produces an exact result; the precision exception does not occur. The sign
of the remainder is the same as the sign of the orginal dividend.
FPREM operates by performing successive scaled subtractions; obtaining the
exact remainder when the operands differ greatly in magnitude can consume
large amounts of execution time. Because the 80287 cas only be preempted
between instructions, the remainder function could seriously increase
interrupt latency in these cases. Accordingly, the instruction is designed
to be executed interactively in a software-controlled loop.
FPREM can reduce a magnitude difference of up to 264 in one execution. If
FPREM produces a remainder that is less than the modulus, the function is
complete and bit C2 of the status word condition code is cleared. If the
function is incomplete, C2 is set to 1; the result is ST is then called
the partial remainder. Software can inspect C2 by storing the status word
following execution of FPREM and re-execute the instruction (using the
partial remainder in ST as the dividend), until C2 is cleared.
Alternatively, a program can determine when the function is complete by
comparing ST to ST(1). If ST > ST(1), then FPREM must be executed again; if
ST = ST(1), then the remainder is 0; if ST < ST(1), then the remainder is
ST. A higher priority interrupting routine that needs the 80287 can force a
context switch between the instructions in the remainder loop.
An important use for FPREM is to reduce arguments (operands) of periodic
transcendental functions to the range permitted by these instructions. For
example, the FPTAN (tangent) instruction requires its argument to be less
than �/4. Using �/4 as a modulus, FPTAN will reduce an argument so that it
is in range of FPTAN. Because FPREM produces an exact result, the argument
reduction does not introduce roundoff error into the calculation, even if
several iterations are required to bring the argument into range. (The
rounding of � does not create the effect of a rounded argument, but of a
rounded period.)
FPREM also provides the least-significant three bits of the quotient
generated by FPREM (in C{3}, C{1}, C{0}). This is also important for
trancendental argument reduction, because it locates the original angle in
the correct one of eight �/4 segments of the unit circle (see table 2-4).
If the quotient is less than 4, then C0 will be the value of C3 before
FPREM was executed. If the quotient is less than 2, then C3 will be the
value of C1 before FPREM was executed.
FRNDINT
FRNDINT (round to integer) rounds the top stack element to an integer. For
example, assume that ST contains the 80287 real number encoding of the
decimal value 155.625. FRNDINT will change the value to 155 if the RC field
of the control word is set to down or chop, or to 156 if it is set to up or
nearest.
FXTRACT
FXTRACT (extract exponent and significand) "decomposes" the number in the
stack top into two numbers that represent the actual value of the operand's
exponent and significand fields. The "exponent" replaces the original
operand on the stack and the "significand" is pushed onto the stack.
Following execution of FXTRACT, ST (the new stack top) contains the value of
the original significand expressed as a real number: its sign is the same as
the operand's, its exponent is 0 true (16,383 or 3FFFH biased), and its
significand is identical to the original operand's. ST(1) contains the
value of the original operand's true (unbiased) exponent expressed as a real
number. If the original operand is zero, FXTRACT produces zeros in ST and
ST(1) and both are signed as the original operand.
To clarify the operation of FXTRACT, assume ST contains a number of whose
true exponent is +4 (i.e., its exponent field contains 4003H). After
executing FXTRACT, ST(1) will contain the real number +4.0; its sign will be
positive, its exponent field will contain 4001H (+2 true) and its
significand field will contain 1{}00...00B. In other words, the value in
ST(1) will be 1.0 * 2^(2) = 4. If ST contains an operand whose true exponent
is -7 (i.e., its exponent field contains 3FF8H), then FXTRACT will return an
"exponent" of -7.0; after the instruction executes, ST(1)'s sign and
exponent fields will contain C001H (negative sign, true exponent of 2), and
its significand will be 1{}1100...00B. In other words, the value in ST(1)
will be -1.11 * 2^(2) = -7.0. In both cases, following FXTRACT, ST's sign
and significand fields will be the same as the original operand's, and its
exponent field will contain 3FFFH (0 true).
FXTRACT is useful in conjunction with FBSTP for converting numbers in 80287
temporary real format to decimal representations (e.g., for printing or
displaying). It can also be useful for debugging, because it allows the
exponent and significant parts of a real number to be examined separately.
FABS
FABS (absolute value) changes the top stack element to its absolute value
by making its sign positive.
FCHS
FCHS (change sign) complements (reverses) the sign of the top stack
element.
Table 2-4. Condition Code Interpretation after FPREM
Comparison Instructions
Each of these instructions (table 2-5) analyzes the top stack element,
often in relationship to another operand, and reports the result in the
status word condition code. The basic operations are compare, test (compare
with zero), and examine (report tag, sign, and normalization). Special
forms of the compare operation are provided to optimize algorithms by
allowing direct comparisons with binary integers and real numbers in memory,
as well as popping the stack after a comparison.
The FSTSW (store status word) instruction may be used following a
comparison to transfer the condition code to memory for inspection.
Note that instructions other than those in the comparison group may update
the condition code. To ensure that the status word is not altered
inadvertently, store it immediately following a comparison operation.
FCOM //source
FCOM (compare real) compares the stack top to the source operand. The
source operand may be a register on the stack, or a short or long real
memory operand. If an operand is not coded, ST is compared to ST(1).
Positive and negative forms of zero compare identically as if they were
unsigned. Following the instruction, the condition codes reflect the order
of the operands as shown in table 2-6.
NaNs and � (projective) cannot be compared and return C3 = C0 = 1 as shown
in the table.
FCOMP //source
FCOMP (compare real and pop) operates like FCOM, and in addition pops the
stack.
FCOMPP
FCOMPP (compare real and pop twice) operates like FCOM and additionally
pops the stack twice, discarding both operands. The comparison is of the
stack top to ST(1); no operands may be explicitly coded.
FICOM source
FICOM (integer compare) converts the source operand, which may reference a
word or short binary integer variable, to temporary real and compares the
stack top to it.
FICOMP source
FICOMP (integer compare and pop) operates identically to FICOM and
additionally discards the value in ST by popping the stack.
FTST
FTST (test) tests the top stack element by comparing it to zero. The result
is posted to the condition codes as shown in table 2-7.
FXAM
FXAM (examine) reports the content of the top stack element as
positive/negative and NaN/unnormal/denormal/normal/zero, or empty.
Table 2-8 lists and interprets all the condition code values that FXAM
generates. Although four different encodings may be returned for an empty
register, bits C3 and C0 of the condition code are both 1 in all encodings.
Bits C2 and C1 should be ignored when examining for empty.
Table 2-5. Comparison Instructions
FCOM Compare real
FCOMP Compare real and pop
FCOMPP Compare real and pop twice
FICOM Integer compare
FICOMP Integer compare and pop
FTST Test
FXAM Examine
Table 2-6. Condition Code Interpretation after FCOM
Transcendental Instructions
The instructions in this group (table 2-9) perform the time-consuming core
calculations for all common trigonometric, inverse trigonometric,
hyperbolic, inverse hyperbolic, logarithmic, and exponential functions.
Prologue and epilogue software may be used to reduce arguments to the range
accepted by the instructions and to adjust the result to correspond to the
original arguments if necessary. The transcendentals operate on the top one
or two stack elements, and they return their results to the stack, also.
���������������������������������������������������������������������������
NOTE
The transcendental instructions assume that their operands are valid and
in-range. The instruction descriptions in this section provide the
allowed operand range of each instruction.
���������������������������������������������������������������������������
All operands to a transcendental must be normalized; denormals, unnormals,
infinities, and NaNs are considered invalid. (Zero operands are accepted by
some functions and are considered out-of-range by others). If a
transcendental operand is invalid or out-of-range, the instruction will
produce an undefined result without signalling an exception. It is the
programmer's responsibility to ensure that operands are valid and in-range
before executing a transcendental. For periodic functions, FPREM may be
used to bring a valid operand into range.
FPTAN
0 � ST(0) � �/4
FPTAN (partial tangent) computes the function Y/X = TAN(�). � is taken
from the top stack element; it must lie in the range 0 � � � �/4. The result
of the operation is a ratio; Y replaces � in the stack and X is pushed,
becoming the new stack top.
The ratio result of FPTAN and the ratio argument of FPATAN are designed to
optimize the calculation of the other trigonometric functions, including
SIN, COS, ARCSIN, and ARCCOS. These can be derived from TAN and ARCTAN via
standard trigonometric identities.
FPATAN
0 � ST(1) < ST(0) < �
FPATAN (partial arctangent) computes the function � = ARCTAN(Y/X). X is
taken from the top stack element and Y from ST(1). Y and X must observe the
inequality 0 � Y < X < �. The instruction pops the stack and returns � to
the (new) stack top, overwriting the Y operand.
F2XM1
0 � ST(0) � 0.5
F2XM1 (2 to the X minus 1) calculates the function Y = 2^(X) - 1. X is taken
from the stack top and must be in the range 0 � X � 0.5. The result Y
replaces X at the stack top.
This instruction is designed to produce a very accurate result even when X
is close to 0. To obtain Y = 2^(X), add 1 to the result delivered by F2XM1.
The following formulas show how values other than 2 may be raised to a
power of X:
As shown in the next section, the 80287 has built-in instructions for
loading the constants LOG{2}10 and LOG{2}e, and the FYL2X instruction may be
used to calculate X * LOG{2}Y.
FYL2X
0 < ST(0) < � - � < ST(1) < �
FYL2X (Y log base 2 of X) calculates the function Z = Y * LOG{2}X. X is
taken from the stack top and Y from ST(1). The operands must be in the
ranges 0 < X < � and -� < Y < +�. The instruction pops the stack and returns
Z at the (new) stack top, replacing the Y operand.
This function optimizes the calculations of log to any base other than two,
because a multiplication is always required:
LOG{n}2 * LOG{2}X
FYL2XP1
0 � �ST(0)� < (1 - (�2/2))
-� < ST(1) < �
FYL2XP1 (Y log base 2 of (X + 1)) calculates the function
Z = Y * LOG{2}(X+1). X is taken from the stack top and must be in the range
0 � �X� < (1 - (�2/2)). Y is taken from ST(1) and must be in the range
-� < Y < �. FYL2XP1 pops the stack and returns Z at the (new) stack top,
replacing Y.
The instruction provides improved accuracy over FYL2X when computing the
log of a number very close to 1, for example 1 + � where � << 1. Providing �
rather than 1 + � as the input to the function allows more significant
digits to be retained.
Table 2-9. Transcendental Instructions
FPTAN Partial tangent
FPATAN Partial arctangent
F2XM1 2^(X) - 1
FYL2X Y * log{2}X
FYL2XP1 Y * log{2}(X + 1)
Constant Instructions
Each of these instructions (table 2-10) loads (pushes) a commonly-used
constant onto the stack. The values have full temporary real precision (64
bits) and are accurate to approximately 19 decimal digits. Because a
temporary real constant occupies 10 memory bytes, the constant
instructions, which are only two bytes long, save storage and improve
execution speed, in addition to simplifying programming.
FLDZ
FLDZ (load zero) loads (pushes) +0.0 onto the stack.
FLD1
FLD1 (load one) loads (pushes) +1.0 onto the stack.
FLDPI
FLDPI (load �) loads (pushes) � onto the stack.
FLDL2T
FLDL2T (load log base 2 of 10) loads (pushes) the value LOG{2}10 onto the
stack.
FLDL2E
FLDL2E (load log base 2 of e) loads (pushes) the value LOG{2}e onto the
stack.
FLDLG2
FLDLG2 (load log base 10 of 2) loads (pushes) the value LOG{10}2 onto the
stack.
FLDLN2
FLDLN2 (load log base e of 2) loads (pushes) the value LOG{e}2 onto the
stack.
Processor Control Instructions
The processor control instructions shown in table 2-11 are not typically
used in calculations; they provide control over the 80287 NPX for
system-level activities. These activities include initialization, exception
handling, and task switching.
As shown in table 2-11, many of the NPX processor control instructions have
two forms of assembler mnemonic:
� A wait form, where the mnemonic is prefixed only with an F, such as
FSTSW. This form checks for unmasked numeric errors.
� A no-wait form, where the mnemonic is prefixed with an FN, such as
FNSTSW. This form ignores unmasked numeric errors.
When the control instruction is coded using the no-wait form of the
mnemonic, the ASM286 assembler does not precede the ESC instruction with a
wait instruction, and the CPU does not test the ERROR status line from the
NPX before executing the processor control instruction.
Only the processor control class of instructions have this alternate
no-wait form. All numeric instructions are automatically synchronized by the
80286, with the CPU testing the BUSY status line and only executing the
numeric instruction when this line is inactive. Because of this automatic
synchronization by the 80286, numeric instructions for the 80287 need not be
preceded by a CPU wait instruction in order to execute correctly.
It should also be noted that the 8087 instructions FENI and FDISI perform
no function in the 80287. If these opcodes are detected in an 80286/80287
instruction stream, the 80287 will perform no specific operation and no
internal states will be affected. For programmers interested in porting
numeric software from 8087 environments to the 80286, however, it should be
noted that program sections containing these exception-handling instructions
are not likely to be completely portable to the 80287. Appendix B contains
a more complete description of the differences between the 80287 and the
8087 NPX.
Table 2-11. Processor Control Instructions
FINIT/FNINIT Initialize processor
FSETPM Set Protected Mode
FLDCW Load control word
FSTCW/FNSTCW Store control word
FSTSW/FNSTSW Store status word
FSTSW AX/FNSTSW AX Store status word to AX
FCLEX/FNCLEX Clear exceptions
FSTENV/FNSTENV Store Environment
FLDENV Load environment
FSAVE/FNSAVE Save state
FRSTOR Restore state
FINCSTP Increment stack pointer
FDECSTP Decrement stack pointer
FFREE Free register
FNOP No operation
FWAIT CPU Wait
FINIT/FNINIT
FINIT/FNINIT (initialize processor) sets the 80287 NPX into a known state,
unaffected by any previous activity. The no-wait form of this instruction
will cause the 80287 to abort any previous numeric operations currently
executing in the NEU. This instruction performs the functional equivalent
of a hardware RESET, with one exception; FINIT/FNINIT does not affect the
current 80287 operating mode (either Real-Address mode or Protected mode).
FINIT checks for unmasked numeric exceptions, FNINIT does not.
Note that if FNINIT is executed while a previous 80287 memory-referencing
instruction is running, 80287 bus cycles in progress will be aborted. This
instruction may be necessary to clear the 80287 if a Processor Extension
Segment Overrun Exception (Interrupt 9) is detected by the CPU.
FSETPM
FSETPM (set Protected mode) sets the operating mode of the 80287 to
Protected Virtual-Address mode. When the 80287 is first initialized
following hardware RESET, it operates in Real-Address mode, just as does the
80286 CPU. Once the 80287 NPX has been set into Protected mode, only a
hardware RESET can return the NPX to operation in Real-Address mode.
When the 80287 operates in Protected mode, the NPX exception pointers are
represented differently than they are in Real-Address mode (see the FSAVE
and FSTENV instructions that follow). This distinction is evident primarily
to writers of numeric exception handlers, however. For general application
programmers, the operating mode of the 80287 need not be a concern.
FLDCW source
FLDCW (load control word) replaces the current processor control word with
the word defined by the source operand. This instruction is typically used
to establish or change the 80287's mode of operation. Note that if an
exception bit in the status word is set, loading a new control word that
unmasks that exception and clears the interrupt enable mask will generate an
immediate interrupt request before the next instruction is executed. When
changing modes, the recommended procedure is to first clear any exceptions
and then load the new control word.
FSTCW/FNSTCW destination
FSTCW/FNSTCW (store control word) writes the current processor control word
to the memory location defined by the destination. FSTCW checks for unmasked
numeric exceptions, FNSTCW does not.
FSTSW/FNSTSW destination
FSTSW/FNSTCW (store status word) writes the current value of the 80287
status word to the destination operand in memory. The instruction is used to
� Implement conditional branching following a comparison or FPREM
instruction (FSTSW)
� Poll the 80287 to determine if it is busy (FNSTSW)
� Invoke exception handlers in environments that do not use interrupts
(FSTSW).
FSTSW checks for unmasked numeric exceptions, FNSTSW does not.
FSTSW AX/FNSTSW AX
FSTSW AX/FNSTSW AX (store status word to AX) is a special 80287 instruction
that writes the current value of the 80287 status word directly into the
80286 AX register. This instruction optimizes conditional branching in
numeric programs, where the 80286 CPU must test the condition of various NPX
status bits. The waited form checks for unmasked numeric exceptions, the
non-waited for does not.
When this instruction is executed, the 80286 AX register is updated with
the NPX status word before the CPU executes any further instructions. In
this way, the 80286 can immediately test the NPX status word without any
WAIT or other synchronization instructions required.
FCLEX/FNCLEX
FCLEX/FNCLEX (clear exceptions) clears all exception flags, the error
status flag and the busy flag in the status word. As a consequence, the
80287's ERROR line goes inactive. FCLEX checks for unmasked numeric
exceptions, FNCLEX does not.
FSAVE/FNSAVE destination
FSAVE/FNSAVE (save state) writes the full 80287 state��environment plus
register stack��to the memory location defined by the destination operand.
Figure 2-1 shows the layout of the 94-byte save area; typically the
instruction will be coded to save this image on the CPU stack. FNSAVE
delays its execution until all NPX activity completes normally. Thus, the
save image reflects the state of the NPX following the completion of any
running instruction. After writing the state image to memory, FSAVE/FNSAVE
initializes the 80287 as if FINIT/FNINIT had been executed.
FSAVE/FNSAVE is useful whenever a program wants to save the current state
of the NPX and initialize it for a new routine. Three examples are
� An operating system needs to perform a context switch (suspend the
task that had been running and give control to a new task).
� An exception handler needs to use the 80287.
� An application task wants to pass a "clean" 80287 to a subroutine.
FSAVE checks for unmasked numeric errors before executing, FNSAVE does not.
An FWAIT should be executed before CPU interrupts are enabled or any
subsequent 80287 instruction is executed. Other CPU instructions may be
executed between the FNSAVE/FSAVE and the FWAIT.
�����������������������������������������������������������������������������
NOTES:
a = INSTRUCTION POINTER
b = OPERAND POINTER
S = Sign
Bit 0 of each field is rightmost, least significant
bit of corresponding register field.
Bit 63 of significand is integer bit (assumed
binary point is immediately to the right.)
�����������������������������������������������������������������������������
FRSTOR source
FRSTOR (restore state) reloads the 80287 from the 94-byte memory area
defined by the source operand. This information should have been written by
a previous FSAVE/FNSAVE instruction and not altered by any other
instruction. An FWAIT is not required after FRSTOR. FRSTOR will
automatically wait and check for interrupts until all data transfers are
completed before continuing to the next instruction.
Note that the 80287 "reacts" to its new state at the conclusion of the
FRSTOR; it will, for example, generate an exception request if the exception
and mask bits in the memory image so indicate when the next WAIT or
error-checking-ESC instruction is executed.
FSTENV/FNSTENV destination
FSTENV/FNSTENV (store environment) writes the 80287's basic
status��control, status, and tag words, and exception pointers��to the
memory location defined by the destination operand. Typically, the
environment is saved on the CPU stack. FSTENV/FNSTENV is often used by
exception handlers because it provides access to the exception pointers that
identify the offending instruction and operand. After saving the
environment, FSTENV/FNSTENV sets all exception masks in the processor.
FSTENV checks for pending errors before executing, FNSTENV does not.
Figure 2-2 shows the format of the environment data in memory. FNSTENV does
not store the environment until all NPX activity has completed. Thus, the
data saved by the instruction reflects the 80287 after any previously
decoded instruction has been executed. After writing the environment image
to memory, FNSTENV/FSTENV initializes the 80287 state as if FNINIT/FINIT
had been executed.
FSTENV/FNSTENV must be allowed to complete before any other 80287
instruction is decoded. When FSTENV is coded, an explicit FWAIT, or
assembler-generated WAIT, should precede any subsequent 80287 instruction.
Figure 2-2. FSTENV/FLDENV Memory Layout
REAL MODE PROTECTED MODE
15 0 MEMORY 15 0 MEMORY
�������������������������������ͻOFFSET ���������������������������ͻOFFSET
� CONTROL WORD � +0 � CONTROL WORD � +0
�������������������������������Ķ ���������������������������Ķ
� STATUS WORD � +2 � STATUS WORD � +2
�������������������������������Ķ ���������������������������Ķ
� TAG WORD � +4 � TAG WORD � +4
�������������������������������Ķ ���������������������������Ķ
� INSTRUCTION POINTER(15-0) � +6 � IP OFFSET � +6
�������������������������������Ķ ���������������������������Ķ
�INSTRUCTION� � INSTRUCTION � � �
� POINTER �0� OPCODE � +8 � CS SELECTOR � +8
� (19-14) � � (10-0) � � �
�����������������������������Ķ ���������������������������Ķ
� DATA POINTER(15-0) � +10 � DATA OPERAND OFFSET � +10
�������������������������������Ķ ���������������������������Ķ
� DATA � � � �
� POINTER � 0 � +12 � DATA OPERAND SELECTOR � +12
� (19-16) � � � �
�������������������������������ͼ ���������������������������ͼ
15 12 11 0
FLDENV source
FLDENV (load environment) reloads the environment from the memory area
defined by the source operand. This data should have been written by a
previous FSTENV/FNSTENV instruction. CPU instructions (that do not reference
the environment image) may immediately follow FLDENV. An FWAIT is not
required after FLDENV. FLDENV will automatically wait for all data
transfers to complete before executing the next instruction.
Note that loading an environment image that contains an unmasked exception
will cause a numeric exception when the next WAIT or error-checking-ESC
instruction is executed.
FINCSTP
FINCSTP (increment stack pointer) adds 1 to the stack top pointer (ST) in
the status word. It does not alter tags or register contents, nor does it
transfer data. It is not equivalent to popping the stack, because it does
not set the tag of the previous stack top to empty. Incrementing the stack
pointer when ST = 7 produces ST = 0.
FDECSTP
FDECSTP (decrement stack pointer) subtracts 1 from ST, the stack top
pointer in the status word. No tags or registers are altered, nor is any
data transferred. Executing FDECSTP when ST = 0 produces ST = 7.
FFREE destination
FFREE (free register) changes the destination register's tag to empty; the
content of the register is unaffected.
FNOP
FNOP (no operation) stores the stack top to the stack top (FST ST,ST(0))
and thus effectively performs no operation.
FWAIT (CPU Instruction)
FWAIT is not actually an 80287 instruction, but an alternate mnemonic for
the CPU WAIT instruction. The FWAIT or WAIT mnemonic should be coded
whenever the programmer wants to synchronize the CPU to the NPX, that is, to
suspend further instruction decoding until the NPX has completed the current
instruction. FWAIT will check for unmasked numeric exceptions.
���������������������������������������������������������������������������
NOTE
A CPU instruction should not attempt to access a memory operand until the
80287 instruction has completed. For example, the following coding shows
how FWAIT can be used to force the CPU instruction to wait for the 80287:
FIST VALUE
FWAIT ; Wait for FIST to complete
MOV AX,VALUE
���������������������������������������������������������������������������
More information on when to code an FWAIT instruction is given in a
following section of this chapter, "Concurrent Processing with the 80287."
Instruction Set Reference Information
Table 2-14 later in this chapter lists the operating characteristics of all
the 80287 instructions. There is one table entry for each instruction
mnemonic; the entries are in alphabetical order for quick lookup. Each entry
provides the general operand forms accepted by the instruction as well as a
list of all exceptions that may be detected during the operation.
One entry exists for each combination of operand types that can be coded
with the mnemonic. Table 2-12 explains the operand identifiers allowed in
table 2-14. Following this entry are columns that provide execution time in
clocks, the number of bus transfers run during the operation, the length of
the instruction in bytes, and an ASM286 coding sample.
Instruction Execution Time
The execution of an 80287 instruction involves three principal activities,
each of which may contribute to the overall execution time of the
instruction:
� 80286 CPU overhead involved in handling the ESC instruction opcode
and setting up the 80287 NPX
� Instruction execution by the 80287 NPX
� Operand transfers between the 80287 NPX and memory or a CPU register
The timing of these various activities is affected by the individual clock
frequencies of the 80286 CPU and the 80287 NPX. In addition, slow memories
requiring the insertion of wait states in bus cycles, and bus contention due
to other processors in the system, may lengthen operand transfer times.
In calculating an overall execution time for an individual numeric
instruction, analysts must take each of these activities into account. In
most cases, it can be assumed that the numeric instructions have already
been prefetched by the 80286 and are awaiting execution.
� The CPU overhead in handling the ESC instruction opcode takes only a
single CPU bus cycle before the 80287 begins its execution of the
numeric instruction. The timing of this bus cycle is determined by the
CPU clock. Additional CPU activity is required to set up the 80287's
instruction and data pointer registers, but this activity occurs after
the 80287 has begun executing its instruction, and so this parallel
activity does not affect total execution time.
� The duration of individual numeric instructions executing on the 80287
varies for each instruction. Table 2-14 quotes a typical execution
clock count and a range for each 80287 instruction. Dividing the
figures in the table by 10 (for a 10-MHz 80287 NPX clock) produces an
execution time in microseconds. The typical case is an estimate for
operand values that normally characterize most applications. The range
encompasses best- and worst-case operand values that may be found in
extreme circumstances.
� The operand transfer time required to transfer operands between the
80287 and memory or a CPU register depends on the number of words to be
transferred, the frequency of the CPU clock controlling bus timing, the
number of wait states added to accommodate slower memories, and
whether operands are based at even or odd memory addresses. Some
(small) additional number of bus cycles may also be lost due to the
asynchronous nature of the PEREQ/PEACK handshaking between the 80286
and 80287, and this interaction varies with relative frequencies of
the CPU and NPX clocks.
The execution clock counts for the NPX execution of instructions shown in
table 2-14 assume that no exceptions are detected during execution. Invalid
operation, denormalized operand (unmasked), and zero divide exceptions
usually decrease execution time from the typical figure, but execution
still falls within the indicated range. The precision exception has no
effect on execution time. Unmasked overflow and underflow, and masked
denormalized exceptions impose additional execution penalties as shown in
table 2-13. Absolute worst-case execution times are therefore the high
range figure plus the largest penalty that may be encountered.
ST Stack top; the register currently at the top of the stack.
ST(i) A register in the stack i (0�i�7) stack elements from the
top. ST(1) is the next-on-stack register, ST(2) is below
ST(1), etc.
Short-real A short real (32 bits) number in memory.
Long-real A long real (64 bits) number in memory.
Temp-real A temporary real (80 bits) number in memory.
Packed-decimal A packed decimal integer (18 digits, 10 bytes) in memory.
Word-integer A word binary integer (16 bits) in memory.
Short-integer A short binary integer (32 bits) in memory.
Identifier Explanation
Short-integer A short binary integer (32 bits) in memory.
Long-integer A long binary integer (64 bits) in memory.
nn-bytes A memory area nn bytes long.
Bus Transfers
NPX instructions that reference memory require bus cycles to transfer
operands between the NPX and memory. The actual number of transfers depends
on the length of the operand and the alignment of the operand in memory.
In table 2-14, the first figure gives execution clocks for even-addressed
operands, while the second gives the clock count for odd-addressed operands.
For operands aligned at word boundaries, that is, based at even memory
addresses, each word to be transferred requires one bus cycle between the
80286 data channel and memory, and one bus cycle to the NPX. For operands
based at odd memory addresses, each word transfer requires two bus cycles
to transfer individual bytes between the 80286 data channel and memory, and
one bus cycle to the NPX.
���������������������������������������������������������������������������
NOTE
For best performance, operands for the 80287 should be aligned along word
boundaries; that is, based at even memory addresses. Operands based at odd
memory addresses are transferred to memory essentially byte-at-a-time and
may take half again as long to transfer as word-aligned operands.
���������������������������������������������������������������������������
Additional transfer time is required if slow memories are being used,
requiring the insertion of wait states into the CPU bus cycle. In
multiprocessor environments, the bus may not be available immediately; this
overhead can also increase effective transfer time.
Instruction Length
80287 instructions that do not reference memory are two bytes long. Memory
reference instructions vary between two and four bytes. The third and fourth
bytes are for the 8- or 16-bit displacement values used in conjunction with
the standard 80286 memory-addressing modes.
Note that the lengths quoted in table 2-14 for the processor control
instructions (FNINIT, FNSTCW, FNSTSW, FNSTSW AX, FNCLEX, FNSTENV, and
FNSAVE) do not include the one-byte CPU wait instruction inserted by the
ASM286 assembler if the control instruction is coded using the wait form of
the mnemonic (e.g. FINIT, FSTCW, FSTSW, FSTSW AX, FCLEX, FSTENV, and
FSAVE). Wait and no-wait forms of the processor control instructions have
been described in the preceding section titled "Processor Control
Instructions."
Table 2-14. Instruction Set Reference Data
�����������������������������������������������������������������������������
�Execution Clocks� Operand Word Code
Operands Typical Range Transfers Bytes Coding Ex
�����������������������������������������������������������������������������
FABS FABS (no operands)
Absolute value Exceptions: I
(no operands) 14 10-17 0 2 FABS
�����������������������������������������������������������������������������
FADD FADD\\source\destination,source
Add real Execptions: I,D,O
ST(i),ST 90 75-105 0 2 FADDP ST(
�����������������������������������������������������������������������������
FBLD FBLD source
�Execution Clocks� Operand Word Code
Operands Typical Range Transfers Bytes Coding Ex
FBLD FBLD source
Packed decimal (BCD) load Exceptions: I
packed-decimal 300 290-310 5 2-4 FBLD YTD_
�����������������������������������������������������������������������������
FBSTP FBSTP destination
Packed decimal (BCD) store and pop Exceptions: I
(no operands) 5 2-8 0 2 FNCLEX
�����������������������������������������������������������������������������
�Execution Clocks� Operand Word Code
Operands Typical Range Transfers Bytes Coding Ex
�����������������������������������������������������������������������������
FCOM FCOM //source
Compare real Exceptions: I, D
//ST(i) 45 40-50 0 2 FCOM ST(1
short-real 65 60-70 2 2-4 FCOM [BP]
long-real 70 65-75 4 2-4 FCOM WAVE
�����������������������������������������������������������������������������
FCOMP FCOMP //source
Compare real and pop Exceptions: I, D
//ST(i) 47 42-52 0 2 FCOMP ST(
short-real 68 63-73 2 2-4 FCOMP [BP
long-real 72 67-77 4 2-4 FCOMP DEN
�����������������������������������������������������������������������������
FCOMPP FCOMPP (no operands)
Compare real and pop twice Exception
(no operands) 50 45-55 0 2 FCOMPP
�Execution Clocks� Operand Word Code
Operands Typical Range Transfers Bytes Coding Ex
(no operands) 50 45-55 0 2 FCOMPP
�����������������������������������������������������������������������������
FDECSTP FDECSTP (no operands)
Decrement stack pointer Exceptions: None
(no operands) 9 6-12 0 2 FDECSTP
�����������������������������������������������������������������������������
FDIV FDIV //source/destination,source
Divide real Exceptions: I, D,
//ST(i),ST 198 193-203 0 2 FDIV
short-real 220 215-225 2 2-4 FDIV DIST
long-real 225 220-230 4 2-4 FDIV ARC
�����������������������������������������������������������������������������
FDIVP FDIVP destination, source
Divide real and pop Exceptions: I, D,
ST(i),ST 202 197-207 0 2 FDIVP ST(
�����������������������������������������������������������������������������
�Execution Clocks� Operand Word Code
Operands Typical Range Transfers Bytes Coding Ex
�����������������������������������������������������������������������������
FDIVR FDIVR //source/destination, source
Divide real reversed Exceptions: I, D,
//ST,ST(i)/ST(i),ST 199 194-204 0 2 FDIVR ST(
short-real 221 216-226 2 2-4 FDIVR [BX
long-real 226 221-231 4 2-4 FDIVR REC
�����������������������������������������������������������������������������
FDIVRP FDIVRP destination, source
Divide real reversed and pop Exceptions: I, D,
ST(i) 11 9-16 0 2 FFREE ST(
�����������������������������������������������������������������������������
FIADD FIADD source
�Execution Clocks� Operand Word Code
Operands Typical Range Transfers Bytes Coding Ex
FIADD FIADD source
Integer add Exceptions: I, D,
word-integer 120 102-137 1 2-4 FIADD DIS
short-integer 125 108-143 2 2-4 FIADD PUL
�����������������������������������������������������������������������������
FICOM FICOM source
Integer compare Exceptions: I, D
word-integer 80 72-86 1 2-4 FICOM TOO
short-integer 85 78-91 2 2-4 FICOM [BP
�����������������������������������������������������������������������������
FICOMP FICOMP source
Integer compare and pop Exceptions: I, D
word-integer 82 74-88 1 2-4 FICOMP [B
short-integer 87 80-93 2 2-4 FICOMP N_
�����������������������������������������������������������������������������
FIDIV FIDIV source
�Execution Clocks� Operand Word Code
Operands Typical Range Transfers Bytes Coding Ex
FIDIV FIDIV source
Integer divide Exceptions: I, D,
(no operands) 9 6-12 0 2 FINCSTP
�����������������������������������������������������������������������������
FINIT/FNINIT FINIT/FNINIT (no operands)
Initialize processor Exceptions: None
(no operands) 5 2-8 0 2 FINIT
�����������������������������������������������������������������������������
FIST FIST destination
Integer store Exceptions: I, P
�Execution Clocks� Operand Word Code
Operands Typical Range Transfers Bytes Coding Ex
Integer store Exceptions: I, P
word-integer 86 80-90 1 2-4 FIST OBS.
short-integer 88 82-92 2 2-4 FIST [BP;
�����������������������������������������������������������������������������
FISTP FISTP destination
Integer store and pop Exceptions: I, P
2-bytes 10 7-14 1 2-4 FLDCW CON
�����������������������������������������������������������������������������
�Execution Clocks� Operand Word Code
Operands Typical Range Transfers Bytes Coding Ex
�����������������������������������������������������������������������������
FLDENV FLDENV source
Load environment Exceptions: None
14-bytes 40 35-45 7 2-4 FLDENV [B
�����������������������������������������������������������������������������
FLDLG2 FLDLG2 (no operands)
Load log{10}2 Exceptions: I
(no operands) 21 18-24 0 2 FLDLG2
�����������������������������������������������������������������������������
FLDLN2 FLDLN2 (no operands)
Load log{e}2 Exceptions: I
(no operands) 20 17-23 0 2 FLDLN2
�����������������������������������������������������������������������������
FLDL2E FLDL2E (no operands)
Load log{2}e Exceptions: I
�Execution Clocks� Operand Word Code
Operands Typical Range Transfers Bytes Coding Ex
(no operands) 18 15-21 0 2 FLDL2E
�����������������������������������������������������������������������������
FLDL2T FLDL2T (no operands)
Load log{2}10 Exceptions: I
(no operands) 19 16-22 0 2 FLDL2T
�����������������������������������������������������������������������������
FLDPI FLDPI (no operands)
Load � Exceptions: I
(no operands) 19 16-22 0 2 FLDPI
�����������������������������������������������������������������������������
FLDZ FLDZ (no operands)
Load +0.0 Exceptions: I
(no operands) 14 11-17 0 2 FLDZ
�����������������������������������������������������������������������������
FLD1 FLD1 (no operands)
�Execution Clocks� Operand Word Code
Operands Typical Range Transfers Bytes Coding Ex
FLD1 FLD1 (no operands)
Load +1.0 Exceptions: I
(no operands) 18 15-21 0 2 FLD1
�����������������������������������������������������������������������������
FMUL FMUL //source/destination,source
Multiply real Exceptions: I, D,
ST(i),ST 100 94-108 0 2 FMULP ST(
ST(i),ST 142 134-148 0 2 FMULP ST(
�Execution Clocks� Operand Word Code
Operands Typical Range Transfers Bytes Coding Ex
ST(i),ST 142 134-148 0 2 FMULP ST(
�����������������������������������������������������������������������������
FNOP FNOP (no operands)
No operation Exceptions: None
(no operands) 13 10-16 0 2 FNOP
�����������������������������������������������������������������������������
FPATAN FPATAN (no operands)
Partial arctangent Exceptions: U, P
(no operands) 650 250-800 0 2 FPATAN
�����������������������������������������������������������������������������
FPREM FPREM (no operands)
Partial remainder Exceptions: I, D,
(no operands) 125 15-190 0 2 FPREM
�����������������������������������������������������������������������������
FPTAN FPTAN (no operands)
Partial tangent Exceptions: I, P
�Execution Clocks� Operand Word Code
Operands Typical Range Transfers Bytes Coding Ex
Partial tangent Exceptions: I, P
(no operands) 450 30-540 0 2 FPTAN
�����������������������������������������������������������������������������
FRNDINT FRNDINT (no operands)
Round to integer Exceptions: I, P
(no operands) 45 16-50 0 2 FRNDINT
�����������������������������������������������������������������������������
FRSTOR FRSTOR source
Restore saved state Exceptions: None
94-bytes () 47 2-4 FRSTOR [B
�����������������������������������������������������������������������������
FSAVE/FNSAVE FSAVE/FNSAVE destination
Save state Exceptions: None
94-bytes () 47 2-4 FSAVE [BP
�����������������������������������������������������������������������������
�Execution Clocks� Operand Word Code
Operands Typical Range Transfers Bytes Coding Ex
�����������������������������������������������������������������������������
FSCALE FSCALE (no operands)
Scale Exceptions: I, O,
(no operands) 35 32-38 0 2 FSCALE
�����������������������������������������������������������������������������
FSETPM FSETPM (no operands)
Set protected mode Exceptions: None
(no operands) 2-8 0 2 FSETPM
�����������������������������������������������������������������������������
FSQRT FSQRT (no operands)
Square root Exceptions: I, D,
(no operands) 183 80-186 0 2 FSQRT
�����������������������������������������������������������������������������
FST FST destination
Store real Exceptions: I, O,
�Execution Clocks� Operand Word Code
Operands Typical Range Transfers Bytes Coding Ex
2-bytes 15 12-18 1 2-4 FSTCW SAV
�����������������������������������������������������������������������������
FSTENV/ FSTENV destination
FNSTENV Store environment Exceptions: None
14-bytes 45 40-50 7 2-4 FSTENV [B
�����������������������������������������������������������������������������
FSTP FSTP destination
Store real and pop Exceptions: I, O,
ST(i) 20 17-24 0 2 FSTP ST(2
�Execution Clocks� Operand Word Code
Operands Typical Range Transfers Bytes Coding Ex
ST(i) 20 17-24 0 2 FSTP ST(2
short-real 89 86-92 2 2-4 FSTP [BX]
long-real 102 98-106 4 2-4 FSTP TOTA
temp-real 55 52-58 5 2-4 FSTP REG_
�����������������������������������������������������������������������������
FSTSW/ FSTSW destination
FNSTSW Store status word Exceptions: None
2-bytes 15 12-18 1 2-4 FSTSW SAV
�����������������������������������������������������������������������������
FSTSW AX/ FSTSW AX
FNSTSWAX Store status word to AX Exceptions: None
AX 10-16 1 2 FSTSW AX
�����������������������������������������������������������������������������
FSUB FSUB //source/destination,source
Subtract real Exceptions: I, D,
//ST,ST(i)/ST(i),ST 85 70-100 0 2 FSUB ST,S
�Execution Clocks� Operand Word Code
Operands Typical Range Transfers Bytes Coding Ex
//ST,ST(i)/ST(i),ST 85 70-100 0 2 FSUB ST,S
short-real 105 90-120 2 2-4 FSUB BASE
long-real 110 95-125 4 2-4 FSUB COOR
�����������������������������������������������������������������������������
FSUBP FSUBP destination, source
Subtract real and pop Exceptions: I, D,
ST(i),ST 90 75-105 0 2 FSUBP ST(
�����������������������������������������������������������������������������
FSUBR FSUBR //source/destination, source
Subtract real reversed Exceptions: I, D,
//ST,ST(i)/ST(i),ST 87 70-100 0 2 FSUBR ST,
short-real 105 90-120 2 2-4 FSUBR VEC
long-real 110 95-125 4 2-4 FSUBR [BX
�����������������������������������������������������������������������������
FSUBRP FSUBRP destination, source
Subtract real reversed and pop Exceptions: I, D,
�Execution Clocks� Operand Word Code
Operands Typical Range Transfers Bytes Coding Ex
ST(i),ST 90 75-105 0 2 FSUBRP ST
�����������������������������������������������������������������������������
FTST FTST (no operands)
Test stack top against +0.0 Exceptions: I, D
(no operands) 42 38-48 0 2 FTST
�����������������������������������������������������������������������������
FWAIT FWAIT (no operands)
(CPU) Wait while 80287 is busy Exceptions: None
(no operands) 3+5n 3+5n 0 1 FWAIT
�����������������������������������������������������������������������������
FXAM FXAM (no operands)
Examine stack top Exceptions: None
(no operands) 17 12-23 0 2 FXAM
�����������������������������������������������������������������������������
FXCH FXCH //destination
Exchange registers Exception
�Execution Clocks� Operand Word Code
Operands Typical Range Transfers Bytes Coding Ex
Exchange registers Exception
//ST(i) 12 10-15 0 2 FXCH ST(2
�����������������������������������������������������������������������������
FXTRACT FXTRACT (no operands)
Extract exponent and significant Exception
(no operands) 50 27-55 0 2 FXTRACT
�����������������������������������������������������������������������������
FYL2X FYL2X (no operands)
Y * Log{2}X Exceptions: P (op
(no operands) 950 900-1100 0 2 FYL2X
�����������������������������������������������������������������������������
FYL2XP1 FYL2XP1 (no operands)
Y * log{2}(X + 1) Exceptions: P (op
(no operands) 850 700-1000 0 2 FYL2XP1
�����������������������������������������������������������������������������
�Execution Clocks� Operand Word Code
Operands Typical Range Transfers Bytes Coding Ex
�����������������������������������������������������������������������������
F2XM1 F2XM1 (no operands)
2^(2-1) Exceptions: U, P (o
(no operands) 500 310-630 0 2 F2XM1
�����������������������������������������������������������������������������
Programming Facilities
As described previously, the 80287 NPX is programmed simply as an extension
of the 80286 CPU. This section describes how programmers in ASM286 and in a
variety of higher-level languages can work with the 80287.
The level of detail in this section is intended to give programmers a basic
understanding of the software tools that can be used with the 80287, but
this information does not document the full capabilities of these
facilities. For a complete list of documentation on all the languages
available for 80286 systems, readers should consult Intel's Literature
Guide.
High-Level Languages
For programmers using high-level languages, the programming and operation
of the NPX is handled automatically by the compiler. A variety of Intel
high-level languages are available that automatically make use of the 80287
NPX when appropriate. These languages include
PL/M-286
FORTRAN-286
PASCAL-286
C-286
Each of these high-level languages has special numeric libraries allowing
programs to take advantage of the capabilities of the 80287 NPX. No special
programming conventions are necessary to make use of the 80287 NPX when
programming numeric applications in any of these languages.
Programmers in PL/M-286 and ASM286 can also make use of many of these
library routines by using routines contained in the 80287 Support Library,
described in the 80287 Support Library Reference Manual, Order Number
122129. These library routines provide many of the functions provided by
higher-level languages, including exception handlers,
ASCII-to-floating-point conversions, and a more complete set of
transcendental functions than that provided by the 80287 instruction set.
PL/M-286
Programmers in PL/M-286 can access a very useful subset of the 80287's
numeric capabilities. The PL/M-286 REAL data type corresponds to the NPX's
short real (32-bit) format. This data type provides a range of about
8.43*10^(-37) � ABS(X) � 3.38*10^(38), with about seven significant
decimal digits. This representation is adequate for the data manipulated by
many microcomputer applications.
The utility of the REAL data type is extended by the PL/M-286 compiler's
practice of holding intermediate results in the 80287's temporary real
format. This means that the full range and precision of the processor are
utilized for intermediate results. Underflow, overflow, and rounding errors
are most likely to occur during intermediate computations rather than during
calculation of an expression's final result. Holding intermediate results in
temporary real format greatly reduces the likelihood of overflow and
underflow and eliminates roundoff as a serious source of error until the
final assignment of the result is performed.
The compiler generates 80287 code to evaluate expressions that contain REAL
data types, whether variables or constants or both. This means that
addition, subtraction, multiplication, division, comparison, and assignment
of REALs will be performed by the NPX. INTEGER expressions, on the other
hand, are evaluated on the CPU.
Five built-in procedures (table 2-15) give the PL/M-286 programmer access
to 80287 functions manipulated by the processor control instructions. Prior
to any arithmetic operations, a typical PL/M-286 program will set up the NPX
after power up using the INIT$REAL$MATH$UNIT procedure and then issue
SET$REAL$MODE to configure the NPX. SET$REAL$MODE loads the 80287 control
word, and its 16-bit parameter has the format shown in figure 1-5. The
recommended value of this parameter is 033EH (projective closure, round to
nearest, 64-bit precision, all exceptions masked except invalid operation).
Other settings may be used at the programmer's discretion.
If any exceptions are unmasked, an exception handler must be provided in
the form of an interrupt procedure that is designated to be invoked by CPU
interrupt pointer (vector) number 16. The exception handler can use the
GET$REAL$ERROR procedure to obtain the low-order byte of the 80287 status
word and to then clear the exception flags. The byte returned by
GET$REAL$ERROR contains the exception flags; these can be examined to
determine the source of the exception.
The SAVE$REAL$STATUS and RESTORE$REAL$STATUS procedures are provided for
multi-tasking environments where a running task that uses the 80287 may be
preempted by another task that also uses the 80287. It is the responsibility
of the preempting task to issue SAVE$REAL$STATUS before it executes any
statements that affect the 80287; these include the INIT$REAL$MATH$UNIT and
SET$REAL$MODE procedures as well as arithmetic expressions.
SAVE$REAL$STATUS saves the 80287 state (registers, status, and control
words, etc.) on the CPU's stack. RESTORE$REAL$STATUS reloads the state
information; the preempting task must invoke this procedure before
terminating in order to restore the 80287 to its state at the time the
running task was preempted. This enables the preempted task to resume
execution from the point of its preemption.
Table 2-15. PL/M-286 Built-In Procedures
80287
Procedure Instruction Description
INIT$REAL$MATH$UNIT FINIT Initialize processor.
SET$REAL$MODE FLDCW Set exception masks, rounding
precision, and infinity
controls.
GET$REAL$ERROR FNSTSW & FNCLEX Store, then clear, exception
flags.
ASM286
The ASM286 assembly language provides programmmers with complete access to
all of the facilities of the 80286 and 80287 processors.
The programmer's view of the 80286/80287 hardware is a single machine with
these resources:
� 160 instructions
� 12 data types
� 8 general registers
� 4 segment registers
� 8 floating-point registers, organized as a stack
Defining Data
The ASM286 directives shown in table 2-16 allocate storage for 80287
variables and constants. As with other storage allocation directives, the
assembler associates a type with any variable defined with these directives.
The type value is equal to the length of the storage unit in bytes (10 for
DT, 8 for DQ, etc.). The assembler checks the type of any variable coded in
an instruction to be certain that it is compatible with the instruction. For
example, the coding FIADD ALPHA will be flagged as an error if ALPHA's type
is not 2 or 4, because integer addition is only available for word and
short integer data types. The operand's type also tells the assembler which
machine instruction to produce; although to the programmer there is only an
FIADD instruction, a different machine instruction is required for each
operand type.
On occasion it is desirable to use an instruction with an operand that has
no declared type. For example, if register BX points to a short integer
variable, a programmer may want to code FIADD [BX]. This can be done by
informing the assembler of the operand's type in the instruction, coding
FIADD DWORD PTR [BX]. The corresponding overrides for the other storage
allocations are WORD PTR, QWORD PTR, and TBYTE PTR.
The assembler does not, however, check the types of operands used in
processor control instructions. Coding FRSTOR [BP] implies that the
programmer has set up register BP to point to the stack location where the
processor's 94-byte state record has been previously saved.
The initial values for 80287 constants may be coded in several different
ways. Binary integer constants may be specified as bit strings, decimal
integers, octal integers, or hexadecimal strings. Packed decimal values are
normally written as decimal integers, although the assembler will accept
and convert other representations of integers. Real values may be written as
ordinary decimal real numbers (decimal point required), as decimal numbers
in scientific notation, or as hexadecimal strings. Using hexadecimal strings
is primarily intended for defining special values such as infinities, NaNs,
and nonnormalized numbers. Most programmers will find that ordinary decimal
and scientific decimal provide the simplest way to initialize 80287
constants. Figure 2-3 compares several ways of setting the various 80287
data types to the same initial value.
Note that preceding 80287 variables and constants with the ASM286 EVEN
directive ensures that the operands will be word-aligned in memory. This
will produce the best system performance. All 80287 data types occupy
integral numbers of words so that no storage is "wasted" if blocks of
variables are defined together and preceded by a single EVEN declarative.
Table 2-16. 80287 Storage Allocation Directives
Directive Interpretation Data Types
DW Define Word Word integer
DD Define Doubleword Short integer, short real
DQ Define Quadword Long integer, long real
DT Define Tenbyte Packed decimal, temporary real
Records and Structures
The ASM286 RECORD and STRUC (structure) declaratives can be very useful in
NPX programming. The record facility can be used to define the bit fields of
the control, status, and tag words. Figure 2-4 shows one definition of the
status word and how it might be used in a routine that polls the 80287 until
it has completed an instruction.
Because STRUCtures allow different but related data types to be grouped
together, they often provide a natural way to represent "real world" data
organizations. The fact that the structure template may be "moved" about in
memory adds to its flexibility. Figure 2-5 shows a simple structure that
might be used to represent data consisting of a series of test score
samples. A structure could also be used to define the organization of the
information stored and loaded by the FSTENV and FLDENV instructions.
Figure 2-3. Sample 80287 Constants
; THE FOLLOWING ALL ALLOCATE THE CONSTANT: -126
; NOTE TWO'S COMPLETE STORAGE OF NEGATIVE BINARY INTEGERS.
;
EVEN ; FORCE WORK ALIGNMENT
WORD_INTEGER DW 111111111000010B ; BIT STRING
SHORT_INTEGER DD OFFFFFF82H ; HEX STRING MUST START
; WITH DIGIT
LONG_INTEGER DQ -126 ; ORDINARY DECIMAL
SHORT_REAL DD -126.0 ; NOTE PRESENCE OF '.'
LONG_REAL DD -1.26E2 ; "SCIENTIFIC"
PACKED_DECIMAL DT -126 ; ORDINARY DECIMAL INTEGER
; IN THE FOLLOWING, SIGN AND EXPONENT IS 'C005'
; SIGNIFICAND IS '7E00...00', 'R' INFORMS ASSEMBLER THAT
; THE STRING REPRESENTS A REAL DATA TYPE.
;
TEMP_REAL DT 0C0057E00000000000000R ; HEX STRING
Figure 2-4. Status Word RECORD Definition
; RESERVE SPACE FOR STATUS WORD
STATUS_WORD
; LAY OUT STATUS WORD FIELDS
STATUS RECORD
& BUSY: 1,
& COND_CODE 3: 1,
& STACK_TOP: 3,
& COND_CODE 2: 1,
& COND_CODE 1: 1,
& COND_CODE 0: 1,
& INT_REQ: 1,
& RESERVED: 1,
& P_FLAG: 1,
& U_FLAG: 1,
& O_FLAG: 1,
& Z_FLAG: 1,
& D_FLAG: 1,
& I_FLAG: 1,
; POLL STATUS WORD UNTIL 80287 IS NOT BUSY
POLL: FNSTSW STATUS_WORD
TEST STATUS_WORD, MASK_BUSY
JNZ POLL
Figure 2-5. Structure Definition
SAMPLE STRUC
N_OBS DD ? ; SHORT INTEGER
MEAN DQ ? ; LONG REAL
MODE DW ? ; WORD INTEGER
STD_DEV DQ ? ; LONG REAL
; ARRAY OF OBSERVATIONS -- WORD INTEGER
TEST_SCORES DW 1000 DUP (?)
SAMPLE ENDS
Addressing Modes
80287 memory data can be accessed with any of the CPU's five memory
addressing modes. This means that 80287 data types can be incorporated in
data aggregates ranging from simple to complex according to the needs of the
application. The addressing modes, and the ASM286 notation used to specify
them in instructions, make the accessing of structures, arrays, arrays of
structures, and other organizations direct and straightforward. Table 2-17
gives several examples of 80287 instructions coded with operands that
illustrate different addressing modes.
Table 2-17. Addressing Mode Examples
���� Coding��������������Ŀ Interpretation
FIADD ALPHA ALPHA is a simple scalar (mode is direct).
FDIVR ALPHA.BETA BETA is a field in a structure that is
"overlaid" on ALPHA (mode is direct).
FMUL QWORD PTR [BX] BX contains the address of a long real
variable (mode is register indirect).
FSUB ALPHA [SI] ALPHA is an array and SI contains the
offset of an array element from the start of
the array (mode is indexed).
FILD [BP].BETA BP contains the address of a structure on
the CPU stack and BETA is a field in the
structure (mode is based).
FBLD TBYTE PTR [BX] [DI] BX contains the address of a packed
decimal array and DI contains the offset of
an array element (mode is based indexed).
Comparative Programming Example
Figures 2-6 and 2-7 show the PL/M-286 and ASM286 code for a simple 80287
program, called ARRSUM. The program references an array (X$ARRAY), which
contains 0-100 short real values; the integer variable N$OF$X indicates the
number of array elements the program is to consider. ARRSUM steps through
X$ARRAY accumulating three sums:
� SUM$X, the sum of the array values
� SUM$INDEXES, the sum of each array value times its index, where the
index of the first element is 1, the second is 2, etc.
� SUM$SQUARES, the sum of each array element squared
(A true program, of course, would go beyond these steps to store and use
the results of these calculations.) The control word is set with the
recommended values: projective closure, round to nearest, 64-bit precision,
interrupts enabled, and all exceptions masked invalid operation. It is
assumed that an exception handler has been written to field the invalid
operation, if it occurs, and that it is invoked by interrupt pointer 16.
Either version of the program will run on an actual or an emulated 80287
without altering the code shown.
The PL/M-286 version of ARRSUM (figure 2-6) is very straightforward and
illustrates how easily the 80287 can be used in this language. After
declaring variables the program calls built-in procedures to initialize the
processor (or its emulator) and to load to the control word. The program
clears the sum variables and then steps through X$ARRAY with a DO-loop. The
loop control takes into account PL/M-286's practice of considering the index
of the first element of an array to be 0. In the computation of SUM$INDEXES,
the built-in procedure FLOAT converts I+1 from integer to real because the
language does not support "mixed mode" arithmetic. One of the strengths of
the NPX, of course, is that it does support arithmetic on mixed data types
(because all values are converted internally to the 80-bit temporary real
format).
The ASM286 version (figure 2-7) defines the external procedure INIT287,
which makes the different initialization requirements of the processor and
its emulator transparent to the source code. After defining the data and
setting up the segment registers and stack pointer, the program calls
INIT287 and loads the control word. The computation begins with the next
three instructions, which clear three registers by loading (pushing) zeros
onto the stack. As shown in figure 2-8, these registers remain at the
bottom of the stack throughout the computation while temporary values are
pushed on and popped off the stack above them.
The program uses the CPU LOOP instruction to control its iteration through
X_ARRAY; register CX, which LOOP automatically decrements, is loaded with
N_OF_X, the number of array elements to be summed. Register SI is used to
select (index) the array elements. The program steps through X_ARRAY from
back to front, so SI is initialized to point at the element just beyond the
first element to be processed. The ASM286 TYPE operator is used to determine
the number of bytes in each array element. This permits changing X_ARRAY to
a long real array by simply changing its definition (DD to DQ) and
reassembling.
Figure 2-8 shows the effect of the instructions in the program loop on the
NPX register stack. The figure assumes that the program is in its first
iteration, that N_OF_X is 20, and that X_ARRAY(19) (the 20th element)
contains the value 2.5. When the loop terminates, the three sums are left
as the top stack elements so that the program ends by simply popping them
into memory variables.
Figure 2-6. Sample PL/M-286 Program
PL/M-286 COMPILER ARRAYSUM
SERIES-III PL/M-286 V 1.0 COMPILATION OF MODULE ARRAYSUM
OBJECT MODULE PLACED IN :F6:D.OBJ
COMPILER INVOLKED BY PLM286.86 :F6:D.SRC XREF
/******************************************
* *
* ARRAYSUM MOD *
* *
******************************************/
MODULE INFORMATION
CODE AREA SIZE = 0077H 119D
CONSTANT AREA SIZE = 0004H 4D
VARIABLE AREA SIZE = 01A0H 416D
MAXIMUM STACK SIZE = 0002H 2D
33 LINES READ
0 PROGRAM WARNINGS
0 PROGRAM ERRORS
DICTIONARY SUMMARY
96KB MEMORY AVILABLE
3KB MEMORY USED (3%)
0KB DISK SPACE USED
END OF PL/M-286 COMPILATION
Figure 2-7. Sample ASM286 Program
iAPX286 MACRO ASSEMBLER EXAMPLE ASM286_PROGRAM
SERIES-III iAPX286 MACRO ASSEMBLER X108 ASSEMBLY OF MODULE EXAMPLE_ASM286_PRO
OBJECT MODULE PLACED IN :F6:287EXP.OBJ
ASSEMBLER INVOKED BY ASM286 B6 :F6:287EXP.SRC XREF
LOC OBJ LINE SOURCE
1 name example_ASM286_program
2 ; Define intitialization routine
3 extrn init287:far
4
5 ; Allocate space for date
---- 6 data segment rw public
0000 3E03 7 control_287 dw 033eh
0002 ???? 8 n_of_x dw ?
0004 (000 9 x_array dd 100 dwp (?)
????????
)
0194 ???????? 10 sum_squares dd ?
019B ???????? 11 sum_indexes dd ?
019C ???????? 12 sum_x dd ?
---- 13 data ends
14
15 ; Allocate CPU stack space
---- 16 stack stackseg 400
17
18 ; Begin code
---- 19 code segment or public
20 assumes ds: data, ss: stack, es: n
0000 21 start:
0000 BB---- R 22 mov ax,data
0003 BED8 23 mov ds,ax
0005 B8---- R 24 mov ax,stack
0008 BED0 25 mov ss,ax
000A BCFEFF R 26 mov sp,stackstart stack
27
28 ; Assume x_array and n_of_x are initialized
29 ; this pprogram zeroes n_of_x
30
31 ; Prepare the 80287 or its emulator
000D 9A0000---- E 32 call init287
0012 D92E0000 R 33 fldcw control_287
34
35 ; Clear three registers to hold running sums
0016 D9EE 36 fldz
0018 D9EE 37 fldz
001A D9EE 38 fldz
39
40 ; Setup CX as loop counter and
41 ; SI as index to x_array
001C 8B0E0200 R 42 mov cx,n_of_x
0020 F7E9 43 imul cx
0022 8BF0 44 mov si,ax
45
46 ; SI now contains index of last element + 1
47 ; Loop thru x_array, accumulating sums
0024 48 sum_next:
0024 8E3304 49 sub si,type x_array ;backup on
0027 D9840400 R 50 fld x_array[si] ;push it o
002B DCC3 51 fadd st(3),st ;add into
002D D9C0 52 fld st ;duplicate
002F DCC8 53 fmul st,st ;square it
0031 DEC2 54 faddp st(2),st ;add into
55 ; and dis
0033 FF0E0200 R 56 dec n_of_x ;reduce in
0037 E2EB 57 loop sum_next ;c
58
59 ; Pop running sums into memory
0039 60 pop_results:
0039 D91E9401 R 61 fstp sum_squares
003D D91E9801 R 62 fstp sum_indexes
0041 D91E9C01 R 63 fstp sum_x
0045 9B 64 fwait
65
66 ;
67 ; Etc.
68 ;
---- 69 code ends
70 end start
iAPX286 MACRO ASSEMBLER EXAMPLE_ASM286_PROGRAM
XREF SYMBOL TABLE LISTING
NAME TYPE VALUE ATTRIBUTES, XREFS
CODE SEGMENT SIZE=0046H ER PUBLIC 19# 69
CONTROL_287 V WORD 0000H DATA 7# 33
DATA SEGMENT SIZE=01A0H RW PUBLIC 6# 13 20 22
INIT287 L FAR 0000H EXTR 3# 32
N_OF_X V WORD 0002H DATA 8# 42 56
POP_RESULTS L NEAR 0039H CODE 60#
STACK STACK SIZE=0190H RW PUBLIC 16# 20 24 26
START L NEAR 0000H CODE 21# 70
SUM_INDEXES V DWORD 0198H DATA 11# 62
SUM_NEXT L NEAR 0024H CODE 48# 57
SUM_SQUARES V DWORD 0194H DATA 10# 61
SUM_X V DWORD 019CH DATA 12# 63
X_ARAY V DWORD 0004H (100) DATA 9# 49 50
80287 Emulation
The programming of applications to execute on both 80286 and 80287 is made
much easier by the existence of an 80287 emulator for 80286 systems. The
Intel E80287 emulator offers a complete software counterpart to the 80287
hardware; NPX instructions can be simply emulated in software rather than
being executed in hardware. With software emulation, the distinction
between 80286 and 80287 systems is reduced to a simple performance
differential (see Table 1-2 for a performance comparison between an actual
80287 and an emulator 80287). Identical numeric programs will simply
execute more slowly on 80286 systems (using software emulation of NPX
instructions) than on executing NPX instructions directly.
When incorporated into the systems software, the emulation of NPX
instructions on the 80286 systems is completely transparent to the
programmer. Applications software needs no special libraries, linking, or
other activity to allow it to run on an 80286 with 80287 emulation.
To the applications programmer, the development of programs for 80286
systems is the same whether the 80287 NPX hardware is available or not. The
full 80287 instruction set is available for use, with NPX instructions being
either emulated or executed directly. Applications programmers need not be
concerned with the hardware configuration of the computer systems on which
their applications will eventually run.
For systems programmers, details relating to 80287 emulators are described
in a later section of this supplement. An E80287 software emulator for 80286
systems is contained in the iMDX 364 8086 Software Toolbox, available from
Intel and described in the 8086 Software Toolbox Manual.
Concurrent Processing with the 80287
Because the 80286 CPU and the 80287 NPX have separate execution units, it
is possible for the NPX to execute numeric instructions in parallel with
instructions executed by the CPU. This simultaneous execution of different
instructions is called concurrency.
No special programming techniques are required to gain the advantages of
concurrent execution; numeric instructions for the NPX are simply placed in
line with the instructions for the CPU. CPU and numeric instructions are
initiated in the same order as they are encountered by the CPU in its
instruction stream. However, because numeric operations performed by the NPX
generally require more time than operations performed by the CPU, the CPU
can often execute several of its instructions before the NPX completes a
numeric instruction previously initiated.
This concurrency offers obvious advantages in terms of execution
performance, but concurrency also imposes several rules that must be
observed in order to assure proper synchronization of the 80286 CPU and
80287 NPX.
All Intel high-level languages automatically provide for and manage
concurrency in the NPX. Assembly-language programmers, however, must
understand and manage some areas of concurrency in exchange for the
flexibility and performance of programming in assembly language. This
section is for the assembly-language programmer or well-informed
high-level-language programmer.
Managing Concurrency
Concurrent execution of the host and 80287 is easy to establish and
maintain. The activities of numeric programs can be split into two major
areas: program control and arithmetic. The program control part performs
activities such as deciding what functions to perform, calculating
addresses of numeric operands, and loop control. The arithmetic part simply
adds, subtracts, multiplies, and performs other operations on the numeric
operands. The NPX and host are designed to handle these two parts
separately and efficiently.
Managing concurrency is necessary because both the arithmetic and control
areas must converge to a well-defined state before starting another numeric
operation. A well-defined state means all previous arithmetic and control
operations are complete and valid.
Normally, the host waits for the 80287 to finish the current numeric
operation before starting another. This waiting is called synchronization.
Managing concurrent execution of the 80287 involves three types of
synchronization:
1. Instruction synchronization
2. Data synchronization
3. Error synchronization
For programmers in higher-level languages, all three types of
synchronization are automatically provided by the appropriate compiler. For
assembly-language programmers, instruction synchronization is guaranteed by
the NPX interface, but data and error synchronization are the
responsibility of the assembly-language programmer.
Instruction Synchronization
Instruction synchronization is required because the 80287 can perform only
one numeric operation at a time. Before any numeric operation is started,
the 80287 must have completed all activity from its previous instruction.
Instruction synchronization is guaranteed for most ESC instructions because
the 80286 automatically checks the BUSY status line from the 80287
before commencing execution of most ESC instructions. No explicit WAIT
instructions are necessary to ensure proper instruction synchronization.
Data Synchronization
Data synchronization addresses the issue of both the CPU and the NPX
referencing the same memory values within a given block of code.
Synchronization ensures that these two processors access the memory operands
in the proper sequence, just as they would be accessed by a single
processor with no concurrency. Data synchronization is not a concern when
the CPU and NPX are using different memory operands during the course of one
numeric instruction.
The two cases where data synchronization might be a concern are
1. The 80286 CPU reads or alters a memory operand first, then invokes
the 80287 to load or alter the same operand.
2. The 80287 is invoked to load or alter a memory operand, after which
the 80286 CPU reads or alters the same location.
Due to the instruction synchronization of the NPX interface, data
synchronization is automatically provided for the first case��the 80286 will
always complete its operation before invoking the 80287.
For the second case, data synchronization is not always automatic. In
general, there is no guarantee that the 80287 will have finished its
processing and accessed the memory operand before the 80286 accesses the
same location.
Figure 2-9 shows examples of the two possible cases of the CPU and NPX
sharing a memory value. In the examples of the first case, the CPU will
finish with the operand before the 80287 can reference it. The NPX interface
guarantees this. In the examples of the second case, the CPU must wait for
the 80287 to finish with the memory operand before proceeding to reuse it.
The FWAIT instructions shown in these examples are required in order to
ensure this data synchronization.
There are several NPX control instructions where automatic data
synchronization is provided; however, the FSTSW/FNSTSW, FSTCW/FNSTCW, FLDCW,
FRSTOR, and FLDENV instructions are all guaranteed to finish their execution
before the CPU can read or alter the referenced memory locations.
The 80287 provides data synchronization for these instructions by making a
request on the Processor Extension Data Channel before the CPU executes its
next instruction. Since the NPX data transfers occur before the CPU regains
control of the local bus, the CPU cannot change a memory value before the
NPX has had a chance to reference it. In the case of the FSTSW AX
instruction, the 80286 AX register is explicitly updated before the CPU
continues execution of the next instruction.
For the numeric instructions not listed above, the assembly-language
programmer must remain aware of synchronization and recognize cases
requiring explicit data synchronization. Data synchronization can be
provided either by programming an explicit FWAIT instruction, or by
initiating a subsequent numeric instruction before accessing the operands or
results of a previous instruction. After the subsequent numeric instruction
has started execution, all memory references in earlier numeric
instructions are complete. Reaching the next host instruction after the
synchronizing numeric instruction indicates that previous numeric operands
in memory are available.
The data-synchronization function of any FWAIT or numeric instruction must
be well-documented, as shown in figure 2-10. Otherwise, a change to the
program at a later time may remove the synchronizing numeric instruction and
cause program failure.
High-level languages automatically establish data synchronization and
manage it, but there may be applications where a high-level language may not
be appropriate.
For assembly-language programmers, automatic data synchronization can be
obtained using the assembler, although concurrency of execution is lost as a
result. To perform automatic data synchronization, the assembler can be
changed to always place a WAIT instruction after the ESCAPE instruction.
Figure 2-11 shows an example of how to change the ASM286 Code Macro for the
FIST instruction to automatically place a WAIT instruction after the ESCAPE
instruction. This Code Macro is included in the ASM286 source module. The
price paid for this automatic data synchronization is the lack of any
possible concurrency between the CPU and NPX.
Figure 2-9. Synchronizing References to Shared Data
Case 1: Case 2:
MOV I, 1 FILD I
FILD I FWAIT
MOV I, 5
MOV AX, I FISTP I
FISTP I FWAIT
MOV AX, I
Figure 2-10. Documenting Data Synchronization
FISTP I
FMUL ; I is updated before FMUL is executed
MOV AX, I ; I is now safe to use
;
; This is an ASM286 code macro to redefine the FIST
; instruction to prevent any concurrency
; while the instruction runs. A wait
; instruction is placed immediately after the
; escape to ensure the store is done
; before the program may continue.
CodeMacro FIST memop: Mw
RfixM 111B, memop
ModRM 010B, memop
RWfix
EndM
Error Synchronization
Almost any numeric instruction can, under the wrong circumstances, produce
a numeric error. Concurrent execution of the CPU and NPX requires
synchronization for these errors just as it does for data references and
numeric instructions. In fact, the synchronization required for data and
instructions automatically provides error synchronization.
However, incorrect data or instruction synchronization may not be
discovered until a numeric error occurs. A further complication is that a
programmer may not expect his numeric program to cause numeric errors, but
in some systems, they may regularly happen. To better understand these
points, let's look at what can happen when the NPX detects an error.
The NPX can perform one of two things when a numeric exception occurs:
� The NPX can provide a default fix-up for selected numeric errors.
Programs can mask individual error types to indicate that the NPX
should generate a safe, reasonable result whenever that error occurs.
The default error fix-up activity is treated by the NPX as part of the
instruction causing the error; no external indication of the error is
given. When errors are detected, a flag is set in the numeric status
register, but no information regarding where or when is available. If
the NPX performs its default action for all errors, then error
synchronization is never exercised. This is no reason to ignore error
synchronization, however.
� As an alternative to the NPX default fix-up of numeric errors, the
80286 CPU can be notified whenever an exception occurs. The CPU can
then implement any sort of recovery procedures desired, for any numeric
error detectable by the NPX. When a numeric error is unmasked and the
error occurs, the NPX stops further execution of the numeric
instruction and signals this event to the CPU. On the next occurrence
of an ESC or WAIT instruction, the CPU traps to a software exception
handler. Some ESC instructions do not check for errors. These are the
nonwaited forms FNINIT, FNSTENV, FNSAVE, FNSTSW, FNSTCW, and FNCLEX.
When the NPX signals an unmasked exception condition, it is requesting
help. The fact that the error was unmasked indicates that further numeric
program execution under the arithmetic and programming rules of the NPX is
unreasonable.
If concurrent execution is allowed, the state of the CPU when it recognizes
the exception is undefined. The CPU may have changed many of its internal
registers and be executing a totally different program by the time the
exception occurs. To handle this situation, the NPX has special registers
updated at the start of each numeric instruction to describe the state of
the numeric program when the failed instruction was attempted.
Error synchronization ensures that the NPX is in a well-defined state after
an unmasked numeric error occurs. Without a well-defined state, it would be
impossible for exception recovery routines to figure out why the numeric
error occurred, or to recover successfully from the error.
Incorrect Error Synchronization
An example of how some instructions written without error synchronization
will work initially, but fail when moved into a new environment is shown in
figure 2-12.
In figure 2-12, three instructions are shown to load an integer, calculate
its square root, then increment the integer. The NPX interface and
synchronous execution of the NPX emulator will allow this program to execute
correctly when no errors occur on the FILD instruction.
This situation changes if the 80287 numeric register stack is extended to
memory. To extend the NPX stack to memory, the invalid error is unmasked. A
push to a full register or pop from an empty register will cause an invalid
error. The recovery routine for the error must recognize this situation,
fix up the stack, then perform the original operation.
The recovery routine will not work correctly in the first example shown in
the figure. The problem is that the value of COUNT is incremented before the
NPX can signal the exception to the CPU. Because COUNT is incremented before
the exception handler is invoked, the recovery routine will load an
incorrect value of COUNT, causing the program to fail or behave unreliably.
Figure 2-12. Error Synchronization Examples
INCORRECT ERROR SYNCHRONIZATION
FILD COUNT ; NPX instruction
INC COUNT ; CPU instruction alters operand
FSQRT COUNT ; subsequent NPX instruction -- error from
; previous NPX instruction detected here
PROPER ERROR SYNCHRONIZATION
FILD COUNT ; NPX instruction
FSQRT COUNT ; subsequent NPX instruction - error from
; previous NPX instruction detected here
INC COUNT ; CPU instruction alters operand
Proper Error Synchronization
Error Synchronization relies on the WAIT instructions required by
instruction and data synchronization and the BUSY and ERROR signals of
the 80287. When an unmasked error occurs in the 80287, it asserts the
ERROR signal, signalling to the CPU that a numeric error has occurred.
The next time the CPU encounters an error-checking ESC or WAIT instruction,
the CPU acknowledges the ERROR signal by trapping automatically to
Interrupt #16, the Processor Extension Error vector. If the following ESC
or WAIT instruction is properly placed, the CPU will not yet have disturbed
any information vital to recovery from the error.
System programming for 80287 systems requires a more detailed understanding
of the 80287 NPX than does application programming. Such things as
emulation, initialization, exception handling, and data and error
synchronization are all the responsibility of the systems programmer. These
topics are covered in detail in the sections that follow.
80287 Architecture
On a software level, the 80287 NPX appears as an extension of the 80286
CPU. On the hardware level, however, the mechanisms by which the 80286 and
80287 interact are a bit more complex. This section describes how the 80287
NPX and 80286 CPU interact and points out features of this interaction that
are of interest to systems programmers.
Processor Extension Data Channel
All transfers of operands between the 80287 and system memory are performed
by the 80286's internal Processor Extension Data Channel. This independent,
DMA-like data channel permits all operand transfers of the 80287 to come
under the supervision of the 80286 memory-management and protection
mechanisms. The operation of this data channel is completely transparent to
software.
Because the 80286 actually performs all transfers between the 80287 and
memory, no additional bus drivers, controllers, or other components are
necessary to interface the 80287 NPX to the local bus. Any memory accessible
to the 80286 CPU is accessible by the 80287. The Processor Extension Data
Channel is described in more detail in Chapter Six of the 80286 Hardware
Reference Manual.
Real-Address Mode and Protected Virtual-Address Mode
Like the 80286 CPU, the 80287 NPX can operate in both Real-Address mode and
in Protected mode. Following a hardware RESET, the 80287 is initially
activated in Real-Address mode. A single, privileged instruction (FSETPM) is
necessary to set the 80287 into Protected mode.
As an extension to the 80286 CPU, the 80287 can access any memory location
accessible by the task currently executing on the 80286. When operating in
Protected mode, all memory references by the 80287 are automatically
verified by the 80286's memory management and protection mechanisms as for
any other memory references by the currently-executing task. Protection
violations associated with NPX instructions automatically cause the 80286 to
trap to an appropriate exception handler.
To the programmer, these two 80287 operating modes differ only in the
manner in which the NPX instruction and data pointers are represented in
memory following an FSAVE or FSTENV instruction. When the 80287 operates in
Protected mode, its NPX instruction and data pointers are each represented
in memory as a 16-bit segment selector and a 16-bit offset. When the 80287
operates in Real-Address mode, these same instruction and data pointers are
represented simply as the 20-bit physical addresses of the operands in
question (see figure 1-7 in Chapter One).
Dedicated and Reserved I/O Locations
The 80287 NPX does not require that any memory addresses be set aside for
special purposes. The 80287 does make use of I/O port addresses in the range
00F8H through 00FFH, although these I/O operations are completely
transparent to the 80286 software. 80286 programs must not reference these
reserved I/O addresses directly.
To prevent any accidental misuse or other tampering with numeric
instructions in the 80287, the 80286's I/O Privilege Level (IOPL) should be
used in multiuser reprogrammable environments to restrict application
program access to the I/O address space and so guarantee the integrity of
80287 computations. Chapter Eight of the 80286 Operating System Writer's
Guide contains more details regarding the use of the I/O Privilege Level.
Processor Initialization and Control
One of the principal responsibilities of systems software is the
initialization, monitoring, and control of the hardware and software
resources of the system, including the 80287 NPX. In this section, issues
related to system initialization and control are described, including
recognition of the NPX, emulation of the 80287 NPX in software if the
hardware is not available, and the handling of exceptions that may occur
during the execution of the 80287.
System Initialization
During initialization of an 80286 system, systems software must
� Recognize the presence or absence of the NPX
� Set flags in the 80286 MSW to reflect the state of the numeric
environment
If an 80287 NPX is present in the system, the NPX must be
� Initialized
� Switched into Protected mode (if desired)
All of these activities can be quickly and easily performed as part of the
overall system initialization.
Recognizing the 80287 NPX
Figure 3-1 shows an example of a recognition routine that determines
whether an NPX is present, and distinguishes between the 80387 and the
8087/80287. This routine can be executed on any 80386, 80286, or 8086
hardware configuration that has an NPX socket.
The example guards against the possibility of accidentally reading an
expected value from a floating data bus when no NPX is present. Data read
from a floating bus is undefined. By expecting to read a specific bit
pattern from the NPX, the routine protects itself from the indeterminate
state of the bus. The example also avoids depending on any values in
reserved bits, thereby maintaining compatibility with future numerics
coprocessors.
Figure 3-1. Software Routine to Recognize the 80287
; The following algorithm detects the presence of the 8087 as well as the
; 80287 in a system. This will make it easier for ISVs to port their 8086-8
; software to 286-287 systems.
;
cc_cr equ 0DH ; carriage return
cc_lf equ 0AH ; line feed
assume cz:code, ds:data
;
code segment public
start:
mov ax,data ; set data segment
mov ds,ax
;
; Test if 8087 is present in PC or PC/XT, or 80287 is in PC/AT
;
fninit ; initialize coprocessor
xor ah,ah ; zero ah register and memory by
mov byte ptr control + 1,ah
fnstcw control ; store coprocessor's control wo
; memory
mov ah,byte ptr control+1
cmp ah,03h ; upper byte of control work wil
; 03 if 8087 or 80287 coprocesso
; is present
jne no_coproc
;
coproc:
mov ah,09h ; print string-coprocessor prese
mov dx,offset msg_yes
int 21h
jmp done
;
no_coproc:
mov ah,09h ; print string-coprocessor not
; present
mov dx,offset msg_no
int 21h
;
done:
mov ah,4CH ; terminate program
int 21h
code ends
data segment public
control dw 00
msg_yes db cc_cr,cc_lf,
db 'System has an 8087 or 80287',cc_cr, cc_lf, '$'
msg_no db cc_cr,cc_lf,
db 'System does not have an 8087 or 80287',cc_cr, cc_lf,
'$'
data ends
end start ; start is the entry point
Configuring the Numerics Environment
Once the 80286 CPU has determined the presence or absence of the 80287 NPX,
the 80286 must set either the MP or the EM bit in its own machine status
word accordingly. The initialization routine can either
� Set the MP bit in the 80286 MSW to allow numeric instructions to be
executed directly by the 80287 NPX component
� Set the EM bit in the 80286 MSW to permit software emulation of the
80287 numeric instructions
The Math Present (MP) flag of the 80286 machine status word indicates to
the CPU whether an 80287 NPX is physically available in the system. The MP
flag controls the function of the WAIT instruction. When executing a WAIT
instruction, the 80286 tests only the Task Switched (TS) bit if MP is set;
if it finds TS set under these conditions, the CPU traps to exception #7.
The Emulation Mode (EM) bit of the 80286 machine status word indicates to
the CPU whether NPX functions are to be emulated. If the CPU finds EM set
when it executes an ESC instruction, program control is automatically
trapped to exception #7, giving the exception handler the opportunity to
emulate the functions of an 80287. The 80286 EM flag can be changed only by
using the LMSW (load machine status word) instruction (legal only at
privilege level 0) and examined with the aid of the SMSW (store machine
status word) instruction (legal at any privilege level).
The EM bit also controls the function of the WAIT instruction. If the CPU
finds EM set while executing a WAIT, the CPU does not check the ERROR
pin for an error indication.
For correct 80286 operation, the EM bit must never be set concurrently with
MP. The EM and MP bits of the 80286 are described in more detail in the
80286 Operating System Writer's Guide. More information on software
emulation for the 80287 NPX is described in the "80287 Emulation" section
later in this chapter.
In any case, if ESC instructions are to be executed, either the MP or EM
bit must be set, but not both.
Initializing the 80287
Initializing the 80287 NPX simply means placing the NPX in a known state
unaffected by any activity performed earlier. The example software routine
to recognize the 80287 (figure 3-1) performed this initialization using a
single FNINIT instruction. This instruction causes the NPX to be
initialized in the same way as that caused by the hardware RESET signal to
the 80287. All the error masks are set, all registers are tagged empty, the
ST is set to zero, and default rounding, precision, and infinity controls
are set. Table 3-1 shows the state of the 80287 NPX following
initialization.
Following a hardware RESET signal, such as after initial power-up, the
80287 is initialized in Real-Address mode. Once the 80287 has been switched
to Protected mode (using the FSETPM instruction), only another hardware
RESET can switch the 80287 back to Real-Address mode. The FNINIT instruction
does not switch the operating state of the 80287.
80287 Emulation
If it is determined that no 80287 NPX is available in the system, systems
software may decide to emulate ESC instructions in software. This emulation
is easily supported by the 80286 hardware, because the 80286 can be
configured to trap to a software emulation routine whenever it encounters
an ESC instruction in its instruction stream.
As described previously, whenever the 80286 CPU encounters an ESC
instruction, and its MP and EM status bits are set appropriately (MP = 0,
EM = 1), the 80286 will automatically trap to interrupt #7, the Processor
Extension Not Available exception. The return link stored on the stack
points to the first byte of the ESC instruction, including the prefix
byte(s), if any. The exception handler can use this return link to examine
the ESC instruction and proceed to emulate the numeric instruction in
software.
The emulator must step the return pointer so that, upon return from the
exception handler, execution can resume at the first instruction following
the ESC instruction.
To an application program, execution on an 80286 system with 80287
emulation is almost indistinguishable from execution on an 80287 system,
except for the difference in execution speeds.
There are several important considerations when using emulation on an 80286
system:
� When operating in Protected-Address mode, numeric applications using
the emulator must be executed in execute-readable code segments.
Numeric software cannot be emulated if it is executed in execute-only
code segments. This is because the emulator must be able to examine
the particular numeric instruction that caused the Emulation trap.
� Only privileged tasks can place the 80286 in emulation mode. The
instructions necessary to place the 80286 in Emulation mode are
privileged instructions, and are not typically accessible to an
application.
An emulator package (E80287) that runs on 80286 systems is available from
Intel in the 8086 Software Toolbox, Order Number 122203. This emulation
package operates in both Real and Protected mode, providing a complete
functional equivalent for the 80287 emulated in software.
When using the E80287 emulator, writers of numeric exception handlers
should be aware of one slight difference between the emulated 80287 and the
80287 hardware:
� On the 80287 hardware, exception handlers are invoked by the 80286 at
the first WAIT or ESC instruction following the instruction causing the
exception. The return link, stored on the 80286 stack, points to this
second WAIT or ESC instruction where execution will resume following a
return from the exception handler.
� Using the E80287 emulator, numeric exception handlers are invoked from
within the emulator itself. The return link stored on the stack when
the exception handler is invoked will therefore point back to the
E80287 emulator, rather than to the program code actually being
executed (emulated). An IRET return from the exception handler returns
to the emulator, which then returns immediately to the emulated
program. This added layer of indirection should not cause confusion,
however, because the instruction causing the exception can always be
identified from the 80287's instruction and data pointers.
Table 3-1. NPX Processor State Following Initialization
����������������������������������������������������������������������������ķ
Field Value Interpretation
Control Word
Infinity Control 0 Projective
Rounding Control 00 Round to nearest
Precision Control 11 64 bits
Interrupt-Enable Mask 1 Interrupts disabled
Exception Masks 111111 All exceptions masked
Status Word
Busy 0 Not busy
Condition Code ???? (Indeterminate)
Stack Top 000 Empty stack
Interrupt Request 0 No interrupt
Exception Flags 000000 No exceptions
Field Value Interpretation
Exception Flags 000000 No exceptions
Tag Word
Tags 11 Empty
Registers N.C. Not changed
Exception Pointers
Instruction Code N.C. Not changed
Instruction Address N.C. Not changed
Operand Address N.C. Not changed
Handling Numeric Processing Exceptions
Once the 80287 has been initialized and normal execution of applications
has been commenced, the 80287 NPX may occasionally require attention in
order to recover from numeric processing errors. This section provides
details for writing software exception handlers for numeric exceptions.
Numeric processing exceptions have already been introduced in previous
sections of this manual.
As discussed previously, the 80287 NPX can take one of two actions when it
recognizes a numeric exception:
� If the exception is masked, the NPX will automatically perform its own
masked exception response, correcting the exception condition according
to fixed rules, and then continuing with its instruction execution.
� If the exception is unmasked, the NPX signals the exception to the
80286 CPU using the ERROR status line between the two processors.
Each time the 80286 encounters an ESC or WAIT instruction in its
instruction stream, the CPU checks the condition of this ERROR
status line. If ERROR is active, the CPU automatically traps to
Interrupt vector #16, the Processor Extension Error trap.
Interrupt vector #16 typically points to a software exception handler,
which may or may not be a part of systems software. This exception handler
takes the form of an 80286 interrupt procedure.
When handling numeric errors, the CPU has two responsibilities:
� The CPU must not disturb the numeric context when an error is
detected.
� The CPU must clear the error and attempt recovery from the error.
Although the manner in which programmers may treat these responsibilities
varies from one implementation to the next, most exception handlers will
include these basic steps:
� Store the NPX environment (control, status, and tag words, operand and
instruction pointers) as it existed at the time of the exception.
� Clear the exception bits in the status word.
� Enable interrupts on the CPU.
� Identify the exception by examining the status and control words in
the save environment.
� Take some system-dependent action to rectify the exception.
� Return to the interrupted program and resume normal execution.
It should be noted that the NPX exception pointers contained in the stored
NPX environment will take different forms, depending on whether the NPX is
operating in Real-Address mode or in Protected mode. The earlier discussion
of Real versus Protected mode details how this information is presented in
each of the two operating modes.
Simultaneous Exception Response
In cases where multiple exceptions arise simultaneously, the 80287 signals
one exception according to the precedence sequence shown in table 3-2. This
means, for example, that zero divided by zero will result in an invalid
operation, and not a zero divide exception.
Exception Recovery Examples
Recovery routines for NPX exceptions can take a variety of forms. They can
change the arithmetic and programming rules of the NPX. These changes may
redefine the default fix-up for an error, change the appearance of the NPX
to the programmer, or change how arithmetic is defined on the NPX.
A change to an error response might be to automatically normalize all
denormals loaded from memory. A change in appearance might be extending the
register stack into memory to provide an "infinite" number of numeric
registers. The arithmetic of the NPX can be changed to automatically extend
the precision and range of variables when exceeded. All these functions can
be implemented on the NPX via numeric errors and associated recovery
routines in a manner transparent to the application programmer.
Some other possible system-dependent actions, mentioned previously, may
include:
� Incrementing an exception counter for later display or printing
� Printing or displaying diagnostic information (e.g., the 80287
environment and registers)
� Aborting further execution
� Storing a diagnostic value (a NaN) in the result and continuing with
the computation
Notice that an exception may or may not constitute an error, depending on
the implementation. Once the exception handler corrects the error condition
causing the exception, the floating-point instruction that caused the
exception can be restarted, if appropriate. This cannot be accomplished
using the IRET instruction, however, because the trap occurs at the ESC or
WAIT instruction following the offending ESC instruction. The exception
handler must obtain from the NPX the address of the offending instruction in
the task that initiated it, make a copy of it, execute the copy in the
context of the offending task, and then return via IRET to the current CPU
instruction stream.
In order to correct the condition causing the numeric exception, exception
handlers must recognize the precise state of the NPX at the time the
exception handler was invoked, and be able to reconstruct the state of the
NPX when the exception initially occurred. To reconstruct the state of the
NPX, programmers must understand when, during the execution of an NPX
instruction, exceptions are actually recognized.
Invalid operation, zero divide, and denormalized exceptions are detected
before an operation begins, whereas overflow, underflow, and precision
exceptions are not raised until a true result has been computed. When a
before exception is detected, the NPX register stack and memory have
not yet been updated, and appear as if the offending instructions has not
been executed.
When an after exception is detected, the register stack and memory appear
as if the instruction has run to completion; i.e., they may be updated.
(However, in a store or store-and-pop operation, unmasked over/underflow is
handled like a before exception; memory is not updated and the stack is not
popped.) The programming examples contained in Chapter Four include an
outline of several exception handlers to process numeric exceptions for the
80287.
The following sections contain examples of numeric programs for the 80287
NPX written in ASM286. These examples are intended to illustrate some of the
techniques for programming the 80287 computing system for numeric
applications.
Conditional Branching Examples
As discussed in Chapter Two, several numeric instructions post their
results to the condition code bits of the 80287 status word. Although there
are many ways to implement conditional branching following a comparison, the
basic approach is as follows:
� Execute the comparison.
� Store the status word. (80287 allows storing status directly into AX
register.)
� Inspect the condition code bits.
� Jump on the result.
Figure 4-1 is a code fragment that illustrates how two memory-resident long
real numbers might be compared (similar code could be used with the FTST
instruction). The numbers are called A and B, and the comparison is A to B.
The comparison itself requires loading A onto the top of the 80287 register
stack and then comparing it to B, while popping the stack with the same
instruction. The status word is then written into the 80286 AX register.
A and B have four possible orderings, and bits C3, C2, and C0 of the
condition code indicate which ordering holds. These bits are positioned in
the upper byte of the NPX status word so as to correspond to the CPU's zero,
parity, and carry flags (ZF, PF, and CF), when the byte is written into the
flags. The code fragment sets ZF, PF, and CF of the CPU status word to the
values of C3, C2, and C0 of the NPX status word, and then uses the CPU
conditional jump instructions to test the flags. The resulting code is
extremely compact, requiring only seven instructions.
The FXAM instruction updates all four condition code bits. Figure 4-2 shows
how a jump table can be used to determine the characteristics of the value
examined. The jump table (FXAM_TBL) is initialized to contain the 16-bit
displacement of 16 labels, one for each possible condition code setting.
Note that four of the table entries contain the same value, because four
condition code settings correspond to "empty."
The program fragment performs the FXAM and stores the status word. It then
manipulates the condition code bits to finally produce a number in register
BX that equals the condition code times 2. This involves zeroing the unused
bits in the byte that contains the code, shifting C3 to the right so that
it is adjacent to C2, and then shifting the code to multiply it by 2. The
resulting value is used as an index that selects one of the displacements
from FXAM_TBL (the multiplication of the condition code is required because
of the 2-byte length of each value in FXAM_TBL). The unconditional JMP
instruction effectively vectors through the jump table to the labelled
routine that contains code (not shown in the example) to process each
possible result of the FXAM instruction.
Figure 4-1. Conditional Branching for Compares
.
.
.
A DQ ?
B DQ ?
.
.
.
FLD A ; LOAD A ONTO TOP OF 287 STACK
FCOMP B ; COMPARE A:B, POP A
FSTSW AX ; STORE RESULT TO CPU AX REGISTER
;
; CPU AX REGISTER CONTAINS CONDITION CODES (RESULTS OF
; COMPARE)
; LOAD CONDITION CODES INTO CPU FLAGS
SAHF
;
; USE CONDITIONAL JUMPS TO DETERMINE ORDERING OF A TO B
;
JP A_B_UNORDERED ; TEST C2 (PF)
JB A_LESS ; TEST C0 (CF)
JE A_EQUAL ; TEST C3 (ZF)
A_GREATER: ; C0 (CF) = 0, C3 (ZF) = 0
.
.
A_EQUAL: ; C0 (CF) = 0, C3 (ZF) = 1
.
.
A_LESS: ; C0 (CF) = 1, C3 (ZF) = 0
.
.
A_B_UNORDERED: ; C2 (PF) = 1
.
.
Figure 4-2. Conditional Branching for FXAM
; JUMP TABLE FOR EXAMINE ROUTINE
;
FXAM_TBL DW POS_UNNORM, POS_NAN, NEG_UNNORM, NEG_NAN,
& POS_NORM, POS_INFINITY, NEG_NORM,
& NEG_INFINITY, POS_ZERO, EMPTY, NEG_ZERO,
& EMPTY, POS_DENORM, EMPTY, NEG_DENORM, EMPTY
.
.
; EXAMINE ST AND STORE RESULT (CONDITION CODES)
FXAM
FSTSW AX
;
; CALCULATE OFFSET INTO JUMP TABLE
MOV BH,0 ; CLEAR UPPER HALF OF BX,
MOV BL,AH ; LOAD CONDITION CODE INTO BL
AND BL,00000111B ; CLEAR ALL BITS EXCEPT C2-C0
AND AH,01000000B ; CLEAR ALL BITS EXCEPT C3
SHR AH,2 ; SHIFT C3 TWO PLACES RIGHT
SAL BX,1 ; SHIFT C2-C0 1 PLACE LEFT (MULTIPLY
; BY 2)
OR BL,AH ; DROP C3 BACK IN ADJACENT TO C2
; (000XXXX0)
;
; JUMP TO THE ROUTINE `ADDRESSED' BY CONDITION CODE
JMP FXAM_TBL[BX]
;
; HERE ARE THE JUMP TARGETS, ONE TO HANDLE
; EACH POSSIBLE RESULT OF FXAM
POS_UNNORM:
.
POS_NAN:
.
NEG_UNNORM:
.
NEG_NAN:
.
POS_NORM:
.
POS_INFINITY:
.
NEG_NORM:
.
NEG_INFINITY:
.
POS_ZERO:
.
EMPTY:
.
NEG_ZERO:
.
POS_DENORM:
.
NEG_DENORM:
Exception Handling Examples
There are many approaches to writing exception handlers. One useful
technique is to consider the exception handler procedure as consisting of
"prologue," "body," and "epilogue" sections of code. (For compatibility with
the 80287 emulators, this procedure should be invoked by interrupt pointer
(vector) number 16.)
At the beginning of the prologue, CPU interrupts have been disabled. The
prologue performs all functions that must be protected from possible
interruption by higher-priority sources. Typically, this will involve saving
CPU registers and transferring diagnostic information from the 80287 to
memory. When the critical processing has been completed, the prologue may
enable CPU interrupts to allow higher-priority interrupt handlers to preempt
the exception handler.
The exception handler body examines the diagnostic information and makes a
response that is necessarily application-dependent. This response may range
from halting execution, to displaying a message, to attempting to repair the
problem and proceed with normal execution.
The epilogue essentially reverses the actions of the prologue, restoring
the CPU and the NPX so that normal execution can be resumed. The epilogue
must not load an unmasked exception flag into the 80287 or another exception
will be requested immediately.
Figures 4-3, 4-4 and 4-5 show the ASM286 coding of three skeleton
exception handlers. They show how prologues and epilogues can be written for
various situations, but provide comments indicating only where the
application-dependent exception handling body should be placed.
Figure 4-3 and 4-4 are very similar; their only substantial difference is
their choice of instructions to save and restore the 80287. The tradeoff
here is between the increased diagnostic information provided by FNSAVE and
the faster execution of FNSTENV. For applications that are sensitive to
interrupt latency or that do not need to examine register contents, FNSTENV
reduces the duration of the "critical region," during which the CPU will
not recognize another interrupt request (unless it is a nonmaskable
interrupt).
After the exception handler body, the epilogues prepare the CPU and the NPX
to resume execution from the point of interruption (i.e., the instruction
following the one that generated the unmasked exception). Notice that the
exception flags in the memory image that is loaded into the 80287 are
cleared to zero prior to reloading (in fact, in these examples, the entire
status word image is cleared).
The examples in figures 4-3 and 4-4 assume that the exception handler
itself will not cause an unmasked exception. Where this is a possibility,
the general approach shown in figure 4-5 can be employed. The basic
technique is to save the full 80287 state and then to load a new control
word in the prologue. Note that considerable care should be taken when
designing an exception handler of this type to prevent the handler from
being reentered endlessly.
Figure 4-3. Full-State Exception Handler
SAVE_ALL PROC
;
; SAVE CPU REGISTERS, ALLOCATE STACK SPACE FOR 80287 STATE IMAGE
PUSH BP
MOV BP,SP
SUB SP,94
; SAVE FULL 80287 STATE, WAIT FOR COMPLETION, ENABLE CPU INTERRUPTS
FNSAVE [BP-94]
FWAIT
STI
;
; APPLICATION-DEPENDENT EXCEPTION HANDLING CODE GOES HERE
;
; CLEAR EXCEPTION FLAGS IN STATUS WORD RESTORE MODIFIED STATE IMAGE
MOV BYTE PTR [BP-92], 0H
FRSTOR [BP-94]
; DE-ALLOCATE STACK SPACE, RESTORE CPU REGISTERS
MOV SP,BP
.
.
POP BP
;
; RETURN TO INTERRUPTED CALCULATION
IRET
SAVE_ALL ENDP
Figure 4-4. Reduced-Latency Exception Handler
SAVE_ENVIRONMENT PROC
;
; SAVE CPU REGISTERS, ALLOCATE STACK SPACE FOR 80287 ENVIRONMENT
PUSH BP
.
MOV BP,SP
SUB SP,14
; SAVE ENVIRONMENT, WAIT FOR COMPLETION, ENABLE CPU INTERRUPTS
FNSTENV [BP-14]
FWAIT
STI
;
; APPLICATION EXCEPTION-HANDLING CODE GOES HERE
;
; CLEAR EXCEPTION FLAGS IN STATUS WORD RESTORE MODIFIED
; ENVIRONMENT IMAGE
MOV BYTE PTR [BP-12], 0H
FLDENV [BP-14]
; DE-ALLOCATE STACK SPACE, RESTORE CPU REGISTERS
MOV SP,BP
POP BP
;
; RETURN TO INTERRUPTED CALCULATION
IRET
SAVE_ENVIRONMENT ENDP
Figure 4-5. Reentrant Exception Handler
.
.
.
LOCAL_CONTROL DW ? ; ASSUME INITIALIZED
.
.
.
REENTRANT PROC
;
; SAVE CPU REGISTERS, ALLOCATE STACK SPACE FOR
; 80287 STATE IMAGE
PUSH BP
.
.
.
MOV BP,SP
SUB SP,94
; SAVE STATE, LOAD NEW CONTROL WORD, FOR COMPLETION, ENABLE CPU
; INTERRUPTS
FNSAVE [BP-94]
FLDCW LOCAL_CONTROL
STI
.
.
.
; APPLICATION EXCEPTION HANDLING CODE GOES HERE.
; AN UNMASKED EXCEPTION GENERATED HERE WILL CAUSE THE EXCEPTION
; HANDLER TO BE REENTERED.
; IF LOCAL STORAGE IS NEEDED, IT MUST BE ALLOCATED ON THE CPU STACK.
.
.
.
; CLEAR EXCEPTION FLAGS IN STATUS WORD RESTORE MODIFIED STATE IMAGE
MOV BYTE PTR [BP-92], 0H
FRSTOR [BP-94]
; DE-ALLOCATE STACK SPACE, RESTORE CPU REGISTERS
MOV SP,BP
.
.
.
POP BP
; RETURN TO POINT OF INTERRUPTION
IRET
REENTRANT ENDP
Floating-Point to ASCII Conversion Examples
Numeric programs must typically format their results at some point for
presentation and inspection by the program user. In many cases, numeric
results are formatted as ASCII strings for printing or display. This example
shows how floating-point values can be converted to decimal ASCII character
strings. The function shown in figure 4-6 can be invoked from PL/M-286,
Pascal-286, FORTRAN-286, or ASM286 routines.
Shortness, speed, and accuracy were chosen rather than providing the
maximum number of significant digits possible. An attempt is made to keep
integers in their own domain to avoid unnecessary conversion errors.
Using the extended precision real number format, this routine achieves a
worst case accuracy of three units in the 16th decimal position for a
noninteger value or integers greater than 10^(18). This is double precision
accuracy. With values having decimal exponents less than 100 in magnitude,
the accuracy is one unit in the 17th decimal position.
Higher precision can be achieved with greater care in programming, larger
program size, and lower performance.
Figure 4-6. Floating-Point to ASCII Conversion Routine
iAPX286 MACRO ASSEMBLER 80287 Floating-Point to 18-Digit ASCII Conversion
SERIES-III iAPX286 MACRO ASSEMBLER X108 ASSEMBLY OF MODULE FLOATING_TO_ASCII
OBJECT MODULE PLACED IN :F3:FPASC.OBJ
ASSEMBLER INVOKED BY: ASM286.86 :F3:FPASC.AP2
LOC OBJ LINE SOURCE
1 +1 $title("80287 Floating-Point to 18-Digit ASCII
2
3 name floating_to_ascii
4
5 public floating_to_ascii
6 extrn get_power_IO near.tos_s
7 ;
8 ; This subroutine will convert the float
9 ; top of the 80287 stack to an ASCII str
10 ; scaling value (in binary). The maximu
11 ; formed is controlled by a parameter wh
12 ; denormal values, and psuedo zeroes wil
13 ; A returned value will indicate how man
14 ; precision were lost in an unnormal or
15 ; (in terms of binary power) of a psuedo
16 ; Integers less than 10**18 in magnitude
17 ; destination ASCII string field is wide
18 ; digits. Otherwise the value is conver
19 ;
20 ; The status of the conversion is iden
21 ; it can be:
22 ;
23 ; 0 conversion complete, s
24 ; 1 invalid arguments
25 ; 2 exact integer conversi
26 ; 3 indefinite
27 ; 4 + NAN (Not A Number)
28 ; 5 - NAN
29 ; 6 + Infinity
30 ; 7 - Infinity
31 ; 8 psuedo zero found, str
32 ;
33 ; The PLM/286 calling convention is
34 ;
35 ;floating_to_ascii:
36 ; procedure (number,denormal ptr, string
37 ; power_ptr) word external.
38 ; declare (denormal_ptr, string ptr, pow
39 ; declare field_size word, string_size b
40 ; delcare number real;
41 ; declare denormal integer based denorma
42 ; declare power integer based power_ptr,
43 ; and floating_to_ascii,
44 ;
45 ; The floating point value is expected
46 ; stack. This subroutine expects 3 free
47 ; will pop the passed value off when don
48 ; will have a leading character either '
49 ; of the value. The ASCII decimal digit
50 ; The numeric value of the ASCII string
51 ; If the given number was zero, the ASCI
52 ; and a single zero character. The valu
53 ; length of the ASCII string including t
54 ; always hold the sign. It is possible
55 ; field_size. This occurs for zeroes or
56 ; will return a special return code. Th
57 ; the power of the two originally associ
58 ; ten and ASCII string will be as if the
59 ;
60 ; The subroutine is accurate up to a m
61 ; integers. Integer values will have a
62 ; with them. For non-integers, the resu
63 ; decimal digits of the 16th decimal pla
64 ; exponentiate instruction is also used
65 ; range acceptable for the BCD data type
66 ; on entry to the subroutine is used for
67 ;
68 ; The following registers are not tran
69 ;
70 ; ax bx cx dx si di flags
71 ;
72 +1 $eject
73 ;
74 ; Define the stack layout
75 ;
0000[] 76 bp_save equ word ptr [bp]
0002[] 77 es_save equ bp_save + size bp_save
0004[] 78 return_ptr equ es_save + size es_save
0006[] 79 power_ptr equ return_ptr _ size return_pt
0008[] 80 field_size equ power_ptr + size power_ptr
000A[] 81 size_ptr equ field_size + size field_siz
000C[] 82 string_ptr equ size_ptr + size size_ptr
000E[] 83 denormal_ptr equ string_ptr + size strin
84
85 parms_size equ size power_ptr + size field
000A 86 & size string_ptr + size deno
87
88 Define constants used
89
0012 90 BCD_DIGIIS equ 18 ; Number of dig
0002 91 WORD-SIZE equ 2
000A 92 BCD_SIZE equ 10
0001 93 MINUS equ 1 ; Define return
0004 94 NAN equ 4 ; The exact val
0006 95 INFINITY equ 6 ; important. T
0003 96 INDEFINITE equ 3 ; the possible
0008 97 PSUEDO_ZERO equ 8 ; the same nume
-0002 98 INVALID equ -2 ; the program.
-0004 99 ZERO equ -4
-0006 100 DENORMAL equ -6
0008 101 UNNORMAL equ -8
0000 102 NORMAL equ 0
0002 103 EXACT equ 2
104 ;
105 ; Define layout of temporary storage
106 ;
-0002[] 107 status equ word ptr [bp-WORD_SIZE]
-0004[] 108 power_two equ status - WORD_SIZE
-0006[] 109 power_ten equ power_two - WORD_SIZE
-0010[] 110 bcd_value equ tbyte ptr power_ten - BCD_S
-0010[] 111 bcd_byte equ byte ptr bcd_value
-0010[] 112 fraction equ bcd_value
113
114 local_size equ size status + size pwer_two
0010 115 & + size bcd_value
116
---- 117 stack stackseg (local_size+6) ; Allocate
118 +1 $eject
---- 119 code segment or public
120 extrn power_table:qword
121 ;
122 ; Constants used by this function
123 ;
124 even ; Optimize
0000 0A00 125 const10 dw 10 ; Adjustmen
126 ;
127 ; Convert the C3,C2,C1,C0 encoding from
128 ; flags and values.
129 ;
0002 F8 130 status_table db UNNORMAL, NAN, UNNORMAL + M
0003 04
0004 F9
0005 05
0006 00 131 & NORMAL, INFINITY, NORMAL +
0007 06
0008 01
0009 07
000A FC 132 & ZERO, INVALID, ZERO + MINUS
000B FE
000C FD
000D FE
000E FA 133 & DENORMAL, INVALID, DENORMAL
000F FE
0010 FB
0011 FE
134
0012 135 floating_to_ascii proc
136
0012 E80000 137 call tos_status ; look
0015 8BD8 138 mov bx,ax ; Get d
0017 2E8A870200 139 mov al,status_table[bx]
001C 3CFE 140 cmp al,INVALID ; Look
001E 752B 141 jne not_empty
142 ;
143 ; ST(0) is empty! Return the status val
144 ;
0020 C20A00 145 ret parms_size
146 ;
147 ; Remove infinity from stack and exit
148 ;
0023 149 found_infinity
150
0023 DDD8 151 fstp st(0) ; OR to
0025 EB02 152 jmp short exit_proc
153 ;
154 ; String space is too small! Return in
155 ;
0027 156 small_string
157
0027 B0FE 158 mov al,INVALID
159
0029 160 exit_proc:
161
0029 C9 162 leave ; Resto
002A 07 163 pop es
002B C20A00 164 ret parms_size
165 ;
166 ; ST(0) is NAN or indefinite. Store th
167 ; at the fraction field to separate indef
168 ;
002E 169 NAN_or_indefinite:
002E DB7EF0 170
0031 A801 171 fstp fraction ; Remov
0033 9B 172 test al,MINUS ; Look
0034 74F3 173 fwait ; Insur
174 jz exit_proc ; Can't
0036 BB00C0 175
0039 2B5EF6 176 mov bx,0C000H ; Match
003C 0B5EF4 177 sub bx,word ptr fraction+6 ; Compa
003F 0B5EF2 178 or bx,word ptr fraction+4 ; Bits
0042 0B5EF0 179 or bx,word ptr fraction+2 ; Bits
0045 75E2 180 or bx,word ptr fraction ; Bits
181 jnz exit_proc
0047 B003 182
0049 EBDE 183 mov al,INDEFINITE ; Set r
184 jmp exit_proc
185 ;
186 ; Allocate stack space for local variab
187 ; addressibility.
188 ;
004B 189 not_empty:
190
004B 06 191 push es ; Save
004C C8100000 192 enter local_size,0 ; Forma
193
0050 8B4E08 194 mov cx,field_ize ; Check
0053 83F902 195 cmp cx,2
0056 7CCF 196 jl sjall_string
197
005B 49 198 dec cx ; Adjus
0059 83F912 199 cmp cx,BCD_DIGITS ; See i
005C 7603 200 jbe size_ok
201
005E B91200 202 mov cx,BCD_DIGITS ; Else
203
0061 204 size_ok:
205
0061 3C06 206 cmp al,INFINITY ; Look
0063 7DBE 207 jge found_infinity ; Retur
208
0065 3C04 209 cmp al,NAN ; Look
0067 7DC5 210 jge NAN_or_indefinite
211 ;
212 ; Set default return values and check t
213 ;
0069 D9E1 214 fabs ; Use p
215 ; sign
006B 8BD0 216 mov dx,ax ; Save
006D 33C0 217 xor ax,ax ; Form
006F 8B7E0E 218 mov di,denormal_ptr ; Zero
0072 8905 219 mov word ptr [di],ax
0074 8B5E06 220 mov bx,power_ptr ; Zero
0077 B907 221 mov word ptr [bx],ax
0079 80FAFC 222 cmp dl,ZERO ; Test
007C 732B 223 jae real_zero ; Skip
224
007E 80FAFA 225 cmp dl,DENORMAL ; Look
008A 732C 226 jae found_denormal ; Handl
227
0083 D9F4 228 fxtract ; Separ
0085 80FAF8 229 cmp dl,UNNORMAL ; Test
0088 7240 230 jb normal_value
231
008A 80EAF8 232 sub dl,UNNORMAL-NORMAL ; Retur
233 ;
234 ; Normalize the fraction, adjust the po
235 ; the denormal count value
236 ;
237 ; Assert 0 <= ST(0) < 1.0
238 ;
008D D9E8 239 fld1 ; Load
240
008F 241 normalize_fraction
242
008F DCC1 243 fadd st(1),st ; Set i
0091 DEE9 244 fsub ; Form
0093 D9F4 245 fxtract ; Power
246 ; of de
0095 D9C9 247 fxch ; Put d
0097 DF15 248 fist word ptr [di] ; Put n
0099 DEC2 249 faddp st(2),st ; Form
250 ; OK to
009B F71D 251 neg word ptr [di] ; Form
009D 752B 252 jnz not_psuedo_zero
253 ;
254 ; A psuedo zero will appear as an unnor
255 ; to normalize it, the resultant fraction
256 ; an fxtract on zero will yield a zero ex
257 ;
009F D9C9 258 fxch ; Put power
00A1 DF1D 259 fistp wrd ptr [di] ; Set denor
260 ; Word ptr
261 ; integer,
00A3 B0EAF8 262 sub dl,NORMAL-PSUEDO_ZERO ; Set ret
00A6 E9A400 263 jmp convert_integer ; Put zero
264 ;
265 ; The number is a real zero, set the re
266 ; conversion to BCD.
267 ;
00A9 268 real_zero
269
00A9 80EAF0 270 sub dl,ZERO-NORMAL ; Conve
00AC E99E00 271 jmp convert_integer ; Treat
272 ;
273 ; The number is a denormal. FXTRACT wi
274 ; case. To correctly separate the expone
275 ; constant to the exponent to guarantee t
276 ;
00AF 277 found_denormal:
278
00AF D9E8 279 fld1 ; Prepa
00B1 D9C9 280 fxch
00B3 D9F8 281 fprem ; Force
282 ; exten
00B5 D9F4 283 fxtract ; This
284 ;
285 ; The power of the original enormal val
286 ; Check if the fraction value is an unnor
287 ;
00B7 D9E5 288 fxam ; See i
00B9 9BDFE0 289 fstsw ax ; Save
00BC D9C9 290 fxch ; Put e
00BE D9CA 291 fxch st(2) ; Put 1
00C0 80EAFA 292 sub dl,DENORMAL-NORMAL ; Retur
00C3 A90044 293 test ax,4400H ; See i
00C6 74C7 294 jz normalize_fraction ; Jump
295
00C8 DDD8 296 fstp st(0) ; Remov
297 ;
298 ; Calculate the decimal magnitude assoc
299 ; within one order. This error will alwa
300 ; rounding and lost precision. As a resu
301 ; to consider the LOQ10 of the fraction v
302 ; Since the fraction will always be 1 <=
303 ; the basic accuracy of the function. To
304 ; simply multiply the power of two by LOQ
305 ; an integer.
306 ;
00CA 307 normal_value:
00CA 308 not_pseudo_zero:
309
00CA DB7EF0 310 fstp fraction ; Save
00CD DF56FC 311 fist power_two ; Save
00D0 D9EC 312 fldlg2 ; Get L
313 ; Power
00D2 DEC9 314 fmul ; Form
00D4 DF5EFA 315 fistp power_ten ; Any r
316 ;
317 ; Check if the magnitude of the number
318 ; an integer.
319 ;
320 ; CX has the maximum number of decimal di
321 ;
00D7 7B 322 fwait ; Wait
00D8 3B46FA 323 mov ax,power_ten ; Get p
00DB 2BC1 324 sub ax,cx ; Form
00DD 7722 325 ja adjust_result ; Jump
326 ;
327 ; The number is between 1 and 10**(fiel
328 ; Test if it is an integer.
329 ;
00DF 0F46FC 330 fild power_two ; Resto
00E2 8BF2 331 mov si,dx ; Save
00E4 80EAFE 332 sub dl,NORMAL-EXACT ; Conve
00E7 0B6EF0 333 fld fraction
00EA 09FD 334 fscale ; Form
00EC DDD1 335 fst st(1) ; Copy
00EE 09FC 336 frndint ; Test
00F0 08D9 337 fcomp ; Compa
00F2 7BDD7EFE 338 fstsw status ; Save
00F6 F746FE0040 339 test status,4000H ; C3=1
00FB 7550 340 jnz convert_integer
341
00FD DDD8 342 fstp st(0) ; Remov
00FF 8BD6 343 mov dx,si ; Resto
344 ;
345 ; Scale the number to within the range
346 ; The scaling operation should produce a
347 ; of magnitude of the largest decimal num
348 ; given string width.
349 ;
350 ; The scaling power of ten value is in
351 ;
0101 352 adjust_result:
353
0101 8907 354 mov word ptr [bx],ax ; Set i
0103 F7D8 355 neg ax ; Subst
356 ; of ma
0105 E80000 E 357 call get_power_10 ; Scali
358 ; and f
0108 DB6EF0 359 fld fraction ; Get f
010B DEC9 360 fmul ; Combi
010D 8BF1 361 mov si,cx ; Form
010F D1E6 362 shl si,1 ; BCD v
0111 D1E6 363 shl si,1 ; Index
0113 D1E6 364 shl si,1
0115 DF46FC 365 fild power_two ; Combi
0118 DEC2 366 faddp st(2),st
011A D9FD 367 fscale ; Form
011C DDD9 368 fstp st(1) ; Remov
369 ;
370 ; Test the adjusted value against a tab
371 ; The combined errors of the magnitude es
372 ; result in a value one order of magnitud
373 ; correctly in the BCD field. To handle
374 ; adjusted value, if it is too small or l
375 ; adjust the power of ten value.
376 ;
011E 377 test_power:
011E 2EDC940800 E 378
379 fcom power_table[si]_type power_tabl
380 ; entry
381 ; has b
0123 9BDFE0 382 fstsw ax ; No wa
0126 690041 383 test ax,4100H ; If C3
0129 750C 384 jnz text_for_small
385
012B 2EDE360000 R 386 fidiv const10 ; Else
0130 80E2FD 387 and dl,not EXACT ; Remov
0133 FF07 388 inc word ptr [bx] ; Adjus
0135 EB14 389 jmp short in_range ; Conve
390
0137 391 test_for_small
392
0137 2EDC940000 E 393 fcom power_table[si] ; Test
013C 9BDFE0 394 fstsw ax ; No wa
013F A90001 395 test ax,100H ; If C0
0142 7407 396 jz in_range ; Conve
397
0144 2EDE0E0000 R 398 fimul const10 ; Adjus
0149 FF0F 399 dec word ptr [bx] ; Adjus
400
014B 401 in_range:
402
014B D9FC 403 frndint ; Form
404 ;
405 ; Assert: 0 <= TOS <= 999,999,999,999,999
406 ; The TOS number will be exactly represen
407 ;
014D 408 convert_integer:
409
014D DF76F0 410 fbstp bcd_value ; Store
411 ;
412 ; While the store BCD runs, setup regis
413 ; ASCII.
414 ;
0150 BE0800 415 mov si,BCD_SIZE-2 ; Initi
0153 B9040F 416 mov cx,0f04h ; Set s
0156 BB0100 417 mov bx,1 ; Set i
0159 8B730C 418 mov di,string_ptr ; Get a
015C BCD8 419 mov ax,ds ; Copy
015E BEC0 420 mov es,ax
0160 FC 421 cld ; Set a
0161 B02B 422 mov al,'+' ; Clear
0163 F6C201 423 text dl,MINUS ; Look
0166 7402 424 jr positive_result
425
0168 B02D 426 mov al,'-'
427
016A 428 positive_result:
429
016A AA 430 stash ; Bump
016B 809E2FE 431 and dl,not MINUS ; Turn
016E 9B 432 fwait ; Wait
433 ;
434 ; Register usage:
435 ; ah: BCD byt
436 ; al: ASCII c
437 ; dx: Return
438 ; ch: BCD mas
439 ; cl BCD shi
440 ; bx: ASCII s
441 ; si: BCD fie
442 ; di: ASCII s
443 ; ds,es: ASCII s
444 ;
445 ; Remove leading zeroes from the number
446 ;
016F 447 skip_leading_zeroes
448
016F 8A62F0 449 move ah,bcd_byte[si] ; Get B
0172 BAC4 450 move al,ah ; Copy
0174 D2E8 451 shr al,cl ; Get h
0176 22C5 452 and al,ch ; Set z
0178 7516 453 jnz enter_odd ; Enter
454
017A 8AC4 455 mov al,ah ; Get B
017C 22C5 456 and al,ch ; Get l
017E 7518 457 jnz enter_even
458
0180 4E 459 dec si ; Decre
0181 79EC 460 jns skip_leading_zeroes
461 ;
462 ; The significand was all zeroes
463 ;
0183 B030 464 mov al,'0' ; Set i
0185 AA 465 stosb
0186 43 466 inc bx ; Bump
0187 EB16 467 jmp short exit_with_value
468 ;
469 ; Now expand the BCD string into digit
470 ;
0189 471 digit_loop
472
0189 8A62F0 473 mov ah,bcd_byte[si] ; Get B
018C 8AC4 474 mov al,ah
018E D2E8 475 shr al,cl ; Get h
476
0190 477 enter_odd
478
0190 0430 479 add al,'0' ; Conve
0192 AA 480 stosb ; Put d
0193 8AC4 481 mov al,ah ; Get l
0195 22C5 482 and al,ch
0197 43 483 inc bx ; Bump
484
0198 0430 485 enter_even
019A AA 486
019B 43 487 add al,'0' ; Conve
019C 4E 488 stosb ; Put d
019D 79EA 489 inc bx ; Bump
490 dec si ; Go to
491 jns digit_loop
492 ;
493 ; Conversion complete. Set the string
494 ;
019F 495 exit_with_value:
496
019F 8B7E0A 497 move di,size_ptr
01A2 891D 498 mov word ptr [di],bx
01A4 8BC2 499 mov ax,x ; Set r
01A6 E980FE 500 jmp exit_proc
501
502 floating_to_ascii endp
---- 503 code ends
504 end
ASSEMBLY COMPLETE, NO WARNINGS, NO ERRORS
iAPX286 MACRO ASSEMBLER Calculate the value of 10**ax 12:11:08
SERIES-III iAPX286 MACRO ASSEMBLER X108 ASSEMBLY OF MODULE GET_POWER_10
OBJECT MODULE PLACED IN :F3:POW10.OBJ
ASSEMBLER INVOKED BY: ASM286.86 :F3:POW10.AP2
LOC OBJ LINE SOURCE
1 +1 $title("Calculate the value of 10**ax")
2 ;
3 ; This subroutine will calculate the va
4 ; For values of 0 <= ax <19, the result w
5 ; All 80286 registers are transparent and
6 ; the TOS as two numbers, exponent in ST(
7 ; The exponent value can be larger than t
8 ; extended real format number. Three sta
9 ;
10 name get_power_10
11
12 public get_power_10,power_tabl
13
---- 14 stack stackseg 8
15
---- 16 code segment or public
17 ;
18 ; Use exact values from 1:0 to 1e18
19 ;
20 even ; O
0000 000000000000F0 21 power_table dq 1.0,1e1,1e2,1e3
3F
0008 00000000000024
40
0010 00000000000059
40
0018 0000000000408F
40
0020 000000000088C3 22 dq 1e4,1e5,1e6,1e7
40
0028 00000000006AF8
40
0030 0000000080842E
41
0038 00000000D01263
41
0040 0000000084D797 23 dq 1e8,1e9,1e10,1e11
41
0048 0000000065CDCD
41
0050 000000205FA002
42
0058 000000E8764837
42
0060 000000A2941A6D 24 dq 1e12,1e13,1e14,1e15
42
0068 000040E59C30A2
42
0070 0000901EC4BCD6
42
0078 00003426F56B0C
43
0080 0080E03779C341 25 dq 1e16,1e17,1e18
43
0088 00A0D885573476
43
0090 00C84E676DC1AB
43
0098 26
27 get_power_10 proc
0098 3D1200 28
009B 770F 29 cmp ax,18 ; T
30 ja out_of_range
009D 53 31
009E 8BD8 32 push bx ; G
00A0 C1E303 33 mov bx,ax ; F
00A3 2EDD870000 R 34 shl bx,3
00A8 5B 35 fld power_table[bx] ; G
00A9 D9F4 36 pop bx ; R
00AB C3 37 fxtract ; S
38 ret ; O
39 ;
40 ; Calculate the value using the exponen
41 ; The following relations are used:
42 ; 10**x = 2**(log2(10)*x)
43 ; 2**(I+F) = 2**I * 2**F
44 ; if st(1) = I and st(0) = 2**F t
45 ;
00AC 46 out_of_range:
47
00AC D9E9 48 fld12t ; T
00AE C8040000 49 enter 4.0 ; F
00B2 8946FE 50 mov [bp-2],ax ; S
00B5 DE4EFE 51 fimul word ptr [bp-1] ; T
00B8 9BD97EFC 52 fstcw word ptr [bp-4] ; G
00BC 8B46FC 53 mov ax,word ptr [bp-4] ; G
00BF 25FFF3 54 and ax,not OCOOH ; M
00C2 0D0004 55 or ax,0400H ; S
00C5 6746FC 56 xchg ax,word ptr [bp-4] ; P
57 ; o
00C8 D9E8 58 fld1 ; S
00CA D9E0 59 fchs
00CC D9C1 60 fld st(1)
00CE D96EFC 61 fldcw word ptr [bp-4]
00D1 D9FC 62 frndint
00D3 8946FC 63 mov word ptr [bp-4],ax
00D6 D96EFC 64 fldcw word ptr [bp-4]
00D9 D9CA 65 fxch st(2)
00DB DBE2 66 fsub st,st(2)
00DD 8B46FE 67 mov ax,[bp-2]
00E0 D9FD 68 fscale
00E2 D9F0 69 f2xm1
00E4 C9 70 leave
00E5 DEE1 71 fsubr
00E7 DCC8 72 fmul st,st(0)
00E9 C3 73 ret
74
75 get_power_10 endp
76
---- 77 code ends
78 end
ASSEMBLY COMPLETE, NO WARNINGS, NO ERRORS
iAPX286 MACRO ASSEMBLER Determine TOS register contents 12:12:13 0
SERIES-III iAPX286 MACRO ASSEMBLER X108 ASSEMBLY OF MODULE TOS_STATUS
OBJECT MODULE PLACED IN :F3:T0SST.OBJ
ASSEMBLER INVOKED BY: ASM286.86 :F3:TOSST.AP2
LOC OBJ LINE SOURCE
1 +1 $title("Determine TOS register contents")
2 ;
3 ; This subroutine will return a value f
4 ; to the contents of 80287 TOS. All regi
5 ; errors are possible. The return value
6 ; of FXAM instruction.
7 ;
8 name tos_status
9
10 public tos_status
11
---- 12 stack stackseg 6 ; Alloc
13
---- 14 code segment er public
15
0000 16 tos_status proc
17
0000 D9E5 18 fxam ; Get r
0002 9BDFE0 19 fstsw ax ; Get s
0005 8AC4 20 mov al,ah ; Put b
0007 250740 21 and ax,4007h ; Mask
000A C0EC03 22 shr ah,3 ; Put b
000D 0AC4 23 or al,ah ; Put c
000F B400 24 mov ah,0 ; Clear
0011 C3 25 ret
26
27 tos_status endp
28
---- 29 code ends
30 end
ASSEMBLY COMPLETE, NO WARNINGS, NO ERRORS
Function Partitioning
Three separate modules implement the conversion. Most of the work of the
conversion is done in the module FLOATING_TO_ASCII. The other modules are
provided separately, because they have a more general use. One of them,
GET_POWER_10, is also used by the ASCII to floating-point conversion
routine. The other small module, TOS_STATUS, will identify what, if
anything, is in the top of the numeric register stack.
Exception Considerations
Care is taken inside the function to avoid generating exceptions. Any
possible numeric value will be accepted. The only exceptions possible would
occur if insufficient space exists on the numeric register stack.
The value passed in the numeric stack is checked for existence, type (NaN
or infinity), and status (unnormal, denormal, zero, sign). The string size
is tested for a minimum and maximum value. If the top of the register stack
is empty, or the string size is too small, the function will return with an
error code.
Overflow and underflow is avoided inside the function for very large or
very small numbers.
Special Instructions
The functions demonstrate the operation of several numeric instructions,
different data types, and precision control. Shown are instructions for
automatic conversion to BCD, calculating the value of 10 raised to an
integer value, establishing and maintaining concurrency, data
synchronization, and use of directed rounding on the NPX.
Without the extended precision data type and built-in exponential function,
the double precision accuracy of this function could not be attained with
the size and speed of the shown example.
The function relies on the numeric BCD data type for conversion from binary
floating-point to decimal. It is not difficult to unpack the BCD digits into
separate ASCII decimal digits. The major work involves scaling the
floating-point value to the comparatively limited range of BCD values. To
print a 9-digit result requires accurately scaling the given value to an
integer between 10^(8) and 10^(9). For example, the number +0.123456789
requires a scaling factor of 10^(9) to produce the value +123456789.0,
which can be stored in 9 BCD digits. The scale factor must be an exact
power of 10 to avoid to changing any of the printed digit values.
These routines should exactly convert all values exactly representable in
decimal in the field size given. Integer values that fit in the given string
size will not be scaled, but directly stored into the BCD form. Noninteger
values exactly representable in decimal within the string size limits will
also be exactly converted. For example, 0.125 is exactly representable in
binary or decimal. To convert this floating-point value to decimal, the
scaling factor will be 1000, resulting in 125. When scaling a value, the
function must keep track of where the decimal point lies in the final
decimal value.
Description of Operation
Converting a floating-point number to decimal ASCII takes three major
steps: identifying the magnitude of the number, scaling it for the BCD data
type, and converting the BCD data type to a decimal ASCII string.
Identifying the magnitude of the result requires finding the value X such
that the number is represented by I*10^(X), where 1.0 <= I < 10.0. Scaling
the number requires multiplying it by a scaling factor 10^(S), so that the
result is an integer requiring no more decimal digits than provided for in
the ASCII string.
Once scaled, the numeric rounding modes and BCD conversion put the number
in a form easy to convert to decimal ASCII by host software.
Implementing each of these three steps requires attention to detail. To
begin with, not all floating-point values have a numeric meaning. Values
such as infinity, indefinite, or Not a Number (NaN) may be encountered by
the conversion routine. The conversion routine should recognize these
values and identify them uniquely.
Special cases of numeric values also exist. Denormals, unnormals, and
pseudo zero all have a numeric value but should be recognized, because all
of them indicate that precision was lost during some earlier calculations.
Once it has been determined that the number has a numeric value, and it is
normalized setting appropriate unnormal flags, the value must be scaled to
the BCD range.
Scaling the Value
To scale the number, its magnitude must be determined. It is sufficient to
calculate the magnitude to an accuracy of 1 unit, or within a factor of 10
of the given value. After scaling the number, a check will be made to see if
the result falls in the range expected. If not, the result can be adjusted
one decimal order of magnitude up or down. The adjustment test after the
scaling is necessary due to inevitable inaccuracies in the scaling value.
Because the magnitude estimate need only be close, a fast technique is
used. The magnitude is estimated by multiplying the power of 2, the unbiased
floating-point exponent, associated with the number by log{10}2. Rounding
the result to an integer will produce an estimate of sufficient accuracy.
Ignoring the fraction value can introduce a maximum error of 0.32 in the
result.
Using the magnitude of the value and size of the number string, the scaling
factor can be calculated. Calculating the scaling factor is the most
inaccurate operation of the conversion process. The relation
10^(X)=2**(X * log{2}10) is used for this function. The exponentiate
instruction (F2XM1) will be used.
Due to restrictions on the range of values allowed by the F2XM1
instruction, the power of 2 value will be split into integer and fraction
components. The relation 2**(I + F) = 2**I * 2**F allows using the FSCALE
instruction to recombine the 2**F value, calculated through F2XM1, and the
2**I part.
Inaccuracy in Scaling
The inaccuracy of these operations arises because of the trailing zeros
placed into the fraction value when stripping off the integer valued bits.
For each integer valued bit in the power of 2 value separated from the
fraction bits, one bit of precision is lost in the fraction field due to
the zero fill occurring in the least significant bits.
Up to 14 bits may be lost in the fraction because the largest allowed
floating point exponent value is 2^(14) - 1.
Avoiding Underflow and Overflow
The fraction and exponent fields of the number are separated to avoid
underflow and overflow in calculating the scaling values. For example, to
scale 10^(4932) to 10^(8) requires a scaling factor of 10^(4950), which
cannot be represented by the NPX.
By separating the exponent and fraction, the scaling operation involves
adding the exponents separate from multiplying the fractions. The exponent
arithmetic will involve small integers, all easily represented by the NPX.
Final Adjustments
It is possible that the power function (Get_Power_10) could produce a
scaling value such that it forms a scaled result larger than the ASCII field
could allow. For example, scaling 9.9999999999999999 * 10^(4900) by
1.00000000000000010 * 10^(-4883) would produce 1.00000000000000009 *
10^(18). The scale factor is within the accuracy of the NPX and the result
is within the conversion accuracy, but it cannot be represented in BCD
format. This is why there is a post-scaling test on the magnitude of the
result. The result can be multiplied or divided by 10, depending on whether
the result was too small or too large, respectively.
Output Format
For maximum flexibility in output formats, the position of the decimal
point is indicated by a binary integer called the power value. If the power
value is zero, then the decimal point is assumed to be at the right of the
rightmost digit. Power values greater than zero indicate how many trailing
zeros are not shown. For each unit below zero, move the decimal point to the
left in the string.
The last step of the conversion is storing the result in BCD and indicating
where the decimal point lies. The BCD string is then unpacked into ASCII
decimal characters. The ASCII sign is set corresponding to the sign of the
original value.
Trigonometric Calculation Examples
The 80287 instruction set does not provide a complete set of trigonometric
functions that can be used directly in calculations. Rather, the basic
building blocks for implementing trigonometric functions are provided by the
FPTAN and FPREM instructions. The example in figure 4-7 shows how three
trigonometric functions (sine, cosine, and tangent) can be implementing
using the 80287. All three functions accept a valid angle argument between
-2^(62) and +2^(62). These functions may be called from PL/M-286,
Pascal-286, FORTRAN-286, or ASM286 routines.
These trigonometric functions use the partial tangent instruction together
with trigonometric identities to calculate the result. They are accurate to
within 16 units of the low 4 bits of an extended precision value. The
functions are coded for speed and small size, with tradeoffs available for
greater accuracy.
SERIES-888 iAPX286 MACRO ASSEMBLER X108 ASSEMBLY OF MODULE TRIG_FUNCTIONS
OBJECT MODULE PLACED IN :F3:TRIG.OBJ
ASSEMBLER INVOKED BY: ASM286.86 :F3:TRIG.AP2
LOC OBJ LINE SOURCE
1 +1 $title("80287 Trigonometric Functions")
2
3 name trig_functions
4 public sine,cosine,tangent
5
---- 6 stack stackseg 6 ; R
7
# 8 sw_287 record res1:1,cond3:1,top:3,co
9 res2:8
10
---- 11 code segment er public
12 ;
13 ; Define local constants
14 ;
15 even
0000 35C26821A2DA0F 16 pi_quarter dt 3FFEC90FDAA22168C235R ;
C9FE3F
000A 0000C0FF 17 indefinite dd 0FFC000000R ; I
18 +1 $eject
19 ;
20 ; This subroutine calculates the sine o
21 ; radians. The angle is in ST(0), the re
22 ; The result is accurate to within 7 unit
23 ; bits of the NPX extended real format.
24 ;
25 ; sine: procedure (angle) real external
26 ; declare angle real;
27 ; and sine;
28 ;
29 ; cosine: procedure (angle) real external
30 ; declare angle real;
31 ; and cosine;
32 ;
33 ; Three stack registers are required.
34 ; defined as follows for the following ar
35 ;
36 ; angle
37 ;
38 ; valid or unnormal less than 2**
39 ; zero
40 ; denormal
41 ; valid or unnormal greater than
42 ; infinity
43 ; NAN
44 ; empty
45 +1 $eject
46 ;
47 ; This function is based on the NPX fpt
48 ; instruction will only work with an angl
49 ; instruction, the sine or cisone of angl
50 ; calculated. The technique used by this
51 ; sine or cosine by using one of four pos
52 ;
53 ; Let R = |angle mod PI/4|
54 ; S = -1 or 1, according
55 ;
56 ; 1) sin(R) 2) cos(R) 3) sin(PI/4-R)
57 ;
58 ; The choice of the relation and the si
59 ; decision table shown below based on the
60 ;
61 ; octant sine
62 ;
63 ; 0 s*1
64 ; 1 s*4
65 ; 2 s*2
66 ; 3 s*3
67 ; 4 -s*1
68 ; 5 -s*4
69 ; 6 -s*2
70 ; 7 -s*3
71 ;
72 +1 $eject
73 ;
74 ; Angle to sine function is a zero or u
75 ;
000E 76 sine_zero_unnormal:
77
000E DDD9 78 fstp st(1)
0010 7501 79 jnz enter_sine_normalize
80 ;
81 ; Angle is a zero.
82 ;
0012 C3 83 ret
84 ;
85 ; Angle is an unnormal
86 ;
0013 87 enter_sine_normalize:
88
0013 E80901 89 call normalize_value
0016 EB2F 90 jmp short enter_sine
91
0018 92 cosine proc
93
0018 D9E5 94 fxam
00A1 9BDF30 95 fstsw ax
001D 2EDB2E0000 R 96 fld pi_quarter
0022 B101 97 mov c1,1
0024 9E 98 sahf
0025 7263 99 jc funny_parameter
100
101 ;
102 ; Angle is unnormal, normal, zero, deno
103 ;
0027 D9C9 104 fxch
0029 7A1C 105 jpe enter_sine
106 ;
107 ; Angle is an unnormal or zero
108 ;
002B DDD9 109 fstp st(1)
002D 75E4 110 jnz enter_sine_normalize
111 ;
112 ; Angle is a zero, cos(0) = 1.0
113 ;
002F DDD8 114 fstp st(0) ; R
0031 D9E8 115 fldl ; R
0033 C3 116 ret
117 ;
118 ; All work is done as a sine function.
119 ; a cosine is converted to a sine. Of co
120 ; done to the argument but rather to the
121 ;
0034 122 sine ; E
123
0034 D9E5 124 fxam ; L
0036 9BDFE0 125 fstsw ax ; L
0039 2EDB2E0000 R 126 fld pi_quarter ; G
003E 9E 127 sahf ; C
003F 7249 128 jc funny_parameter ; J
129 ;
130 ; Angle is unnormal, normal, zero, or d
131 ;
0041 D9C9 132 fxch ; S
0043 B100 133 mov c1,0 ; S
0045 7BC7 134 jpo sine_zero_unnormal ; J
135 ;
136 ; ST(0) is either a normal or denormal
137 ; Use the fprem instruction to accurately
138 ; angle to within 0 and PI/4 in magnitude
139 ; angle in one shot, the angle is too big
140 ; radians. Any roundoff error in the cal
141 ; could completely change the result of t
142 ; call this very rare case an error.
143 ;
0047 144 enter_sine
0047 D9F8 145 fprem ; R
146 ; N
147 ; d
148 ; F
149 ; w
150 ; u
0049 93 151 xchg ax,bx ; S
004A 9BDFE0 152 fstsw ax ; C
153 ; Q
004D 93 154 xchg ax,bx ; P
004E F6C704 155 test bh,high(mask cond2) ; s
0051 7544 156 jnz angle_too_big
157 ;
158 ; Set sign flags and test for which eig
159 ; angle fell intl.
160 ;
161 ; Assert -PI/4 < st(0) < PI/4
162 ;
0053 D9E1 163 fabs ; F
164 ; c
0055 0AC9 165 or c1,c1 ; T
0057 740F 166 jz sine_select ; J
167 ;
168 ; This is a cosine function. Ignore th
169 ; and add a quarter revolution to the oct
170 ; cos(A) = sin(A+PI/2) and cos(|A|) = cos
171 ;
0059 B0E4FD 172 and ah,not high(mask cond1) ; T
005C B0CF80 173 or by,80H ; P
174 ; s
175 ; S
005F 80C740 176 add bh,high(mask cond3) ; C
0062 B000 177 mov al,0 ; E
0064 D0D0 178 rcl al,1 ; P
0066 32FB 179 xor bh,al ; A
180 ; C
181 ;
182 ; See if the argument should be reverse
183 ; which the argument fell during fprem.
184 ;
0068 185 sine_select:
186
0068 F6C702 187 test bh,high(mask cond1) ; R
006B 7404 188 jz no_sine_reverse
189 ;
190 ; Angle was in octants 1,3,5,7.
191 ;
006D DEE9 192 fsub ; I
006F EB0E 193 jmp short do_sine_fptan ; 0
194 ;
195 ; Angle was in octants 0,2,4,6
196 ; Test for a zero argument since fptan wi
197 ;
0071 198 no_sine_reverse:
199
0071 D9E4 200 ftst ; T
0073 91 201 xchg ax,cx
0074 9BDFE0 202 fstsw ax ; c
0077 91 203 xchg ax,cx
0078 DDD9 204 fstp st(1) ; R
007A F6C540 205 test ch,high(mask cond3) ; I
007D 7514 206 jnz sine_argument_zero
207 ;
208 ; Assert: 0 < st(0) <= PI/4
209 ;
007F 210 do_sine_fptan:
211
007F D9F2 212 fptan
213
0081 214 after_sine_fptan:
215
0081 F6C742 216 test bh,high(mask cond3 + mask cond1
0084 7B1A 217 jpo x_numerator
218
219 ;
220 ; Calculate the sine of the argument
221 ; sine(A) = tan(A)/sqrt(1+tan(A)**2)
222 ; sin(A) = Y/sqrt(X*X + Y*Y)
223 ;
0086 D9C1 224 fld st(1)
0088 EB1A 225 jmp short finish_sine
226 ;
227 ; The top of the stack is either NAN, i
228 ;
008A 229 funny_parameter:
230
008A DDD8 231 fstp st(0)
008C 7404 232 jz return_empty
233
008E 7B02 234 jpo return_NAN
235 ;
236 ; st(0) is infinity. Return an indefin
237 ;
0090 D9F8 238 fprem
239
0092 240 return_NAN:
0092 241 return_empty:
242
0092 C3 243 ret
244 ;
245 ; Simulate fptan with st(0) = 0
246 ;
0093 247 sine_argument_zero:
248
0093 D9EB 249 fld1
0095 EBEA 250 jmp after_sine_fptan
251 ;
252 ; The angle was too large. Remove the
253 ; stack and return an indefinite result.
254 ;
0097 255 angle_too_big:
256
0097 DED9 257 fcompp
0099 2ED9060A00 R 258 fld indefinite
009E 9B 259 fwait
009F C3 260 ret
261 ;
262 ; Calculate the cosine of the argument
263 ; cos(A) = 1/sqrt(1+tan(A)**2) if tan(
264 ; cos(A) = X/sqrt(X*X + Y*Y)
265 ;
00A0 266 X_numerator:
267
00A0 D9C0 268 fld st(0)
00A2 D9CA 269 fxch st(2)
270
00A4 271 finish_sine:
272
00A4 DCCB 273 fmul st,st(0)
00A6 D9C9 274 fxch
00AB DCC8 275 fmul st,st(0)
00AA DEC1 276 fadd
00AC D9FA 277 fsqrt
278
279
280 ; Form the sign of th result. The two
281 ; FXAM in bh and the CO flag from fprem i
282 ;
00AE 80E701 283 and bh,high(mask cond0)
00B1 80E402 284 and ah,high(mask cond1)
00B4 0AFC 285 or bh,ah
00B6 7A02 286 jpe positive_sine
287
00B8 D9E0 288 fchs
289
00BA 290 positive_sine:
291
00BA DEF9 292 fdiv
00BC C3 293 ret
294
295 cosine endp
296 +1 $eject
297 ;
298 ; This function will calculate the tang
299 ; The angle, in radians is passed in ST(0
300 ; in ST(0). The tangent is calculated to
301 ; least three significant bits of an exte
302 ; PLM/86 calling format is:
303 ;
304 ; tangent procedure (angle) real external
305 ; declare angle real;
306 ; end tangent;
307 ;
308 ; Two stack registers are used. The re
309 ; defined for the following cases:
310 ;
311 ; angle
312 ;
313 ; valid or unnormal < 2**62 in ma
314 ; 0
315 ; denormal
316 ; valid or unnormal > 2**62 in ma
317 ; NAN
318 ; infinity
319 ; empty
320 ;
321 ; The tangent instruction uses the fpta
322 ; relations are used:
323 ;
324 ; Let R = |angle MOD PI/4|
325 ; S = -1 or 1 depending on the sign
326 ;
327 ; 1) tan(R) 2) tan(PI/4-R) 3) 1/tan(R)
328 ;
329 ; The following table is used to decide
330 ; on in which octant the angle fell.
331 ;
332 ; octant relation
333 ;
334 ; 0 s*1
335 ; 1 s*4
336 ; 2 -s*3
337 ; 3 -s*2
338 ; 4 s*1
339 ; 5 s*4
340 ; 6 -s*3
341 ; 7 -s*2
342 ;
00BD 343 tangent proc
344
00BD D9E5 345 fram
00BF 9BDFE0 346 fstw ax
00C2 2EDB2E0000 R 347 fld pi_quarter
00C7 9E 348 sahf
00C8 72C0 349 jc funny_parameter
350 ;
351 ; Angle is unnormal, normal, zero, or d
352 ;
00CA D9C9 353 fxch
00CC 7A17 354 jpe tan_zero_unnormal
355 ;
356 ; Angle is either an normal or denormal
357 ; Reduc the angle to the range -PI/4 < re
358 ; If fprem cannot perform this operation
359 ; angle must be > 2**62. Such an angle i
360 ; errors could make a very large differen
361 ; It is safest to call this very rare cas
362 ;
00CE 363 tan_normal
364
00CE D9F8 365 fprem
366
00D0 93 367 xchg ax,bx
00D1 9BDFE0 368 fstsw ax
369
00D4 93 370 xchg ax,bx
00D5 F6C704 371 test bh,high(mask cond2)
00D8 7BD 372 jnz angle_too_big
373 ;
374 ; See if the angle must be reversed.
375 ;
376 ; Assert -PI/4 < st(0) < PI/4
377 ;
00DA D9E1 378 fabs
379
00DC F6C702 380 test bh,high(mask cond1)
00DF 740E 381 jz no_tan_reverse
382 ;
383 ; Angle fell in octants 1,3,5,7. Rever
384 ;
00E1 DEE9 385 fsub
00E3 EB18 386 jmp short do_tangent
387 ;
388 ; Angle is either zero or an unnormal
389 ;
00E5 390 tan_zero_unnormal:
391
00E5 DDD9 392 fstp st(1)
00E7 7405 393 jz tan_angle_zero
394 ;
395 ; Angle is an unnormal.
396 ;
00E9 E83300 397 call normalize_value
00EC EBE0 398 jmp tan_normal
399
00EE 400 tan_angle_zero:
401
00EE C3 402 ret
403 ;
404 Angle fell in octants 0,2,4,6. Test f
405 ;
00EF 406 no_tan_reverse:
407
00EF D9E4 408 ftst
00F1 91 409 xchg ax,cx
00F2 9BDFE0 410 fstsw ax
00F5 91 411 fstp st(1)
00F6 DDD9 412 test ch,high(mask cond3)
00F8 F6C540 413 jnz tan_zero
00FB 7515 414
415 do_tangent:
00FD 416
417 fptan
00FD D9F2 418
419 after_tangent:
00FF 420 ;
421 ; Decide on the order of the operands a
422 ; operation while the fptan instruction i
423 ;
424 mov al,bh
00FF 8AC7 425 and ax,mask cond1 + high(mask cond3
0101 254002 426
427 test bh,high(mask cond1 + mask cond3
0104 F6C742 428
429 jpo reverse_divide
0107 7B0D 430
431 ;
432 ; Angle was in octants 0,3,4,7
433 ; Test for the sign of the result. Two n
434 ;
435 or al,ah
0109 0AC4 436 jpe positive_divide
010B 7A02 437
438 fchs
010D D9E0 439
440 positive_divide:
010F 441
442 fdiv
010F DEF9 443 ret
0111 C3 444
445 tan_zero:
0112 446
447 fld1
0112 D9E8 448 ;
0114 EBE9 449 ; Angle was in octants 1,2,5,6
450 ; Set the correct sign of the result
451 ;
452 reverse_divide:
453
0116 454 or al,ah
455 jpe positive_r_divide
0116 0AC4 456
0118 7A02 457 fchs
458
011A D9E0 459 positive_r_divide:
460
011C 461 fdivr
462 ret
011C DEF1 463
011E C3F1 464 tangent endp
465 ;
466 ; This function will normalize the valu
467 ; Then PI/4 is placed into st(1).
468 ;
469 normalize_value:
470
011F 471 fabs
472 fxtract
011F D9E1 473 fld1
0121 D9F4 474 fadd st(1),st
0123 D9E8 475 fsub
0125 DCC1 476 fscale
0127 DEE9 477 fstp st(1)
0129 D9FD 478 fld pi_quarter
012B DDD9 479 fxch
012D 2EDB2E0000 R 480 ret
0132 D9C9 481
0134 C3 482 code ends
483
---- 484
485
ASSEMBLY COMPLETE, NO WARNINGS, NO ERRORS
FPTAN and FPREM
These trigonometric functions use the FPTAN instruction of the NPX. FPTAN
requires that the angle argument be between 0 and �/4 radians, 0 to 45
degrees. The FPREM instruction is used to reduce the argument down to this
range. The low three quotient bits set by FPREM identify which octant the
original angle was in.
One FPREM instruction iteration can reduce angles of 10^(18) radians or less
in magnitude to �/4! Larger values can be reduced, but the meaning of the
result is questionable, because any errors in the least significant bits of
that value represent changes of 45 degrees or more in the reduced angle.
Cosine Uses Sine Code
To save code space, the cosine function uses most of the sine function
code. The relation sin (�A� + �/2) = cos(A) is used to convert the
cosine argument into a sine argument. Adding �/2 to the angle is performed
by adding 010{2} to the FPREM quotient bits identifying the argument's
octant.
It would be very inaccurate to add �/2 to the cosine argument if it was
very much differentfrom �/2.
Depending on which octant the argument falls in, a different relation will
be used in the sine and tangent functions. The program listings show which
relations are used.
For the tangent function, the ratio produced by FPTAN will be directly
evaluated. The sine function will use either a sine or cosine relation
depending on which octant the angle fell into. On exit, these functions will
normally leave a divide instruction in progress to maintain concurrency.
If the input angles are of a restricted range, such as from 0 to 45
degrees, then considerable optimization is possible since full angle
reduction and octant identification is not necessary.
All three functions begin by looking at the value given to them. Not a
Number (NaN), infinity, or empty registers must be specially treated.
Unnormals need to be converted to normal values before the FPTAN instruction
will work correctly. Denormals will be converted to very small unnormals
that do work correctly for the FPTAN instruction. The sign of the angle is
saved to control the sign of the result.
Within the functions, close attention was paid to maintain concurrent
execution of the 80287 and host. The concurrent execution will effectively
hide the execution time of the decision logic used in the program.
Appendix A Machine Instruction Encoding and Decoding
Machine instructions for the 80287 come in one of five different forms as
shown in table A-1. In all cases, the instructions are at least two bytes
long and begin with the bit pattern 11011B, which identifies the ESCAPE
class of instructions. Instructions that reference memory operands are
encoded much like similar CPU instructions, because all of the CPU
memory-addressing modes may be used with ESCAPE instructions.
Note that several of the processor control instructions (see table 2-11 in
Chapter Two) may be preceded by an assembler-generated CPU WAIT instruction
(encoding: 10011011B) if they are programmed using the WAIT form of their
mnemonics. The ASM286 assembler inserts a WAIT instruction only before these
specific processor control instructions��all of the numeric instructions are
automatically synchronized by the 80286 CPU and an explicit WAIT
instruction, though allowed, is not necessary.
Table A-2 lists all 80287 machine instructions in binary sequence. This
table may be used to "disassemble" instructions in unformatted memory dumps
or instructions monitored from the data bus. Users writing exception
handlers may also find this information useful to identify the offending
instruction.
OP, OP-A, OP-B: Instruction opcode, possibly split into two fields.
MOD: Same as 80286 CPU mode field.
R/M: Same as 80286 CPU register/memory field.
FORMAT: Defines memory operand
00 = short real
01 = short integer
10 = long real
11 = word integer
R: 0 = return result to stack top
1 = return result to other register
P: 0 = do not pop stack
1 = pop stack after operation
REG: register stack element
000 = stack top
001 = next on stack
010 = third stack element, etc.
���������������������������������������������������������������������������
The 80286/80287 operating in Real-Address mode will execute 8087 programs
without major modification. However, because of differences in the handling
of numeric exceptions by the 80287 NPX and the 8087 NPX, exception-handling
routines may need to be changed.
This appendix summarizes the differences between the 80287 NPX and the 8087
NPX, and provides details showing how 8087 programs can be ported to the
80287.
1. The 80287 signals exceptions through a dedicated ERROR line to the
80286. The 80287 error signal does not pass through an interrupt
controller (the 8087 INT signal does). Therefore, any
interrupt-controller-oriented instructions in numeric exception
handlers for the 8087 should be deleted.
2. The 8087 instructions FENI/FNENI and FDISI/FNDISI perform no useful
function in the 80287. If the 80287 encounters one of these opcodes in
its instruction stream, the instruction will effectively be ignored��
none of the 80287 internal states will be updated. While 8087 code
containing these instructions may be executed on the 80287, it is
unlikely that the exception-handling routines containing these
instructions will be completely portable to the 80287.
3. Interrupt vector 16 must point to the numeric exception handling
routine.
4. The ESC instruction address saved in the 80287 includes any leading
prefixes before the ESC opcode. The corresponding address saved in the
8087 does not include leading prefixes.
5. In Protected-Address mode, the format of the 80287's saved
instruction and address pointers is different than for the 8087. The
instruction opcode is not saved in Protected mode��exception handlers
will have to retrieve the opcode from memory if needed.
6. Interrupt 7 will occur in the 80286 when executing ESC instructions
with either TS (task switched) or EM (emulation) of the 80286 MSW set
(TS = 1 or EM = 1). If TS is set, then a WAIT instruction will also
cause interrupt 7. An exception handler should be included in 80287
code to handle these situations.
7. Interrupt 9 will occur if the second or subsequent words of a
floating-point operand fall outside a segment's size. Interrupt 13
will occur if the starting address of a numeric operand falls outside
a segment's size. An exception handler should be included in 80287
code to report these programming errors.
8. Except for the processor control instructions, all of the 80287
numeric instructions are automatically synchronized by the 80286 CPU��
the 80286 automatically tests the BUSY line from the 80287 to ensure
that the 80287 has completed its previous instruction before executing
the next ESC instruction. No explicit WAIT instructions are required
to assure this synchronization. For the 8087 used with 8086 and 8088
processors, explicit WAITs are required before each numeric
instruction to ensure synchronization. Although 8087 programs having
explicit WAIT instructions will execute perfectly on the 80287 without
reassembly, these WAIT instructions are unnecessary.
9. Since the 80287 does not require WAIT instructions before each
numeric instruction, the ASM286 assembler does not automatically
generate these WAIT instructions. The ASM86 assembler, however,
automatically precedes every ESC instruction with a WAIT instruction.
Although numeric routines generated using the ASM86 assembler will
generally execute correctly on the 80286/20, reassembly using ASM286
may result in a more compact code image.
The processor control instructions for the 80287 may be coded using
either a WAIT or No-WAIT form of mnemonic. The WAIT forms of these
instructions cause ASM286 to precede the ESC instruction with a
CPU WAIT instruction, in the identical manner as does ASM86.
10. A recommended way to detect the presence of an 80287 in an 80286
system (or an 8087 in an 8086 system) is shown below. It assumes that
the sytem hardware causes the data bus to be high if no 80287 is
present to drive the data lines during the FSTSW (Store 80287 Status
Word) instruction.
FND_287: FNINIT ; initialize numeric processor.
FSTSTW STAT ; store status word into location
; STAT.
MOV AX,STAT
OR AL,AL ; Zero Flag reflects result of OR.
JZ GOT_287 ; Zero in AL means 80287 is present.
;
; No 80287 Present
;
SMSW AX
OR AX,0004H ; set EM bit in Machine Status Word.
LMSW AX ; to enable software emulation of 287.
JMP CONTINUE
;
; 80287 is present in system
;
GOT_287: SMSW AX
OR AX,0002H ; set MP bit in Machine Status Word
LMSW AX ; to permit normal 80287 operation
;
; Continue . . .
;
CONTINUE: ; and off we go
An 80286/80287 design must place a pullup resistor on one of the low
eight data bus bits of the 80286 to be sure it is read as a high when
no 80287 is present.
The 80287 NPX and standard support library software, provides an
implementation of the IEEE "A Proposed Standard for Binary Floating-Point
Arithmetic," Draft 10.0, Task P754, of December 2, 1982. The 80287 Support
Library, described in 80287 Support Library Reference Manual, Order Number
122129, is an example of such a support library.
This appendix describes the relationship between the 80287 NPX and the IEEE
Standard. Where the Standard has options, Intel's choices in implementing
the 80287 are described. Where portions of the Standard are implemented
through software, this appendix indicates which modules of the 80287 Support
Library implement the Standard. Where special software in addition to the
Support Library may be required by your application, this appendix indicates
how to write this software.
This appendix contains many terms with precise technical meanings,
specified in the 754 Standard. Where these terms are used, they have been
capitalized to emphasize the precision of their meanings. The Glossary
provides the definitions for all capitalized phrases in this appendix.
Options Implemented in the 80287
The 80287 SHORT_REAL and LONG_REAL formats conform precisely to the
Standard's Single and Double Floating-Point Numbers, respectively. The 80287
TEMP_REAL format is the same as the Standard's Double Extended format. The
Standard allows a choice of Bias in representing the exponent; the 80287
uses the Bias 16383 decimal.
For the Double Extended format, the Standard contains an option for the
meaning of the minimum exponent combined with a nonzero significand. The
Bias for this special case can be either 16383, as in all the other cases,
or 16382, making the smallest exponent equivalent to the second-smallest
exponent. The 80287 uses the Bias 16382 for this case. This allows the 80287
to distinguish between Denormal numbers (integer part is zero, fraction is
nonzero, Biased exponent is 0) and Unnormal numbers of the same value (same
as the denormal except the Biased Exponent is 1).
The Standard allows flexibility in specifying which NaNs are trapping and
which are nontrapping. The EH287.LIB module of the 80287 Support Library
provides a software implementation of nontrapping NaNs, and defines one
distinction between trapping and nontrapping NaNs: If the most significant
bit of the fractional part of a NaN is 1, the NaN is nontrapping. If it is
0, the NaN is trapping.
When a masked Invalid Operation error involves two NaN inputs, the Standard
allows flexibility in choosing which NaN is output. The 80287 selects the
NaN whose absolute value is greatest.
Areas of the Standard Implemented in Software
There are five areas of the Standard that are not implemented directly in
the 80287 hardware; these areas are instead implemented in software as part
of the 80287 Support Library.
1. The Standard requires that a Normalizing Mode be provided, in which
any nonnormal operands to functions are automatically normalized
before the function is performed. The NPX provides a "Denormal
operand" exception for this case, allowing the exception handler
the opportunity to perform the normalization specified by the
Standard. The Denormal operand exception handler provided by
EH287.LIB implements the Standard's Normalizing Mode completely for
Single- and Double-precision arguments. Normalizing mode for Double
Extended operands is implemented in EH287.LIB with one non-Standard
feature, discussed in the next section.
2. The Standard specifies that in comparing two operands whose
relationship is "unordered," the equality test yield an answer of
FALSE, with no errors or exceptions. The 80287 FCOM and FTST
instructions themselves issue an Invalid Operation exception in this
case. The error handler EH287.LIB filters out this Invalid Operation
error using the following convention: Whenever an FCOM or FTST
instruction is followed by a MOV AX,AX instruction (8BC0 Hex), and
neither argument is a trapping NaN, the error handler will assume that
a Standard equality comparison was intended, and return the correct
answer with the Invalid Operation exception flag erased. Note that the
Invalid Operation exception must be unmasked for this action to
occur.
3. The Standard requires that two kinds of NaN's be provided: trapping
and nontrapping. Nontrapping NaNs will not cause further Invalid
Operation errors when they occur as operands to calculations. The NPX
hardware directly supports only trapping NaN's; the EH287.LIB
software implements nontrapping NaNs by returning the correct answer
with the Invalid Operation exception flag erased. Note that the
Invalid Operation exception must be unmasked for this action to occur.
4. The Standard requires that all functions that convert real numbers to
integer formats automatically normalize the inputs if necessary. The
integer conversion functions contained in CEL287.LIB fully meet the
Standard in this respect; the 80287 FIST instruction alone does not
perform this normalization.
5. The Standard specifies the remainder function which is provided by
mqerRMD in CEL287.LIB. The 80287 FPREM instruction returns answers
within a different range.
Additional Software to Meet the Standard
There are two cases in which additional software is required in conjunction
with the 80287 Support Library in order to meet the standard. The 80287
Support Library does not provide this software in the interest of saving
space and because the vast majority of applications will never encounter
these cases.
1. When the Invalid Operation exception is masked, Nontrapping NaNs are
not implemented fully. Likewise, the Standard's equality test for
"unordered" operands is not implemented when the Invalid Operation
exception is masked. Programmers can simulate the Standard notion of
a masked Invalid Operation exception by unmasking the 80287 Invalid
Operation exception, and providing an Invalid Operation exception
handler that supports nontrapping NaNs and the equality test, but
otherwise acts just as if the Invalid Operation exception were
masked. The 80287 Support Library Reference Manual contains examples
for programming this handler in both ASM286 andPL/M-286.
2. In Normalizing Mode, Denormal operands in the TEMP_REAL format are
converted to 0 by EH287.LIB, giving sharp Underflow to 0. The Standard
specifies that the operation be performed on the real numbers
represented by the denormals, giving gradual underflow. To correctly
perform such arithmetic while in Normalizing Mode, programmers would
have to normalize the operands into a format identical to TEMP_REAL
except for two extra exponent bits, then perform the operation on
those numbers. Thus, software must be written to handle the 17-bit
exponent explicitly.
In designing the EH287.LIB, it was felt that it would be a disadvantage to
most users to increase the size of the Normalizing routine by the amount
necessary to provide this expanded arithmetic. Because the TEMP_REAL
exponent field is so much larger than the LONG_REAL exponent field, it is
extremely unlikely that TEMP_REAL underflow will be encountered in most
applications.
If meeting the Standard is a more important criterion for your application
than the choice between Normalizing and warning modes, then you can select
warning mode (Denormal operand exceptions masked), which fully meets the
Standard.
If you do wish to implement the Normalization of denormal operands in
TEMP_REAL format using extra exponent bits, the list below indicates some
useful pointers about handling Denormal operand exceptions:
1. TEMP_REAL numbers are considered Denormal by the NPX whenever the
Biased Exponent is 0 (minimum exponent). This is true even if the
explicit integer bit of the significand is 1. Such numbers can occur
as the result of Underflow.
2. The 80287 FLD instruction can cause a Denormal Operand error if a
number is being loaded from memory. It will not cause this exception
if the number is being loaded from elsewhere in the 80287 stack.
3. The 80287 FCOM and FTST instructions will cause a Denormal Operand
exception for unnormal operands as well as for denormal operands.
4. In cases where both the Denormal Operand and Invalid Operation
exceptions occur, you will want to know which is signalled first. When
a comparison instruction operates between a nonexistent stack element
and a denormal number in 80286 memory, the D and I exceptions are
issued simultaneously. In all other situations, a Denormal Operand
exception takes precedence over a nonstack Invalid operation
exception, while a stack Invalid Operation exception takes precedence
over a Denormal Operand exception.
This glossary defines many terms that have precise technical meanings as
specified in the IEEE 754 Standard. Where these terms are used, they have
been capitalized to emphasize the precision of their meanings.
Affine Mode:
a state of the 80287, selected in the 80287 Control Word, in which
infinities are treated as having a sign. Thus, the values +INFINITY and
-INFINITY are considered different; they can be compared with finite
numbers and with each other.
Base:
(1) a term used in logarithms and exponentials. In both contexts, it is
a number that is being raised to a power. The two equations (y = log
base b of x) and (b^(y) = x) are the same.
Base:
(2) a number that defines the representation being used for a string of
digits. Base 2 is the binary representation; Base 10 is the decimal
representation; Base 16 is the hexadecimal representation. In each case,
the Base is the factor of increased significance for each succeeding
digit (working up from the bottom).
Bias:
the difference between the unsigned Integer that appears in the Exponent
field of a Floating-Point Number and the true Exponent that it
represents. To obtain the true Exponent, you must subtract the Bias from
the given Exponent. For example, the Short Real format has a Bias of 127
whenever the given Exponent is nonzero. If the 8-bit Exponent field
contains 10000011, which is 131, the true Exponent is 131-127, or +4.
Biased Exponent:
the Exponent as it appears in a Floating-Point Number, interpreted as an
unsigned, positive number. In the above example, 131 is the Biased
Exponent.
Binary Coded Decimal:
a method of storing numbers that retains a base 10 representation. Each
decimal digit occupies 4 full bits (one hexadecimal digit). The hex
values A through F (1010 through 1111) are not used. The 80287 supports
a Packed Decimal format that consists of 9 bytes of Binary Coded Decimal
(18 decimal digits) and one sign byte.
Binary Point:
an entity just like a decimal point, except that it exists in binary
numbers. Each binary digit to the right of the Binary Point is
multiplied by an increasing negative power of two.
C3��C0:
the four "condition code" bits of the 80287 Status Word. These bits are
set to certain values by the compare, test, examine, and remainder
functions of the 80287.
Characteristic:
a term used for some non-Intel computers, meaning the Exponent field of
a Floating-Point Number.
Chop:
to set the fractional part of a real number to zero, yielding the
nearest integer in the direction of zero.
Control Word:
a 16-bit 80287 register that the user can set, to determine the modes of
computation the 80287 will use, and the error interrupts that will be
enabled.
Denormal:
a special form of Floating-Point Number, produced when an Underflow
occurs. On the 80287, a Denormal is defined as a number with a Biased
Exponent that is zero. By providing a Significand with leading zeros,
the range of possible negative Exponents can be extended by the number
of bits in the Significand. Each leading zero is a bit of lost accuracy,
so the extended Exponent range is obtained by reducing significance.
Double Extended:
the Standard's term for the 80287 Temporary Real format, with more
Exponent and Significand bits than the Double (Long Real) format, and an
explicit Integer bit in the Significand.
Double Floating Point Number:
the Standard's term for the 80287's 64-bit Long Real format.
Environment:
the 14 bytes of 80287 registers affected by the FSTENV and FLDENV
instructions. It encompasses the entire state of the 80287, except for
the 8 Temporary Real numbers of the 80287 stack. Included are the
Control Word, Status Word, Tag Word, and the instruction, opcode, and
operand information provided by interrupts.
Exception:
any of the six error conditions (I, D, O, U, Z, P) signalled by the
80287.
Exponent:
(1) any power that is raised by an exponential function. For example,
the operand to the function mqerEXP is an Exponent. The Integer operand
to mqerYI2 is an Exponent.
Exponent:
(2) the field of a Floating-Point Number that indicates the magnitude of
the number. This would fall under the above more general definition (1),
except that a Bias sometimes needs to be subtracted to obtain the
correct power.
Floating-Point Number:
a sequence of data bytes that, when interpreted in a standardized way,
represents a Real number. Floating-Point Numbers are more versatile than
Integer representations in two ways. First, they include fractions.
Second, their Exponent parts allow a much wider range of magnitude than
possible with fixed-length Integer representations.
Gradual Underflow:
a method of handling the Underflow error condition that minimizes the
loss of accuracy in the result. If there is a Denormal number that
represents the correct result, that Denormal is returned. Thus, digits
are lost only to the extent of denormalization. Most computers return
zero when Underflow occurs, losing all significant digits.
Implicit Integer Bit:
a part of the Significand in the Short Real and Long Real formats that
is not explicitly given. In these formats, the entire given Significand
is considered to be to the right of the Binary Point. A single Implicit
Integer Bit to the left of the Binary Point is always 1, except in one
case. When the Exponent is the minimum (Biased Exponent is 0), the
Implicit Integer Bit is 0.
Indefinite:
a special value that is returned by functions when the inputs are such
that no other sensible answer is possible. For each Floating-Point
format there exists one Nontrapping NaN that is designated as the
Indefinite value. For binary Integer formats, the negative number
furthest from zero is often considered the Indefinite value. For the
80287 Packed Decimal format, the Indefinite value contains all 1's in
the sign byte and the uppermost digits byte.
Infinity:
a value that has greater magnitude than any Integer or any Real number.
The existence of Infinity is subject to heated philosophical debate.
However, it is often useful to consider Infinity as another number,
subject to special rules of arithmetic. All three Intel Floating-Point
formats provide representations for +INFINITY and -INFINITY. They
support two ways of dealing with Infinity: Projective (unsigned) and
Affine (signed).
Integer:
a number (positive, negative, or zero) that is finite and has no
fractional part. Integer can also mean the computer representation for
such a number: a sequence of data bytes, interpreted in a standard way.
It is perfectly reasonable for Integers to be represented in a
Floating-Point format; this is what the 80287 does whenever an Integer is
pushed onto the 80287 stack.
Invalid Operation:
the error condition for the 80287 that covers all cases not covered by
other errors. Included are 80287 stack overflow and underflow, NaN
inputs, illegal infinite inputs, out-of-range inputs, and illegal
unnormal inputs.
Long Integer:
an Integer format supported by the 80287 that consists of a 64-bit Two's
Complement quantity.
Long Real:
a Floating-Point Format supported by the 80287 that consists of a sign,
an 11-bit Biased Exponent, an Implicit Integer Bit, and a 52-bit
Significand��a total of 64 explicit bits.
Mantissa:
a term used for some non-Intel computers, meaning the Significand of a
Floating-Point Number.
Masked:
a term that applies to each of the six 80287 Exceptions I,D,Z,O,U,P. An
exception is Masked if a corresponding bit in the 80287 Control Word is
set to 1. If an exception is Masked, the 80287 will not generate an
interrupt when the error condition occurs; it will instead provide its
own error recovery.
NaN:
an abbreviation for Not a Number; a Floating-Point quantity that does
not represent any numeric or infinite quantity. NaNs should be returned
by functions that encounter serious errors. If created during a sequence
of calculations, they are transmitted to the final answer and can
contain information about where the error occurred.
Nontrapping NaN:
a NaN in which the most significant bit of the fractional part of the
Significand is 1. By convention, these NaNs can undergo certain
operations without visible error. Nontrapping NaNs are implemented for
the 80287 via the software in EH87.LIB.
Normal:
the representation of a number in a Floating-Point format in which the
Significand has an Integer bit 1 (either explicit or Implicit).
Normalizing Mode:
a state in which nonnormal inputs are automatically converted to normal
inputs whenever they are used in arithmetic. Normalizing Mode is
implemented for the 80287 via the software in EH87.LIB.
NPX:
Numeric Processor Extension. This is the 80287.
Overflow:
an error condition in which the correct answer is finite, but has
magnitude too great to be represented in the destination format.
Packed Decimal:
an Integer format supported by the 80287. A Packed Decimal number is a
10-byte quantity, with nine bytes of 18 Binary Coded Decimal digits, and
one byte for the sign.
Pop:
to remove from a stack the last item that was placed on the stack.
Precision Control:
an option, programmed through the 80287 Control Word, that allows all
80287 arithmetic to be performed with reduced precision. Because no
speed advantage results from this option, its only use is for strict
compatibility with the IEEE Standard, and with other computer
systems.
Precision Exception:
an 80287 error condition that results when a calculation does not return
an exact answer. This exception is usually Masked and ignored; it is
used only in extremely critical applications, when the user must know if
the results are exact.
Projective Mode:
a state of the 80287, selected in the 80287 Control Word, in which
infinities are treated as not having a sign. Thus the values +INFINITY
and -INFINITY are considered the same. Certain operations, such as
comparison to finite numbers, are illegal in Projective Mode but legal
in Affine Mode. Thus Projective Mode gives you a greater degree of error
control over infinite inputs.
Pseudo Zero:
a special value of the Temporary Real format. It is a number with a zero
significand and an Exponent that is neither all zeros or all ones.
Pseudo zeros can come about as the result of multiplication of two
Unnormal numbers; but they are very rare.
Real:
any finite value (negative, positive, or zero) that can be represented
by a decimal expansion. The fractional part of the decimal expansion can
contain an infinite number of digits. Reals can be represented as the
points of a line marked off like a ruler. The term Real can also refer
to a Floating-Point Number that represents a Real value.
Short Integer:
an Integer format supported by the 80287 that consists of a 32-bit Two's
Complement quantity. Short Integer is not the shortest 80287 Integer
format��the 16-bit Word Integer is.
Short Real:
a Floating-Point Format supported by the 80287, which consists of a
sign, an 8-bit Biased Exponent, an Implicit Integer Bit, and a 23-bit
Significand��a total of 32 explicit bits.
Significand:
the part of a Floating-Point Number that consists of the most
significant nonzero bits of the number, if the number were written out
in an unlimited binary format. The Significand alone is considered to
have a Binary Point after the first (possibly Implicit) bit; the Binary
Point is then moved according to the value of the Exponent.
Single Extended:
a Floating-Point format, required by the Standard, that provides greater
precision than Single; it also provides an explicit Integer Significand
bit. The 80287's Temporary Real format meets the Single Extended
requirement as well as the Double Extended requirement.
Single Floating-Point Number:
the Standard's term for the 80287's 32-bit Short Real format.
Standard:
"a Proposed Standard for Binary Floating-Point Arithmetic," Draft 10.0
of IEEE Task P754, December 2, 1982.
Status Word:
A 16-bit 80287 register that can be manually set, but which is usually
controlled by side effects to 80287 instructions. It contains condition
codes, the 80287 stack pointer, busy and interrupt bits, and error
flags.
Tag Word:
a 16-bit 80287 register that is automatically maintained by the 80287.
For each space in the 80287 stack, it tells if the space is occupied by
a number; if so, it gives information about what kind of number.
Temporary Real:
the main Floating-Point Format used by the 80287. It consists of a sign,
a 15-bit Biased Exponent, and a Significand with an explicit Integer bit
and 63 fractional-part bits.
Transcendental:
one of a class of functions for which polynomial formulas are always
approximate, never exact for more than isolated values. The 80287
supports trigonometric, exponential, and logarithmic functions; all are
Transcendental.
Trapping NaN:
a NaN that causes an I error whenever it enters into a calculation or
comparison, even a nonordered comparison.
Two's Complement:
a method of representing Integers. If the uppermost bit is 0, the number
is considered positive, with the value given by the rest of the bits. If
the uppermost bit is 1, the number is negative, with the value obtained
by subtracting (2^(bit count)) from all the given bits. For example, the
8-bit number 11111100 is -4, obtained by subtracting 2^(8) from 252.
Unbiased Exponent:
the true value that tells how far and in which direction to move the
Binary Point of the Significand of a Floating-Point Number. For example,
if a Short Real Exponent is 131, we subtract the Bias 127 to obtain the
Unbiased Exponent +4. Thus, the Real number being represented is the
Significand with the Binary Point shifted 4 bits to the right.
Underflow:
an error condition in which the correct answer is nonzero, but has a
magnitude too small to be represented as a Normal number in the
destination Floating-Point format. The Standard specifies that an
attempt be made to represent the number as a Denormal.
Unmasked:
a term that applies to each of the six 80287 Exceptions: I,D,Z,O,U,P. An
exception is Unmasked if a corresponding bit in the 80287 Control Word
is set to 0. If an exception is Unmasked, the 80287 will generate an
interrupt when the error condition occurs. You can provide an interrupt
routine that customizes your error recovery.
Unnormal:
a Temporary Real representation in which the explicit Integer bit of the
Significand is zero, and the exponent is nonzero. We consider Unnormal
numbers distinct from Denormal numbers.
Word Integer:
an Integer format supported by both the 80286 and the 80287 that
consists of a 16-bit Two's Complement quantity.
Zero divide:
an error condition in which the inputs are finite, but the correct
answer, even with an unlimited exponent, has infinite magnitude.
Index
���������������������������������������������������������������������������
B
���������������������������������������������������������������������������
Binary Integers
C
���������������������������������������������������������������������������
Comparison Instructions
Compatibility Between the 80287 and 8087
Computation Fundamentals
Concurrent (80286 and 80287) Processing
Condition Codes Interpretation
Constant Instructions
Control Word
D
���������������������������������������������������������������������������
Data Synchronization
Data Transfer Instructions
Data Types and Formats
Binary Integers
Decimal Integers
Encoding of Data Type
Infinity Control
Precision Control
Real Numbers
Rounding Control
Decimal Integers
Denormalization
Denormalized Operand
Denormals
Destination Operands
E
���������������������������������������������������������������������������
EM (Emulation Mode) Bit in 80286
Emulation of 80287
Encoding of Data Types
Error Synchronization
Exception Handling Examples
Exception Handling, Numeric Processing
Exceptions, Numeric
Automatic Exception Handling
Handling Numeric Errors
Inexact Result
Invalid Operation
Masked Response
Numeric Overflow and Underflow
Software Exception Handling
Zero Divisor
Exponent Field
F
���������������������������������������������������������������������������
F2XM1 (Exponentiation)
FADD (Add Real)
FADDP (Add Real and POP)
FABS (Absolute Value)
FBLD (Packed Decimal��BCD��Load)
FBSTP (Packed Decimal��BCD��Store and Pop)
FCHS (Change Signs)
FCLEX/FNCLEX (Clear Exceptions)
FCOM (Compare Real)
FCOMP (Compare Real and Pop)
FCOMPP (Compare Real and Pop Twice)
FDECSTP (Decrement Stack Pointer)
FDISI/FNDISI
FDIV (Divide Real)
FDIV DWORD PTR (Division, Single Precision)
FDIVP (Divide Real and Pop)
FDIVR (Divide Real Reversed)
FDIVRP (Divide Real Reversed and Pop)
FENI/FNENI
FFREE (Free Register)
FIADD (Integer Add)
FICOM (Integer Compare)
FICOMP (Integer Compare and Pop)
FIDIV (Integer Divide)
FIDIVR (Integer Divide Reversed)
FILD (Integer Load)
FIMUL (Integer Multiply)
FINCSTP (Increment Stack Pointer)
FINIT/FNINIT (Initialize Processor)
FIST (Integer Store)
FISTP (Integer Store and Pop)
FISUB (Integer Subtract)
FISUBR (Integer Subtract Reversed)
FLD (Load Real)
FLD1 (Load One)
FLDCW (Load Control Word)
FLDENV (Load Environment)
FLDL2E (Load Log Base 2 of e)
FLDL2T (Load Log Base 2 of 10)
FLDLG2 (Load Log Base 3 10 of 2)
FLDLN2 (Load Log Base e of 2)
FLDPI (Load PI)
FLDZ (Load Zero)
FMUL (Multiply Real)
FMULP (Multiply Real and Pop)
FNOP (No Operation)
FPATAN (Partial Arctangant)
FPREM (Partial Remainder)
FPTAN (Partial Tangent)
FRNDINT (Round to Integer)
FRSTOR (Restore State)
FSAVE, FNSAVE (Save State)
FSCALE (Scale)
FSETPM (Set Protected Mode)
FSQRT (Square Root)
FST (Store Real)
FSTCW/FNSTCW (Store Control Word)
FSTENV/FNSTENV (Store Environment)
FSTP (Store Real and Pop)
FSTSW/FNSTSW (Store Status Word)
FSTSW AX, FNSTSW AX (Store Status Word in AX)
FSUB (Subtract Real)
FSUBP (Subtract Real and Pop)
FSUBR (Subtract Real Reversed)
FSUBRP (Subtract Real Reversed and Pop)
FTST (Test)
FWAIT (CPU Wait)
FXAM (Examine)
FXCH (Exchange Registers)
FXTRACT (Extract Exponent and Significand)
FYL2X (Logarithm��of x)
FYL2XP1 (Logarithm��of x+1)
G
���������������������������������������������������������������������������
GET$REAL$ERROR (Store, then Clear, Exception Flags)
H
���������������������������������������������������������������������������
Handling Numeric Errors
Hardware Interface
I
���������������������������������������������������������������������������
I/O Locations (Dedicated and Reserved)
IEEE P754 Standard, Implementation
Indefinite
Inexact Result
Infinity
Infinity Control
INIT$REAL$MATH$UNIT (Initialize Processor Procedure)
Initialization and Control
Instruction Coding and Decoding
Instruction Execution Times
Instruction Length
Integer Bit
Introduction to Numeric Processor 80287
Invalid Operation
L
���������������������������������������������������������������������������
Long Integer Format
Long Real Format
M
���������������������������������������������������������������������������
Machine Instruction Encoding and Decoding
Masked Response
MP (Math Present) Flag
N
���������������������������������������������������������������������������
NaN (Not a Number)
NO-WAIT FORM
Nonnormal Real Numbers
Number System
Numeric Exceptions
Numeric Operands
Numeric Overflow and Underflow
Numeric Processor Overview
O
���������������������������������������������������������������������������
Output Format
Overflow
P
���������������������������������������������������������������������������
Packed Decimal Notation
Precision Control
PLM-286
Pointers (INstruction/Data)
Processor Control Instructions
Programming Examples,
Comparative
Conditional Branching
Exception Handling
Floating Point to ASCII Conversion
Function Partitioning
Special Instructions
Programming Interface
Pseudo zeros and zeros
R
���������������������������������������������������������������������������
Real Number Range
Real Numbers
Recognizing the 80287
Register Stack
RESTORE$REAL$STATUS (Restore Processor State)
Rounding Control
S
���������������������������������������������������������������������������
SAVE$REAL$STATUS (Save Processor State)
Scaling
SET$REAL$MODE (Set Exception Masks,Rounding Precision, and Infinity
Controls)
Short Integer Format
Short Real Format
Significand
Software Exception Handling
Source Operands
Status Word
T
���������������������������������������������������������������������������
Tag Word
Temporary Real Format
Transcendental Instructions
Trigonometric Calculation Examples
U
���������������������������������������������������������������������������
Underflow
Unnormals
Upgradability
W
���������������������������������������������������������������������������
WAIT Form
Word Integer Format
Z
���������������������������������������������������������������������������
Zero Divisor