* MOTOROLA MICROPROCESSOR & MEMORY TECHNOLOGY GROUP
* M68000 Hi-Performance Microprocessor Division
* M68040 Software Package
*
* M68040 Software Package Copyright (c) 1993, 1994 Motorola Inc.
* All rights reserved.
*
* THE SOFTWARE is provided on an "AS IS" basis and without warranty.
* To the maximum extent permitted by applicable law,
* MOTOROLA DISCLAIMS ALL WARRANTIES WHETHER EXPRESS OR IMPLIED,
* INCLUDING IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
* PARTICULAR PURPOSE and any warranty against infringement with
* regard to the SOFTWARE (INCLUDING ANY MODIFIED VERSIONS THEREOF)
* and any accompanying written materials.
*
* To the maximum extent permitted by applicable law,
* IN NO EVENT SHALL MOTOROLA BE LIABLE FOR ANY DAMAGES WHATSOEVER
* (INCLUDING WITHOUT LIMITATION, DAMAGES FOR LOSS OF BUSINESS
* PROFITS, BUSINESS INTERRUPTION, LOSS OF BUSINESS INFORMATION, OR
* OTHER PECUNIARY LOSS) ARISING OF THE USE OR INABILITY TO USE THE
* SOFTWARE. Motorola assumes no responsibility for the maintenance
* and support of the SOFTWARE.
*
* You are hereby granted a copyright license to use, modify, and
* distribute the SOFTWARE so long as this entire notice is retained
* without alteration in any modified and/or redistributed versions,
* and that such modified versions are clearly identified as such.
* No licenses are granted by implication, estoppel or otherwise
* under any patents or trademarks of Motorola, Inc.
*
* satan.sa 3.3 12/19/90
*
* The entry point satan computes the arctagent of an
* input value. satand does the same except the input value is a
* denormalized number.
*
* Input: Double-extended value in memory location pointed to by address
* register a0.
*
* Output: Arctan(X) returned in floating-point register Fp0.
*
* Accuracy and Monotonicity: The returned result is within 2 ulps in
* 64 significant bit, i.e. within 0.5001 ulp to 53 bits if the
* result is subsequently rounded to double precision. The
* result is provably monotonic in double precision.
*
* Speed: The program satan takes approximately 160 cycles for input
* argument X such that 1/16 < |X| < 16. For the other arguments,
* the program will run no worse than 10% slower.
*
* Algorithm:
* Step 1. If |X| >= 16 or |X| < 1/16, go to Step 5.
*
* Step 2. Let X = sgn * 2**k * 1.xxxxxxxx...x. Note that k = -4, -3,..., or 3.
* Define F = sgn * 2**k * 1.xxxx1, i.e. the first 5 significant bits
* of X with a bit-1 attached at the 6-th bit position. Define u
* to be u = (X-F) / (1 + X*F).
*
* Step 3. Approximate arctan(u) by a polynomial poly.
*
* Step 4. Return arctan(F) + poly, arctan(F) is fetched from a table of values
* calculated beforehand. Exit.
*
* Step 5. If |X| >= 16, go to Step 7.
*
* Step 6. Approximate arctan(X) by an odd polynomial in X. Exit.
*
* Step 7. Define X' = -1/X. Approximate arctan(X') by an odd polynomial in X'.
* Arctan(X) = sign(X)*Pi/2 + arctan(X'). Exit.
*
satan IDNT 2,1 Motorola 040 Floating Point Software Package
*--THE MOST LIKELY CASE, |X| IN [1/16, 16). WE USE TABLE TECHNIQUE
*--THE IDEA IS ATAN(X) = ATAN(F) + ATAN( [X-F] / [1+XF] ).
*--SO IF F IS CHOSEN TO BE CLOSE TO X AND ATAN(F) IS STORED IN
*--A TABLE, ALL WE NEED IS TO APPROXIMATE ATAN(U) WHERE
*--U = (X-F)/(1+XF) IS SMALL (REMEMBER F IS CLOSE TO X). IT IS
*--TRUE THAT A DIVIDE IS NOW NEEDED, BUT THE APPROXIMATION FOR
*--ATAN(U) IS A VERY SHORT POLYNOMIAL AND THE INDEXING TO
*--FETCH F AND SAVING OF REGISTERS CAN BE ALL HIDED UNDER THE
*--DIVIDE. IN THE END THIS METHOD IS MUCH FASTER THAN A TRADITIONAL
*--ONE. NOTE ALSO THAT THE TRADITIONAL SCHEME THAT APPROXIMATE
*--ATAN(X) DIRECTLY WILL NEED TO USE A RATIONAL APPROXIMATION
*--(DIVISION NEEDED) ANYWAY BECAUSE A POLYNOMIAL APPROXIMATION
*--WILL INVOLVE A VERY LONG POLYNOMIAL.
*--NOW WE SEE X AS +-2^K * 1.BBBBBBB....B <- 1. + 63 BITS
*--WE CHOSE F TO BE +-2^K * 1.BBBB1
*--THAT IS IT MATCHES THE EXPONENT AND FIRST 5 BITS OF X, THE
*--SIXTH BITS IS SET TO BE 1. SINCE K = -4, -3, ..., 3, THERE
*--ARE ONLY 8 TIMES 16 = 2^7 = 128 |F|'S. SINCE ATAN(-|F|) IS
*-- -ATAN(|F|), WE NEED TO STORE ONLY ATAN(|F|).
ATANMAIN:
CLR.W XDCARE(a6) ...CLEAN UP X JUST IN CASE
ANDI.L #$F8000000,XFRAC(a6) ...FIRST 5 BITS
ORI.L #$04000000,XFRAC(a6) ...SET 6-TH BIT TO 1
CLR.L XFRACLO(a6) ...LOCATION OF X IS NOW F
FMOVE.X FP0,FP1 ...FP1 IS X
FMUL.X X(a6),FP1 ...FP1 IS X*F, NOTE THAT X*F > 0
FSUB.X X(a6),FP0 ...FP0 IS X-F
FADD.S #:3F800000,FP1 ...FP1 IS 1 + X*F
FDIV.X FP1,FP0 ...FP0 IS U = (X-F)/(1+X*F)
*--WHILE THE DIVISION IS TAKING ITS TIME, WE FETCH ATAN(|F|)
*--CREATE ATAN(F) AND STORE IT IN ATANF, AND
*--SAVE REGISTERS FP2.
MOVE.L d2,-(a7) ...SAVE d2 TEMPORARILY
MOVE.L d0,d2 ...THE EXPO AND 16 BITS OF X
ANDI.L #$00007800,d0 ...4 VARYING BITS OF F'S FRACTION
ANDI.L #$7FFF0000,d2 ...EXPONENT OF F
SUBI.L #$3FFB0000,d2 ...K+4
ASR.L #1,d2
ADD.L d2,d0 ...THE 7 BITS IDENTIFYING F
ASR.L #7,d0 ...INDEX INTO TBL OF ATAN(|F|)
LEA ATANTBL,a1
ADDA.L d0,a1 ...ADDRESS OF ATAN(|F|)
MOVE.L (a1)+,ATANF(a6)
MOVE.L (a1)+,ATANFHI(a6)
MOVE.L (a1)+,ATANFLO(a6) ...ATANF IS NOW ATAN(|F|)
MOVE.L X(a6),d0 ...LOAD SIGN AND EXPO. AGAIN
ANDI.L #$80000000,d0 ...SIGN(F)
OR.L d0,ATANF(a6) ...ATANF IS NOW SIGN(F)*ATAN(|F|)
MOVE.L (a7)+,d2 ...RESTORE d2
*--THAT'S ALL I HAVE TO DO FOR NOW,
*--BUT ALAS, THE DIVIDE IS STILL CRANKING!
*--U IN FP0, WE ARE NOW READY TO COMPUTE ATAN(U) AS
*--U + A1*U*V*(A2 + V*(A3 + V)), V = U*U
*--THE POLYNOMIAL MAY LOOK STRANGE, BUT IS NEVERTHELESS CORRECT.
*--THE NATURAL FORM IS U + U*V*(A1 + V*(A2 + V*A3))
*--WHAT WE HAVE HERE IS MERELY A1 = A3, A2 = A1/A3, A3 = A2/A3.
*--THE REASON FOR THIS REARRANGEMENT IS TO MAKE THE INDEPENDENT
*--PARTS A1*U*V AND (A2 + ... STUFF) MORE LOAD-BALANCED
FADD.X FP1,FP0 ...ATAN(U), FP1 RELEASED
FMOVE.L d1,FPCR ;restore users exceptions
FADD.X ATANF(a6),FP0 ...ATAN(X)
bra t_frcinx
ATANBORS:
*--|X| IS IN d0 IN COMPACT FORM. FP1, d0 SAVED.
*--FP0 IS X AND |X| <= 1/16 OR |X| >= 16.
CMPI.L #$3FFF8000,d0
BGT.W ATANBIG ...I.E. |X| >= 16
ATANSM:
*--|X| <= 1/16
*--IF |X| < 2^(-40), RETURN X AS ANSWER. OTHERWISE, APPROXIMATE
*--ATAN(X) BY X + X*Y*(B1+Y*(B2+Y*(B3+Y*(B4+Y*(B5+Y*B6)))))
*--WHICH IS X + X*Y*( [B1+Z*(B3+Z*B5)] + [Y*(B2+Z*(B4+Z*B6)] )
*--WHERE Y = X*X, AND Z = Y*Y.
CMPI.L #$3FD78000,d0
BLT.W ATANTINY
*--COMPUTE POLYNOMIAL
FMUL.X FP0,FP0 ...FP0 IS Y = X*X
*--APPROXIMATE ATAN(-1/X) BY
*--X'+X'*Y*(C1+Y*(C2+Y*(C3+Y*(C4+Y*C5)))), X' = -1/X, Y = X'*X'
*--THIS CAN BE RE-WRITTEN AS
*--X'+X'*Y*( [C1+Z*(C3+Z*C5)] + [Y*(C2+Z*C4)] ), Z = Y*Y.
FMOVE.S #:BF800000,FP1 ...LOAD -1
FDIV.X FP0,FP1 ...FP1 IS -1/X
*--DIVIDE IS STILL CRANKING
FMOVE.X FP1,FP0 ...FP0 IS X'
FMUL.X FP0,FP0 ...FP0 IS Y = X'*X'
FMOVE.X FP1,X(a6) ...X IS REALLY X'