* MOTOROLA MICROPROCESSOR & MEMORY TECHNOLOGY GROUP
* M68000 Hi-Performance Microprocessor Division
* M68040 Software Package
*
* M68040 Software Package Copyright (c) 1993, 1994 Motorola Inc.
* All rights reserved.
*
* THE SOFTWARE is provided on an "AS IS" basis and without warranty.
* To the maximum extent permitted by applicable law,
* MOTOROLA DISCLAIMS ALL WARRANTIES WHETHER EXPRESS OR IMPLIED,
* INCLUDING IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
* PARTICULAR PURPOSE and any warranty against infringement with
* regard to the SOFTWARE (INCLUDING ANY MODIFIED VERSIONS THEREOF)
* and any accompanying written materials.
*
* To the maximum extent permitted by applicable law,
* IN NO EVENT SHALL MOTOROLA BE LIABLE FOR ANY DAMAGES WHATSOEVER
* (INCLUDING WITHOUT LIMITATION, DAMAGES FOR LOSS OF BUSINESS
* PROFITS, BUSINESS INTERRUPTION, LOSS OF BUSINESS INFORMATION, OR
* OTHER PECUNIARY LOSS) ARISING OF THE USE OR INABILITY TO USE THE
* SOFTWARE. Motorola assumes no responsibility for the maintenance
* and support of the SOFTWARE.
*
* You are hereby granted a copyright license to use, modify, and
* distribute the SOFTWARE so long as this entire notice is retained
* without alteration in any modified and/or redistributed versions,
* and that such modified versions are clearly identified as such.
* No licenses are granted by implication, estoppel or otherwise
* under any patents or trademarks of Motorola, Inc.
*
* slogn.sa 3.1 12/10/90
*
* slogn computes the natural logarithm of an
* input value. slognd does the same except the input value is a
* denormalized number. slognp1 computes log(1+X), and slognp1d
* computes log(1+X) for denormalized X.
*
* Input: Double-extended value in memory location pointed to by address
* register a0.
*
* Output: log(X) or log(1+X) returned in floating-point register Fp0.
*
* Accuracy and Monotonicity: The returned result is within 2 ulps in
* 64 significant bit, i.e. within 0.5001 ulp to 53 bits if the
* result is subsequently rounded to double precision. The
* result is provably monotonic in double precision.
*
* Speed: The program slogn takes approximately 190 cycles for input
* argument X such that |X-1| >= 1/16, which is the usual
* situation. For those arguments, slognp1 takes approximately
* 210 cycles. For the less common arguments, the program will
* run no worse than 10% slower.
*
* Algorithm:
* LOGN:
* Step 1. If |X-1| < 1/16, approximate log(X) by an odd polynomial in
* u, where u = 2(X-1)/(X+1). Otherwise, move on to Step 2.
*
* Step 2. X = 2**k * Y where 1 <= Y < 2. Define F to be the first seven
* significant bits of Y plus 2**(-7), i.e. F = 1.xxxxxx1 in base
* 2 where the six "x" match those of Y. Note that |Y-F| <= 2**(-7).
*
* Step 3. Define u = (Y-F)/F. Approximate log(1+u) by a polynomial in u,
* log(1+u) = poly.
*
* Step 4. Reconstruct log(X) = log( 2**k * Y ) = k*log(2) + log(F) + log(1+u)
* by k*log(2) + (log(F) + poly). The values of log(F) are calculated
* beforehand and stored in the program.
*
* lognp1:
* Step 1: If |X| < 1/16, approximate log(1+X) by an odd polynomial in
* u where u = 2X/(2+X). Otherwise, move on to Step 2.
*
* Step 2: Let 1+X = 2**k * Y, where 1 <= Y < 2. Define F as done in Step 2
* of the algorithm for LOGN and compute log(1+X) as
* k*log(2) + log(F) + poly where poly approximates log(1+u),
* u = (Y-F)/F.
*
* Implementation Notes:
* Note 1. There are 64 different possible values for F, thus 64 log(F)'s
* need to be tabulated. Moreover, the values of 1/F are also
* tabulated so that the division in (Y-F)/F can be performed by a
* multiplication.
*
* Note 2. In Step 2 of lognp1, in order to preserved accuracy, the value
* Y-F has to be calculated carefully when 1/2 <= X < 3/2.
*
* Note 3. To fully exploit the pipeline, polynomials are usually separated
* into two parts evaluated independently before being added up.
*
slogn IDNT 2,1 Motorola 040 Floating Point Software Package
xdef slognd
slognd:
*--ENTRY POINT FOR LOG(X) FOR DENORMALIZED INPUT
MOVE.L #-100,ADJK(a6) ...INPUT = 2^(ADJK) * FP0
*----normalize the input value by left shifting k bits (k to be determined
*----below), adjusting exponent and storing -k to ADJK
*----the value TWOTO100 is no longer needed.
*----Note that this code assumes the denormalized input is NON-ZERO.
MoveM.L D2-D7,-(A7) ...save some registers
Clr.L D3 ...D3 is exponent of smallest norm. #
Move.L 4(A0),D4
Move.L 8(A0),D5 ...(D4,D5) is (Hi_X,Lo_X)
Clr.L D2 ...D2 used for holding K
TST.L D0 ...CHECK IF X IS NEGATIVE
BLT.W LOGNEG ...LOG OF NEGATIVE ARGUMENT IS INVALID
CMP2.L BOUNDS1,D0 ...X IS POSITIVE, CHECK IF X IS NEAR 1
BCC.W LOGNEAR1 ...BOUNDS IS ROUGHLY [15/16, 17/16]
LOGMAIN:
*--THIS SHOULD BE THE USUAL CASE, X NOT VERY CLOSE TO 1
*--X = 2^(K) * Y, 1 <= Y < 2. THUS, Y = 1.XXXXXXXX....XX IN BINARY.
*--WE DEFINE F = 1.XXXXXX1, I.E. FIRST 7 BITS OF Y AND ATTACH A 1.
*--THE IDEA IS THAT LOG(X) = K*LOG2 + LOG(Y)
*-- = K*LOG2 + LOG(F) + LOG(1 + (Y-F)/F).
*--NOTE THAT U = (Y-F)/F IS VERY SMALL AND THUS APPROXIMATING
*--LOG(1+U) CAN BE VERY EFFICIENT.
*--ALSO NOTE THAT THE VALUE 1/F IS STORED IN A TABLE SO THAT NO
*--DIVISION IS NEEDED TO CALCULATE (Y-F)/F.
*--GET K, Y, F, AND ADDRESS OF 1/F.
ASR.L #8,D0
ASR.L #8,D0 ...SHIFTED 16 BITS, BIASED EXPO. OF X
SUBI.L #$3FFF,D0 ...THIS IS K
ADD.L ADJK(a6),D0 ...ADJUST K, ORIGINAL INPUT MAY BE DENORM.
LEA LOGTBL,A0 ...BASE ADDRESS OF 1/F AND LOG(F)
FMOVE.L D0,FP1 ...CONVERT K TO FLOATING-POINT FORMAT
*--WHILE THE CONVERSION IS GOING ON, WE GET F AND ADDRESS OF 1/F
MOVE.L #$3FFF0000,X(a6) ...X IS NOW Y, I.E. 2^(-K)*X
MOVE.L XFRAC(a6),FFRAC(a6)
ANDI.L #$FE000000,FFRAC(a6) ...FIRST 7 BITS OF Y
ORI.L #$01000000,FFRAC(a6) ...GET F: ATTACH A 1 AT THE EIGHTH BIT
MOVE.L FFRAC(a6),D0 ...READY TO GET ADDRESS OF 1/F
ANDI.L #$7E000000,D0
ASR.L #8,D0
ASR.L #8,D0
ASR.L #4,D0 ...SHIFTED 20, D0 IS THE DISPLACEMENT
ADDA.L D0,A0 ...A0 IS THE ADDRESS FOR 1/F
FMOVE.X X(a6),FP0
move.l #$3fff0000,F(a6)
clr.l F+8(a6)
FSUB.X F(a6),FP0 ...Y-F
FMOVEm.X FP2/fp3,-(sp) ...SAVE FP2 WHILE FP0 IS NOT READY
*--SUMMARY: FP0 IS Y-F, A0 IS ADDRESS OF 1/F, FP1 IS K
*--REGISTERS SAVED: FPCR, FP1, FP2
LP1CONT1:
*--AN RE-ENTRY POINT FOR LOGNP1
FMUL.X (A0),FP0 ...FP0 IS U = (Y-F)/F
FMUL.X LOGOF2,FP1 ...GET K*LOG2 WHILE FP0 IS NOT READY
FMOVE.X FP0,FP2
FMUL.X FP2,FP2 ...FP2 IS V=U*U
FMOVE.X FP1,KLOG2(a6) ...PUT K*LOG2 IN MEMORY, FREE FP1
*--LOG(1+U) IS APPROXIMATED BY
*--U + V*(A1+U*(A2+U*(A3+U*(A4+U*(A5+U*A6))))) WHICH IS
*--[U + V*(A1+V*(A3+V*A5))] + [U*V*(A2+V*(A4+V*A6))]
fmove.l d1,fpcr
FADD.X KLOG2(a6),FP0 ...FINAL ADD
bra t_frcinx
LOGNEAR1:
*--REGISTERS SAVED: FPCR, FP1. FP0 CONTAINS THE INPUT.
FMOVE.X FP0,FP1
FSUB.S one,FP1 ...FP1 IS X-1
FADD.S one,FP0 ...FP0 IS X+1
FADD.X FP1,FP1 ...FP1 IS 2(X-1)
*--LOG(X) = LOG(1+U/2)-LOG(1-U/2) WHICH IS AN ODD POLYNOMIAL
*--IN U, U = 2(X-1)/(X+1) = FP1/FP0
LP1CONT2:
*--THIS IS AN RE-ENTRY POINT FOR LOGNP1
FDIV.X FP0,FP1 ...FP1 IS U
FMOVEm.X FP2/fp3,-(sp) ...SAVE FP2
*--REGISTERS SAVED ARE NOW FPCR,FP1,FP2,FP3
*--LET V=U*U, W=V*V, CALCULATE
*--U + U*V*(B1 + V*(B2 + V*(B3 + V*(B4 + V*B5)))) BY
*--U + U*V*( [B1 + W*(B3 + W*B5)] + [V*(B2 + W*B4)] )
FMOVE.X FP1,FP0
FMUL.X FP0,FP0 ...FP0 IS V
FMOVE.X FP1,SAVEU(a6) ...STORE U IN MEMORY, FREE FP1
FMOVE.X FP0,FP1
FMUL.X FP1,FP1 ...FP1 IS W
fmove.l d1,fpcr
FADD.X SAVEU(a6),FP0
bra t_frcinx
rts
LOGNEG:
*--REGISTERS SAVED FPCR. LOG(-VE) IS INVALID
bra t_operr
xdef slognp1d
slognp1d:
*--ENTRY POINT FOR LOG(1+Z) FOR DENORMALIZED INPUT
* Simply return the denorm
bra t_extdnrm
xdef slognp1
slognp1:
*--ENTRY POINT FOR LOG(1+X) FOR X FINITE, NON-ZERO, NOT NAN'S
FMOVE.X (A0),FP0 ...LOAD INPUT
fabs.x fp0 ;test magnitude
fcmp.x LTHOLD,fp0 ;compare with min threshold
fbgt.w LP1REAL ;if greater, continue
fmove.l #0,fpsr ;clr N flag from compare
fmove.l d1,fpcr
fmove.x (a0),fp0 ;return signed argument
bra t_frcinx
LP1REAL:
FMOVE.X (A0),FP0 ...LOAD INPUT
CLR.L ADJK(a6)
FMOVE.X FP0,FP1 ...FP1 IS INPUT Z
FADD.S one,FP0 ...X := ROUND(1+Z)
FMOVE.X FP0,X(a6)
MOVE.W XFRAC(a6),XDCARE(a6)
MOVE.L X(a6),D0
TST.L D0
BLE.W LP1NEG0 ...LOG OF ZERO OR -VE
CMP2.L BOUNDS2,D0
BCS.W LOGMAIN ...BOUNDS2 IS [1/2,3/2]
*--IF 1+Z > 3/2 OR 1+Z < 1/2, THEN X, WHICH IS ROUNDING 1+Z,
*--CONTAINS AT LEAST 63 BITS OF INFORMATION OF Z. IN THAT CASE,
*--SIMPLY INVOKE LOG(X) FOR LOG(1+Z).
LP1NEAR1:
*--NEXT SEE IF EXP(-1/16) < X < EXP(1/16)
CMP2.L BOUNDS1,D0
BCS.B LP1CARE
LP1ONE16:
*--EXP(-1/16) < X < EXP(1/16). LOG(1+Z) = LOG(1+U/2) - LOG(1-U/2)
*--WHERE U = 2Z/(2+Z) = 2Z/(1+X).
FADD.X FP1,FP1 ...FP1 IS 2Z
FADD.S one,FP0 ...FP0 IS 1+X
*--U = FP1/FP0
BRA.W LP1CONT2
LP1CARE:
*--HERE WE USE THE USUAL TABLE DRIVEN APPROACH. CARE HAS TO BE
*--TAKEN BECAUSE 1+Z CAN HAVE 67 BITS OF INFORMATION AND WE MUST
*--PRESERVE ALL THE INFORMATION. BECAUSE 1+Z IS IN [1/2,3/2],
*--THERE ARE ONLY TWO CASES.
*--CASE 1: 1+Z < 1, THEN K = -1 AND Y-F = (2-F) + 2Z
*--CASE 2: 1+Z > 1, THEN K = 0 AND Y-F = (1-F) + Z
*--ON RETURNING TO LP1CONT1, WE MUST HAVE K IN FP1, ADDRESS OF
*--(1/F) IN A0, Y-F IN FP0, AND FP2 SAVED.