Aalice.349
net.chess
utzoo!decvax!ucbvax!mhtsa!alice!ken
Thu Jan  7 14:31:21 1982
chess strength
This is in response to Gillogly's
recent comments on computer chess
strength (and projections of same).

I performed the following experiment:
There are 7 programs named P3, P4, ... P9.

P3 searches the standard (Chess 4.X clone)
3 plies and then enters a quiescence
capture search with full evaluation
at the leaves. (Getting out of check
does not count as a ply.)

P4, P5 ... are identical programs that
search progressively deeper.

Program Px played 20 games with Py
(all x,y; x not equal y). Each program
played both sides of 10
randomly selected starting opening
positions. The positions were chosen
by random number generator selecting
positions from ECO (Encyclopedia of
Chess Openings) that had 10 moves
played by each side and were judged
(by mini-max) to be "equal" or "unclear"
or "with compensation".

Altogether there were 420 games played.
((7x7 - 7)/2 * 20.)

The performance rating of each program
was calculated (on an arbitrary base)
by calculating what rating a program
needed to obtain the expected performance
actually obtained. This was then iterated
until it converged to 1 rating point
for all programs.

Here is the crosstable:

       P3      P4      P5      P6      P7      P8      P9
P3      --      8       0       0       0       0       0+
P4      12      --      5       0+      0       0       0
P5      20      15      --      3+      3       0+      0
P6      20      19+     16+     --      4       1+      1+
P7      20      20      17      16      --      5       4
P8      20      20      19+     18+     15      --      5+
P9      19+     20      20      18+     16      14+     --

The performance rating of the programs
(with P4 chosen as the base of 0)

P3      -134
P4      0000
P5      0335
P6      0591
P7      0796
P8      0973
P9      1093

If you plot the performance data,
you will note that P3 performs
better than expected. This is true. Due
to edge effects in my chess hardware,
P3 performs approximately 3.2 plies on
the average. This is why P4 was chosen
as zero.

Other empirical results:

1. Newborn: double the power = 100 rating points.
  If you assume 5.5 branching factor, this means
  that a ply will yield 250 points.
  In the linear range of P4-P7 (in fact the
  range observed by Newborn in existing programs)
  this relation holds.

2. Thompson: rating = 400 * nodes ** (1/8)
  Cahlander: rating = 400 * 2**(1/8) * m**(d/16)
       m = legal moves, d = depth (ply)
  This formula predicted the performance up
  to 2000, but obviously is not valid above that.

3. Gillogly: linear in time, 80 points/30 minutes.
  This is based on few games over a very
  small time range (factor of 5). I think
  it was optimistic to extrapolate
  these results.

I urge curve-fitters to guess at the
nature of the P4-P9 ratings. I also would like
a discussion of the relevance of the whole
experiment.

Ken Thompson

-----------------------------------------------------------------
gopher://quux.org/ conversion by John Goerzen <[email protected]>
of http://communication.ucsd.edu/A-News/


This Usenet Oldnews Archive
article may be copied and distributed freely, provided:

1. There is no money collected for the text(s) of the articles.

2. The following notice remains appended to each copy:

The Usenet Oldnews Archive: Compilation Copyright (C) 1981, 1996
Bruce Jones, Henry Spencer, David Wiseman.