Aalice.349

Aalice.349
net.chess
utzoo!decvax!ucbvax!mhtsa!alice!ken
Thu Jan 7 14:31:21 1982
chess strength
This is in response to Gillogly's
recent comments on computer chess
strength (and projections of same).

I performed the following experiment:
There are 7 programs named P3, P4, ... P9.

P3 searches the standard (Chess 4.X clone)
3 plies and then enters a quiescence
capture search with full evaluation
at the leaves. (Getting out of check
does not count as a ply.)

P4, P5 ... are identical programs that
search progressively deeper.

Program Px played 20 games with Py
(all x,y; x not equal y). Each program
played both sides of 10
randomly selected starting opening
positions. The positions were chosen
by random number generator selecting
positions from ECO (Encyclopedia of
Chess Openings) that had 10 moves
played by each side and were judged
(by mini-max) to be "equal" or "unclear"
or "with compensation".

Altogether there were 420 games played.
((7x7 - 7)/2 * 20.)

The performance rating of each program
was calculated (on an arbitrary base)
by calculating what rating a program
needed to obtain the expected performance
actually obtained. This was then iterated
until it converged to 1 rating point
for all programs.

Here is the crosstable:

P3 P4 P5 P6 P7 P8 P9
P3 -- 8 0 0 0 0 0+
P4 12 -- 5 0+ 0 0 0
P5 20 15 -- 3+ 3 0+ 0
P6 20 19+ 16+ -- 4 1+ 1+
P7 20 20 17 16 -- 5 4
P8 20 20 19+ 18+ 15 -- 5+
P9 19+ 20 20 18+ 16 14+ --

The performance rating of the programs
(with P4 chosen as the base of 0)

P3 -134
P4 0000
P5 0335
P6 0591
P7 0796
P8 0973
P9 1093

If you plot the performance data,
you will note that P3 performs
better than expected. This is true. Due
to edge effects in my chess hardware,
P3 performs approximately 3.2 plies on
the average. This is why P4 was chosen
as zero.

Other empirical results:

1. Newborn: double the power = 100 rating points.
If you assume 5.5 branching factor, this means
that a ply will yield 250 points.
In the linear range of P4-P7 (in fact the
range observed by Newborn in existing programs)
this relation holds.

2. Thompson: rating = 400 * nodes ** (1/8)
Cahlander: rating = 400 * 2**(1/8) * m**(d/16)
m = legal moves, d = depth (ply)
This formula predicted the performance up
to 2000, but obviously is not valid above that.

3. Gillogly: linear in time, 80 points/30 minutes.
This is based on few games over a very
small time range (factor of 5). I think
it was optimistic to extrapolate
these results.

I urge curve-fitters to guess at the
nature of the P4-P9 ratings. I also would like
a discussion of the relevance of the whole
experiment.

Ken Thompson

-----------------------------------------------------------------
gopher://quux.org/ conversion by John Goerzen <[email protected]>
of http://communication.ucsd.edu/A-News/

This Usenet Oldnews Archive
article may be copied and distributed freely, provided:

1. There is no money collected for the text(s) of the articles.

2. The following notice remains appended to each copy:

The Usenet Oldnews Archive: Compilation Copyright (C) 1981, 1996
Bruce Jones, Henry Spencer, David Wiseman.