30-Nov-86 16:01:00-PST,16724;000000000000

30-Nov-86 16:01:00-PST,16724;000000000000
Mail-From: NEUMANN created at 30-Nov-86 15:58:57
Date: Sun 30 Nov 86 15:58:57-PST
From: RISKS FORUM (Peter G. Neumann -- Coordinator) <[email protected]>
Subject: RISKS DIGEST 4.21
Sender: [email protected]
To: [email protected]

RISKS-LIST: RISKS-FORUM Digest, Sunday, 30 November 1986 Volume 4 : Issue 21

FORUM ON RISKS TO THE PUBLIC IN COMPUTER SYSTEMS
ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Contents:
Risks of Computer Modeling and Related Subjects (Mike Williams--LONG MESSAGE)

The RISKS Forum is moderated. Contributions should be relevant, sound, in good
taste, objective, coherent, concise, nonrepetitious. Diversity is welcome.
(Contributions to [email protected], Requests to [email protected])
(Back issues Vol i Issue j available in CSL.SRI.COM:<RISKS>RISKS-i.j. MAXj:
Summary Contents Vol 1: RISKS-1.46; Vol 2: RISKS-2.57; Vol 3: RISKS-3.92.)

----------------------------------------------------------------------

Date: Fri, 28 Nov 86 13:02 EST
From: "John Michael (Mike) Williams" <[email protected]>
To: [email protected]
Subject: Risks of Computer Modeling and Related Subjects (LONG MESSAGE)

Taking the meretricious "con" out of econometrics and computer modeling:
"Con"juring the Witch of Endor
John Michael Williams, Bethesda MD

Quite a few years ago, the Club of Rome perpetrated its "Limits to Growth"
public relations exercise. Although not my field, I instinctively found it
bordering on Aquarian numerology to assign a quantity, scalar or otherwise,
to "Quality of Life," and a gross abuse of both scientific method and
scientific responsibility to the culture at large. Well after the initial
report's firestorm, I heard that a researcher at McGill proved the model was
not even internally consistent, had serious typographical/syntactical errors
that produced at least an order of magnitude error, and that when the errors
were corrected, the model actually predicted an improving, not declining
"Quality of Life." I called the publisher of "Limits to Growth," into its
umpteenth edition, and asked if they intended to publish a correction or
retraction. They were not enthusiastic, what with Jerry Brown, as Governor
and candidate for Presidential nomination, providing so much lucrative
publicity. Jimmy Carter's "malaise" and other speeches suggest that these
dangerously flawed theses also affected, and not for the better, both his
campaign and administration.

This shaman-esque misuse of computers embarrassed the computing
community, but with no observable effect.

On 31 October 1986, Science ran a depressing article entitled: "Asking
Impossible Questions About the Economy and Getting Impossible Answers"
(Gina Kolata, Research News, Vol. 234, Issue 4776, pp. 545-546). The
subtitle and the sidebar insert are informative:

Some economists say that large-scale computer models of the economy are no
better at forecasting than economists who simply use their best judgment...
"People are overly impressed by answers that come out of a computer"...

Additional pertinent citations (cited with permission):

"There are two things you would be better not seeing in the making--
sausages and econometric estimates," says Edward Learner, an economist at
[UCLA]. These estimates are used by policymakers to decide, for example,
how the new tax law will affect the economy or what would happen if a new
oil import tax were imposed. They are also used by businesses to decide
whether there is a demand for a new product. Yet the computer models that
generate these estimates, say knowledgeable critics, have so many flaws
that, in Learner's words, it is time to take the "con out of econometrics."

...[E]ven the defenders of the models... [such as e]conomists Kenneth
Arrow of Stanford and Stephen McNees of the Federal Reserve Board in Boston
say they believe the models can be useful but also say that one reason the
models are made and their predictions so avidly purchased is that people
want answers to impossible questions and are overly impressed by answers
that come out of a computer...

The problem, says statistician David Freedman of the University of
California at Berkeley, is that "there is no economic theory that tells you
exactly what the equations should look like." Some model builders do not
even try to use economic theory...: most end up curve-fitting--a risky
business since there are an infinite number of equations that will fit any
particular data set...

"What you really have," says William Ascher of Duke University, "is a man-
model system." And this system, say the critics, is hardly scientific.
Wassily Leontief of New York University remarks, "I'm very much in favor of
mathematics, but you can do silly things with mathematics as well as with
anything else."

Defenders of the models point out that economists are just making the best
of an impossible situation. Their theory is inadequate and it is
impossible to write down a set of equations to describe the economy in any
event... But the critics of the models say that none of these defenses
makes up for the fact that the models are, as Leontief says, "hot air."
Very few of the models predict accurately, the economic theory behind the
models is extremely weak if it exists at all, in many cases the data used to
build the models are of such poor quality as to be essentially useless, and
the model builders, with their subjective adjustments, produce what is,
according to Learner, "an uncertain mixture of data and judgment."

When David Stockman made "subjective adjustments," he was reviled for
cooking the numbers. It seems they may have been hash to begin with.

[Douglas Hale, director of quality assurance at the (Federal) Energy
Information Administration] whose agency is one of the few that regularly
assess models to see how they are doing, reports that, "in many cases, the
models are oversold. The scholarship is very poor, the degree of testing
and peer review is far from adequate by any scientific measure, and there
is very little you can point to where one piece of work is a building block
for the next."

For example, the Energy Information Administration looked at the accuracy
of short-term forecasts for the cost of crude oil... At first glance, it
looks as if they did not do too badly... But, says Hale, "what we are
really interested in is how much does the price change over time. The
error in predicting change is 91%"

This is about the same error, to the hour, of a stopped clock.

In the Washington Post for 23 November 1986, pg K1 et seq., in an
interview entitled "In Defense of Public Choice," Assar Lindbeck,
chairman of the Swedish Royal Academy's committee for selecting the
Nobel Prize in economics, explains the committee's choice of Professor
James M. Buchanan, and is asked by reporter Jane Seaberry:

It seems the economics profession has come into some disrepute. Economists
forecast economic growth and forecasts are wrong. The Reagan administration
has really downplayed advice from economists. What do you think about the
economics profession today?

Chairman Lindbeck replies:

Well, there's something in what you say in the following sense, I think,
that in the 1960s, it was a kind of hubris development in the economic
profession ... in the sense that it was an overestimation of what research
and scientific knowledge can provide about the possibilities of
understanding the complex economic system. And also an overestimation
about the abilities of economists to give good advice and an overestimation
of the abilities of politicians and public administrators to pursue public
policy according to that advice.

The idea about fine tuning the economy was based on an oversimplified
vision of the economy. So from that point of view, for instance,
economists engaged in forecasting--they are, in my opinion, very much
overestimating the possibilities of making forecasts because the economic
system is too complex to forecast. Buchanan has never been engaged in
forecasting. He does not even give policy advice because he thinks it's
quite meaningless...

What econometric computer model is not "an oversimplified vision of the
economy?" When is forecasting an "economic system ... too complex to
forecast" not fortune-telling?

To return to Kolata's article:

[Victor Zarnowitz of the University of Chicago] finds that "when you
combine the forecasts from the large models, and take an average, they are
no better than the average of forecasts from people who just use their best
judgment and do not use a model."

I cannot resist noting that when a President used his own judgment, and
pursued an economic policy that created the greatest Federal deficit in
history but the lowest interest rates in more than a decade, the high
priests of the dismal science called it "voodoo economics." It takes one
to know one, I guess.

Ascher finds that "econometric models do a little bit worse than judgment.
And for all the elaboration over the years they haven't gotten any better.
Refining the models hasn't helped." Ascher says he finds it "somewhat
surprising that the models perform worse than judgment since judgment is
actually part of the models; it is incorporated in when modelers readjust
their data to conform to their judgment."

Fascinating! Assuming the same persons are rendering "judgments," at
different times perhaps, it implies that the elaboration and mathematical
sophistry of the models actually cloud their judgment when expressed through
the models: they appear to have lost sight of the real forest for the
papier-mache trees.

Another way of assessing models is to ask whether you would be better off
using them, or just predicting that next year will be like this year. This
is the approach taken by McNees... "I would argue that, if you average
over all the periods [1974-1982] you would make smaller errors with the
models [on GNP and inflation rates] than you would by simply assuming that
next year will be just like this year," he says. "But the errors would not
be tremendously smaller. We're talking about relatively small orders of
improvement."

I seem to recall that this is the secret of the Farmer's Almanac success
in predicting weather, and that one will only be wrong 15% of the time
if one predicts tomorrow's weather will be exactly like today's.

Other investigators are asking whether the models' results are
reproducible... Suprisingly the answer seems to be no. "There is a real
problem with scholarship in the profession," says Hale of the Energy
Information Administration. "Models are rarely documented well enough so
that someone else can get the same result..."

[In one study, about two-thirds of the] 62 authors whose papers were
published in the [J]ournal [of Money, Credit and Banking]... were unwilling
to supply their data in enough detail for replication. In those cases
where the data and equations were available, [the researchers] succeeded in
replicating the original results only about half the time...

What a sorry testament! What has become of scientific method, peer review?

"Even if you think the models are complete garbage, until there is an
obviously superior alternative, people will continue to use them," [McNees]
says.

Saul, failing to receive a sign from Jehovah, consulted a fortune-teller on the
eve of a major battle. The Witch of Endor's "model" was the wraith of Samuel,
and it wasn't terribly good for the body politic either. I keep a sprig of
laurel on my CRT, a "model" I gathered from the tree at Delphi, used to send
the Oracle into trance, to speak Apollo's "truth." I do it as amusement and
memento, not as talisman for public policy. History and literature are filled
with the mischief that superstition and fortune-telling have wrought, yet
some economic and computer scientists, the latter apparently as inept as the
Sorcerer's Apprentice, are perpetuating these ancient evils. Are Dynamo and
decendents serving as late-twentieth-century substitutes for I Ching sticks?

Is the problem restricted to econometrics, or is the abuse of computer
modeling widespread? Who reproduces the results of weather models, for
instance? Who regularly assesses and reports on, and culls the unworthy
models? Weather models are interesting because they may be among the
most easily "validated," yet there remains the institutional question:
when the Washington Redskins buy a weather service, for example, to
predict the next game's weather, how can they objectively predetermine
that they are buying acceptable, "validated" modeling rather than snake
oil? After all, even snake oil can be objectively graded SAE 10W-40, or
not. A posteriori "invalidation" by losing while playing in the "wrong"
weather is no answer, any more than invalidation by catastrophic engine
failure would be in motor oils. The Society of Automotive Engineers at
least has promulgated a viscosity standard: what have we done?

Where is scientific method at work in computer modeling? When peer review
is necessarily limited by classification, in such applications as missile
engagement modeling and war gaming, what body of standards may the closed
community use to detect and eliminate profitable, or deadly, hokum? Is this
just one more instance of falsified data and experiments in science
generally, of the sort reported on the front page of the Washington Post as
or before it hits the journals? (See: "Harvard Researchers Retract
Published Medical 'Discovery;'" Boyce Rensberger, Washington Post, 22
November 1986 pg 1 et seq.; and Science, Letters, 28 November 1986.)

Several reforms (based on the "publish or perish" practice that is
itself in need of reform) immediately suggest themselves. I offer them
both as a basis for discussion, and as a call to action, or we shall
experience another aspect of Limits to Growth-- widespread rejection of
the contributions of computer science, as a suspect specialty:

o Refusal to supply data to a peer for purposes of replication might
result in the journal immediately disclaiming the article, and temporary
or permanent prohibition from publication in the journal in question.

o Discovery of falsified data in one publication resulting in
restriction from publication (except replies, clarification or
retraction) in all publications of the affiliated societies. In
computer science, this might be all IEEE publications at the first
level, AFIPS, IFIPS and so on.

o Widespread and continuing publication of the identities of the authors,
and in cases of multiple infractions, their sponsoring institutions, in
those same journals, as a databank of refuseniks and frauds.

o Prohibition of the use of computer models in public policymaking (as in
sworn testimony before Congress) that have not been certified, or audited,
much as financial statements of publicly traded companies must now be audited.

o Licensing by the state of sale and conveyance of computer models of
general economic or social significance, perhaps as defined and
maintained by the National Academy of Sciences.

The last is extreme, of course, implying enormous bureaucracy and
infrastructure to accomplish, and probably itself inevitably subject to
abuse. The reforms are all distasteful in a free society. But if we do
nothing to put our house in order, much worse is likely to come from the
pen or word-processor of a technically naive legislator.

In exchange for a profession's privileged status, society demands it be
self-policing. Doctors, lawyers, CPAs and the like are expected to
discipline their membership and reform their methods when (preferably
before) there are gross abuses. Although some of them have failed to do
so in recent years, is that an excuse for us not to?

Finally, how can we ensure that McNees' prediction, that people will
continue to re-engineer our society on models no better than garbage,
will prove as false as the models he has described?

------------------------------

End of RISKS-FORUM Digest
************************
-------