17-Mar-86 19:25:11-PST,12624;000000000000
Mail-From: NEUMANN created at 17-Mar-86 19:23:21
Date: Mon 17 Mar 86 19:23:21-PST
From: RISKS FORUM    (Peter G. Neumann, Coordinator) <[email protected]>
Subject: RISKS-2.29
Sender: [email protected]
To: [email protected]

RISKS-LIST: RISKS-FORUM Digest,  Monday, 17 Mar 1986  Volume 2 : Issue 29

          FORUM ON RISKS TO THE PUBLIC IN COMPUTER SYSTEMS
  ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Contents:
 Commission vs. Omission (Martin J. Moore plus an example from Dave Parnas)
 A Stitch in Time (Jagan Jagannathan)
 Clockenspiel (Jim Horning)
 Cordless phones (Chris Koenigsberg)
 Money talks (Dirk Grunwald, date correction from Matthew Kruk)
 [Non]computerized train wreck (Mark Brader)
 On-line Safety Database (Ken Dymond)

The RISKS Forum is moderated.  Contributions should be relevant, sound, in good
taste, objective, coherent, concise, nonrepetitious.  Diversity is welcome.
(Contributions to [email protected], Requests to [email protected].)
(Back issues Vol i Issue j stored in SRI-CSL:<RISKS>RISKS-i.j.  Vol 1: MAXj=45)

----------------------------------------------------------------------

Received: from eglin-vax.ARPA ... Mon 17 Mar 86 17:26:49-PST
Date: 0  0 00:00:00 CDT
From: "MARTIN J. MOORE" <mooremj@eglin-vax>
Subject: Commission vs. Omission
To: "risks" <risks@sri-csl>

Dave Parnas's points regarding the shuttle destruct system are well taken.
The policy, stated informally, was that "it better work if we need it --
but it absolutely better NOT 'work' when we DON'T need it" which generated the
extreme emphasis on preventing what Dave calls "risks of commission."  I feel
that the risk of commission on the destruct system is extremely small, while
the risk of omission is somewhat higher, although still small.  During
validation testing and in every pre-launch checkout, we performed "exhaustive"
checks -- "exhaustive" meaning that we tried every combination of
 [(2 central computers) * (6 remote sites) * (2 computers per site)
     * (2 transmitters per site) * (2 comm paths to each site)
     * (2 possible commands in various sequences)].
Yeah, this takes a *LONG* time (with practice, we got it down
to several hours if everything went smooth.)  On one occasion during
validation testing, we did find a software error which only manifested on a
particular (central computer/comm path/remote computer/unusual command
sequence) combination.  Exhaustive tests *are* necessary.

I have often wondered why the emphasis was to prevent errors of commission
over errors of omission (not to say that we wanted either kind, but errors
of commission were definitely considered to be worse!).  An erroneous
destruct would cost the lives of the flight crew, loss of the Orbiter, and
possibly damage on the ground if it occurred early in the flight (e.g.,
windows blown out, etc.)  An erroneous non-destruct, in the worst case (if
the ET were to detonate near the crowded spectator area on the NASA
causeway), could cause the loss of TENS OF THOUSANDS of lives.  Certainly
this is worse than an erroneous destruct.  I believe there may be a
subconscious feeling that an erroneous destruct means the difference between
a success and a disaster, while an erroneous non-destruct means the
difference between a disaster and a worse disaster.  Subjectively, that
difference is not as great as the first, although objectively it may be much
greater.
                                    Martin Moore

<The usual disclaimers.  I'm too tired to type in the whole silly thing.>

       [By the way, Dave Parnas suggested the following example to
        illustrate his message in RISKS-2.28:]

        "Consider elevators.  Consider how much easier it is to prevent the
        floor indicator from saying "13" than to assure that the floor
        indicator will always give the actual floor that the elevator is
        on.  The risk of indicating "13" can be gotten acceptably low by
        eliminating "13" from the set of indicator lights.  The risk of
        indicating an incorrect floor or not indicating the current floor
        is much harder to eliminate."  [Dave Parnas]

------------------------------

Date: Mon 17 Mar 86 11:43:53-PST
From: [email protected]
Subject: A Stitch in Time
To: [email protected]

  [As it now turns out, the reboot occurred just moments BEFORE I logged
   in Sunday night.  Here are some further details.  PGN]

This is the probable sequence of events that led us back in time on CSLA:
1. A power glitch (late night SUNDAY) caused the F4 to hard boot.
2. During a hard boot, the TIME is retrieved from eleven independent sources
  (which are assumed to be correct!)
3. One of these sources had the incorrect time of some warm day in 1972
  causing the average to be wrongly computed resulting in Dec 6th/1985.

Suggestion:
1. Change the statistical measure from MEAN to something less sensitive to
  one or two abnormal times; for example the average of the 5th, 6th, and 7th
  largest times.

      [IT IS ABSOLUTELY INCREDIBLE THAT UNSAFE ALGORITHMS continue to
       be used.  This problem is as old as the hills.  Statisticians
       routinely throw out the absurd values before computing the
       mean.  Dorothy Denning pointed out the pun in their terminology
       (applicable to Byzantine agreement algorithms, where you don't
       trust anyone): the OUT-LIERS are really the OUT-LIARS.

       EVEN WORSE, Jagan points out that if the clock had been accidentally
       set INTO THE FUTURE, things could also get very sticky.  We also
       have a problem of nonunique clock readings during the hour at 2AM when
       Daylight Savings Time ends.  A good time to be asleep.  PGN]

[Here is some more background.]

 Date: Mon 17 Mar 86 12:37:37-PST
 From: Mark Lottor <[email protected]>
 Subject: [Louis A. Mamakos <[email protected]>: time]
 To: [email protected]

 This was just to verify that the problem was on the
 remote system and not some local problem...
               ---------------

 Date: Mon, 17 Mar 86 15:31:14 EST
 From: Louis A. Mamakos <[email protected]>
 To: [email protected]
 In-Reply-To: Mark Lottor's message of Mon 17 Mar 86 11:34:41-PST
 Subject: time

 Yes, I can verify that it was indeed the clock (actually the host the clock
 was on) that was screwed up.  It it unfortunate that there is no way to get
 the current year out of the WWVB clock.  There was work being done in the
 computer room, which reset our LSI-11/73 host, which subsequently got
 confused.  Sorry about the problem.

 Louis A. Mamakos  WA3YMH    Internet: [email protected]
 University of Maryland, Computer Science Center - Systems Programming

------------------------------

From: [email protected] (Jim Horning)
Date: 17 Mar 1986 1436-PST (Monday)
To: [email protected]
Subject: Clockenspiel

Your errant clock reminds me of something that happened at Stanford in
the mid-sixties. I was apparently one of the first users of the 360/67
to run a job that started on one day and finished on the next. When the
statement for my account arrived in the mail, I had quite a job
convincing my wife that the huge figure (to a graduate student couple)
was nothing to worry about: It was a CREDIT resulting from a job that
was charged for minus 23 hours and 58 minutes!

The Xerox Alto operating system had a compiled-in reasonableness check
on the date and time. When it started up, if the local clock wasn't
"reasonable," it sent a request over the Ethernet and put "Date and
Time Unknown" in the banner. Well, you guessed it: The day came when
the (correct) time from the time server was no longer "reasonable," and
therefore couldn't be corrected by appeal to the time server....

Jim H.

------------------------------

Date: 17 Mar 1986 11:48-EST
From: [email protected]
To: RISKS FORUM    (Peter G. Neumann, Coordinator) <[email protected]>
Subject: RISKS re: Cordless phones

My roommate has a cordless phone and it goes on the blink every few weeks.
All the phones in the house stop working. When you pick any one up, all you
get is a very loud static sound and you can't dial out. I have learned that
I can fix this problem by sneaking in his room and unplugging the cradle for
his cordless phone. A visitor in the house was very frightened one night
when she was left alone and though someone had cut the phone lines or
something. It was the cordless phone on the blink again.

Chris Koenigsberg
[email protected] , or ckk%[email protected]
{harvard,seismo,topaz,ucbvax}!g.cs.cmu.edu!ckk
(412)268-8526 office, (412)362-6422 home
Center for Design of Educational Computing
Carnegie-Mellon U.
Pgh, Pa. 15213

------------------------------

Date: Mon, 17 Mar 86 16:44:15 CST
From: [email protected] (Dirk Grunwald)
To: [email protected]
Subject: re: money talks

I read the 'money talks' article with great amusement. One of the risks to
society which is worth talking about is the risk of using inappropriate, or
downright silly, technology.

Talking money would appear to be such a waste of resources. Certainly some
other method of denomination descrimination could be devised for the
visually impared. Rasied lettering, coinage instead of paper money, different
sized paper money, different paper stock. But talking money?

dirk grunwald
university of illinois

------------------------------

Date: Mon, 17 Mar 86 09:07:32 PST
From: Matthew_Kruk%[email protected]
To: [email protected]
Subject: Money Talks

Correction to my previous message: The date of the article should be
March 14th (Friday).

------------------------------

Date: Mon, 17 Mar 86 19:04:03 EST
From: [email protected]
To: [email protected]
Subject: [Non]computerized train wreck

Me:
> Not only was this NOT a case of computer malfunction, but indeed, a more
> fully computerized system (with cab signalling and automatic train stopping)
> would probably have prevented the accident.

PGN:
> [... Thus, even though it appears NOT to be a computer problem,
> we discover that the computer could have done better!  But, of course,
> don't blame the computer system.  Blame the people who specified,
> designed, and implemented it -- not JUST the train operator(s).  PGN]

You sound more critical than I meant to be.  The cost of equipping all major
railways with cab signalling and the like would be considerable, to say the
least.  While such installations certainly do exist, especially on busy
high-speed lines, the "centralized traffic control" in use on the route in
question is probably much more common.  Are you calling on all railways to
upgrade their signaling systems long before they are life-expired, every
time something somewhat better comes along?

Mark Brader  (ihnp4!utzoo!lsuc!msb and ...!dciem!msb are both me.)

  [One would hope that new improvements do not always require everything
   to be thrown out.  Long ago we discovered the advantages of software
   solutions over hardware solutions.  But when human lives are at stake,
   safer systems may be worth the price of upgrading equipment.  I think
   that the incredible escalation of law-suit awards and of rates for
   malpractice and liability insurance may provide some new incentives.  PGN]

------------------------------

Date: 17 Mar 86 15:14:00 EST
From: "DYMOND, KEN" <[email protected]>
Subject: On-line Safety Database
To: "risks" <[email protected]>

Our Library Bulletin (and as a frequent user I'd have to say that the NBS
has one of the best technical libraries going) for February contained a
notice that Pergamon Infoline (evidently a supplier of such services) is
offering a new online database service, SAFETY: "SAFETY, produced by
Cambridge Scientific Abstracts, provides broad interdisciplinary
coverage of safety, including industrial, transportation, environmental,
and medical safety.  This database indexes journals, books, reports,
patents, and proceedings published in 1981 or later."  If someone on
the list uses this database, please let us know how well it covers
computer and software safety.

Ken Dymond
National Bureau of Standards

------------------------------

End of RISKS-FORUM Digest
************************
-------