Aucbvax.5298
fa.space
utzoo!decvax!ucbvax!space
Tue Nov 24 03:22:30 1981
SPACE Digest V2 #40
>From OTA@S1-A Tue Nov 24 03:17:50 1981

SPACE Digest                                      Volume 2 : Issue 40

Today's Topics:
               STS-1 -- "The Bug Heard 'Round the World"
----------------------------------------------------------------------

Date: 24 Nov 1981 01:24:40-PST
From: decvax!duke!unc!smb at Berkeley
In-real-life: Steven M. Bellovin
To: decvax!duke!unc!space@Berkeley
Subject: STS-1 -- "The Bug Heard 'Round the World"

There's a very interesting article on just what delayed the launch of
STS-1 in the October 1981 issue of SOFTWARE ENGINEERING NOTES.  It's
written by John R. Garman, the deputy chief of the Spacecraft Software
Division at the Johnson Space Center.  I won't try to summarize the
article -- it's fairly complex, and describes how the 4 identically-
programmed computers and the backup computer with different software
co-exist.  But the origin of the bug is interesting.

The problem was caused when a time delay in an initialization
subroutine was changed to avoid problems during system
reconfigurations; this affected the system's idea of what the time of
day was, and hence caused affected scheduling of certain asynchronous
processes.  (Because all 4 computers must have *identical* ideas of
what time it is, they use the operating system's timer queue; hence,
any use of the timer before the other initialization code ran could
cause trouble.  The real TOD clock is used only during cold-starts of
the first computer.) The nature of this change was such that there was
only a 1 in 67 chance of a failure.

       "No 'mapping' analyzer built today could have found that
       linkage.  Testing might have.  But the window wasn't opened
       until late in the test program (relative to this code), and
       even then, *most* simulations didn't go through the expense of
       initializing 'from scratch'.  And even where they did, it would
       have to have been in a lab with a reasonably accurate model of
       the telemetry system *plus* a simulation or test involving both
       PASS [Primary Avionics Software System] *and* BFS [Backup
       Flight Control System], and it would still be fighting the low
       probability.  Even then, the temptation would be to try
       again....and never be able to repeat it; and never be sure it
       wasn't a 'funny' in the lab set-up.... or a similar problem
       fixed by another software change.  That, in fact, apparently
       did happen in one of the labs....about 4 months prior to the
       flight..

       "And then, on *the* day that the first GPC [General Purpose
       Computer] was turned on, 30 hours before scheduled launch, we
       hit the problem......"

------------------------------

End of SPACE Digest
*******************

-----------------------------------------------------------------
gopher://quux.org/ conversion by John Goerzen <[email protected]>
of http://communication.ucsd.edu/A-News/


This Usenet Oldnews Archive
article may be copied and distributed freely, provided:

1. There is no money collected for the text(s) of the articles.

2. The following notice remains appended to each copy:

The Usenet Oldnews Archive: Compilation Copyright (C) 1981, 1996
Bruce Jones, Henry Spencer, David Wiseman.