Aucbvax.5298
fa.space
utzoo!decvax!ucbvax!space
Tue Nov 24 03:22:30 1981
SPACE Digest V2 #40
>From OTA@S1-A Tue Nov 24 03:17:50 1981
SPACE Digest Volume 2 : Issue 40
Today's Topics:
STS-1 -- "The Bug Heard 'Round the World"
----------------------------------------------------------------------
Date: 24 Nov 1981 01:24:40-PST
From: decvax!duke!unc!smb at Berkeley
In-real-life: Steven M. Bellovin
To: decvax!duke!unc!space@Berkeley
Subject: STS-1 -- "The Bug Heard 'Round the World"
There's a very interesting article on just what delayed the launch of
STS-1 in the October 1981 issue of SOFTWARE ENGINEERING NOTES. It's
written by John R. Garman, the deputy chief of the Spacecraft Software
Division at the Johnson Space Center. I won't try to summarize the
article -- it's fairly complex, and describes how the 4 identically-
programmed computers and the backup computer with different software
co-exist. But the origin of the bug is interesting.
The problem was caused when a time delay in an initialization
subroutine was changed to avoid problems during system
reconfigurations; this affected the system's idea of what the time of
day was, and hence caused affected scheduling of certain asynchronous
processes. (Because all 4 computers must have *identical* ideas of
what time it is, they use the operating system's timer queue; hence,
any use of the timer before the other initialization code ran could
cause trouble. The real TOD clock is used only during cold-starts of
the first computer.) The nature of this change was such that there was
only a 1 in 67 chance of a failure.
"No 'mapping' analyzer built today could have found that
linkage. Testing might have. But the window wasn't opened
until late in the test program (relative to this code), and
even then, *most* simulations didn't go through the expense of
initializing 'from scratch'. And even where they did, it would
have to have been in a lab with a reasonably accurate model of
the telemetry system *plus* a simulation or test involving both
PASS [Primary Avionics Software System] *and* BFS [Backup
Flight Control System], and it would still be fighting the low
probability. Even then, the temptation would be to try
again....and never be able to repeat it; and never be sure it
wasn't a 'funny' in the lab set-up.... or a similar problem
fixed by another software change. That, in fact, apparently
did happen in one of the labs....about 4 months prior to the
flight..
"And then, on *the* day that the first GPC [General Purpose
Computer] was turned on, 30 hours before scheduled launch, we
hit the problem......"
------------------------------
End of SPACE Digest
*******************
-----------------------------------------------------------------
gopher://quux.org/ conversion by John Goerzen <
[email protected]>
of
http://communication.ucsd.edu/A-News/
This Usenet Oldnews Archive
article may be copied and distributed freely, provided:
1. There is no money collected for the text(s) of the articles.
2. The following notice remains appended to each copy:
The Usenet Oldnews Archive: Compilation Copyright (C) 1981, 1996
Bruce Jones, Henry Spencer, David Wiseman.