[12] WHEN ARE THE SYSTEM MAINTENANCE WINDOWS? WHY THE LOW UPTIME?

    Typically the SDF Public Access UNIX System is available to its
    members and, in some cases, the general public 24 hours a day,
    7 days a week, 365 days a year, 10 years a decade, 25 years a
    quarter century .. and so on.

    That being said there are unforeseen issues that can cause the
    system to become unavailable:

       1.  Hard Disk Crash - We have several spare drives, some of
           them already plugged in and ready to be used.  In the
           best case scenario no maintenance window is required.

       2.  Fire - In the case of fire all SDF machines must be shut
           down unless the fire is an isolated occurance.

       3.  Natural Disaster - In the Spring (Apr-May) we do get
           affected by lighting strikes in our area due to heavy
           thunderstorms.  Best case scenario the UPS systems filter
           the spikes and dips which allow SDF to run uninterrupted.

       4.  Software Bug - This due crop up from time to time and are
           usually related to system updates.  On SDF we typically
           will let the public access machines lag behind NetBSD
           development in order to test new releases in our lab before
           subjecting the userbase to 'new bugs'.

       5.  Routine and Scheduled Maintenance - Please read below.

       6.  Hardware Component Failure - We have many spare machines,
           some completely cabled up and ready to go at the flick of
           a remote command.  If an SDF client host becomes completely
           unrecoverable, a spare can be put into operation within
           minutes.  Keep in mind that while all of your personal files
           are hosted on the file server, the /tmp directory is exclusive
           to each SDF client host.

    ROUTINE AND SCHEDULED MAINTENANCE

    There is a weekly maintenance window on Sunday mornings beginning at
    02:00 AM until 03:00 AM.  This windows is not always used and when it
    is, it is used very briefly. 5 minutes prior to a shutdown or runlevel
    transition all logged in members will be notified on their terminals.
    If you see this message alerting you to system maintenance, you should
    save all open files and prepare to logout.

    Scheduled maintenance is always announced several days in advance on
    the bboard in the <ANNOUNCE> board.  If it that maintenance window
    requires extended time (basically anything over 5 to 10 minutes) the
    /etc/motd file (displayed at login) will note the details of the event.

    Scheduled maintenance is really only used when hardware upgrades have
    to take place.  In most cases, software updates can occur while the
    systems are up and available.

WHY THE LOW UPTIME?

    Uptime is relative.  What we're after is 'high availability'.  This
    means that our goal is to have the servers answering at least 99.9%
    of the time.  In the 20+ years of service SDF has been able to meet
    this goal.  The most uptime you'll see on any given server will be
    about 3 to 4 weeks.  After 3 weeks performing maintenance is necessary.
    This helps with clearing buffers, caches and other inconsistencies
    that can occur as the systems run from cold or warm boot.  Rather
    than waiting for the system to fail due to kernel panic or a hang,
    a warm boot is performed, during the weekly maintenance window, which
    takes roughly 5 minutes or less.  Keep in mind, this doesn't occur
    weekly but usually after 3 to 4 weeks of linear uptime.

    Why is this necessary? (aka "My box runs for years under my desk").
    We too have very low usage non-public NetBSD systems that run for years
    without requiring a reboot.  However, SDF is extremely high volume with
    sophsiticated NFS, NIS and VNODE caching.  While these do not cause
    problems with light loads, with 40,000 active users they become an
    issue.  Again, our goal is high availability which doesn't necessarily
    have to translate it long uptimes.