[09] WHAT SHOULD I DO IF A SYSTEM CRASHES OR LOCKS UP?

    Hopefully this will not happen at all to you, but if you experience
    'lock ups' or 'freezes', please follow these steps to help prevent
    your own data loss.

    Also, it is important to note that you do not have a direct connection
    to SDF and are mostly likely hopping through 10 or more networks to
    get to SDF.  You can use ping and traceroute to measure lag between
    your computer and SDF.  So, your experience of lag on SDF is subjective
    and it is very important for you to understand that.

    Typically a lockup will occur when you are trying to access a
    file that is resident on the fileserver.  For instance, say you
    are trying to cat a file and instead of seeing the contents you
    get either nothing or a message similar to:

    ol1:/sys: not responding

    Be patient, the fileserver will recover shortly and your task
    will be completed .. you will probably see:

    ol1:/sys: is alive again

    which means your request will actually begin to be processed.

    During the hang time, you can use ^T (CTRL T) to display the
    status of your job .. for instance:

    load: 2.04  cmd: tail 12966 [select] 0.00u 0.00s 0% 808k

    [select] is the current state of the process id 12966 which
    is the 'tail' program.  If the system is waiting on actual
    disk I/O, you'll probably see [biowait].  In cases of a hang
    you may see either [nfsrcvlk] (Network File System Received Lock)
    or [vnlock] (Virtual Node Lock) which the system will usually
    recover from, but can be telling of a serious resource problem
    on the NFS client should this state be prolonged.

    In the event that the fileserver becomes unavailable, it is
    important that you do not become impatient and interrupt, quit
    or suspend your jobs (^C, ^\ or ^Z) but rather, wait them out.
    If you are patient your chances of losing data will be
    significantly reduced.  Usually the fileserver will respond
    within a few seconds, but usually no longer.  In the case when
    it is the NFS client's problem (vnlock for more than say 20
    seconds) that particular host will most likely need to be reset.

    More on this.  SDF is pushing NetBSD to its limits and we are
    currently (2003-2004) doing quite a bit of investigation with
    the uvm/vfs/vnode code developers to help NetBSD become scalable
    in high usage situations such as the loads we experience on SDF.
    Solutions we find will be incorporated into the public code.