* * * * *
Swapping disks
Tonight Mark [1] and I replaced a bad disk on swift, the colocated server
currently serving up our sites. The bad disk is the system disk; the websites
themselves (along with some other services we have) all reside on another
disk.
There was much discussion before heading over there as to the best way to
approach the problem of copying the data off the bad drive. The first method
would to be install the new disk into the machine and do a disk-to-disk copy.
The downside is that swift is a 1U (Rack Unit—1.75″) system with no room for
a third drive (no matter how temporary). Also, the unit is designed to run
with the cover on—we were unsure how it would deal running uncovered. The
other option would be a network based copy, from swift to another machine
with the new drive in it. The problem here was speed—even though we could
hook the second machine directly to swift (on the secondary ethernet port) at
100Mbps (Megabit per second) it would still take a while to copy over several
gigs worth of files. We decided to take a second computer (the Windows box
Spring [2] and I share) as we decided to decide when we got to the colocation
facility.
When we got there and examined swift, it was decided to use the temporary
computer and do a network copy. We had some difficulty in getting the Windows
box to recognize the new SCSI (Small Computer System Interface) disk (Mark
had some extra SCSI controllers and disks); it was certainly news to me that
the BIOS (Basic Input/Output System) setup was on the harddrive instead of on
the ROM (Read-Only Memory) (much like the very old days of PC (Personal
Computer)s). Once we straightened that out, it was pretty straightforward to
boot Gentoo [3] from a live CD (Compact Disc), partition and format the new
drive.
Then it was time to copy the files. It took some work to figure out how to
use rsync using the rsync protocol and it still took us two attempts to get
everything (first time rsync ran without root priviledges which limited the
number of files copied). Once that finished (and still on the temporary
machine) we recompiled the kernel to support SCSI, then set about to make the
drive bootable.
The problem here was that Gentoo was a bit too aggressive in identifying
hardware, and since the Linux kernel sticks USB (Universal Serial Bus)
storage devices under the SCSI layer, the harddrive ended up with an ID that
it wouldn't have in the swift. We ended up having to reboot the Gentoo CD,
remove the loaded USB drivers, then mount the SCSI drive, then make the drive
bootable. Once that was done, the temporary system booted up without a
problem.
We then removed the drive and controller, cleaned the area (so we could have
room to move about) and spent a few minutes making a game plan of swapping
the bad drive for the new one. The physical swap went fairly smoothly. It was
reconfiguring the BIOS that proved to be rather difficult. We couldn't get
into the BIOS configuration. A search of possible key sequences to get into
the BIOS configuration revealed:
1. DEL
2. F1
3. F2
4. F10
5. Ctrl-Alt-Esc
6. Ctrl-Esc
7. Alt-Esc
8. INS
9. Esc
10. Ctrl-Alt-Ins
We ran down the entire list, and not one worked. Mark then had the brainstorm
to hold down the keys as the machine was powered up. First key he tried, DEL
got us into the BIOS.
Talk about having plenty of time to get into the BIOS configuration.
Once the BIOS was configured with the new drive, it rebooted without a
problem.
All told, we spent maybe five hours doing the drive swap, with the websites
unavailable for maybe fifteen minutes tops. It was a bit scary at times
though, watching the copying go with numerous disk errors. But so far,
nothing important seems to have been corrupted, unlike most of the files in
Mark's home directory (but he had current backups of that data anyway).
[1]
http://grumpy.conman.org/
[2]
http://www.springdew.com/
[3]
http://www.gentoo.org/
Email author at
[email protected]