In the process of creating tarballs of some old hard drives. Each of
them several hundred GB in size. I'd like to have a checksum of each
tarball. Naive approach:
tar cf "$i".tar "$i"
sha1sum "$i".tar >"$i".tar.sha1sum
What's bad about this is that it requires reading the data twice: Once
for creating the tarball, then we have to read the tarball again to
create the checksum file.
Reading 200 GB at 80 MB/s takes about 42 minutes. Ugh.
Can't we do that in one go? Yes, we can:
tar cf - "$i" | tee >(sha1sum | sed "s/ -\$/ $i.tar/" >"$i".tar.sha1sum) >"$i".tar
The call to `sed` is a bit ugly and it uses GNU Bash co-processes, but
it saves a lot of time.
Oh, damn it. Now I realized I also need an index file (a dump of `tar
tvf "$i"`). Could have done that by adding another co-process, but I've
already finished a few drives ...
Those old drives are from former computers of mine. Some back from when
I still used Windows. File organization in Windows is really messy, your
stuff is scattered all around `C:` and `D:` and `E:` and what not.
Maybe it's different today, I don't know, but back then, Windows (or
rather, DOS) didn't really encourage you to organize your files in a
meaningful way. That's why I'm simply dumping the entire drives. I don't
know where my stuff is.
On UNIXoid systems, you at least have most of your stuff in `$HOME`. If
you do it right, *all* your stuff is in `$HOME`. (I screwed up a few
times and put important data somewhere else like `/var/www` ... Don't do
that on personal computers.)