MY OWN BEST PRACTICES, WITH (PORTABLE) SHELL SCRIPTING

MY OWN BEST PRACTICES, WITH (PORTABLE) SHELL SCRIPTING

Shell scripting is similar to a martial art: it is *hard*, it takes ages
to master, and on the long run it gives you incredible powers.

This document is a humble collection of best practices for shell scripting
that I studied over time. The focus is on the POSIX shell, and on those
extensions that are common among the various implementations.

Target platforms:

* GNU/Linux: bash, dash, busybox
* OpenBSD: ksh
* FreeBSd: sh

Changelog:

2021-01-19 - First draft.

Still gathering the practices as I use them. No verification of the
actual portability of the techniques has been achieved, although it is
considered a goal. Things are known to work under bash and dash.

** PARAMETER VIA GETOPTS **

The `getopts` builtin can be used to parse single-dash flags, both at
script level and at function level.

The second case it is obviously more complex than plain positional
arguments, and probably makes sense only if the function is user-facing,
that is if the script is sourced, and the function is exposed for
interactive use.

foo() {
local opt
local a=
local b=
local c=

OPTIND=1; while getopts 'a:bc' opt; do
case "$opt" in
a) a="$OPTARG"
b) b=1
c) c=1
esac
done
shift $((OPTIND - 1))

...
}

The previous example shows some useful patterns to keep in mind:

* The `OPTIND` global must be reset to 1 before the first invocation of
`getopts`. Failing to do so exposes the option scanning to the side
effects from previous `getopts` invocations;

* The local variables `a`, `b` and `c` need to be explicitly emptied in
order to avoid unexpected values being assigned from previous scopes.
The local variables might in fact shadow other equally named variables
(globals, or local to other functions), while also taking their values.
See VARIABLE LOCALITY.

* The `b` and `c` variables act as booleans: false is represented by
an empty value, while true is represented by an arbitrary non-empty
value. See BOOLEANS.

** VARIABLE LOCALITY **

The goal is not polluting nor having unexpected side-effects on the global
namespace.

Unfortunately local is not POSIX, but it is allowed in most shells,
although it shows different behaviours in some corner cases.

Declaring a local variable (within a function), shadowing an existing
variable (global, or local to a caller scope) should be safe. It might be
unintentional, or a questionable intentional practice.

The declaration and assignment of a local variable should be distinct
(see Shellcheck SC2155).

** WISE USE OF TRAPS **

Traps are a great way to clear up residual state when the script exits.
Heads up: they will interfere with the return value of the shell script.

atexit() {
# first thing: take a copy of "$?" by declaring a local variable
# and assigning it in a single statement. It is generally not
# recommended to assign a local variable while declaring it, but this
# is an important exception: local will in fact succeed, effectively
# setting "$?" to 0.
local ex="$?"

# Here goes clean up, which might succeed or fail, independently from
# the rest of the script!
# If the script uses `set -e`, make sure that the handler is not
# terminated before time by a command failure!
false || :

# This is the right moment to clean multiple resources.
rm -rf "$tempfile"

exit "$ex"
}
trap atexit EXIT

Unfortunately there's room for only one exit handler, which is bound
to "know everything". It is wise to keep C++ programmers away from shell
scripts: I've seen clumsy attempts at implementing RAII, and the
complexity rose to infinity!

** ERROR HANDLING 1 **

pipefail is a bashism, too bad!

remember:
* pipes are subshells, error handling is only on the tail
* 'set -e' is your friend, but a shady one!
e.g. it is not honoured inside pipes
* within functions, always behave like 'set -e' is not in place

** ERROR HANDLING 2 **

default values of variables:

tl;dr

[ "$variable" ] || variable="$(gen_variable)"

surprises from this form, as error checking is not effective (dash):

: "${variable:="$(gen_variable)"}"

** BOOLEANS **

Perl-style: use empty strings for false, and anything for true.
Advantages:

if [ "$variable" ]; then
...
fi

** REMOTE COMMANDS VIA SSH **

It is generally possible to do

ssh -T user@host command args ...

It is a bit difficult to pull this off if we want to run remotely scripts
of some complexity.

First off, consider using tools like Ansible, although they typically
require some additional packages to be installed remotely (e.g. Python,
in Ansible's case).

Alternative method:

ssh_pipe() {
ssh -T -l "${1:?user}" "${2:?host}" sh -xe
}

ssh_pipe bob 192.168.1.1 <<EOF
... # here goes a shell script!
EOF

Beware of variable expansions: the local shell variables are expanded within
the heredoc, unless the marker is surrounded by quotes. To understand this
compare the following forms:

$ cat <<EOF
$USER
EOF

$ cat <<'EOF'
$USER
EOF

Corollary: a neat trick to pass around local variables is the following:

{ cat <<END_OF_VARS; cat <<'END_OF_SCRIPT'; } | ssh_pipe bob 192.168.1.1
remote_var="$local_var"
END_OF_VARS
echo "I can use $remote_var"
END_OF_SCRIPT