Compilation with security flags (1)

Compilation with security flags (1)

My current task is about enabling the security flags for the
compilation of a few binaries. We are talking about embedded
software.

I started by reading up a little about these flags, and how they
work. So I'll annotate here a few useful pointers, for future
reference.

The protection mitigates the stack smashing techniques, which I'm
familiar with at conceptual level, and only superficially at
practical level.

1. _FORTIFY_SOURCE
2. x86_64 assembly
3. Various buffer overflow protection techniques
4. The stack goes down
5. Also
6. Updates (edit from a few days later) <- Acquired wisdom.

== 1. _FORTIFY_SOURCE

The first result on duck duck go leads to this article by
red-hat.
https://www.redhat.com/en/blog/enhance-application-security-fortifysource

In a nutshell, the idea is to detect problems arising from the
[mis]use of a few well-known functions. This is done at compile
time when possible (e.g. constant parameters are wrong), and at
runtime otherwise. The runtime check is achieved by means of
special versions of the original call (e.g. memcpy being replaced
with __memcpy_chk).

It is easy to understand how it works, even if the discussion is
about x86_64 while I'm working with ARM architectures. Also, I'm
not very used to assembly since I mostly work with C.

== 2. x86_64 assembly

A useful book on the topic.
https://en.wikibooks.org/wiki/X86_Assembly/

Even if this is not my target platform (I'm on ARM), I
appreciated a few interesting details.

In the "Address operand syntax" syntax there are a few examples
about the operands. Interesting how the Load Effective Address
(LEA) operation is handy to do some math.

I had to rehearse the meaning of a few registers: the use of %ebp
as a base address for local variables, and I learned a little
about the segment support, which is disabled in favour of paging
on modern operating systems, but still in use for thread specific
data (see the segment registers FS GS)

What I find hard of assembly is that it is mainly about
conventions, and not working on this stuff often, it is difficult
to keep them in mind.

== 3. Various buffer overflow protection techniques

https://en.wikipedia.org/wiki/Buffer_overflow_protection

== 4. The stack goes down

Reasoning up from the Wikipedia article above, the variables are
positioned by design in a way that fosters stack smashing (with
the beginning of a buffer positioned on an address which is
lesser than than control information and return address => write
beyond boundary overwrites them).

Would it work to have an upwards-growing stack?
https://security.stackexchange.com/questions/44801/smashing-the-stack-if-it-grows-upwards

Short answer: no, it just improves the protection of the
"closest" stack frame, but that's hardly an improvement.

== 5. Also

Keep intermediate object codes with CMake: pass
'--debug-trycompile' when invoking cmake.

== 6. Updates (edit from a few days later)

After studying the matter, and getting some clue on how stack
smashing protection works, I started to experiment on the target
system. The target is a bare-metal build (no operating system)
on a ARM CPU.

We are using Newlib, in which I could find some code implementing
some stack smashing protection. Such code relies on file
descriptors and such, which are not available in our firmware.
So I was expecting -D_FORTIFY_SOURCE to produce some link-time
issues at least, but I did not see anything.

Unsurprisingly, I tried to smash the stack on purpose with no
effect. Then I did a comparison among binaries compiled with and
without the macro: nothing changed at all.

Later (next working day) I gave another try, using
-fstack-protector instead of _FORTIFY_SOURCE. This time I
started to see some linking problem, which meant I was on the
right track.

A quick analysis of the object file (`objdump -t`) showed
that the compiled code was depending on a symbol called
__stack_chk_fail. A disassembly (`objdump -D`) showed how
this is implemented: a canary value and conditional jump
(assembly instruction bl) to __stack_chk_fail.

I used the --wrap linker flag to replace the unsuitable
__stack_chk_fail handler from Newlib with a function called
__wrap___stack_chk_fail, that I implemented.

The handler implementation is simple: print an error message on
the UART and halt the firmware execution.

A little but nice detail is that the error message shows the
content of the lr register. The lr register is set with the
return address by the bl assembly instruction, so by printing it
the handler can tell were, in the object code, the stack smashing
happened.