<?xml version="1.0" encoding="utf8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
       <title>The Embedded Software Developer diaries</title>
       <description><![CDATA[This diary contains my notes and memories about my work as
embedded software developer.

I've always been interested in operating systems, low level
development and networking.  After a bachelor thesis in robotics,
and a specialization in embedded systems, I decided against all
odds to wrap up uni with a master thesis in distributed systems.
The choice was motivated by the job opportunity that came with
it, more than by a genuine interest for the subject.

After a few years of dissatisfaction I found the guts to change
the situation.  Then I worked as backend developer, until the
pervasive use of containers and the cloud-induced complexity wore
off my interest for this line of work.

In mid 2021 I attained a job as embedded developer, effectively
coming back to my origins, only to find out how much I'll have to
learn in order to catch up.
]]></description>
       <link>gopher://example.conf:70/1/esd-diaries/</link>
       <item>
               <title>Hardening dropbear</title>
               <link>gopher://example.conf:70/0/esd-diaries/2024-01-11.txt</link>
               <description><![CDATA[I've been notified about the results of the ssh-audit[1] security
scanner.

This gave me the opportunity to learn a bit about how SSH works under
the hood.

This article[2] explains how the session is established.
I learned from it that the diffie-hellman procedure used to exchange
the session key between client and server is based on temporary keys.
The persistent host keys are only used for authentication.

MAC-then-encrypt or encrypt-then-MAC?[3]
The MAC authenticates the message
* MAC-then-encrypt (TLS):
   Encrypt(PlainText . MAC(PlainText))
* MAC-and-encrypt (SSH):
   Encrypt(PlainText) . MAC(PlainText)
* encrypt-then-MAC (who does that?):
   Encrypt(PlainText) . MAC(Encrypt(PlainText))
From the stackexchange thread, I could infer that the latter has stronger
guarantees.

Other interesting reads:
MAC wikipedia page[4].


[1] https://github.com/jtesta/ssh-audit
[2] https://www.digitalocean.com/community/tutorials/understanding-the-ssh-encryption-and-connection-process#authenticating-the-user-s-access-to-the-server
[3] https://crypto.stackexchange.com/questions/202/should-we-mac-then-encrypt-or-encrypt-then-mac
[4] https://en.wikipedia.org/wiki/Message_authentication_code
]]></description>
       </item>
       <item>
               <title>The joy of libclang in Yocto</title>
               <link>gopher://example.conf:70/0/esd-diaries/2023-10-02.txt</link>
               <description><![CDATA[----------------------------

Once upon a time I was asked to introduce libclang in our yocto build.

The back story, abridged version, is about a certain team that
requires a certain tool for certain logic to be copied and pasted from
another project.

In case you're not aware of it, yocto uses GCC for C and C++.
As far as I know there are efforts aiming at integrating Clang to it,
but in our case we are just interested in libclang, not in the
compiler.

The task seemed a little complex, so I started with a preliminary
study, trying to figure out how difficult this could be.
The compilation of the library is not complicated itself, but even
restricting the build to the library alone requires around 2 hours.
By tweaking the compilation a little, I could shrink down the final
image size to 150 MiB, which is quite a lot, but better than the
original 500 MiB.  I wish the management was convinced by this data,
but it clearly wasn't enough.  I started to delve into the task.

Let's mention the good news first: since this is about code
generation, I don't need to build it for the target machine.

The build is just needed for the Host and for the SDK.  If you who are
reading this journal entry are not experienced in Yocto, you might be
wondering what that means.  By Host I mean the "regular" build
environment within Yocto. The SDK is a self-contained and installable
build environment that enables a workstation to build software for the
target platform without having Yocto installed.  The software that is
distributed with the SDK is effectively cross compiled as it was a
different architecture

With the previous paragraph hopefully giving enough context, it is
time to mention that the build system of of Clang requires certain
tools that are compiled on the fly, and invoked afterwards, by the
very same build system.
Since however this is a cross-build, the Clang build system will fail
while trying to execute the executables it has just built.
Those executables are in fact tailored for the SDK.

What I need is to force the CMake of Clang to use those that I
compiled in the host.

Since the tools are made for the SDK, they can't be executed, since
they're pointing at a non-existent linker script (expected to exist
under /opt/infn-xr/..., where it will not be installed until the SDK
is installed, so this is a chicken-egg problem.

Since the tools are not meant to be found in the system root, the
cmake configuration will restrict the search path to the build path
Therefore I need to patch cmake configuration.

Since the tools are not meant for installation, the cmake
configuration will not declare them as installable.
Therefore, again, I need to patch cmake configuration.

Patching cmake configuration implies the invalidation of the Shared
State Cache of Yocto, which means I have to endure two hours of
rebuild at each attempt.

Just wonderful.
]]></description>
       </item>
       <item>
               <title>Yocto, native and nativesdk</title>
               <link>gopher://example.conf:70/0/esd-diaries/2023-09-30.txt</link>
               <description><![CDATA[---------------------------

For native only recipes, it is probably more convenient to go for the
foobar-native.bb alternative.

As explicitly mentioned in the documentation, if a recipe is
nativesdk, the conventional filename is not "foobar-native.bb" but
"nativesdk-foobar.bb".  Not very consistent, but so it is.

A nativesdk package "foo" might have dependencies towards a package "bar".
Should "nativesdk-foo.bb" depend on "bar-nativesdk" or "bar-native"?
The answer is: go for "bar-native".

Anecdotal report: I've recently defined a recipe for "foo-nativesdk",
that depends on a "bar" recipe.  The compilation of "foo" requires an
executable to generate the code, and the executable is provided by
"bar".  Depending on bar-nativesdk ended up failing because the
executable could not be invoked.  The failure was due to the dynamic
linking interpreter ld.so(8) being missing.
The executable installed by bar-nativesdk was referring to a missing
loader, pointing to a position in the filesystem where it would have
been positioned by the SDK setup.
]]></description>
       </item>
       <item>
               <title>Notes on storage</title>
               <link>gopher://example.conf:70/0/esd-diaries/2023-04-23.txt</link>
               <description><![CDATA[A few documents I'm reading today.  My goal is getting to know
the theory behind UBI (Unsorted Block Images).
Warning: the following is a list of bookmarks.  The notes below
each link are relevant to me because they fill my gaps, and
because they act as a pesonal handbook: they are not meant as a a
summary of the page!


** Kernel.org / UBIFS

https://kernel.org/doc/html/v6.3-rc7/filesystems/ubifs.html

In regular MTD devices, read/write ops refer to some offset of an
erase block (variable, rather large size).  By contrast, block
devices allow for 512-bytes read/write operations

UBI sits on the top of MTD, abstracts the wear leveling of erase
blocks and error detection.


https://en.wikipedia.org/wiki/Logical_block_addressing

Abstracts away the access detail of the hard drive.  In "classic"
disks the addressing is defined by a CHS tuple (Cylinder, Head,
Sector).

With the advent os SCSI, Logical Block Addressing (LBA) was
introduced, abstracting away the accessing logic to the operating
system.  Bad blocks became irrelevant: they are automatically
replaced under the hood.


** UBI **

http://www.linux-mtd.infradead.org/doc/ubi.html

LEB = Logical Erase Block
PEB = Physical Erase Block

Static volumes are read-only and use CRC-32 to detect errors,
whereas dynamic volumes assume a read-write FS to sit on the top
of them and do error detection.

Scrubbing: UBI transparently moves data away from worn PEBs.

Headers:

Each non-bad PEB contains EC and VID.
EC = Erase Count
VID = Volume IDentifier (the PEB belongs to a LEB in a volume)

UBI claims some space on the PEB to store EIC + VID + CRC(EIC +
VID) Attaching a MTD device to UBI requires a linear scan to
resume the metadata in RAM.

VID is undefined for unmapped PEBs, and is stored upon mapping
the PEB to a LEB.  Subsequent writes do not touch metadata.

Erasure clears everything (including VID).  It is always updated
after erasure: if not found (e.g. power loss between erase and
EIC write) it is estimated as the average of the whole MTD.

Volume table

A UBI device supports up to 128 volumes, each described by a
record in a special "layout volume".  The layout volume itself
uses 2 LEB, corresponding to two copies, for fault tollerance.

The volume table info seems to correspond to the cfg file in
ubinize.  TODO: verify if this is the case.



** Further things I'd like to learn more about: **

- http://www.linux-mtd.infradead.org/doc/ubifs.html
- file system journaling
- FTL (Flash Translation Layer)
- SLC NAND flashes
]]></description>
       </item>
       <item>
               <title>Quest for software verification (2)</title>
               <link>gopher://example.conf:70/0/esd-diaries/2023-04-21.txt</link>
               <description><![CDATA[Finally some progress, after a long battle.

In the previous episode, I was wondering if the SPI-NOR flash can
be memory mapped, so that it can be seen by the JFFS2 driver.

I'm still unable to access the flash directly.  I can indirectly
copying some of it to a buffer by issuing a command to the SPI
controller (`sf read`).

The reference manual of the processor claims it should be
possible to memory map it, but does not seem to delve into the
needed details.  Could the information be in yet-another
document?

Also, I have been told that the processor is executing from flash
(See also 'XIP') which implies in my opinion that the flash
should be in fact available as memory mapping.
But this is just intuition, and I wouldn't be surprised if
someone proved me wrong.

Some facts that caught my attention:

1. By using objdump to disassemble the u-boot ELF, I can notice
  that the text section starts at 40100000.  I might read this
  incorrecty, but 100000 is the offset in flash of the u-boot
  partition.  Then 40000000 should be the start of the memory
  mapping.

2. Reading from 40000000 via `md` (a U-Boot command that allows
  to read from an arbitrary memory location) will trigger the
  Synchronous Abort handler.  Could this be due to a conflict
  for which I can not read whlie executing?

The good news is that I've found way to force arbitrary flash and
partitions offsets in the JFFS2 driver: if The CMD_MTDPARTS
configuration is turned off, the JFFS2 driver will supply its own
(quoting) "'Static' version of command line mtdparts_init() routine",
which will allow for a (quoting) "Single partition on a single device
configuration.".

Even if the code quality is questionable, it has the great
advantage of allowing the injection of constants, which means I
can supply manually the memory area where the flash is mapped.

And since I don't know how to reach the memory-mapped flash, I can
simply leverage the `sf read` command to load the JFFS2 partition in a
static buffer, inject the buffer address in the JFFS2 static
partition/device, and effectively load the filesystem content without
involving the memory mapping.

Following this plan, a few days ago I managed to list the content
of the file system.  Today I fixed a little bug that crashed
u-boot during the loading of a file.  The bug is trivial (a
pointer assigned to a signed integer, resulting to an overflow on
our 64-bits architecture).  I should probably send a patch
upstream, if it applies.
]]></description>
       </item>
       <item>
               <title>Quest for software verification</title>
               <link>gopher://example.conf:70/0/esd-diaries/2023-04-11.txt</link>
               <description><![CDATA[Since more than a week now, I'm after a task which was originally
estimated to take a few days.  An experienced colleague pushed to
migrate the boot procedure from a reasonable setup based on
FIT[1] to an arrangement where the kernel and the device tree is
loaded from the root filesystem.

The reason behind this migration has to do with the convenience
of implementing the cryptographic verification of the software in
one step: only the root filesystem is verified, and this implies
the verification of kernel and device tree in it contained.

The system I'm trying to modify is using SquashFS for the root
filesystem.  SquashFS is supported by U-Boot, so it should have
been as simple as requesting the loading of a certain filesystem
path (e.g. "/boot/vmlinuz") to a designated memory area, and boot
from there.

As it turns out, the SquashFS support of U-Boot is unfortunately
quite limited.  It requires SquashFS to be constructed on the top
of the UBI[2] support.

I was suggested to fall back on using JFFS2, which should be
supported in read-only mode from U-Boot.  The unfortunate side
effect is that our root filesystem will no longer be inherently
read-only (SquashFS is read-only by design, JFFS2 can be mounted
read-write under Linux), and of course that we should migrate the
new (old) filesystem.

Ironically, the easy part was figuring out how to do so with
Yocto!  The IMAGE_FSTYPES[3] setting allows to list what formats
the build system should use for the root filesystem.

I copied the image on the designated flash partition, and tried
to load it from U-Boot.  It took me a while to find out about
'mtdids'+'mtdparts'[4], what should be the correct values for
them, and what build-time configurations to enable in order to
get the desired code activated.  All those build-time
conditionals, and the practice of silently skipping disabled
code, can make U-Boot very difficult to work with.

I reached the point where the values of 'mtdids' and 'mtdparts'
determined a promising long delay between the 'fsls' command and
a remarkably empty list of files.  Actually nothing really
prevents those pesky 'mtdids' and 'mtdparts' variables to be
still wrong.

I tried to verify if the filesystem is usable, first by
successfully mounting it on my Linux workstation, and then by
mounting it, still on Linux but on the target board.
The second step was not trivial but eventually successful.

It was not trivial because the flash erase block matters on
JFFS2, and the kernel has a configuration called
CONFIG_MTD_SPI_NOR_USE_4K_SECTORS, active by default, which will
force our flash to use 4 KiB erase blocks.  Such configuration
does not work well with our NOR flash.

I finally managed to mount the new root filesystem under Linux
(on target) by disabled such configuration, and configured Yocto
to use 64KiB erase blocks for the image (the right knob for the
purpose is JFFS2_ERASEBLOCK[5]).  A build-time configuration with
the same name, meaning, and default value also exists under
U-Boot, so I had to disable it consistently.

Despite my efforts, I still can not access the filesystem from
U-Boot!  I can boot the kernel with the JFFS2 image as root
filesystem, which means that the problem is not in the image, nor
in the kernel.  I can also correctly load the filesystem in
memory from U-Boot (acting on the 'sf' commands, which result in
a memory transfer via the SPI bus to the main memory).  I know
the image is correct because I can dump the loaded bytes with the
'md' command, and it checks out.

The thing is, unfortunately, that the JFFS2 driver will expect
the data to be available from a memory mapped flash, and I'm not
sure if the flash can even be accessed this way.  I will now
delve into the data-sheet of our target ASIC.

Even if this battle is not yet over, it feels good to annotate
the progress made so far.  I'm no longer sure that this was a
good idea, but at least I've gained some precious know-how. :)


Additional reads:

https://en.wikipedia.org/wiki/Serial_Peripheral_Interface
https://www.embedded.com/flash-101-the-nor-flash-electrical-interface/


References

[1] Flattened Image Trees
   https://www.elinux.org/Fit-boot

[2] Unsorted Block Images
   https://en.wikipedia.org/wiki/UBIFS#UBI

[3] IMAGE_FSTYPES
   https://docs.yoctoproject.org/ref-manual/variables.html#term-IMAGE_FSTYPES

[4] mtdparts
   https://github.com/u-boot/u-boot/blob/9e804638bfe2693a908abf066ff66c251572afa7/cmd/mtdparts.c#L26

[5] JFFS2_ERASEBLOCK
   https://docs.yoctoproject.org/ref-manual/variables.html#term-IMAGE_FSTYPES
]]></description>
       </item>
       <item>
               <title>Compilation with security flags (2)</title>
               <link>gopher://example.conf:70/0/esd-diaries/2023-01-10.txt</link>
               <description><![CDATA[Some updates on the topic of hardening via CFLAGS.

The explorations reported on the 2022-12-22 entry of this phlog
turned out to be successful, in that I managed to enable stack
smashing protection (-fstack-protector and variants) under a
bare-metal build.  I updated the 2022-12-22 entry with my
findings about -fstack-protector.

The -D_FORTIFY_SOURCE setting was lately discovered to be enabled
by default in the toolchain configuration, which would explain
why I could not spot any difference between a build where it was
enabled, and a build where it was not.  It is still not clear if
the -D_FORTIFY_SOURCE accomplished anything on our build, given
that I couldn't appreciate any additional safety until
-fstack-protector was enabled.

My working implementation is currently using the "terminator
canary", which is a well known constant canary value.  A constant
canary is not as robust as a random canary, but it provides a
certain degree of protection (plenty of details can be found
online).  A constant canary has the advantage of being extremely
simple to implement.

Today a colleague suggested to use a randomised canary, so I've
spent some additional time in refining my implementation.

The newlib implementation can optionally use a randomised canary,
but it relies on features that are not available on our
bare-metal firmware.

We do have a (hardware) random number generator which we could
use for the purpose, but it needs to be initialised before we get
to use it.  Unfortunately, assigning the canary after the C
runtime has been set up breaks the stack smashing detection
mechanism.  Any return from a function is detected as a smashing.
This comes with no surprise: I had a clue when I noticed that
newlib assigns a random value to the canary within function
marked as '__attribute__((constructor))'

I don't believe that initialising the hardware from crt0 is a
good idea :D
]]></description>
       </item>
       <item>
               <title>Linux explorations: flash memory (2)</title>
               <link>gopher://example.conf:70/0/esd-diaries/2022-12-28.txt</link>
               <description><![CDATA[Continuing yesterday's work on learning about flash memories and
how they work in Linux.

1. More wikipedia
2. MTD documentation

== 1. More wikipedia

https://en.wikipedia.org/wiki/Flash_memory
Lots of informtation, of which I'm summing up what I found most
significative:

* EEPROM:
The whole device needs to be erased before rerites

* NOR:
Random access reads, random access writes on erased words,
block erase.
Erase block size: 64/128/256 KiB.
Suitable for configuration data and firmware.
Suitable for XIP - Execute In Place, firmware directly executed
from flash.
More reliable than NAND (less likely to have bit flips).
Mnemonic: NOeRror :)

* NAND:
Block devices.
Blocks are composed of Pages.
The typical page size is 512/2048/4096 B.
Reads and writes are per-page, erasures are per block.
Less expensive than NOR.
More likely to have errors (each page has its ECC - Error
Correcting Codes).
Faster erase.
Best suited for large size -> common for storage, e.g. USB
drives, [e]MMC...

* Common:
Reset of a block turns every bit into 1.
Programming turns selected bits into 0.
It is feasible (not necessarily implemented) to update words as
long as done by turning further 1 bits into 0.

* Serial Flash (e.g. via SPI bus):
Common as it makes the PCB design simpler.
A RAM buffer might be placed between the SPI bus and the flash,
to increase the speed of data modifications.
A RAM buffer might be placed between the SPI bus and the CPU,
to improve speed (code shadowing).

== 2. MTD documentation

http://www.linux-mtd.infradead.org/doc/general.html

I found this one interesting:
http://www.linux-mtd.infradead.org/doc/nand.html
]]></description>
       </item>
       <item>
               <title>Linux explorations: flash memory (1)</title>
               <link>gopher://example.conf:70/0/esd-diaries/2022-12-27.txt</link>
               <description><![CDATA[My current task is about flash memories, and certain operations about flash
memory security.
I'll have to build up some knowledge before being able to carry it on, and
such knowledge involves implementation details of the Linux kernel.

As I've done before, I'll track my progress here, marking useful (or seemingly
useful) links along the way, maybe starting with some focus music.
https://www.youtube.com/embed/eCs8LT290a4

1. User-space standpoint
2. A few Wikipedia pages
3. Linux docs: MTD, NOR and NAND


== 1. User-space standpoint

https://www.coresecurity.com/core-labs/articles/linux-flash-newbies-how-linux-works-flash

It is possible to emulate flash in RAM to do testing: nandsim (kernel module)

Besides using dd(1) on /dev/mtdX, there are many higher level tools available,
to write flash, dump it, and then some (see mtd-utils).

/dev/mtdblockX, crude abstraction to present flash as block device.
Even small writes might erase a whole erase-block, which might wear the
flash memory.

* Using romfs:
- genromfs(8) to generate read-only image from a directory
- flash_erase to clean /dev/mtdX
- nandwrite to write generated rom image to /dev/mtdX
- mount(8) /dev/mtdblockX, specifying romfs as type.  It is readonly.

* Using jffs2:
- mkfs.jffs2(1) can generate a filesystem image.  The image is flash aware
 (e.g parametrised with the correct erase block size will behave better)

Thoughts:
- Using a plain dd(1) probably works on the regular 'read' callback of
 the driver as I imagine it.  TODO: verify what's the difference with
 nanddump, for instance.


== 2. A few Wikipedia pages

https://en.wikipedia.org/wiki/Memory_Technology_Device
https://en.wikipedia.org/wiki/JFFS2
https://en.wikipedia.org/wiki/YAFFS
And more filesystem exist.
Which ones are the most popular and why?


== 3. Linux docs: MTD, NOR and NAND

MTD = Memory Technology Device

https://www.kernel.org/doc/html/latest/driver-api/mtd/
https://www.kernel.org/doc/html/latest/driver-api/mtd/spi-nor.html

The kernel docs are not very structured, and quite hard to follow.
I'm not sure what I'm reading about.

Probably a better bet for tomorrow.
http://www.linux-mtd.infradead.org/doc/general.html
]]></description>
       </item>
       <item>
               <title>Compilation with security flags (1)</title>
               <link>gopher://example.conf:70/0/esd-diaries/2022-12-22.txt</link>
               <description><![CDATA[My current task is about enabling the security flags for the
compilation of a few binaries.  We are talking about embedded
software.

I started by reading up a little about these flags, and how they
work.  So I'll annotate here a few useful pointers, for future
reference.

The protection mitigates the stack smashing techniques, which I'm
familiar with at conceptual level, and only superficially at
practical level.

1. _FORTIFY_SOURCE
2. x86_64 assembly
3. Various buffer overflow protection techniques
4. The stack goes down
5. Also
6. Updates (edit from a few days later) <- Acquired wisdom.


== 1. _FORTIFY_SOURCE

The first result on duck duck go leads to this article by
red-hat.
https://www.redhat.com/en/blog/enhance-application-security-fortifysource

In a nutshell, the idea is to detect problems arising from the
[mis]use of a few well-known functions.  This is done at compile
time when possible (e.g. constant parameters are wrong), and at
runtime otherwise.  The runtime check is achieved by means of
special versions of the original call (e.g. memcpy being replaced
with __memcpy_chk).

It is easy to understand how it works, even if the discussion is
about x86_64 while I'm working with ARM architectures.  Also, I'm
not very used to assembly since I mostly work with C.


== 2. x86_64 assembly

A useful book on the topic.
https://en.wikibooks.org/wiki/X86_Assembly/

Even if this is not my target platform (I'm on ARM), I
appreciated a few interesting details.

In the "Address operand syntax" syntax there are a few examples
about the operands.  Interesting how the Load Effective Address
(LEA) operation is handy to do some math.

I had to rehearse the meaning of a few registers: the use of %ebp
as a base address for local variables, and I learned a little
about the segment support, which is disabled in favour of paging
on modern operating systems, but still in use for thread specific
data (see the segment registers FS GS)

What I find hard of assembly is that it is mainly about
conventions, and not working on this stuff often, it is difficult
to keep them in mind.


== 3. Various buffer overflow protection techniques

https://en.wikipedia.org/wiki/Buffer_overflow_protection


== 4. The stack goes down

Reasoning up from the Wikipedia article above, the variables are
positioned by design in a way that fosters stack smashing (with
the beginning of a buffer positioned on an address which is
lesser than than control information and return address => write
beyond boundary overwrites them).

Would it work to have an upwards-growing stack?
https://security.stackexchange.com/questions/44801/smashing-the-stack-if-it-grows-upwards

Short answer: no, it just improves the protection of the
"closest" stack frame, but that's hardly an improvement.


== 5. Also

Keep intermediate object codes with CMake: pass
'--debug-trycompile' when invoking cmake.


== 6. Updates (edit from a few days later)

After studying the matter, and getting some clue on how stack
smashing protection works, I started to experiment on the target
system.  The target is a bare-metal build (no operating system)
on a ARM CPU.

We are using Newlib, in which I could find some code implementing
some stack smashing protection.  Such code relies on file
descriptors and such, which are not available in our firmware.
So I was expecting -D_FORTIFY_SOURCE to produce some link-time
issues at least, but I did not see anything.

Unsurprisingly, I tried to smash the stack on purpose with no
effect.  Then I did a comparison among binaries compiled with and
without the macro: nothing changed at all.

Later (next working day) I gave another try, using
-fstack-protector instead of _FORTIFY_SOURCE.  This time I
started to see some linking problem, which meant I was on the
right track.

A quick analysis of the object file (`objdump -t`) showed
that the compiled code was depending on a symbol called
__stack_chk_fail.  A disassembly (`objdump -D`) showed how
this is implemented: a canary value and conditional jump
(assembly instruction bl) to __stack_chk_fail.

I used the --wrap linker flag to replace the unsuitable
__stack_chk_fail handler from Newlib with a function called
__wrap___stack_chk_fail, that I implemented.

The handler implementation is simple: print an error message on
the UART and halt the firmware execution.

A little but nice detail is that the error message shows the
content of the lr register.  The lr register is set with the
return address by the bl assembly instruction, so by printing it
the handler can tell were, in the object code, the stack smashing
happened.
]]></description>
       </item>
       <item>
               <title>U-boot environment again.</title>
               <link>gopher://example.conf:70/0/esd-diaries/2022-12-09.txt</link>
               <description><![CDATA[It has been a while since last time I handled a u-boot environment!
As stupid as this sounds, I forgot a few things, and I had to find out again.

The address and size of the environment is dictated by two constants
that are part of the platform definition: CONFIG_ENV_ADDR and
CONFIG_ENV_SIZE.

The environment image can be constructed using the mkenvimage.
The tool accepts a '-r' flag (mnemonic: redundant) which is necessary when
the produced image is stored in two redundant copies.

The format is simple: 4 bytes of CRC, one optional byte defining which of the
redunant copies should be booted, and a sequence of key-values pairs.
Easily understandable by means of a simple hex dump.

Using the '-r' flag will only add the optional redundancy byte.  It will not
duplicate the data in the image.  The resulting image should be copied on
the platform, in the two redundant memory areas.
]]></description>
       </item>
       <item>
               <title>Reading ARM docs</title>
               <link>gopher://example.conf:70/0/esd-diaries/2022-11-01.txt</link>
               <description><![CDATA[I'm reading up for work-related reasons.  On our platform we use
ARM Cortex A53, and my goal is to enable some memory-related CPU
features.

This is a great opportunity to learn something new, and I see how it dusts
off some knowledge I gained back at uni.  I'm happy to have the
opportunity review knowledge and use it in practice!

I've created a new text file, and indexed it in the top level page as
"ARM Cortext A53 notes".

It is definitely not intended as a guide, rather as a tracking of my
personal advancement in learning about these topics.
]]></description>
       </item>
       <item>
               <title>Yocto, deploy.bbclass and dependencies</title>
               <link>gopher://example.conf:70/0/esd-diaries/2022-10-12.txt</link>
               <description><![CDATA[I noticed that the Yocto recipes of a few colleagues inherit a
bbclass named 'deploy.bbclass'.

This feature has ramification in a few variables[1][2], but I couldn't really
grasp the point of it without a query on the #yocto IRC channel.
After asking, I understood what follows, although I still have to verify
personally how things work.


Notes on the discussion:

1. A dependency Y -> X (Y depends on X) will ensure that the sysroot of Y
  will be populated with the artefacts installed (do_install) by the
  recipe X.

2. The sysroot is meant for things that should appear on the filesystem in
  the linux userspace.  This is not the case for the kernel, or u-boot
  (quoting qshultz)

3. There's no guarrantee that the files of X would be deployed by the
  time do_configure starts in Y (quoting vm1).
  DEPENDS doesn't give warranty binaries are deployed (confirmed by qshultz)

4. If deployed artifacts from X are needed in Y's do_compile, a dependency
  can be specified in a way that looks like this (quoting qshultz, not
  verified):

    do_compile[depends] += "do_deploy:recipeX"

Example: trusted firmware needed for compiling u-boot, not meaningful in the
sysroot.


[1] https://docs.yoctoproject.org/ref-manual/variables.html#term-DEPLOYDIR
[2] https://docs.yoctoproject.org/ref-manual/variables.html#term-DEPLOY_DIR_IMAGE
]]></description>
       </item>
       <item>
               <title>Acquired wisdom</title>
               <link>gopher://example.conf:70/0/esd-diaries/2022-07-15.txt</link>
               <description><![CDATA[I'm working on a second device driver today.

When the driver is supposed to poke or read certain memory mapped registers
(and I'd dare say that's quite common) the normal thing is to define a C
structure with the fields specified by the data sheet, and access the
registers through a volatile pointer to a structure of that type.

Needless to say, it is of paramount importance to nail the correct offset for
each register.  This is not itself tricky, except for those sneaky paddings,
that must correspond to appropriate unused fields in the defined struct.

So far I've always found that the datasheets report the registers offset from
the the address where the device is accessible.  Given how common it is to use
the C language in embedded sytems, I think it would be quite useful if data
sheets came already with the definition of a C struct.

Since the compiler won't protect you against accidental missing fields or
wrong paddings, it is wise to cross check the defined structure by copying it
into a little test program and printing out the offset of each field.


       #define ShowRegister(RegName) printf(#RegName ": %02x\n", \
               offsetof(struct quiddikey_registers, RegName)     \
       )

       // packed aligned...
       struct my_struct
       {
               uint32_t field_1;
               uint32_t field_2;
               uint32_t field_3;
               // ...
       };

       int main(int argc, char **argv)
       {
               ShowRegister(struct my_struct, field_1);
               ShowRegister(struct my_struct, field_2);
               ShowRegister(struct my_struct, field_3);
               // ...
       }
]]></description>
       </item>
       <item>
               <title>What the heck does that mean?</title>
               <link>gopher://example.conf:70/0/esd-diaries/2022-07-13.txt</link>
               <description><![CDATA[Acronyms! Acronyms everywhere.

 RAZ/WI - Read as Zero, Writes Ignored (ARM Slang).

I find it annoying when I'm reading a document that is dissiminated with
acronyms.  If I'm not familiar with the acronym in question, my brain will
automatically skip over it and attempt (and often fail) to make sense of the
phrase without that bit of information.  If there are a few of them in a row,
I often just get a /Fnord/ effect.

Fortunately, and not so surprisingly, I'm gradually adapting to all this
slang.
]]></description>
       </item>
       <item>
               <title>Readings of the day</title>
               <link>gopher://example.conf:70/0/esd-diaries/2022-07-08.txt</link>
               <description><![CDATA[https://www.kernel.org/doc/Documentation/memory-barriers.txt
https://en.cppreference.com/w/cpp/atomic/memory_order

I always found memory order difficult to understand.

I seen it before while working with C++, since in a previous job I used to
maintain a multi-threaded C++ daemon.

I stumbled into memory order while writing a device driver.  The device
driver is not designed as a multi-threaded application, but so it happens
that the controlled device works independently, and its behaviour might
effectively be affected by the re-ordering of instructions, operated by the
processor or by the compiler.

I've been pointed to the kernel documentation above, which is a very good
complement to what explained on cppreference.com.  As a matter of fact, it
allowed me to actually understand how the different memory barriers work.
]]></description>
       </item>
       <item>
               <title>Readings of the day</title>
               <link>gopher://example.conf:70/0/esd-diaries/2022-07-06.txt</link>
               <description><![CDATA[https://en.wikipedia.org/wiki/Entropy_(information_theory)

Entropy!

I'm messing with a random number generator, and it is time to dust off
what I've learned during the Machine Learning course, at uni.
]]></description>
       </item>
       <item>
               <title>A quick recap</title>
               <link>gopher://example.conf:70/0/esd-diaries/2022-06-29.txt</link>
               <description><![CDATA[This diary is supposed to trace my personal experiences, while
reaching my goals and getting into embedded software development.

Also, I would like to get into technical details, at least for those
things that are general, and not too entangled with the intellectual
property of the company I'm working for.

In reality I'm often way too busy or tired for annotating what is
going on, so I might have missed some entries.  I will make up by
briefly listing a few achievements of the last few months.


== I worked on flash memory

In the very beginning of my career as embedded system developer, I was
assigned with the task of ensuring that security-sensitive information
can be properly erased from flash memory, and specifically of MMC/eMMC
([embedded] multi-media card).

The takeaway from it is that disk encryption is always good, even if
eMMC devices support the secure erasure of the content.  The standards
mandate vendors to expose specific knobs for secure erasure.
Unfortunately it is pretty much impossible distinguish a plain erasure
from a "secure" one, at least from software.  Everything boils down to
trust the memory vendors to do their job correctly.  I can trust
people to be in good faith, but legitimate errors also happen.
Hardware is usually closed source, thus not verifiable.

Multi-layer security is *always* a good idea.  Use disk encryption.


== I'm currently writing a device driver

It is an interesting experience.  It is not trivial, it takes a lot of
reading, but it is also extremely satisfying to see the device
responding as expected.


== I've learned a few things

I'm improving my knowledge of the GNU Debugger.

I've learned some TCL.  It is old, but still somewhat popular in this
field.  It is quite kinky too.

I'm getting more comfortable with acronyms.  As a software developer,
I tend to be disoriented by the [ab]use of acronyms, which turns out
to be typical in the hardware club.  Sometimes they're just not
meaningful at all.  For example, "IP" indicates a hardware component.
What does IP stand for? "Intellectual Property"[1].  I find this quite
ridiculous, which paradoxically makes it easy to remember.


Let's go on.


[1] https://en.wikipedia.org/wiki/Semiconductor_intellectual_property_core
]]></description>
       </item>
       <item>
               <title>The unreasonable difficulty of string-to-number</title>
               <link>gopher://example.conf:70/0/esd-diaries/2022-05-11.txt</link>
               <description><![CDATA[tl;dr - properly use strto* in C.  Jump to the bottom for gotchas.

                                    ~~~

String to integer conversion is trivial, or at least it should be.
As much as I love the C programming language, the standard library is
unfortunately well known for being quite bad at such trivial tasks.

The casual approach to string to integer conversion is what lot of
people learned at the first programming course at school: atoi(3).
In case you don't know it, the problem with atoi(3) is that the user
is unable to tell apart a parsing error from a legitimate "0".

The right way of parsing an integer is by calling strtol(3) and
friends.  It is a family of functions, in that there is one dedicated
converter for each primitive type.  They all allow proper error checking,
although it is a little tricky to get it right.

This is the (strict) pattern I would recommend (example with strtoul):

   char *endptr = NULL;

   errno = 0
   value = strtoul(value_string, &endptr, base);
   if (errno)
       return handle_error(errno);

   /* to be more strict: */
   if (!endptr || *endptr)
       return handle_error(EINVAL);

   /* All good, we may use value */

In short, errno should be set to zero before calling the string
conversion function, and checked afterwards.  Optionally the endptr
parameter can be used to verify if the whole string was properly
parsed.

I find it incredible, but I keep seeing people doing it wrong.  The
most typical error I see is to neglect the (admittedly awkward) errno
dance.  Yet it must be common enough for BSD to came up with strtonum(3)
(available on other operating systems by linking against libbsd).

                                    ~~~

Lately I'm working with an arguably bad codebase, which is affected among
other things by a sloppy approach to compiler warnings.  Some of these are
related to integer conversion.  Adding a dependency to libbsd is not
an option.  I've decided to improve things by writing a simple wrapper
that implements the pattern above in a sensible way.

Here are my type-specific headers:

   int to_slong(signed long *dst, const char *src,
                int base, signed long min, signed long max);
   int to_ulong(unsigned long *dst, const char *src,
                int base, unsigned long min, unsigned long max);

   int to_sint(signed int *dst, const char *src,
               int base, signed int min, signed int max);
   int to_uint(unsigned int *dst, const char *src,
               int base, unsigned int min, unsigned int max);

   int to_sshort(signed short *dst, const char *src,
               int base, signed short min, signed short max);
   int to_ushort(unsigned short *dst, const char *src,
               int base, unsigned short min, unsigned short max);

   int to_schar(signed char *dst, const char *src,
               int base, signed char min, signed char max);
   int to_uchar(unsigned char *dst, const char *src,
               int base, unsigned char min, unsigned char max);
   int to_char(char *dst, const char *src,
               int base, char min, char max);

And here is my generic to_number, using the _Generic of C11[1]:

   #define to_number(dst, src, base, min, max) _Generic((dst), \
       signed long *     :      to_slong,                      \
       unsigned long *   :      to_ulong,                      \
       signed int *      :      to_sint,                       \
       unsigned int *    :      to_uint,                       \
       signed short *    :      to_sshort,                     \
       unsigned short *  :      to_ushort,                     \
       signed char *     :      to_schar,                      \
       unsigned char *   :      to_uchar,                      \
       char *            :      to_char)                       \
   (dst, src, base, min, max)

The implementation is trivially using the pattern above.


                                    ~~~

The interesting part of all this is what I learned about the C language

1. There are a few strto* variants that I ignored: strtoulmax/strtoimax

2. The `char` type is weird in that `char` is always equivalent to either
  `unsigned char` or `signed char`, yet it is a distinct type[2].

  It is possible to distinguish the two possible cases by checking the value
  of CHAR_MIN[3], and then proceed with a casting.

  Basically this is how I implemented to_char:

      int to_char(char *dst, const char *src, int base, char min, char max)
      {
      #if CHAR_MIN == 0
          return to_uchar((unsigned char *)dst, src, base,
               (unsigned char)min, (unsigned char)max);
      #else
          return to_schar((signed char *)dst, src, base,
               (signed char)min, (signed char)max);
      #endif
      }

3. This is more of a pitfall: the above preprocessor conditional will work
  even if CHAR_MIN is not defined.  Be sure that <limits.h> is included.

4. Bonus gotcha: strtoul seems to happily accept negative integers.  In other
  words, `strtoul("-123", NULL, 10)` will return `(unsigned long)-123`.
  This works at least on glibc.  I should check if this holds everywhere.

  Astonished at first, I started to find it reasonable, after reading
  paragraph 6.3.1.3 of the C standard:

      When a value with integer type is converted to another integer type
      other than _Bool, if the value can be represented by the new type, it
      is unchanged.

      Otherwise, if the new type is unsigned, the value is converted by
      repeatedly adding or subtracting one more than the maximum value that
      can be represented in the new type until the value is in the range of
      the new type.

      Otherwise, the new type is signed and the value cannot be represented
      in it; either the result is implementation-defined or an
      implementation-defined signal is raised.

  Yet, this might lead to bad surprises!

                                    ~~~

A big shout out to the good folks in #c, on freenode!


References:

[1] https://en.cppreference.com/w/c/language/generic
[2] https://www.iso-9899.info/n1570.html#6.2.5p15
[3] http://www.iso-9899.info/n1570.html#FOOTNOTE.45
]]></description>
       </item>
       <item>
               <title>Readings of the day</title>
               <link>gopher://example.conf:70/0/esd-diaries/2022-04-06.txt</link>
               <description><![CDATA[I'm looking at a LINKER SCRIPTS, and I figured I should rehearse what I
learned so many years ago about it.  I've dusted off the GNU ld manual, and I
found it quite straightforward.

Admittedly, I don't expect to look often at linking scripts, but I always
found fascinating the idea of messing with memory layouts.

The linker script I'm looking at refers to a few libraries that are added via
the GROUP command (see the GNU ld manual).

 GROUP(-lgcc -lc -lnosys)

WHAT IS LIBNOSYS?  It is a component of newlib[1].  It provides empty stubs
for a bunch of calls that the developer might want to "gloss over", basically
allowing the program to be built without the linker to complain about missing
symbols.  See the introduction to libgloss[2], and then seek for "libnosys" in
the newlib FAQ[3].


[1] https://sourceware.org/newlib/
[2] https://sourceware.org/newlib/libgloss.html#Libgloss
[3] https://sourceware.org/newlib/faq.html
]]></description>
       </item>
</channel>
</rss>