Title: OpenBSD: pkg_add performance analysis | |
Author: Solène | |
Date: 08 July 2021 | |
Tags: bandwidth openbsd unix | |
Description: | |
# Introduction | |
OpenBSD package manager pkg_add is known to be quite slow and using | |
much bandwidth, I'm trying to figure out easy ways to improve it and I | |
may nailed something today by replacing ftp(1) http client by curl. | |
# Testing protocol | |
I used on an OpenBSD -current amd64 the following command "pkg_add -u | |
-v | head -n 70" which will check for updates of the 70 first packages | |
and then stop. The packages tested are always the same so the test is | |
reproducible. | |
The traditional "ftp" will be tested, but also "curl" and "curl -N". | |
The bandwidth usage has been accounted using "pfctl -s labels" by a | |
match rule matching the mirror IP and reset after each test. | |
# What happens when pkg_add runs | |
Here is a quick intro to what happens in the code when you run pkg_add | |
-u on http:// | |
* pkg_add downloads the package list on the mirror (which could be | |
considered to be an index.html file) which weights ~2.5 MB, if you add | |
two packages separately the index will be downloaded twice. | |
* pkg_add will run /usr/bin/ftp on the first package to upgrade to read | |
its first bytes and pipe this to gunzip (done from perl from pkg_add) | |
and piped to signify to check the package signature. The signature is | |
the list of dependencies and their version which is used by pkg_add to | |
know if the package requires update and the whole package signify | |
signature is stored in the gzip header if the whole package is | |
downloaded (there are 2 signatures: signify and the packages | |
dependencies, don't be mislead!). | |
* if everything is fine, package is downloaded and the old one is | |
replaced. | |
* if there is no need to update, package is skipped. | |
* new package = new connection with ftp(1) and pipes to setup | |
Using FETCH_CMD variable it's possible to tell pkg_add to use another | |
command than /usr/bin/ftp as long as it understand "-o -" parameter and | |
also "-S session" for https:// connections. Because curl doesn't | |
support the "-S session=..." parameter, I used a shell wrapper that | |
discard this parameter. | |
# Raw results | |
I measured the whole execution time and the total bytes downloaded for | |
each combination. I didn't show the whole results but I did the tests | |
multiple times and the standard deviation is near to 0, meaning a test | |
done multiple time was giving the same result at each run. | |
``` | |
operation time to run data transferred | |
--------- ----------- ---------------- | |
ftp http:// 39.01 26 | |
curl -N http:// 28.74 12 | |
curl http:// 31.76 14 | |
ftp https:// 76.55 26 | |
curl -N https:// 55.62 15 | |
curl https:// 54.51 15 | |
``` | |
Charts with results | |
# Analysis | |
There are a few surprising facts from the results. | |
* ftp(1) not taking the same time in http and https, while it is | |
supposed to reuse the same TLS socket to avoid handshake for every | |
package. | |
* ftp(1) bandwidth usage is drastically higher than with curl, time | |
seems proportional to the bandwidth difference. | |
* curl -N and curl performs exactly the same using https. | |
# Conclusion | |
Using http:// is way faster than https://, the risk is about privacy | |
because in case of man in the middle the download packaged will be | |
known, but the signify signature will prevent any malicious package | |
modification to be installed. Using 'FETCH_CMD="/usr/local/bin/curl -L | |
-s -q -N"' gave the best results. | |
However I can't explain yet the very different behaviors between ftp | |
and curl or between http and https. | |
# Extra: set a download speed limit to pkg_add operations | |
By using curl as FETCH_CMD you can use the "--limit-rate 900k" | |
parameter to limit the transfer speed to the given rate. |