| Title: OpenBSD: pkg_add performance analysis | |
| Author: Solène | |
| Date: 08 July 2021 | |
| Tags: bandwidth openbsd unix | |
| Description: | |
| # Introduction | |
| OpenBSD package manager pkg_add is known to be quite slow and using | |
| much bandwidth, I'm trying to figure out easy ways to improve it and I | |
| may nailed something today by replacing ftp(1) http client by curl. | |
| # Testing protocol | |
| I used on an OpenBSD -current amd64 the following command "pkg_add -u | |
| -v | head -n 70" which will check for updates of the 70 first packages | |
| and then stop. The packages tested are always the same so the test is | |
| reproducible. | |
| The traditional "ftp" will be tested, but also "curl" and "curl -N". | |
| The bandwidth usage has been accounted using "pfctl -s labels" by a | |
| match rule matching the mirror IP and reset after each test. | |
| # What happens when pkg_add runs | |
| Here is a quick intro to what happens in the code when you run pkg_add | |
| -u on http:// | |
| * pkg_add downloads the package list on the mirror (which could be | |
| considered to be an index.html file) which weights ~2.5 MB, if you add | |
| two packages separately the index will be downloaded twice. | |
| * pkg_add will run /usr/bin/ftp on the first package to upgrade to read | |
| its first bytes and pipe this to gunzip (done from perl from pkg_add) | |
| and piped to signify to check the package signature. The signature is | |
| the list of dependencies and their version which is used by pkg_add to | |
| know if the package requires update and the whole package signify | |
| signature is stored in the gzip header if the whole package is | |
| downloaded (there are 2 signatures: signify and the packages | |
| dependencies, don't be mislead!). | |
| * if everything is fine, package is downloaded and the old one is | |
| replaced. | |
| * if there is no need to update, package is skipped. | |
| * new package = new connection with ftp(1) and pipes to setup | |
| Using FETCH_CMD variable it's possible to tell pkg_add to use another | |
| command than /usr/bin/ftp as long as it understand "-o -" parameter and | |
| also "-S session" for https:// connections. Because curl doesn't | |
| support the "-S session=..." parameter, I used a shell wrapper that | |
| discard this parameter. | |
| # Raw results | |
| I measured the whole execution time and the total bytes downloaded for | |
| each combination. I didn't show the whole results but I did the tests | |
| multiple times and the standard deviation is near to 0, meaning a test | |
| done multiple time was giving the same result at each run. | |
| ``` | |
| operation time to run data transferred | |
| --------- ----------- ---------------- | |
| ftp http:// 39.01 26 | |
| curl -N http:// 28.74 12 | |
| curl http:// 31.76 14 | |
| ftp https:// 76.55 26 | |
| curl -N https:// 55.62 15 | |
| curl https:// 54.51 15 | |
| ``` | |
| Charts with results | |
| # Analysis | |
| There are a few surprising facts from the results. | |
| * ftp(1) not taking the same time in http and https, while it is | |
| supposed to reuse the same TLS socket to avoid handshake for every | |
| package. | |
| * ftp(1) bandwidth usage is drastically higher than with curl, time | |
| seems proportional to the bandwidth difference. | |
| * curl -N and curl performs exactly the same using https. | |
| # Conclusion | |
| Using http:// is way faster than https://, the risk is about privacy | |
| because in case of man in the middle the download packaged will be | |
| known, but the signify signature will prevent any malicious package | |
| modification to be installed. Using 'FETCH_CMD="/usr/local/bin/curl -L | |
| -s -q -N"' gave the best results. | |
| However I can't explain yet the very different behaviors between ftp | |
| and curl or between http and https. | |
| # Extra: set a download speed limit to pkg_add operations | |
| By using curl as FETCH_CMD you can use the "--limit-rate 900k" | |
| parameter to limit the transfer speed to the given rate. |