Title: Host your own wikipedia backup | |
Author: Solène | |
Date: 13 November 2019 | |
Tags: openbsd wikipedia life | |
Description: | |
## Wikipedia and openzim | |
If you ever wanted to host your own wikipedia replica, here is the | |
simplest | |
way. | |
As wikipedia is REALLY huge, you don't really want to host a php | |
wikimedia | |
software and load the huge database, instead, the project made the | |
*openzim* | |
format to compress the huge database that wikipedia became while | |
allowing using | |
it for fast searches. | |
Sadly, on OpenBSD, we have no software reading zim files and most | |
software | |
requires the library openzim to work which requires extra work to get | |
it as a | |
package on OpenBSD. | |
Hopefully, there is a python package implementing all you need as pure | |
python | |
to serve zim files over http and it's easy to install. | |
This tutorial should work on all others unix like systems but packages | |
or | |
binary names may change. | |
## Downloading wikipedia | |
The project Kiwix is responsible for wikipedia files, they create | |
regularly | |
files from various projects (including stackexchange, gutenberg, | |
wikibooks | |
etc...) but for this tutorial we want wikipedia: | |
[https://wiki.kiwix.org/wiki/Content_in_all_languages](https://wiki.kiw | |
ix.org/wiki/Content_in_all_languages) | |
You will find a lot of files, the language is contained into the | |
filename. Some | |
filenames will also self explain if they contain everything or | |
categories, and | |
if they have pictures or not. | |
The full French file is 31.4 GB worth. | |
## Running the server | |
For the next steps, I recommend setting up a new user dedicated to | |
this. | |
On OpenBSD, we will require python3 and pip: | |
$ doas pkg_add py3-pip-- | |
Then we can use pip to fetch and install dependencies for the zimply | |
software, | |
the flag `--user` is rather important as it allows any user to download | |
and | |
install python libraries in its home folder instead of polluting the | |
whole | |
system as root. | |
$ pip3.7 install --user --upgrade zimply | |
I wrote a small script to start the server using the zim file as a | |
parameter, I | |
rarely write python so the script may not be high standard. | |
File **server.py**: | |
from zimply import ZIMServer | |
import sys | |
import os.path | |
print("usage: " + sys.argv[0] + " file") | |
exit(1) | |
ZIMServer(sys.argv[1]) | |
else: | |
print("Can't find file " + sys.argv[1]) | |
And then you can start the server using the command: | |
$ python3.7 server.py /path/to/wikipedia_fr_all_maxi_2019-08.zim | |
You will be able to access wikipedia on the url http://localhost:9454/ | |
Note that this is not a "wiki" as you can't see history and edit/create | |
pages. | |
This kind of backup is used in place like Cuba or Africa areas where | |
people | |
don't have unlimited internet access, the project lead by Kiwix allow | |
more | |
people to access knowledge. |