| Title: Host your own wikipedia backup | |
| Author: Solène | |
| Date: 13 November 2019 | |
| Tags: openbsd wikipedia life | |
| Description: | |
| ## Wikipedia and openzim | |
| If you ever wanted to host your own wikipedia replica, here is the | |
| simplest | |
| way. | |
| As wikipedia is REALLY huge, you don't really want to host a php | |
| wikimedia | |
| software and load the huge database, instead, the project made the | |
| *openzim* | |
| format to compress the huge database that wikipedia became while | |
| allowing using | |
| it for fast searches. | |
| Sadly, on OpenBSD, we have no software reading zim files and most | |
| software | |
| requires the library openzim to work which requires extra work to get | |
| it as a | |
| package on OpenBSD. | |
| Hopefully, there is a python package implementing all you need as pure | |
| python | |
| to serve zim files over http and it's easy to install. | |
| This tutorial should work on all others unix like systems but packages | |
| or | |
| binary names may change. | |
| ## Downloading wikipedia | |
| The project Kiwix is responsible for wikipedia files, they create | |
| regularly | |
| files from various projects (including stackexchange, gutenberg, | |
| wikibooks | |
| etc...) but for this tutorial we want wikipedia: | |
| [https://wiki.kiwix.org/wiki/Content_in_all_languages](https://wiki.kiw | |
| ix.org/wiki/Content_in_all_languages) | |
| You will find a lot of files, the language is contained into the | |
| filename. Some | |
| filenames will also self explain if they contain everything or | |
| categories, and | |
| if they have pictures or not. | |
| The full French file is 31.4 GB worth. | |
| ## Running the server | |
| For the next steps, I recommend setting up a new user dedicated to | |
| this. | |
| On OpenBSD, we will require python3 and pip: | |
| $ doas pkg_add py3-pip-- | |
| Then we can use pip to fetch and install dependencies for the zimply | |
| software, | |
| the flag `--user` is rather important as it allows any user to download | |
| and | |
| install python libraries in its home folder instead of polluting the | |
| whole | |
| system as root. | |
| $ pip3.7 install --user --upgrade zimply | |
| I wrote a small script to start the server using the zim file as a | |
| parameter, I | |
| rarely write python so the script may not be high standard. | |
| File **server.py**: | |
| from zimply import ZIMServer | |
| import sys | |
| import os.path | |
| print("usage: " + sys.argv[0] + " file") | |
| exit(1) | |
| ZIMServer(sys.argv[1]) | |
| else: | |
| print("Can't find file " + sys.argv[1]) | |
| And then you can start the server using the command: | |
| $ python3.7 server.py /path/to/wikipedia_fr_all_maxi_2019-08.zim | |
| You will be able to access wikipedia on the url http://localhost:9454/ | |
| Note that this is not a "wiki" as you can't see history and edit/create | |
| pages. | |
| This kind of backup is used in place like Cuba or Africa areas where | |
| people | |
| don't have unlimited internet access, the project lead by Kiwix allow | |
| more | |
| people to access knowledge. |