Title: Download files listed in a http index with wget | |
Author: Solène | |
Date: 16 June 2020 | |
Tags: wget internet | |
Description: | |
Sometimes I need to download files through http from a list on an | |
"autoindex" | |
page and it's always painful to find a correct command for this. | |
The easy solution is **wget** but you need to use the correct | |
parameters | |
because wget has a lot of mirroring options but you only want specific | |
ones to | |
achieve this goal. | |
I ended up with the following command: | |
wget --continue --accept "*.tgz" --no-directories --no-parent | |
--recursive http://ftp.fr.openbsd.org/pub/OpenBSD/6.7/amd64/ | |
This will download every tgz files available at the address given as | |
last parameter. | |
The parameters given will filter to only download the **tgz** files, | |
put the | |
files in the current working directory and most important, don't try to | |
escape | |
to the parent directory to start downloading again. The `--continue`` | |
parameter | |
allow to interrupt wget and start again, downloaded file will be | |
skipped and | |
partially downloaded files will be completed. | |
**Do not reuse this command if files changed on the remote server** | |
because | |
continue feature only work if your local file and the remote file are | |
the same, | |
this simply look at the local and remote names and will ask the remote | |
server | |
to start downloading at the current byte range of your local file. If | |
meanwhile | |
the remote file changed, you will have a mix of the old and new file. | |
Obviously ftp protocol would be better suited for this download job but | |
ftp is | |
less and less available so I find **wget** to be a nice workaround for | |
this. |