| Title: Download files listed in a http index with wget | |
| Author: Solène | |
| Date: 16 June 2020 | |
| Tags: wget internet | |
| Description: | |
| Sometimes I need to download files through http from a list on an | |
| "autoindex" | |
| page and it's always painful to find a correct command for this. | |
| The easy solution is **wget** but you need to use the correct | |
| parameters | |
| because wget has a lot of mirroring options but you only want specific | |
| ones to | |
| achieve this goal. | |
| I ended up with the following command: | |
| wget --continue --accept "*.tgz" --no-directories --no-parent | |
| --recursive http://ftp.fr.openbsd.org/pub/OpenBSD/6.7/amd64/ | |
| This will download every tgz files available at the address given as | |
| last parameter. | |
| The parameters given will filter to only download the **tgz** files, | |
| put the | |
| files in the current working directory and most important, don't try to | |
| escape | |
| to the parent directory to start downloading again. The `--continue`` | |
| parameter | |
| allow to interrupt wget and start again, downloaded file will be | |
| skipped and | |
| partially downloaded files will be completed. | |
| **Do not reuse this command if files changed on the remote server** | |
| because | |
| continue feature only work if your local file and the remote file are | |
| the same, | |
| this simply look at the local and remote names and will ask the remote | |
| server | |
| to start downloading at the current byte range of your local file. If | |
| meanwhile | |
| the remote file changed, you will have a mix of the old and new file. | |
| Obviously ftp protocol would be better suited for this download job but | |
| ftp is | |
| less and less available so I find **wget** to be a nice workaround for | |
| this. |