url fetch improvements - annna - Annna the nice friendly bot. | |
git clone git://bitreich.org/annna/ git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws6… | |
Log | |
Files | |
Refs | |
Tags | |
README | |
--- | |
commit 38e087783a9d8a080204b5c7d4d37cf088531a67 | |
parent 0d34d3fa3e153cfe4996b76b30db1f2ce401b189 | |
Author: Annna Robert-Houdin <[email protected]> | |
Date: Sun, 9 Dec 2018 12:54:37 +0100 | |
url fetch improvements | |
Separate url fetching in a separate function so it can be modified in | |
one place. | |
The title and content fetching used to use 2 different ways to fetch. | |
Use the new function fetch-url and do it in one request (via Tor). | |
Remove the control-character trimming in grabtitle, it is now done in | |
the binary. | |
Diffstat: | |
M annna-start-services | 10 +++++----- | |
A curl-grabtitle | 9 +++++++++ | |
A fetch-url | 14 ++++++++++++++ | |
3 files changed, 28 insertions(+), 5 deletions(-) | |
--- | |
diff --git a/annna-start-services b/annna-start-services | |
@@ -89,10 +89,9 @@ then | |
*) | |
if [ -n "$uri" ]; | |
then | |
- urititle="$(curl-grabtitle "${uri}" \ | |
- | tr '[:cntrl:]' ' ' \ | |
- | sed 's@^ *@@' \ | |
- | cut -c -200)" | |
+ tmpf=$(mktemp) | |
+ fetch-url "${uri}" > "${tmpf}" | |
+ urititle="$(grabtitle < "${tmpf}" | sed 's@^ *… | |
if [ -n "$urititle" ]; | |
then | |
case "${urititle}" in | |
@@ -107,12 +106,13 @@ then | |
then | |
annna-say -c "#bitreic… | |
else | |
- purl="$(curl -sL "${ur… | |
+ purl="$(9 htmlfmt < "$… | |
annna-say -c "#bitreic… | |
fi | |
;; | |
esac | |
fi | |
+ rm -f "${tmpf}" | |
continue | |
fi | |
;; | |
diff --git a/curl-grabtitle b/curl-grabtitle | |
@@ -0,0 +1,9 @@ | |
+#!/bin/sh | |
+export PATH="$HOME/bin:$PATH" | |
+ | |
+if test x"$1" = x""; then | |
+ echo "usage: $0 <url>" >&2 | |
+ exit 1 | |
+fi | |
+ | |
+fetch-url "$1" | grabtitle | |
diff --git a/fetch-url b/fetch-url | |
@@ -0,0 +1,14 @@ | |
+#!/bin/sh | |
+ | |
+if test x"$1" = x""; then | |
+ echo "usage: $0 <url>" >&2 | |
+ exit 1 | |
+fi | |
+ | |
+curl \ | |
+ --preproxy socks5://127.0.0.1:9100 \ | |
+ -s \ | |
+ -L --max-redirs 3 \ | |
+ -m 5 \ | |
+ -H 'User-Agent:' \ | |
+ "$1" 2>/dev/null |