Introduction
Introduction Statistics Contact Development Disclaimer Help
man page documentation for all tools: copied from sfeed and changed - tscrape -…
git clone git://git.codemadness.org/tscrape
Log
Files
Refs
README
LICENSE
---
commit f05f3eb6c90f7b1baf7369498609dc5d5d212b63
parent 797f715398aeac08febc8067ed6423da727e4f45
Author: Hiltjo Posthuma <[email protected]>
Date: Sat, 17 Aug 2019 12:10:00 +0200
man page documentation for all tools: copied from sfeed and changed
initial version, needs some more work.
Diffstat:
M Makefile | 15 ++++++++++++---
A tscrape.5 | 47 +++++++++++++++++++++++++++++…
A tscrape_html.1 | 34 +++++++++++++++++++++++++++++…
A tscrape_plain.1 | 46 +++++++++++++++++++++++++++++…
A tscrape_update.1 | 89 +++++++++++++++++++++++++++++…
A tscraperc.5 | 98 +++++++++++++++++++++++++++++…
6 files changed, 326 insertions(+), 3 deletions(-)
---
diff --git a/Makefile b/Makefile
@@ -36,7 +36,11 @@ COMPATOBJ =\
LIB = ${LIBUTIL} ${LIBXML} ${COMPATOBJ}
-MAN1 = tscrape.1
+MAN1 = ${BIN:=.1}\
+ ${SCRIPTS:=.1}
+MAN5 = \
+ tscrape.5\
+ tscraperc.5
DOC = \
LICENSE\
README
@@ -66,7 +70,7 @@ ${LIBXML}: ${LIBXMLOBJ}
dist:
rm -rf "${NAME}-${VERSION}"
mkdir -p "${NAME}-${VERSION}"
- cp -f ${MAN1} ${DOC} ${HDR} \
+ cp -f ${MAN1} ${MAN5} ${DOC} ${HDR} \
${SRC} ${LIBXMLSRC} ${LIBUTILSRC} ${COMPATSRC} ${SCRIPTS} \
Makefile config.mk \
tscraperc.example style.css \
@@ -90,10 +94,14 @@ install: all
style.css\
README\
"${DESTDIR}${DOCPREFIX}"
- # installing manual pages for tools.
+ # installing manual pages for general commands: section 1.
mkdir -p "${DESTDIR}${MANPREFIX}/man1"
cp -f ${MAN1} "${DESTDIR}${MANPREFIX}/man1"
for m in $(MAN1); do chmod 644 "${DESTDIR}${MANPREFIX}/man1/$$m"; done
+ # installing manual pages for file formats: section 5.
+ mkdir -p "${DESTDIR}${MANPREFIX}/man5"
+ cp -f ${MAN5} "${DESTDIR}${MANPREFIX}/man5"
+ for m in ${MAN5}; do chmod 644 "${DESTDIR}${MANPREFIX}/man5/$$m"; done
uninstall:
# removing executable files and scripts.
@@ -106,5 +114,6 @@ uninstall:
-rmdir "${DESTDIR}${DOCPREFIX}"
# removing manual pages.
for m in $(MAN1); do rm -f "${DESTDIR}${MANPREFIX}/man1/$$m"; done
+ for m in ${MAN5}; do rm -f "${DESTDIR}${MANPREFIX}/man5/$$m"; done
.PHONY: all clean dist install uninstall
diff --git a/tscrape.5 b/tscrape.5
@@ -0,0 +1,47 @@
+.Dd July 20, 2019
+.Dt TSCRAPE 5
+.Os
+.Sh NAME
+.Nm tscrape
+.Nd output format
+.Sh SYNOPSIS
+.Nm
+.Sh DESCRIPTION
+.Xr tscrape 1
+writes the feed data in a TAB-separated format to stdout.
+.Sh TAB-SEPARATED FORMAT FIELDS
+The items are output per line in a TSV-like format.
+.Pp
+The fields are not allowed to have newlines and TABs, all whitespace characters
+are replaced by a single space character.
+Control characters are removed.
+.Sh TAB-SEPARATED FORMAT FIELDS
+The items are saved in a TSV-like format. Control characters are replaced
+by a single space.
+.Pp
+The order and format of the fields are:
+.Bl -tag -width 17n
+.It UNIX timestamp
+UNIX timestamp in UTC+0.
+.It username
+Twitter username (can be a retweet).
+.It fullname
+Twitter fullname (can be a retweet).
+.It tweet text
+Tweet text.
+.It item id
+Item id.
+.It item username
+Item username.
+.It item fullname
+Item fullname.
+.It item retweetid
+Item Retweet ID.
+.It item is pinned
+Item is pinned or not? 0 or 1.
+.El
+.Sh SEE ALSO
+.Xr tscrape 1 ,
+.Xr tscrape_plain 1
+.Sh AUTHORS
+.An Hiltjo Posthuma Aq Mt [email protected]
diff --git a/tscrape_html.1 b/tscrape_html.1
@@ -0,0 +1,34 @@
+.Dd July 20, 2019
+.Dt TSCRAPE_HTML 1
+.Os
+.Sh NAME
+.Nm tscrape_html
+.Nd format TSV data to HTML
+.Sh SYNOPSIS
+.Nm
+.Op Ar file...
+.Sh DESCRIPTION
+.Nm
+formats tscrape data (TSV) from
+.Xr tscrape 1
+from stdin or
+.Ar file
+to stdout in HTML.
+If one or more
+.Ar file
+are specified, the basename of the
+.Ar file
+is used as the feed name in the output.
+If no
+.Ar file
+parameters are specified and so the data is read from stdin the feed name
+is empty.
+.Pp
+Items with a timestamp from the last day compared to the system time at the
+time of formatting are counted and marked as new.
+.Sh SEE ALSO
+.Xr tscrape 1 ,
+.Xr tscrape_plain 1 ,
+.Xr tscrape 5
+.Sh AUTHORS
+.An Hiltjo Posthuma Aq Mt [email protected]
diff --git a/tscrape_plain.1 b/tscrape_plain.1
@@ -0,0 +1,46 @@
+.Dd July 20, 2019
+.Dt TSCRAPE_PLAIN 1
+.Os
+.Sh NAME
+.Nm tscrape_plain
+.Nd format tscrape data to a plain-text list
+.Sh SYNOPSIS
+.Nm
+.Op Ar file...
+.Sh DESCRIPTION
+.Nm
+formats tscrape data (TSV) from
+.Xr tscrape 1
+from stdin or
+.Ar file
+to stdout as a plain-text list.
+If one or more
+.Ar file
+are specified, the basename of the
+.Ar file
+is used as the feed name in the output.
+If no
+.Ar file
+parameters are specified and so the data is read from stdin the feed name
+is empty.
+.Pp
+Items with a timestamp from the last day compared to the system time at the
+time of formatting are marked as new.
+.Pp
+.Nm
+aligns the output.
+It shows a maximum of 70 column-wide characters for the title and outputs
+an ellipsis symbol if the title is longer and truncated.
+Make sure the environment variable
+.Ev LC_CTYPE
+is set to a UTF-8 locale, so it can determine the proper column-width
+per rune, using
+.Xr mbtowc 3
+and
+.Xr wcwidth 3 .
+.Sh SEE ALSO
+.Xr tscrape 1 ,
+.Xr tscrape_html 1 ,
+.Xr tscrape 5
+.Sh AUTHORS
+.An Hiltjo Posthuma Aq Mt [email protected]
diff --git a/tscrape_update.1 b/tscrape_update.1
@@ -0,0 +1,89 @@
+.Dd August 17, 2019
+.Dt TSCRAPE_UPDATE 1
+.Os
+.Sh NAME
+.Nm tscrape_update
+.Nd update feeds and merge with old feeds
+.Sh SYNOPSIS
+.Nm
+.Op Ar tscraperc
+.Sh DESCRIPTION
+.Nm
+updates feeds files and merges the new data with the previous files.
+These are the files in the directory
+.Pa $HOME/.tscrape/feeds
+by default.
+.Sh OPTIONS
+.Bl -tag -width 17n
+.It Ar tscraperc
+Config file, if not specified uses the path
+.Pa $HOME/.tscrape/tscraperc
+by default.
+See the
+.Sx FILES READ
+section for more information.
+.El
+.Sh FILES READ
+.Bl -tag -width 17n
+.It Ar tscraperc
+Config file, see the tscraperc.example file for an example.
+This file is evaluated as a shellscript in
+.Nm .
+.Pp
+Atleast the following functions can be overridden per feed:
+.Bl -tag -width 17n
+.It Fn fetch
+to use
+.Xr wget 1 ,
+OpenBSD
+.Xr ftp 1
+or an other download program.
+.It Fn merge
+to change the merge logic.
+.It Fn filter
+to filter on fields.
+.It Fn order
+to change the sort order.
+.El
+.Pp
+The
+.Fn feeds
+function is called to process the feeds.
+The default
+.Fn feed
+function is executed concurrently as a background job in your
+.Xr tscraperc 5
+config file to make updating faster.
+The variable
+.Va maxjobs
+can be changed to limit or increase the amount of concurrent jobs (8 by
+default).
+.El
+.Sh FILES WRITTEN
+.Bl -tag -width 17n
+.It feedname
+TAB-separated format containing all items per feed.
+The
+.Nm
+script merges new items with this file.
+The filename cannot contain '/' characters, they will be replaced with '_'.
+.El
+.Sh EXAMPLES
+To update your feeds and format them in various formats:
+.Bd -literal
+# Update
+tscrape_update "configfile"
+# Plain-text list
+tscrape_plain $HOME/.tscrape/feeds/* > $HOME/.tscrape/feeds.txt
+# HTML
+tscrape_html $HOME/.tscrape/feeds/* > $HOME/.tscrape/feeds.html
+.Ed
+.Sh SEE ALSO
+.Xr tscrape 1 ,
+.Xr tscrape_html 1 ,
+.Xr tscrape_plain 1 ,
+.Xr sh 1 ,
+.Xr tscrape 5 ,
+.Xr tscraperc 5
+.Sh AUTHORS
+.An Hiltjo Posthuma Aq Mt [email protected]
diff --git a/tscraperc.5 b/tscraperc.5
@@ -0,0 +1,98 @@
+.Dd July 14, 2019
+.Dt TSCRAPERC 5
+.Os
+.Sh NAME
+.Nm tscraperc
+.Nd tscrape_update(1) configuration file
+.Sh DESCRIPTION
+.Nm
+is the configuration file for
+.Xr tscrape_update 1 .
+.Pp
+The variable
+.Va tscrapepath
+can be set for the directory to store the TAB-separated feed files,
+by default this is
+.Pa $HOME/.tscrape/feeds .
+.
+.Sh FUNCTIONS
+The following functions must be defined in a
+.Nm
+file:
+.Bl -tag -width Ds
+.It Fn feeds
+This function is like a "main" function called from
+.Xr tscrape_update 1 .
+.It Fn feed "name" "feedurl"
+Function to process the feed, its arguments are in the order:
+.Bl -tag -width Ds
+.It Fa name
+Name of the feed, this is also used as the filename for the TAB-separated
+feed file.
+The filename cannot contain '/' characters, they will be replaced with '_'.
+.It Fa feedurl
+Uri to fetch the RSS/Atom data from, usually a HTTP or HTTPS uri.
+.El
+.El
+.Sh OVERRIDE FUNCTIONS
+Because
+.Xr tscrape_update 1
+is a shellscript each function can be overridden to change its behaviour,
+notable functions are:
+.Bl -tag -width Ds
+.It Fn fetch "name" "uri" "feedfile"
+Fetch feed from url and writes data to stdout, its arguments are:
+.Bl -tag -width Ds
+.It Fa name
+Specified name in configuration file (useful for logging).
+.It Fa uri
+Uri to fetch.
+.It Fa feedfile
+Used feedfile (useful for comparing modification times).
+.El
+.It Fn merge "name" "oldfile" "newfile"
+Merge data of oldfile with newfile and writes it to stdout, its arguments are:
+.Bl -tag -width Ds
+.It Fa name
+Feed name.
+.It Fa oldfile
+Old file.
+.It Fa newfile
+New file.
+.El
+.It Fn filter "name"
+Filter
+.Xr tscrape 5
+data from stdin, write to stdout, its arguments are:
+.Bl -tag -width Ds
+.It Fa name
+Feed name.
+.El
+.It Fn order "name"
+Sort
+.Xr tscrape 5
+data from stdin, write to stdout, its arguments are:
+.Bl -tag -width Ds
+.It Fa name
+Feed name.
+.El
+.El
+.Sh EXAMPLES
+An example configuration file is included named tscraperc.example and also
+shown below:
+.Bd -literal
+#tscrapepath="$HOME/.tscrape/feeds"
+
+# list of feeds to fetch:
+feeds() {
+ # feed <name> <feedurl>
+ feed "Rich Felker" "https://twitter.com/richfelker"
+ feed "Internet of shit" "https://twitter.com/internetofshit"
+ feed "Donald Trump" "https://twitter.com/realdonaldtrump"
+}
+.Ed
+.Sh SEE ALSO
+.Xr tscrape_update 1 ,
+.Xr sh 1
+.Sh AUTHORS
+.An Hiltjo Posthuma Aq Mt [email protected]
You are viewing proxied material from codemadness.org. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.