CROSSBOW(7) Miscellaneous Information Manual (urm) CROSSBOW(7)
NAME
crossbow-cookbook ? cookbookish examples of crossbow(1) usage
DESCRIPTION
This manual page contains short recipes demonstrating how to use the
crossbow feed aggregator.
Table of contents
1. Simple local mail notification
2. Incremental files collection
3. Download the full article
4. One mail per entry
5. Maintain a local multimedia collection
EXAMPLES
Simple local mail notification
We want a periodic notification, via local mail, of the availability of
new stories on a site.
The configuration in crossbow.conf(5) would look like this:
feed debian_micro
url
https://micronews.debian.org/feeds/feed.rss
format %ft: %l\n
The invocation of crossbow(1) will emit on stdout(3) a line like the
following for each new item:
Debian micronews:
https://micronews.debian.org/....html
By placing the following string in a crontab(5), a check for updates will
be run automatically every two hours:
0 0-23/2 * * * crossbow
Assuming that local mail delivery is enabled, and since the output of a
cronjob is mailed to the owner of the crontab(5), the user will receive a
mail with one line for each entry that appeared in the last two hours.
Incremental files collection
Let's consider a feed whose XML reports the whole article for each entry.
We want to store individual articles in a separate file, under a specific
directory on the filesystem.
The configuration in crossbow.conf(5) would look like this:
feed cosmic.voyage
url
gopher://cosmic.voyage:70/0/atom.xml
handler pipe
command sed -n w%n.txt
chdir ~/scifi_stories/cosmic.voyage/
The invocation of crossbow(1) will spawn one sed(1) process for each new
entry. The content, corresponding to the %d placeholder, will be piped
to the subprocess. This in turn will write it on the specified file (w
command), but not on stdout(3) (-n flag).
As a result, the ~/scifi_stories/cosmic.voyage directory will be
populated with files named 000000.txt, 000001.txt, 000002.txt, ...etc,
since %n is expanded with an incremental numeric value. See
crossbow-format(5).
Security remark: unless the feed is trusted, it is strongly discouraged
to name filesystem paths after entry properties others than %n. Consider
for example the case where %t is used as a file name, and the title of a
post is something like ../../.profile. %n is safe to use, since its
value is not dependent on the feed content.
Download the full article
This scenario is similar to the previous one, but it tackles the
situation where the feed entry does not contain the full content, while
the entry's link field contains a valid URL, which is intended to be
reached by means of a web browser.
In this case we can leverage curl(1) to do the retrieval:
feed debian_micro
url
https://micronews.debian.org/feeds/feed.rss
handler exec
command curl -o %n.html %l
chdir ~/debian_micronews/
The "%n" and "%l" placeholders do not need to be quoted: they are handled
safely even when their expansions contain white spaces. See
crossbow-format(5).
It is of course possible to use any non-interactive download manager in
place of curl(1), or maybe a specialized script that fetches the entry
link and scrapes the content out of it.
One mail per entry
We want to turn individual feed entries into plain (HTML-free) text
messages, and deliver them via email.
Our goal can be achieved by means of a generic shell script like the
following:
#!/bin/sh
set -e
feed_title="$1"
post_title="$2"
link="$3"
lynx "${link:--stdin}" -dump -force_html |
sed "s/^~/~~/" | # Escape dangerous tilde expressions
mail -s "${feed_title:+${feed_title}: }${post_title:-...}" "${USER:?}"
The script can be installed in the PATH, e.g. as
/usr/local/bin/crossbow-to-mail, and then integrated in crossbow(1) as
follows:
? If the tracked feed encloses the whole content in the XML:
feed debian_micro
url
https://micronews.debian.org/feeds/feed.rss
handler pipe
command crossbow-to-mail %ft %t
? If the feed entries only relay the link to the article:
feed lobsters.c
url
https://lobste.rs/t/c.rss
handler exec
command crossbow-to-mail %ft %t %l
Note: The crossbow-to-mail script leverages lynx(1) to download and parse
the HTML into textual form. Any other
Security remark: The "s/^~/~~/" sed(1) regex prevents accidental or
malicious tilde escapes from being interpreted by the mail(1) program.
The mutt(1) mail user agent, if available, can be used as a safer drop-in
replacement.
Maintain a local multimedia collection
Many sites specialized in multimedia delivery can be scraped using tools
such as youtube-dl(1). If the web site allows the subscription of a
feed, crossbow(1) can be combined with these tools in order to maintain
incrementally a local collection of files.
For example, YouTube provides feeds for users, channels and playlists.
Each of these entities is assigned with a unique identifier, which can be
easily figured by looking at the web URL.
? Given a user identifier UID, the feed is
https://youtube.com/feeds/videos.xml?user=UID
? Given a channel identifier CID, the feed is
https://youtube.com/feeds/videos.xml?channel_id=CID
? Given a playlist identifier PID, the feed is
https://youtube.com/feeds/videos.xml?playlist_id=PID
What follows is a convenient wrapper script that ensures proper file
naming (although it is always wiser to use %n, as explained above):
#!/bin/sh
link="${1:?missing link}"
incremental_id="${2:?missing incremental id}"
format="$3"
# Transform a title in a reasonably safe 'slug'
slugify() {
tr -d \\n | # explicitly drop new-lines
tr /[:punct:][:space:] . | # turn all sly chars into dots
tr -cs [:alnum:] # squeeze repetitions
}
fname="$(
youtube-dl \
--get-filename \
-o "%(id)s_%(title)s.%(ext)s" \
"$link"
)" || exit 1
youtube-dl \
${format:+-f "$format"} \
-o "$(printf %s_%s "$incremental_id" "$fname" | slugify)" \
--no-progress \
"$link"
Once again, the script can be installed in the PATH, e.g. as
/usr/local/bin/crossbow-ytdl, and then integrated in crossbow(1) as
follows:
? To save each published video:
feed computerophile
url
https://youtube.com/feeds/videos.xml?user=Computerphile
handler exec
command crossbow-ytdl %l %n
? To save only the audio of each published video:
feed nodumb
url
https://youtube.com/feeds/videos.xml?channel_id=UCVnIvJuTZqM5nnwGFpA57_Q
handler exec
command crossbow-ytdl %l %n
SEE ALSO
crossbow(1), lynx(1), sed(1), youtube-dl(1), crontab(5), cron(8)
AUTHORS
Giovanni Simoni <
[email protected]>
October 9, 2021