CROSSBOW(7)         Miscellaneous Information Manual (urm)         CROSSBOW(7)

NAME
    crossbow-cookbook ? cookbookish examples of crossbow(1) usage

DESCRIPTION
    This manual page contains short recipes demonstrating how to use the
    crossbow feed aggregator.

  Table of contents
    1.   Simple local mail notification
    2.   Incremental files collection
    3.   Download the full article
    4.   One mail per entry
    5.   Maintain a local multimedia collection

EXAMPLES
  Simple local mail notification
    We want a periodic notification, via local mail, of the availability of
    new stories on a site.

    The configuration in crossbow.conf(5) would look like this:

          feed debian_micro
            url https://micronews.debian.org/feeds/feed.rss
            format %ft: %l\n

    The invocation of crossbow(1) will emit on stdout(3) a line like the
    following for each new item:

          Debian micronews: https://micronews.debian.org/....html

    By placing the following string in a crontab(5), a check for updates will
    be run automatically every two hours:

        0 0-23/2 * * * crossbow

    Assuming that local mail delivery is enabled, and since the output of a
    cronjob is mailed to the owner of the crontab(5), the user will receive a
    mail with one line for each entry that appeared in the last two hours.

  Incremental files collection
    Let's consider a feed whose XML reports the whole article for each entry.
    We want to store individual articles in a separate file, under a specific
    directory on the filesystem.

    The configuration in crossbow.conf(5) would look like this:

          feed cosmic.voyage
            url gopher://cosmic.voyage:70/0/atom.xml
            handler pipe
            command sed -n w%n.txt
            chdir ~/scifi_stories/cosmic.voyage/

    The invocation of crossbow(1) will spawn one sed(1) process for each new
    entry.  The content, corresponding to the %d placeholder, will be piped
    to the subprocess.  This in turn will write it on the specified file (w
    command), but not on stdout(3) (-n flag).

    As a result, the ~/scifi_stories/cosmic.voyage directory will be
    populated with files named 000000.txt, 000001.txt, 000002.txt, ...etc,
    since %n is expanded with an incremental numeric value.  See
    crossbow-format(5).

    Security remark: unless the feed is trusted, it is strongly discouraged
    to name filesystem paths after entry properties others than %n.  Consider
    for example the case where %t is used as a file name, and the title of a
    post is something like ../../.profile.  %n is safe to use, since its
    value is not dependent on the feed content.

  Download the full article
    This scenario is similar to the previous one, but it tackles the
    situation where the feed entry does not contain the full content, while
    the entry's link field contains a valid URL, which is intended to be
    reached by means of a web browser.

    In this case we can leverage curl(1) to do the retrieval:

          feed debian_micro
            url https://micronews.debian.org/feeds/feed.rss
            handler exec
            command curl -o %n.html %l
            chdir ~/debian_micronews/

    The "%n" and "%l" placeholders do not need to be quoted: they are handled
    safely even when their expansions contain white spaces.  See
    crossbow-format(5).

    It is of course possible to use any non-interactive download manager in
    place of curl(1), or maybe a specialized script that fetches the entry
    link and scrapes the content out of it.

  One mail per entry
    We want to turn individual feed entries into plain (HTML-free) text
    messages, and deliver them via email.

    Our goal can be achieved by means of a generic shell script like the
    following:

          #!/bin/sh

          set -e

          feed_title="$1"
          post_title="$2"
          link="$3"

          lynx "${link:--stdin}" -dump -force_html |
              sed "s/^~/~~/" |    # Escape dangerous tilde expressions
              mail -s "${feed_title:+${feed_title}: }${post_title:-...}" "${USER:?}"

    The script can be installed in the PATH, e.g. as
    /usr/local/bin/crossbow-to-mail, and then integrated in crossbow(1) as
    follows:

    ?   If the tracked feed encloses the whole content in the XML:

              feed debian_micro
                url https://micronews.debian.org/feeds/feed.rss
                handler pipe
                command crossbow-to-mail %ft %t

    ?   If the feed entries only relay the link to the article:

              feed lobsters.c
                url https://lobste.rs/t/c.rss
                handler exec
                command crossbow-to-mail %ft %t %l

    Note: The crossbow-to-mail script leverages lynx(1) to download and parse
    the HTML into textual form.  Any other

    Security remark: The "s/^~/~~/" sed(1) regex prevents accidental or
    malicious tilde escapes from being interpreted by the mail(1) program.
    The mutt(1) mail user agent, if available, can be used as a safer drop-in
    replacement.

  Maintain a local multimedia collection
    Many sites specialized in multimedia delivery can be scraped using tools
    such as youtube-dl(1).  If the web site allows the subscription of a
    feed, crossbow(1) can be combined with these tools in order to maintain
    incrementally a local collection of files.

    For example, YouTube provides feeds for users, channels and playlists.
    Each of these entities is assigned with a unique identifier, which can be
    easily figured by looking at the web URL.

    ?   Given a user identifier UID, the feed is
        https://youtube.com/feeds/videos.xml?user=UID

    ?   Given a channel identifier CID, the feed is
        https://youtube.com/feeds/videos.xml?channel_id=CID

    ?   Given a playlist identifier PID, the feed is
        https://youtube.com/feeds/videos.xml?playlist_id=PID

    What follows is a convenient wrapper script that ensures proper file
    naming (although it is always wiser to use %n, as explained above):

          #!/bin/sh

          link="${1:?missing link}"
          incremental_id="${2:?missing incremental id}"
          format="$3"

          # Transform a title in a reasonably safe 'slug'
          slugify() {
              tr -d \\n |                     # explicitly drop new-lines
              tr /[:punct:][:space:] . |      # turn all sly chars into dots
              tr -cs [:alnum:]                # squeeze repetitions
          }

          fname="$(
              youtube-dl \
                  --get-filename \
                  -o "%(id)s_%(title)s.%(ext)s" \
                  "$link"
          )" || exit 1

          youtube-dl \
              ${format:+-f "$format"} \
              -o "$(printf %s_%s "$incremental_id" "$fname" | slugify)" \
              --no-progress \
              "$link"

    Once again, the script can be installed in the PATH, e.g. as
    /usr/local/bin/crossbow-ytdl, and then integrated in crossbow(1) as
    follows:

    ?   To save each published video:

              feed computerophile
                url https://youtube.com/feeds/videos.xml?user=Computerphile
                handler exec
                command crossbow-ytdl %l %n

    ?   To save only the audio of each published video:

              feed nodumb
                url https://youtube.com/feeds/videos.xml?channel_id=UCVnIvJuTZqM5nnwGFpA57_Q
                handler exec
                command crossbow-ytdl %l %n

SEE ALSO
    crossbow(1), lynx(1), sed(1), youtube-dl(1), crontab(5), cron(8)

AUTHORS
    Giovanni Simoni <[email protected]>

                               October 9, 2021