Introduction
Introduction Statistics Contact Development Disclaimer Help
README - sfeed - RSS and Atom parser
git clone git://git.codemadness.org/sfeed
Log
Files
Refs
README
LICENSE
---
README (35959B)
---
1 sfeed
2 -----
3
4 RSS and Atom parser (and some format programs).
5
6 It converts RSS or Atom feeds from XML to a TAB-separated file. There are
7 formatting programs included to convert this TAB-separated format to var…
8 other formats. There are also some programs and scripts included to impo…
9 export OPML and to fetch, filter, merge and order feed items.
10
11
12 Build and install
13 -----------------
14
15 $ make
16 # make install
17
18
19 To build sfeed without sfeed_curses set SFEED_CURSES to an empty string:
20
21 $ make SFEED_CURSES=""
22 # make SFEED_CURSES="" install
23
24
25 To change the theme for sfeed_curses you can set SFEED_THEME. See the t…
26 directory for the theme names.
27
28 $ make SFEED_THEME="templeos"
29 # make SFEED_THEME="templeos" install
30
31
32 Usage
33 -----
34
35 Initial setup:
36
37 mkdir -p "$HOME/.sfeed/feeds"
38 cp sfeedrc.example "$HOME/.sfeed/sfeedrc"
39
40 Edit the sfeedrc(5) configuration file and change any RSS/Atom feeds. Th…
41 is included and evaluated as a shellscript for sfeed_update, so its func…
42 and behaviour can be overridden:
43
44 $EDITOR "$HOME/.sfeed/sfeedrc"
45
46 or you can import existing OPML subscriptions using sfeed_opml_import(1):
47
48 sfeed_opml_import < file.opml > "$HOME/.sfeed/sfeedrc"
49
50 an example to export from an other RSS/Atom reader called newsboat and i…
51 for sfeed_update:
52
53 newsboat -e | sfeed_opml_import > "$HOME/.sfeed/sfeedrc"
54
55 an example to export from an other RSS/Atom reader called rss2email (3.x…
56 import for sfeed_update:
57
58 r2e opmlexport | sfeed_opml_import > "$HOME/.sfeed/sfeedrc"
59
60 Update feeds, this script merges the new items, see sfeed_update(1) for …
61 information what it can do:
62
63 sfeed_update
64
65 Format feeds:
66
67 Plain-text list:
68
69 sfeed_plain $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.txt"
70
71 HTML view (no frames), copy style.css for a default style:
72
73 cp style.css "$HOME/.sfeed/style.css"
74 sfeed_html $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.html"
75
76 HTML view with the menu as frames, copy style.css for a default style:
77
78 mkdir -p "$HOME/.sfeed/frames"
79 cp style.css "$HOME/.sfeed/frames/style.css"
80 cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/*
81
82 To automatically update your feeds periodically and format them in a way…
83 like you can make a wrapper script and add it as a cronjob.
84
85 Most protocols are supported because curl(1) is used by default and also…
86 settings from the environment (such as the $http_proxy environment varia…
87 are used.
88
89 The sfeed(1) program itself is just a parser that parses XML data from s…
90 and is therefore network protocol-agnostic. It can be used with HTTP, HT…
91 Gopher, SSH, etc.
92
93 See the section "Usage and examples" below and the man-pages for more
94 information how to use sfeed(1) and the additional tools.
95
96
97 Dependencies
98 ------------
99
100 - C compiler (C99).
101 - libc (recommended: C99 and POSIX >= 200809).
102
103
104 Optional dependencies
105 ---------------------
106
107 - POSIX make(1) for the Makefile.
108 - POSIX sh(1),
109 used by sfeed_update(1) and sfeed_opml_export(1).
110 - POSIX utilities such as awk(1) and sort(1),
111 used by sfeed_content(1), sfeed_markread(1), sfeed_opml_export(1) and
112 sfeed_update(1).
113 - curl(1) binary: https://curl.haxx.se/ ,
114 used by sfeed_update(1), but can be replaced with any tool like wget(1…
115 OpenBSD ftp(1) or hurl(1): https://git.codemadness.org/hurl/
116 - iconv(1) command-line utilities,
117 used by sfeed_update(1). If the text in your RSS/Atom feeds are alread…
118 encoded then you don't need this. For a minimal iconv implementation:
119 https://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c
120 - xargs with support for the -P and -0 option,
121 used by sfeed_update(1).
122 - mandoc for documentation: https://mdocml.bsd.lv/
123 - curses (typically ncurses), otherwise see minicurses.h,
124 used by sfeed_curses(1).
125 - a terminal (emulator) supporting UTF-8 and the used capabilities,
126 used by sfeed_curses(1).
127
128
129 Optional run-time dependencies for sfeed_curses
130 -----------------------------------------------
131
132 - xclip for yanking the URL or enclosure. See $SFEED_YANKER to change it.
133 - xdg-open, used as a plumber by default. See $SFEED_PLUMBER to change i…
134 - awk, used by the sfeed_content and sfeed_markread script.
135 See the ENVIRONMENT VARIABLES section in the man page to change it.
136 - lynx, used by the sfeed_content script to convert HTML content.
137 See the ENVIRONMENT VARIABLES section in the man page to change it.
138
139
140 Formats supported
141 -----------------
142
143 sfeed supports a subset of XML 1.0 and a subset of:
144
145 - Atom 1.0 (RFC 4287): https://datatracker.ietf.org/doc/html/rfc4287
146 - Atom 0.3 (draft, historic).
147 - RSS 0.90+.
148 - RDF (when used with RSS).
149 - MediaRSS extensions (media:).
150 - Dublin Core extensions (dc:).
151
152 Other formats like JSON Feed, twtxt or certain RSS/Atom extensions are
153 supported by converting them to RSS/Atom or to the sfeed(5) format direc…
154
155
156 OS tested
157 ---------
158
159 - Linux,
160 compilers: clang, gcc, chibicc, cproc, lacc, pcc, scc, tcc,
161 libc: glibc, musl.
162 - OpenBSD (clang, gcc).
163 - NetBSD (with NetBSD curses).
164 - FreeBSD
165 - DragonFlyBSD
166 - GNU/Hurd
167 - Illumos (OpenIndiana).
168 - Windows (cygwin gcc + mintty, mingw).
169 - HaikuOS
170 - SerenityOS
171 - FreeDOS (djgpp, Open Watcom).
172 - FUZIX (sdcc -mz80, with the sfeed parser program).
173
174
175 Architectures tested
176 --------------------
177
178 amd64, ARM, aarch64, HPPA, i386, MIPS32-BE, RISCV64, SPARC64, Z80.
179
180
181 Files
182 -----
183
184 sfeed - Read XML RSS or Atom feed data from stdin. Write fee…
185 in TAB-separated format to stdout.
186 sfeed_atom - Format feed data (TSV) to an Atom feed.
187 sfeed_content - View item content, for use with sfeed_curses.
188 sfeed_curses - Format feed data (TSV) to a curses interface.
189 sfeed_frames - Format feed data (TSV) to HTML file(s) with frames.
190 sfeed_gopher - Format feed data (TSV) to Gopher files.
191 sfeed_html - Format feed data (TSV) to HTML.
192 sfeed_json - Format feed data (TSV) to JSON Feed.
193 sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file.
194 sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file.
195 sfeed_markread - Mark items as read/unread, for use with sfeed_curses.
196 sfeed_mbox - Format feed data (TSV) to mbox.
197 sfeed_plain - Format feed data (TSV) to a plain-text list.
198 sfeed_twtxt - Format feed data (TSV) to a twtxt feed.
199 sfeed_update - Update feeds and merge items.
200 sfeed_web - Find URLs to RSS/Atom feed from a webpage.
201 sfeed_xmlenc - Detect character-set encoding from a XML stream.
202 sfeedrc.example - Example config file. Can be copied to $HOME/.sfeed/s…
203 style.css - Example stylesheet to use with sfeed_html(1) and
204 sfeed_frames(1).
205
206
207 Files read at runtime by sfeed_update(1)
208 ----------------------------------------
209
210 sfeedrc - Config file. This file is evaluated as a shellscript in
211 sfeed_update(1).
212
213 At least the following functions can be overridden per feed:
214
215 - fetch: to use wget(1), OpenBSD ftp(1) or an other download program.
216 - filter: to filter on fields.
217 - merge: to change the merge logic.
218 - order: to change the sort order.
219
220 See also the sfeedrc(5) man page documentation for more details.
221
222 The feeds() function is called to process the feeds. The default feed()
223 function is executed concurrently as a background job in your sfeedrc(5)…
224 file to make updating faster. The variable maxjobs can be changed to lim…
225 increase the amount of concurrent jobs (8 by default).
226
227
228 Files written at runtime by sfeed_update(1)
229 -------------------------------------------
230
231 feedname - TAB-separated format containing all items per feed. The
232 sfeed_update(1) script merges new items with this file.
233 The format is documented in sfeed(5).
234
235
236 File format
237 -----------
238
239 man 5 sfeed
240 man 5 sfeedrc
241 man 1 sfeed
242
243
244 Usage and examples
245 ------------------
246
247 Find RSS/Atom feed URLs from a webpage:
248
249 url="https://codemadness.org"; curl -L -s "$url" | sfeed_web "$u…
250
251 output example:
252
253 https://codemadness.org/atom.xml application/atom+xml
254 https://codemadness.org/atom_content.xml application/atom…
255
256 - - -
257
258 Make sure your sfeedrc config file exists, see the sfeedrc.example file.…
259 update your feeds (configfile argument is optional):
260
261 sfeed_update "configfile"
262
263 Format the feeds files:
264
265 # Plain-text list.
266 sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt
267 # HTML view (no frames), copy style.css for a default style.
268 sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html
269 # HTML view with the menu as frames, copy style.css for a defaul…
270 mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feed…
271
272 View formatted output in your browser:
273
274 $BROWSER "$HOME/.sfeed/feeds.html"
275
276 View formatted output in your editor:
277
278 $EDITOR "$HOME/.sfeed/feeds.txt"
279
280 - - -
281
282 View formatted output in a curses interface. The interface has a look i…
283 by the mutt mail client. It has a sidebar panel for the feeds, a panel …
284 listing of the items and a small statusbar for the selected item/URL. So…
285 functions like searching and scrolling are integrated in the interface i…
286
287 Just like the other format programs included in sfeed you can run it lik…
288
289 sfeed_curses ~/.sfeed/feeds/*
290
291 ... or by reading from stdin:
292
293 sfeed_curses < ~/.sfeed/feeds/xkcd
294
295 By default sfeed_curses marks the items of the last day as new/bold. Thi…
296 might be overridden by setting the environment variable $SFEED_NEW_AGE t…
297 desired maximum in seconds. To manage read/unread items in a different w…
298 plain-text file with a list of the read URLs can be used. To enable this
299 behaviour the path to this file can be specified by setting the environm…
300 variable $SFEED_URL_FILE to the URL file:
301
302 export SFEED_URL_FILE="$HOME/.sfeed/urls"
303 [ -f "$SFEED_URL_FILE" ] || touch "$SFEED_URL_FILE"
304 sfeed_curses ~/.sfeed/feeds/*
305
306 It then uses the shellscript "sfeed_markread" to process the read and un…
307 items.
308
309 - - -
310
311 Example script to view feed items in a vertical list/menu in dmenu(1). I…
312 the selected URL in the browser set in $BROWSER:
313
314 #!/bin/sh
315 url=$(sfeed_plain "$HOME/.sfeed/feeds/"* | dmenu -l 35 -i | \
316 sed -n 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@p')
317 test -n "${url}" && $BROWSER "${url}"
318
319 dmenu can be found at: https://git.suckless.org/dmenu/
320
321 - - -
322
323 Generate a sfeedrc config file from your exported list of feeds in OPML
324 format:
325
326 sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc
327
328 - - -
329
330 Export an OPML file of your feeds from a sfeedrc config file (configfile
331 argument is optional):
332
333 sfeed_opml_export configfile > myfeeds.opml
334
335 - - -
336
337 The filter function can be overridden in your sfeedrc file. This allows
338 filtering items per feed. It can be used to shorten URLs, filter away
339 advertisements, strip tracking parameters and more.
340
341 # filter fields.
342 # filter(name, url)
343 filter() {
344 case "$1" in
345 "tweakers")
346 awk -F '\t' 'BEGIN { OFS = "\t"; }
347 # skip ads.
348 $2 ~ /^ADV:/ {
349 next;
350 }
351 # shorten link.
352 {
353 if (match($3, /^https:\/\/tweakers\.net\…
354 $3 = substr($3, RSTART, RLENGTH);
355 }
356 print $0;
357 }';;
358 "yt BSDNow")
359 # filter only BSD Now from channel.
360 awk -F '\t' '$2 ~ / \| BSD Now/';;
361 *)
362 cat;;
363 esac | \
364 # replace youtube links with embed links.
365 sed '[email protected]/[email protected]/…
366
367 awk -F '\t' 'BEGIN { OFS = "\t"; }
368 function filterlink(s) {
369 # protocol must start with http, https o…
370 if (match(s, /^(http|https|gopher):\/\//…
371 return "";
372 }
373
374 # shorten feedburner links.
375 if (match(s, /^(http|https):\/\/[^\/]+\/…
376 s = substr($3, RSTART, RLENGTH);
377 }
378
379 # strip tracking parameters
380 # urchin, facebook, piwik, webtrekk and …
381 gsub(/\?(ad|campaign|fbclid|pk|tm|utm|wt…
382 gsub(/&(ad|campaign|fbclid|pk|tm|utm|wt)…
383
384 gsub(/\?&/, "?", s);
385 gsub(/[\?&]+$/, "", s);
386
387 return s
388 }
389 {
390 $3 = filterlink($3); # link
391 $8 = filterlink($8); # enclosure
392
393 # try to remove tracking pixels: <img/> …
394 gsub("<img[^>]*(width|height)[[:space:]]…
395
396 print $0;
397 }'
398 }
399
400 - - -
401
402 Aggregate feeds. This filters new entries (maximum one day old) and sort…
403 by newest first. Prefix the feed name in the title. Convert the TSV outp…
404 to an Atom XML feed (again):
405
406 #!/bin/sh
407 cd ~/.sfeed/feeds/ || exit 1
408
409 awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
410 BEGIN { OFS = "\t"; }
411 int($1) >= old {
412 $2 = "[" FILENAME "] " $2;
413 print $0;
414 }' * | \
415 sort -k1,1rn | \
416 sfeed_atom
417
418 - - -
419
420 To have a "tail(1) -f"-like FIFO stream filtering for new unique feed it…
421 showing them as plain-text per line similar to sfeed_plain(1):
422
423 Create a FIFO:
424
425 fifo="/tmp/sfeed_fifo"
426 mkfifo "$fifo"
427
428 On the reading side:
429
430 # This keeps track of unique lines so might consume much memory.
431 # It tries to reopen the $fifo after 1 second if it fails.
432 while :; do cat "$fifo" || sleep 1; done | awk '!x[$0]++'
433
434 On the writing side:
435
436 feedsdir="$HOME/.sfeed/feeds/"
437 cd "$feedsdir" || exit 1
438 test -p "$fifo" || exit 1
439
440 # 1 day is old news, don't write older items.
441 awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
442 BEGIN { OFS = "\t"; }
443 int($1) >= old {
444 $2 = "[" FILENAME "] " $2;
445 print $0;
446 }' * | sort -k1,1n | sfeed_plain | cut -b 3- > "$fifo"
447
448 cut -b is used to trim the "N " prefix of sfeed_plain(1).
449
450 - - -
451
452 For some podcast feed the following code can be used to filter the latest
453 enclosure URL (probably some audio file):
454
455 awk -F '\t' 'BEGIN { latest = 0; }
456 length($8) {
457 ts = int($1);
458 if (ts > latest) {
459 url = $8;
460 latest = ts;
461 }
462 }
463 END { if (length(url)) { print url; } }'
464
465 ... or on a file already sorted from newest to oldest:
466
467 awk -F '\t' '$8 { print $8; exit }'
468
469 - - -
470
471 Over time your feeds file might become quite big. You can archive items …
472 feed from (roughly) the last week by doing for example:
473
474 awk -F '\t' -v "old=$(($(date +'%s') - 604800))" 'int($1) > old'…
475 mv feed feed.bak
476 mv feed.new feed
477
478 This could also be run weekly in a crontab to archive the feeds. Like th…
479 away old newspapers. It keeps the feeds list tidy and the formatted outp…
480 small.
481
482 - - -
483
484 Convert mbox to separate maildirs per feed and filter duplicate messages…
485 fdm program.
486 fdm is available at: https://github.com/nicm/fdm
487
488 fdm config file (~/.sfeed/fdm.conf):
489
490 set unmatched-mail keep
491
492 account "sfeed" mbox "%[home]/.sfeed/mbox"
493 $cachepath = "%[home]/.sfeed/fdm.cache"
494 cache "${cachepath}"
495 $maildir = "%[home]/feeds/"
496
497 # Check if message is in the cache by Message-ID.
498 match case "^Message-ID: (.*)" in headers
499 action {
500 tag "msgid" value "%1"
501 }
502 continue
503
504 # If it is in the cache, stop.
505 match matched and in-cache "${cachepath}" key "%[msgid]"
506 action {
507 keep
508 }
509
510 # Not in the cache, process it and add to cache.
511 match case "^X-Feedname: (.*)" in headers
512 action {
513 # Store to local maildir.
514 maildir "${maildir}%1"
515
516 add-to-cache "${cachepath}" key "%[msgid…
517 keep
518 }
519
520 Now run:
521
522 $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
523 $ fdm -f ~/.sfeed/fdm.conf fetch
524
525 Now you can view feeds in mutt(1) for example.
526
527 - - -
528
529 Read from mbox and filter duplicate messages using the fdm program and d…
530 it to a SMTP server. This works similar to the rss2email program.
531 fdm is available at: https://github.com/nicm/fdm
532
533 fdm config file (~/.sfeed/fdm.conf):
534
535 set unmatched-mail keep
536
537 account "sfeed" mbox "%[home]/.sfeed/mbox"
538 $cachepath = "%[home]/.sfeed/fdm.cache"
539 cache "${cachepath}"
540
541 # Check if message is in the cache by Message-ID.
542 match case "^Message-ID: (.*)" in headers
543 action {
544 tag "msgid" value "%1"
545 }
546 continue
547
548 # If it is in the cache, stop.
549 match matched and in-cache "${cachepath}" key "%[msgid]"
550 action {
551 keep
552 }
553
554 # Not in the cache, process it and add to cache.
555 match case "^X-Feedname: (.*)" in headers
556 action {
557 # Connect to a SMTP server and attempt t…
558 # mail to it.
559 # Of course change the server and e-mail…
560 smtp server "codemadness.org" to "hiltjo…
561
562 add-to-cache "${cachepath}" key "%[msgid…
563 keep
564 }
565
566 Now run:
567
568 $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
569 $ fdm -f ~/.sfeed/fdm.conf fetch
570
571 Now you can view feeds in mutt(1) for example.
572
573 - - -
574
575 Convert mbox to separate maildirs per feed and filter duplicate messages…
576 procmail(1).
577
578 procmail_maildirs.sh file:
579
580 maildir="$HOME/feeds"
581 feedsdir="$HOME/.sfeed/feeds"
582 procmailconfig="$HOME/.sfeed/procmailrc"
583
584 # message-id cache to prevent duplicates.
585 mkdir -p "${maildir}/.cache"
586
587 if ! test -r "${procmailconfig}"; then
588 printf "Procmail configuration file \"%s\" does not exis…
589 echo "See procmailrc.example for an example." >&2
590 exit 1
591 fi
592
593 find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while rea…
594 name=$(basename "${d}")
595 mkdir -p "${maildir}/${name}/cur"
596 mkdir -p "${maildir}/${name}/new"
597 mkdir -p "${maildir}/${name}/tmp"
598 printf 'Mailbox %s\n' "${name}"
599 sfeed_mbox "${d}" | formail -s procmail "${procmailconfi…
600 done
601
602 Procmailrc(5) file:
603
604 # Example for use with sfeed_mbox(1).
605 # The header X-Feedname is used to split into separate maildirs.…
606 # assumed this name is sane.
607
608 MAILDIR="$HOME/feeds/"
609
610 :0
611 * ^X-Feedname: \/.*
612 {
613 FEED="$MATCH"
614
615 :0 Wh: "msgid_$FEED.lock"
616 | formail -D 1024000 ".cache/msgid_$FEED.cache"
617
618 :0
619 "$FEED"/
620 }
621
622 Now run:
623
624 $ procmail_maildirs.sh
625
626 Now you can view feeds in mutt(1) for example.
627
628 - - -
629
630 The fetch function can be overridden in your sfeedrc file. This allows to
631 replace the default curl(1) for sfeed_update with any other client to fe…
632 RSS/Atom data or change the default curl options:
633
634 # fetch a feed via HTTP/HTTPS etc.
635 # fetch(name, url, feedfile)
636 fetch() {
637 hurl -m 1048576 -t 15 "$2" 2>/dev/null
638 }
639
640 - - -
641
642 Caching, incremental data updates and bandwidth saving
643
644 For servers that support it some incremental updates and bandwidth savin…
645 be done by using the "ETag" HTTP header.
646
647 Create a directory for storing the ETags and modification timestamps per…
648
649 mkdir -p ~/.sfeed/etags ~/.sfeed/lastmod
650
651 The curl ETag options (--etag-save and --etag-compare) can be used to st…
652 send the previous ETag header value. curl version 7.73+ is recommended f…
653 to work properly.
654
655 The curl -z option can be used to send the modification date of a local …
656 a HTTP "If-Modified-Since" request header. The server can then respond i…
657 data is modified or not or respond with only the incremental data.
658
659 The curl --compressed option can be used to indicate the client supports
660 decompression. Because RSS/Atom feeds are textual XML content this gener…
661 compresses very well.
662
663 These options can be set by overriding the fetch() function in the sfeed…
664 file:
665
666 # fetch(name, url, feedfile)
667 fetch() {
668 basename="$(basename "$3")"
669 etag="$HOME/.sfeed/etags/${basename}"
670 lastmod="$HOME/.sfeed/lastmod/${basename}"
671 output="${sfeedtmpdir}/feeds/${filename}.xml"
672
673 curl \
674 -f -s -m 15 \
675 -L --max-redirs 0 \
676 -H "User-Agent: sfeed" \
677 --compressed \
678 --etag-save "${etag}" --etag-compare "${etag}" \
679 -R -o "${output}" \
680 -z "${lastmod}" \
681 "$2" 2>/dev/null || return 1
682
683 # succesful, but no file written: assume it is OK and No…
684 [ -e "${output}" ] || return 0
685
686 # use server timestamp from curl -R to set Last-Modified.
687 touch -r "${output}" "${lastmod}" 2>/dev/null
688 cat "${output}" 2>/dev/null
689 # use write output status, other errors are ignored here.
690 fetchstatus="$?"
691 rm -f "${output}" 2>/dev/null
692 return "${fetchstatus}"
693 }
694
695 These options can come at a cost of some privacy, because it exposes
696 additional metadata from the previous request.
697
698 - - -
699
700 CDNs blocking requests due to a missing HTTP User-Agent request header
701
702 sfeed_update will not send the "User-Agent" header by default for privacy
703 reasons. Some CDNs like Cloudflare or websites like Reddit.com don't li…
704 and will block such HTTP requests.
705
706 A custom User-Agent can be set by using the curl -H option, like so:
707
708 curl -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; r…
709
710 The above example string pretends to be a Windows 10 (x86-64) machine ru…
711 Firefox 78.
712
713 - - -
714
715 Page redirects
716
717 For security and efficiency reasons by default redirects are not allowed…
718 are treated as an error.
719
720 For example to prevent hijacking an unencrypted http:// to https:// redi…
721 to not add time of an unnecessary page redirect each time. It is encour…
722 use the final redirected URL in the sfeedrc config file.
723
724 If you want to ignore this advise you can override the fetch() function …
725 sfeedrc file and change the curl options "-L --max-redirs 0".
726
727 - - -
728
729 Shellscript to handle URLs and enclosures in parallel using xargs -P.
730
731 This can be used to download and process URLs for downloading podcasts,
732 webcomics, download and convert webpages, mirror videos, etc. It uses a
733 plain-text cache file for remembering processed URLs. The match patterns…
734 defined in the shellscript fetch() function and in the awk script and ca…
735 modified to handle items differently depending on their context.
736
737 The arguments for the script are files in the sfeed(5) format. If no file
738 arguments are specified then the data is read from stdin.
739
740 #!/bin/sh
741 # sfeed_download: downloader for URLs and enclosures in sfeed(5)…
742 # Dependencies: awk, curl, flock, xargs (-P), yt-dlp.
743
744 cachefile="${SFEED_CACHEFILE:-$HOME/.sfeed/downloaded_urls}"
745 jobs="${SFEED_JOBS:-4}"
746 lockfile="${HOME}/.sfeed/sfeed_download.lock"
747
748 # log(feedname, s, status)
749 log() {
750 if [ "$1" != "-" ]; then
751 s="[$1] $2"
752 else
753 s="$2"
754 fi
755 printf '[%s]: %s: %s\n' "$(date +'%H:%M:%S')" "${s}" "$3"
756 }
757
758 # fetch(url, feedname)
759 fetch() {
760 case "$1" in
761 *youtube.com*)
762 yt-dlp "$1";;
763 *.flac|*.ogg|*.m3u|*.m3u8|*.m4a|*.mkv|*.mp3|*.mp4|*.wav|…
764 # allow 2 redirects, hide User-Agent, connect ti…
765 curl -O -L --max-redirs 2 -H "User-Agent:" -f -s…
766 esac
767 }
768
769 # downloader(url, title, feedname)
770 downloader() {
771 url="$1"
772 title="$2"
773 feedname="${3##*/}"
774
775 msg="${title}: ${url}"
776
777 # download directory.
778 if [ "${feedname}" != "-" ]; then
779 mkdir -p "${feedname}"
780 if ! cd "${feedname}"; then
781 log "${feedname}" "${msg}: ${feedname}" …
782 return 1
783 fi
784 fi
785
786 log "${feedname}" "${msg}" "START"
787 if fetch "${url}" "${feedname}"; then
788 log "${feedname}" "${msg}" "OK"
789
790 # append it safely in parallel to the cachefile …
791 # successful download.
792 (flock 9 || exit 1
793 printf '%s\n' "${url}" >> "${cachefile}"
794 ) 9>"${lockfile}"
795 else
796 log "${feedname}" "${msg}" "FAIL" >&2
797 return 1
798 fi
799 return 0
800 }
801
802 if [ "${SFEED_DOWNLOAD_CHILD}" = "1" ]; then
803 # Downloader helper for parallel downloading.
804 # Receives arguments: $1 = URL, $2 = title, $3 = feed fi…
805 # It should write the URI to the cachefile if it is succ…
806 downloader "$1" "$2" "$3"
807 exit $?
808 fi
809
810 # ...else parent mode:
811
812 tmp="$(mktemp)" || exit 1
813 trap "rm -f ${tmp}" EXIT
814
815 [ -f "${cachefile}" ] || touch "${cachefile}"
816 cat "${cachefile}" > "${tmp}"
817 echo >> "${tmp}" # force it to have one line for awk.
818
819 LC_ALL=C awk -F '\t' '
820 # fast prefilter what to download or not.
821 function filter(url, field, feedname) {
822 u = tolower(url);
823 return (match(u, "youtube\\.com") ||
824 match(u, "\\.(flac|ogg|m3u|m3u8|m4a|mkv|mp3|mp4|…
825 }
826 function download(url, field, title, filename) {
827 if (!length(url) || urls[url] || !filter(url, field, fil…
828 return;
829 # NUL-separated for xargs -0.
830 printf("%s%c%s%c%s%c", url, 0, title, 0, filename, 0);
831 urls[url] = 1; # print once
832 }
833 {
834 FILENR += (FNR == 1);
835 }
836 # lookup table from cachefile which contains downloaded URLs.
837 FILENR == 1 {
838 urls[$0] = 1;
839 }
840 # feed file(s).
841 FILENR != 1 {
842 download($3, 3, $2, FILENAME); # link
843 download($8, 8, $2, FILENAME); # enclosure
844 }
845 ' "${tmp}" "${@:--}" | \
846 SFEED_DOWNLOAD_CHILD="1" xargs -r -0 -L 3 -P "${jobs}" "$(readli…
847
848 - - -
849
850 Shellscript to export existing newsboat cached items from sqlite3 to the…
851 TSV format.
852
853 #!/bin/sh
854 # Export newsbeuter/newsboat cached items from sqlite3 to the sf…
855 # The data is split per file per feed with the name of the newsb…
856 # It writes the URLs of the read items line by line to a "urls" …
857 #
858 # Dependencies: sqlite3, awk.
859 #
860 # Usage: create some directory to store the feeds then run this …
861
862 # newsboat cache.db file.
863 cachefile="$HOME/.newsboat/cache.db"
864 test -n "$1" && cachefile="$1"
865
866 # dump data.
867 # .mode ascii: Columns/rows delimited by 0x1F and 0x1E
868 # get the first fields in the order of the sfeed(5) format.
869 sqlite3 "$cachefile" <<!EOF |
870 .headers off
871 .mode ascii
872 .output
873 SELECT
874 i.pubDate, i.title, i.url, i.content, i.content_mime_typ…
875 i.guid, i.author, i.enclosure_url,
876 f.rssurl AS rssurl, f.title AS feedtitle, i.unread
877 -- i.id, i.enclosure_type, i.enqueued, i.flags, i.delete…
878 FROM rss_feed f
879 INNER JOIN rss_item i ON i.feedurl = f.rssurl
880 ORDER BY
881 i.feedurl ASC, i.pubDate DESC;
882 .quit
883 !EOF
884 # convert to sfeed(5) TSV format.
885 LC_ALL=C awk '
886 BEGIN {
887 FS = "\x1f";
888 RS = "\x1e";
889 }
890 # normal non-content fields.
891 function field(s) {
892 gsub("^[[:space:]]*", "", s);
893 gsub("[[:space:]]*$", "", s);
894 gsub("[[:space:]]", " ", s);
895 gsub("[[:cntrl:]]", "", s);
896 return s;
897 }
898 # content field.
899 function content(s) {
900 gsub("^[[:space:]]*", "", s);
901 gsub("[[:space:]]*$", "", s);
902 # escape chars in content field.
903 gsub("\\\\", "\\\\", s);
904 gsub("\n", "\\n", s);
905 gsub("\t", "\\t", s);
906 return s;
907 }
908 function feedname(feedurl, feedtitle) {
909 if (feedtitle == "") {
910 gsub("/", "_", feedurl);
911 return feedurl;
912 }
913 gsub("/", "_", feedtitle);
914 return feedtitle;
915 }
916 {
917 fname = feedname($9, $10);
918 if (!feed[fname]++) {
919 print "Writing file: \"" fname "\" (title: " $10…
920 }
921
922 contenttype = field($5);
923 if (contenttype == "")
924 contenttype = "html";
925 else if (index(contenttype, "/html") || index(contenttyp…
926 contenttype = "html";
927 else
928 contenttype = "plain";
929
930 print $1 "\t" field($2) "\t" field($3) "\t" content($4) …
931 contenttype "\t" field($6) "\t" field($7) "\t" f…
932 > fname;
933
934 # write URLs of the read items to a file line by line.
935 if ($11 == "0") {
936 print $3 > "urls";
937 }
938 }'
939
940 - - -
941
942 Progress indicator
943 ------------------
944
945 The below sfeed_update wrapper script counts the amount of feeds in a sf…
946 config. It then calls sfeed_update and pipes the output lines to a func…
947 that counts the current progress. It writes the total progress to stderr.
948 Alternative: pv -l -s totallines
949
950 #!/bin/sh
951 # Progress indicator script.
952
953 # Pass lines as input to stdin and write progress status to stde…
954 # progress(totallines)
955 progress() {
956 total="$(($1 + 0))" # must be a number, no divide by zer…
957 test "${total}" -le 0 -o "$1" != "${total}" && return
958 LC_ALL=C awk -v "total=${total}" '
959 {
960 counter++;
961 percent = (counter * 100) / total;
962 printf("\033[K") > "/dev/stderr"; # clear EOL
963 print $0;
964 printf("[%s/%s] %.0f%%\r", counter, total, percent) > "/…
965 fflush(); # flush all buffers per line.
966 }
967 END {
968 printf("\033[K") > "/dev/stderr";
969 }'
970 }
971
972 # Counts the feeds from the sfeedrc config.
973 countfeeds() {
974 count=0
975 . "$1"
976 feed() {
977 count=$((count + 1))
978 }
979 feeds
980 echo "${count}"
981 }
982
983 config="${1:-$HOME/.sfeed/sfeedrc}"
984 total=$(countfeeds "${config}")
985 sfeed_update "${config}" 2>&1 | progress "${total}"
986
987 - - -
988
989 Counting unread and total items
990 -------------------------------
991
992 It can be useful to show the counts of unread items, for example in a
993 windowmanager or statusbar.
994
995 The below example script counts the items of the last day in the same wa…
996 formatting tools do:
997
998 #!/bin/sh
999 # Count the new items of the last day.
1000 LC_ALL=C awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
1001 {
1002 total++;
1003 }
1004 int($1) >= old {
1005 totalnew++;
1006 }
1007 END {
1008 print "New: " totalnew;
1009 print "Total: " total;
1010 }' ~/.sfeed/feeds/*
1011
1012 The below example script counts the unread items using the sfeed_curses …
1013 file:
1014
1015 #!/bin/sh
1016 # Count the unread and total items from feeds using the URL file.
1017 LC_ALL=C awk -F '\t' '
1018 # URL file: amount of fields is 1.
1019 NF == 1 {
1020 u[$0] = 1; # lookup table of URLs.
1021 next;
1022 }
1023 # feed file: check by URL or id.
1024 {
1025 total++;
1026 if (length($3)) {
1027 if (u[$3])
1028 read++;
1029 } else if (length($6)) {
1030 if (u[$6])
1031 read++;
1032 }
1033 }
1034 END {
1035 print "Unread: " (total - read);
1036 print "Total: " total;
1037 }' ~/.sfeed/urls ~/.sfeed/feeds/*
1038
1039 - - -
1040
1041 sfeed.c: adding new XML tags or sfeed(5) fields to the parser
1042 -------------------------------------------------------------
1043
1044 sfeed.c contains definitions to parse XML tags and map them to sfeed(5) …
1045 fields. Parsed RSS and Atom tag names are first stored as a TagId, which…
1046 number. This TagId is then mapped to the output field index.
1047
1048 Steps to modify the code:
1049
1050 * Add a new TagId enum for the tag.
1051
1052 * (optional) Add a new FeedField* enum for the new output field or you c…
1053 it to an existing field.
1054
1055 * Add the new XML tag name to the array variable of parsed RSS or Atom
1056 tags: rsstags[] or atomtags[].
1057
1058 These must be defined in alphabetical order, because a binary search i…
1059 which uses the strcasecmp() function.
1060
1061 * Add the parsed TagId to the output field in the array variable fieldma…
1062
1063 When another tag is also mapped to the same output field then the tag …
1064 the highest TagId number value overrides the mapped field: the order i…
1065 least important to high.
1066
1067 * If this defined tag is just using the inner data of the XML tag, then …
1068 definition is enough. If it for example has to parse a certain attribu…
1069 have to add a check for the TagId to the xmlattr() callback function.
1070
1071 * (optional) Print the new field in the printfields() function.
1072
1073 Below is a patch example to add the MRSS "media:content" tag as a new fi…
1074
1075 diff --git a/sfeed.c b/sfeed.c
1076 --- a/sfeed.c
1077 +++ b/sfeed.c
1078 @@ -50,7 +50,7 @@ enum TagId {
1079 RSSTagGuidPermalinkTrue,
1080 /* must be defined after GUID, because it can be a link (isPerm…
1081 RSSTagLink,
1082 - RSSTagEnclosure,
1083 + RSSTagMediaContent, RSSTagEnclosure,
1084 RSSTagAuthor, RSSTagDccreator,
1085 RSSTagCategory,
1086 /* Atom */
1087 @@ -81,7 +81,7 @@ typedef struct field {
1088 enum {
1089 FeedFieldTime = 0, FeedFieldTitle, FeedFieldLink, FeedFieldCont…
1090 FeedFieldId, FeedFieldAuthor, FeedFieldEnclosure, FeedFieldCate…
1091 - FeedFieldLast
1092 + FeedFieldMediaContent, FeedFieldLast
1093 };
1094
1095 typedef struct feedcontext {
1096 @@ -137,6 +137,7 @@ static const FeedTag rsstags[] = {
1097 { STRP("enclosure"), RSSTagEnclosure },
1098 { STRP("guid"), RSSTagGuid },
1099 { STRP("link"), RSSTagLink },
1100 + { STRP("media:content"), RSSTagMediaContent },
1101 { STRP("media:description"), RSSTagMediaDescription },
1102 { STRP("pubdate"), RSSTagPubdate },
1103 { STRP("title"), RSSTagTitle }
1104 @@ -180,6 +181,7 @@ static const int fieldmap[TagLast] = {
1105 [RSSTagGuidPermalinkFalse] = FeedFieldId,
1106 [RSSTagGuidPermalinkTrue] = FeedFieldId, /* special-case: both…
1107 [RSSTagLink] = FeedFieldLink,
1108 + [RSSTagMediaContent] = FeedFieldMediaContent,
1109 [RSSTagEnclosure] = FeedFieldEnclosure,
1110 [RSSTagAuthor] = FeedFieldAuthor,
1111 [RSSTagDccreator] = FeedFieldAuthor,
1112 @@ -677,6 +679,8 @@ printfields(void)
1113 string_print_uri(&ctx.fields[FeedFieldEnclosure].str);
1114 putchar(FieldSeparator);
1115 string_print_trimmed_multi(&ctx.fields[FeedFieldCategory].str);
1116 + putchar(FieldSeparator);
1117 + string_print_trimmed(&ctx.fields[FeedFieldMediaContent].str);
1118 putchar('\n');
1119
1120 if (ferror(stdout)) /* check for errors but do not flush */
1121 @@ -718,7 +722,7 @@ xmlattr(XMLParser *p, const char *t, size_t tl, cons…
1122 }
1123
1124 if (ctx.feedtype == FeedTypeRSS) {
1125 - if (ctx.tag.id == RSSTagEnclosure &&
1126 + if ((ctx.tag.id == RSSTagEnclosure || ctx.tag.id == RSS…
1127 isattr(n, nl, STRP("url"))) {
1128 string_append(&tmpstr, v, vl);
1129 } else if (ctx.tag.id == RSSTagGuid &&
1130
1131 - - -
1132
1133 Running custom commands inside the sfeed_curses program
1134 -------------------------------------------------------
1135
1136 Running commands inside the sfeed_curses program can be useful for examp…
1137 sync items or mark all items across all feeds as read. It can be comfort…
1138 have a keybind for this inside the program to perform a scripted action …
1139 then reload the feeds by sending the signal SIGHUP.
1140
1141 In the input handling code you can then add a case:
1142
1143 case 'M':
1144 forkexec((char *[]) { "markallread.sh", NULL }, 0);
1145 break;
1146
1147 or
1148
1149 case 'S':
1150 forkexec((char *[]) { "syncnews.sh", NULL }, 1);
1151 break;
1152
1153 The specified script should be in $PATH or be an absolute path.
1154
1155 Example of a `markallread.sh` shellscript to mark all URLs as read:
1156
1157 #!/bin/sh
1158 # mark all items/URLs as read.
1159 tmp="$(mktemp)" || exit 1
1160 (cat ~/.sfeed/urls; cut -f 3 ~/.sfeed/feeds/*) | \
1161 awk '!x[$0]++' > "$tmp" &&
1162 mv "$tmp" ~/.sfeed/urls &&
1163 pkill -SIGHUP sfeed_curses # reload feeds.
1164
1165 Example of a `syncnews.sh` shellscript to update the feeds and reload th…
1166
1167 #!/bin/sh
1168 sfeed_update
1169 pkill -SIGHUP sfeed_curses
1170
1171
1172 Running programs in a new session
1173 ---------------------------------
1174
1175 By default processes are spawned in the same session and process group as
1176 sfeed_curses. When sfeed_curses is closed this can also close the spawn…
1177 process in some cases.
1178
1179 When the setsid command-line program is available the following wrapper …
1180 can be used to run the program in a new session, for a plumb program:
1181
1182 setsid -f xdg-open "$@"
1183
1184 Alternatively the code can be changed to call setsid() before execvp().
1185
1186
1187 Open an URL directly in the same terminal
1188 -----------------------------------------
1189
1190 To open an URL directly in the same terminal using the text-mode lynx br…
1191
1192 SFEED_PLUMBER=lynx SFEED_PLUMBER_INTERACTIVE=1 sfeed_curses ~/.s…
1193
1194
1195 Yank to tmux buffer
1196 -------------------
1197
1198 This changes the yank command to set the tmux buffer, instead of X11 xcl…
1199
1200 SFEED_YANKER="tmux set-buffer \`cat\`"
1201
1202
1203 Alternative for xargs -P and -0
1204 -------------------------------
1205
1206 Most xargs implementations support the options -P and -0.
1207 GNU or *BSD has supported them for over 20+ years!
1208
1209 These functions in sfeed_update can be overridden in sfeedrc, if you don…
1210 to use xargs:
1211
1212 feed() {
1213 # wait until ${maxjobs} are finished: will stall the que…
1214 # is slow, but it is portable.
1215 [ ${signo} -ne 0 ] && return
1216 [ $((curjobs % maxjobs)) -eq 0 ] && wait
1217 [ ${signo} -ne 0 ] && return
1218 curjobs=$((curjobs + 1))
1219
1220 _feed "$@" &
1221 }
1222
1223 runfeeds() {
1224 # job counter.
1225 curjobs=0
1226 # fetch feeds specified in config file.
1227 feeds
1228 # wait till all feeds are fetched (concurrently).
1229 [ ${signo} -eq 0 ] && wait
1230 }
1231
1232
1233 Known terminal issues
1234 ---------------------
1235
1236 Below lists some bugs or missing features in terminals that are found wh…
1237 testing sfeed_curses. Some of them might be fixed already upstream:
1238
1239 - cygwin + mintty: the xterm mouse-encoding of the mouse position is bro…
1240 scrolling.
1241 - HaikuOS terminal: the xterm mouse-encoding of the mouse button number …
1242 middle-button, right-button is incorrect / reversed.
1243 - putty: the full reset attribute (ESC c, typically `rs1`) does not rese…
1244 window title.
1245 - Mouse button encoding for extended buttons (like side-buttons) in some
1246 terminals are unsupported or map to the same button: for example side-…
1247 and 8 map to the scroll buttons 4 and 5 in urxvt.
1248
1249
1250 License
1251 -------
1252
1253 ISC, see LICENSE file.
1254
1255
1256 Author
1257 ------
1258
1259 Hiltjo Posthuma <[email protected]>
You are viewing proxied material from codemadness.org. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.