Introduction
Introduction Statistics Contact Development Disclaimer Help
webdump.1 - webdump - HTML to plain-text converter for webpages
git clone git://git.codemadness.org/webdump
Log
Files
Refs
README
LICENSE
---
webdump.1 (3230B)
---
1 .Dd October 6, 2023
2 .Dt WEBDUMP 1
3 .Os
4 .Sh NAME
5 .Nm webdump
6 .Nd convert HTML to plain-text
7 .Sh SYNOPSIS
8 .Nm
9 .Op Fl 8adiIlrx
10 .Op Fl b Ar baseurl
11 .Op Fl s Ar selector
12 .Op Fl u Ar selector
13 .Op Fl w Ar termwidth
14 .Sh DESCRIPTION
15 .Nm
16 reads UTF-8 HTML data from stdin.
17 It converts and writes the output as plain-text to stdout.
18 A
19 .Ar baseurl
20 can be specified if the links in the feed are relative URLs.
21 This must be an absolute URI.
22 .Pp
23 The options are as follows:
24 .Bl -tag -width Ds
25 .It Fl 8
26 Use UTF-8 symbols for certain items like bullet items and rulers to make…
27 output fancier.
28 .It Fl a
29 Toggle ANSI escape codes usage, by default it is not enabled.
30 .It Fl b Ar baseurl
31 Base URL of links.
32 This is used to make links absolute.
33 The specified URL is always preferred over the value in a <base/> tag.
34 .It Fl d
35 Deduplicate link references.
36 When a duplicate link reference is found reuse the same link reference n…
37 .It Fl i
38 Toggle if link reference numbers are displayed inline or not, by default…
39 not enabled.
40 .It Fl I
41 Toggle if URLs for link reference are displayed inline or not, by defaul…
42 not enabled.
43 .It Fl l
44 Toggle if link references are displayed at the bottom or not, by default…
45 not enabled.
46 .It Fl r
47 Toggle if line-wrapping mode is enabled, by default it is not enabled.
48 .It Fl s
49 CSS-like selectors, this sets a reader mode to show only content matchin…
50 selector, see the section
51 .Sx SELECTOR SYNTAX
52 for the syntax.
53 Multiple selectors can be specified by separating them with a comma.
54 .It Fl u
55 CSS-like selectors, this sets a reader mode to hide content matching the
56 selector, see the section
57 .Sx SELECTOR SYNTAX
58 for the syntax.
59 Multiple selectors can be specified by separating them with a comma.
60 .It Fl w Ar termwidth
61 The terminal width.
62 The default is 77 characters.
63 .It Fl x
64 Write resources as TAB-separated lines to file descriptor 3.
65 .El
66 .Sh SELECTOR SYNTAX
67 The syntax has some inspiration from CSS, but it is more limited.
68 Some examples:
69 .Bl -item
70 .It
71 "main" would match on the "main" tags.
72 .It
73 "#someid" would match on any tag which has the id attribute set to "some…
74 .It
75 ".someclass" would match on any tag which has the class attribute set to
76 "someclass".
77 .It
78 "main#someid" would match on the "main" tag which has the id attribute s…
79 "someid".
80 .It
81 "main.someclass" would match on the "main" tags which has the class
82 attribute set to "someclass".
83 .It
84 "ul li" would match on any "li" tag which also has a parent "ul" tag.
85 .It
86 "li@0" would match on any "li" tag which is also the first child element…
87 parent container.
88 Note that this differs from filtering on a collection of "li" elements.
89 .El
90 .Sh EXIT STATUS
91 .Ex -std
92 .Sh EXAMPLES
93 .Bd -literal
94 url='https://codemadness.org/sfeed.html'
95
96 curl -s "$url" | webdump -r -b "$url" | less
97
98 curl -s "$url" | webdump -8 -a -i -l -r -b "$url" | less -R
99
100 curl -s "$url" | webdump -s 'main' -8 -a -i -l -r -b "$url" | less -R
101 .Ed
102 .Pp
103 To use
104 .Nm
105 as a HTML to text filter for example in the mutt mail client, change in
106 ~/.mailcap:
107 .Bd -literal
108 text/html; webdump -i -l -r < %s; needsterminal; copiousoutput
109 .Ed
110 .Sh SEE ALSO
111 .Xr curl 1 ,
112 .Xr xmllint 1 ,
113 .Xr xmlstarlet 1 ,
114 .Xr ftp 1
115 .Sh AUTHORS
116 .An Hiltjo Posthuma Aq Mt [email protected]
You are viewing proxied material from codemadness.org. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.