webdump.1 - webdump - HTML to plain-text converter for webpages | |
git clone git://git.codemadness.org/webdump | |
Log | |
Files | |
Refs | |
README | |
LICENSE | |
--- | |
webdump.1 (3230B) | |
--- | |
1 .Dd October 6, 2023 | |
2 .Dt WEBDUMP 1 | |
3 .Os | |
4 .Sh NAME | |
5 .Nm webdump | |
6 .Nd convert HTML to plain-text | |
7 .Sh SYNOPSIS | |
8 .Nm | |
9 .Op Fl 8adiIlrx | |
10 .Op Fl b Ar baseurl | |
11 .Op Fl s Ar selector | |
12 .Op Fl u Ar selector | |
13 .Op Fl w Ar termwidth | |
14 .Sh DESCRIPTION | |
15 .Nm | |
16 reads UTF-8 HTML data from stdin. | |
17 It converts and writes the output as plain-text to stdout. | |
18 A | |
19 .Ar baseurl | |
20 can be specified if the links in the feed are relative URLs. | |
21 This must be an absolute URI. | |
22 .Pp | |
23 The options are as follows: | |
24 .Bl -tag -width Ds | |
25 .It Fl 8 | |
26 Use UTF-8 symbols for certain items like bullet items and rulers to make… | |
27 output fancier. | |
28 .It Fl a | |
29 Toggle ANSI escape codes usage, by default it is not enabled. | |
30 .It Fl b Ar baseurl | |
31 Base URL of links. | |
32 This is used to make links absolute. | |
33 The specified URL is always preferred over the value in a <base/> tag. | |
34 .It Fl d | |
35 Deduplicate link references. | |
36 When a duplicate link reference is found reuse the same link reference n… | |
37 .It Fl i | |
38 Toggle if link reference numbers are displayed inline or not, by default… | |
39 not enabled. | |
40 .It Fl I | |
41 Toggle if URLs for link reference are displayed inline or not, by defaul… | |
42 not enabled. | |
43 .It Fl l | |
44 Toggle if link references are displayed at the bottom or not, by default… | |
45 not enabled. | |
46 .It Fl r | |
47 Toggle if line-wrapping mode is enabled, by default it is not enabled. | |
48 .It Fl s | |
49 CSS-like selectors, this sets a reader mode to show only content matchin… | |
50 selector, see the section | |
51 .Sx SELECTOR SYNTAX | |
52 for the syntax. | |
53 Multiple selectors can be specified by separating them with a comma. | |
54 .It Fl u | |
55 CSS-like selectors, this sets a reader mode to hide content matching the | |
56 selector, see the section | |
57 .Sx SELECTOR SYNTAX | |
58 for the syntax. | |
59 Multiple selectors can be specified by separating them with a comma. | |
60 .It Fl w Ar termwidth | |
61 The terminal width. | |
62 The default is 77 characters. | |
63 .It Fl x | |
64 Write resources as TAB-separated lines to file descriptor 3. | |
65 .El | |
66 .Sh SELECTOR SYNTAX | |
67 The syntax has some inspiration from CSS, but it is more limited. | |
68 Some examples: | |
69 .Bl -item | |
70 .It | |
71 "main" would match on the "main" tags. | |
72 .It | |
73 "#someid" would match on any tag which has the id attribute set to "some… | |
74 .It | |
75 ".someclass" would match on any tag which has the class attribute set to | |
76 "someclass". | |
77 .It | |
78 "main#someid" would match on the "main" tag which has the id attribute s… | |
79 "someid". | |
80 .It | |
81 "main.someclass" would match on the "main" tags which has the class | |
82 attribute set to "someclass". | |
83 .It | |
84 "ul li" would match on any "li" tag which also has a parent "ul" tag. | |
85 .It | |
86 "li@0" would match on any "li" tag which is also the first child element… | |
87 parent container. | |
88 Note that this differs from filtering on a collection of "li" elements. | |
89 .El | |
90 .Sh EXIT STATUS | |
91 .Ex -std | |
92 .Sh EXAMPLES | |
93 .Bd -literal | |
94 url='https://codemadness.org/sfeed.html' | |
95 | |
96 curl -s "$url" | webdump -r -b "$url" | less | |
97 | |
98 curl -s "$url" | webdump -8 -a -i -l -r -b "$url" | less -R | |
99 | |
100 curl -s "$url" | webdump -s 'main' -8 -a -i -l -r -b "$url" | less -R | |
101 .Ed | |
102 .Pp | |
103 To use | |
104 .Nm | |
105 as a HTML to text filter for example in the mutt mail client, change in | |
106 ~/.mailcap: | |
107 .Bd -literal | |
108 text/html; webdump -i -l -r < %s; needsterminal; copiousoutput | |
109 .Ed | |
110 .Sh SEE ALSO | |
111 .Xr curl 1 , | |
112 .Xr xmllint 1 , | |
113 .Xr xmlstarlet 1 , | |
114 .Xr ftp 1 | |
115 .Sh AUTHORS | |
116 .An Hiltjo Posthuma Aq Mt [email protected] |