extractjson.1 - extractjson - extract embedded JSON metadata from HTML pages | |
git clone git://git.codemadness.org/extractjson | |
Log | |
Files | |
Refs | |
README | |
LICENSE | |
--- | |
extractjson.1 (862B) | |
--- | |
1 .Dd August 14, 2022 | |
2 .Dt EXTRACTJSON 1 | |
3 .Os | |
4 .Sh NAME | |
5 .Nm extractjson | |
6 .Nd extracts embedded JSON metadata from HTML pages | |
7 .Sh SYNOPSIS | |
8 .Nm | |
9 .Sh DESCRIPTION | |
10 .Nm | |
11 extracts embedded JSON metadata from HTML pages, such as data in the tag… | |
12 <script type="application/ld+json"> | |
13 .Pp | |
14 It reads HTML from stdin and outputs JSON per line to stdout. | |
15 .Sh EXIT STATUS | |
16 .Ex -std | |
17 .Sh EXAMPLES | |
18 .Bd -literal | |
19 curl -s https://www.imdb.com/title/tt0107048/ | extractjson | sed 1q | j… | |
20 .Ed | |
21 .Pp | |
22 This extracts the JSON metadata from the IMDB page of the movie "Ground … | |
23 It uses the first embedded JSON fragment and pipes it to json2tsv. | |
24 It can then be further processed using awk to get the relevant data. | |
25 .Pp | |
26 It can also be useful for extracting video streams from webpages. | |
27 .Sh SEE ALSO | |
28 .Xr curl 1 , | |
29 .Xr json2tsv 1 | |
30 .Sh AUTHORS | |
31 .An Hiltjo Posthuma Aq Mt [email protected] |