Introduction
Introduction Statistics Contact Development Disclaimer Help
extractjson.1 - extractjson - extract embedded JSON metadata from HTML pages
git clone git://git.codemadness.org/extractjson
Log
Files
Refs
README
LICENSE
---
extractjson.1 (862B)
---
1 .Dd August 14, 2022
2 .Dt EXTRACTJSON 1
3 .Os
4 .Sh NAME
5 .Nm extractjson
6 .Nd extracts embedded JSON metadata from HTML pages
7 .Sh SYNOPSIS
8 .Nm
9 .Sh DESCRIPTION
10 .Nm
11 extracts embedded JSON metadata from HTML pages, such as data in the tag…
12 <script type="application/ld+json">
13 .Pp
14 It reads HTML from stdin and outputs JSON per line to stdout.
15 .Sh EXIT STATUS
16 .Ex -std
17 .Sh EXAMPLES
18 .Bd -literal
19 curl -s https://www.imdb.com/title/tt0107048/ | extractjson | sed 1q | j…
20 .Ed
21 .Pp
22 This extracts the JSON metadata from the IMDB page of the movie "Ground …
23 It uses the first embedded JSON fragment and pipes it to json2tsv.
24 It can then be further processed using awk to get the relevant data.
25 .Pp
26 It can also be useful for extracting video streams from webpages.
27 .Sh SEE ALSO
28 .Xr curl 1 ,
29 .Xr json2tsv 1
30 .Sh AUTHORS
31 .An Hiltjo Posthuma Aq Mt [email protected]
You are viewing proxied material from codemadness.org. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.