README - extractjson - extract embedded JSON metadata from HTML pages | |
git clone git://git.codemadness.org/extractjson | |
Log | |
Files | |
Refs | |
README | |
LICENSE | |
--- | |
README (578B) | |
--- | |
1 extractjson | |
2 ----------- | |
3 | |
4 Extracts embedded JSON metadata from HTML pages, such as data in the tag… | |
5 <script type="application/ld+json"> | |
6 | |
7 It reads HTML from stdin and outputs JSON per line to stdout. | |
8 | |
9 Example: | |
10 | |
11 curl -s https://www.imdb.com/title/tt0107048/ | extractjson | se… | |
12 | |
13 This extracts the JSON metadata from the IMDB page of the movie "Ground … | |
14 It uses the first embedded JSON fragment and pipes it to json2tsv. | |
15 It can then be further processed using awk to get the relevant data. | |
16 | |
17 It can also be useful for extracting video streams from webpages. |