| Title: Port of the week: pup | |
| Author: Solène | |
| Date: 22 April 2021 | |
| Tags: internet | |
| Description: | |
| # Introduction | |
| Today I will introduce you to the utility "pup" providing CSS selectors | |
| filtering for HTML documents. It is a perfect companion to curl to | |
| properly fetch only a specific data from an HTML page. | |
| On OpenBSD you can install it with `pkg_add pup` and check its | |
| documentation at /usr/local/share/doc/pup/README.md | |
| pup official project | |
| # Examples | |
| pup is quite easy to use once you understand the filters. Let's see a | |
| few examples to illustrate practical uses. | |
| ## Fetch my blog titles list to a JSON format | |
| The following command will returns a JSON structure with an array of | |
| data from the tags matching "a" tags with in "h4" tags. | |
| ```command line example | |
| curl https://dataswamp.org/~solene/index.html | pup "h4 a json{}" | |
| ``` | |
| The output (only an extract here) looks like this: | |
| ```output truncated | |
| [ | |
| { | |
| "href": "2021-04-18-ipfs-bandwidth-mgmt.html", | |
| "tag": "a", | |
| "text": "Bandwidth management in go-IPFS" | |
| }, | |
| { | |
| "href": "2021-04-17-ipfs-openbsd.html", | |
| "tag": "a", | |
| "text": "Introduction to IPFS" | |
| }, | |
| [truncated] | |
| { | |
| "href": "2016-05-02-3.html", | |
| "tag": "a", | |
| "text": "How to add a route through a specific interface on FreeBSD 10" | |
| } | |
| ] | |
| ``` | |
| ## Fetch OpenBSD -current specific changes | |
| The page https://www.openbsd.org/faq/current.html contains specific | |
| instructions that are required for people using OpenBSD -current and | |
| you may want to be notified for changes. Using pup it's easy to make a | |
| script to compare your last data to see what has been appended. | |
| ```command line | |
| curl https://www.openbsd.org/faq/current.html | pup "h3 json{}" | |
| ``` | |
| Output sample as JSON, perfect for further processing with a scripting | |
| language. | |
| ```JSON output sample | |
| [ | |
| { | |
| "id": "r20201107", | |
| "tag": "h3", | |
| "text": "2020/11/07 - iked.conf \u0026#34;to dynamic\u0026#34;" | |
| }, | |
| { | |
| "id": "r20210312", | |
| "tag": "h3", | |
| "text": "2021/03/12 - IPv6 privacy addresses renamed to temporary addresses" | |
| }, | |
| { | |
| "id": "r20210329", | |
| "tag": "h3", | |
| "text": "2021/03/29 - [packages] yubiserve replaced with yubikeyedup" | |
| } | |
| ] | |
| ``` | |
| I provide a RSS feed for that | |
| # Conclusion | |
| There are many possibilities with pup and I won't list them all. I | |
| highly recommend reading the README.md file from the project because | |
| it's its documentation and explains the syntax for filtering. |