Introduction
Introduction Statistics Contact Development Disclaimer Help
Title: Port of the week: pup
Author: Solène
Date: 22 April 2021
Tags: internet
Description:
# Introduction
Today I will introduce you to the utility "pup" providing CSS selectors
filtering for HTML documents. It is a perfect companion to curl to
properly fetch only a specific data from an HTML page.
On OpenBSD you can install it with `pkg_add pup` and check its
documentation at /usr/local/share/doc/pup/README.md
pup official project
# Examples
pup is quite easy to use once you understand the filters. Let's see a
few examples to illustrate practical uses.
## Fetch my blog titles list to a JSON format
The following command will returns a JSON structure with an array of
data from the tags matching "a" tags with in "h4" tags.
```command line example
curl https://dataswamp.org/~solene/index.html | pup "h4 a json{}"
```
The output (only an extract here) looks like this:
```output truncated
[
{
"href": "2021-04-18-ipfs-bandwidth-mgmt.html",
"tag": "a",
"text": "Bandwidth management in go-IPFS"
},
{
"href": "2021-04-17-ipfs-openbsd.html",
"tag": "a",
"text": "Introduction to IPFS"
},
[truncated]
{
"href": "2016-05-02-3.html",
"tag": "a",
"text": "How to add a route through a specific interface on FreeBSD 10"
}
]
```
## Fetch OpenBSD -current specific changes
The page https://www.openbsd.org/faq/current.html contains specific
instructions that are required for people using OpenBSD -current and
you may want to be notified for changes. Using pup it's easy to make a
script to compare your last data to see what has been appended.
```command line
curl https://www.openbsd.org/faq/current.html | pup "h3 json{}"
```
Output sample as JSON, perfect for further processing with a scripting
language.
```JSON output sample
[
{
"id": "r20201107",
"tag": "h3",
"text": "2020/11/07 - iked.conf \u0026#34;to dynamic\u0026#34;"
},
{
"id": "r20210312",
"tag": "h3",
"text": "2021/03/12 - IPv6 privacy addresses renamed to temporary addresses"
},
{
"id": "r20210329",
"tag": "h3",
"text": "2021/03/29 - [packages] yubiserve replaced with yubikeyedup"
}
]
```
I provide a RSS feed for that
# Conclusion
There are many possibilities with pup and I won't list them all. I
highly recommend reading the README.md file from the project because
it's its documentation and explains the syntax for filtering.
You are viewing proxied material from dataswamp.org. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.