WEB-DL

WEB-DL

I've become quite fond of my little ytb script that allows browsing
YouTube in ELinks or Less, made as a solution to the YouTube
website itself no longer displaying without Javascript. I kind-of
doubt that anyone else uses it, though it's there in the scripts
section, but there are lots of other text-only YouTube browsers
that weren't quite to my taste either.

The reason there are so many, and that I could find the time to
make my own, is all down to the youtube-dl project. It does all the
hard stuff of figuring out how to extract the information from
YouTube's API, and I think even more significantly - being
regularly updated to adapt to changes in YouTube itself. Then you
can just get it to spit out key info like the description, title,
and download length (either in plain text, or you can export to
JSON). So it's pretty trivial to take that raw data and wrap it up
in whatever interface you like.

But whenever I use it I just wish there were 100 other programs
like it for the other sites that require Javascript or full CSS
support. Shopping sites like Ebay, blogs based on common platforms
such as Wordpress, forums, news websites, weather, TV Guide. What
if they all had little tools that would just spit out the parts
that I want so I could build it into my own interface, so much more
usable than the barely-navigable mess of broken Javascript and CSS
that Dillo, Lynx, etc. would normally present? Little ebay-dl,
wordpress-dl, etc. programs that take a URL or (where appropriate)
search query and command-line options such as --get-description and
--get-title, as well as the option of dumping all the separate
parts into a machine-readable format.

There are some things like this besides youtube-dl. There's the
XMLTV project ( http://wiki.xmltv.org/index.php/Main_Page ),
various weather data grabbers for things like the wttr.in service,
and even the units_cur script that grabs the currency data for my
Gopher currency converter. The trouble is that they're very hard to
find, usually poorly documented (the format that units_cur spits
out isn't documented at all), and rarely share much in common so
far as the interface or output formatting. Some sites like the
Internet Archive have APIs that might be used, but one has to spend
the time figuring them out, and they could always change on a whim.

Youtube-dl is great in that it offers both plain-text output of the
requested data, as well as putting it all into a machine-readable
file. So lazy people like me don't even have to worry about parsing
the JSON output if they don't want to. One improvement would be to
cache the downloaded web page/s so that you can make independent
program calls for each bit of info instead of having to split the
plain-text output in a dodgy way based on content like I did with
ytb detecting the video-length line. It also attracts a lot of
developers (somehow, continuously revising code to adapt to sudden
changes to website structures doesn't appeal to me personally in
the slightest), though projects like XMLTV don't seem to be so
lucky.

So my proposal would be a common web-dl project that united a whole
lot of individual tools for downloading/extracting data from
websites. The tools would all try to work in a similar way and
provide similar sorts of output, and people would also be able to
share tools created on top of these "downloaders" such as
terminal-based site-specific browsers, and plug-ins for Dillo. When
a website changes, just the -dl tool needs to be adapted to keep
spitting out the same info as before to the program running above
it.

Granted it could all be avoided if commercial/trendy websites
weren't such a mess to begin with, and could actually be
conveniently navigated with lightweight and terminal-based web
browsers. But there's no way of turning back the tide on that at
this point unfortunately.

- The Free Thinker, 2021

2022-10-15:
I discovered the strangly-named Woob project, which seems to share
the same goal. Annoyingly the developers have chosen Python and Qt
as the common base, and I dislike both of those. It also seems to
cover remarkably few of the functions or associated websites that
I personally use, and as it's all Python I'm not interested in
contributing. Still, it's better than nothing, and I'll keep it
in mind for next time I want to view an image from Imgur or track
a parcel sent with DHL.

https://woob.tech/