# Web Scraping Thing
I'm pretty into the folk music of England, Ireland, Scotland,
and Wales. Especially some of the better bits of the '60s
folk revival (Ann Briggs, Planxty, the occasional Pentangle
track--though they can get a little too fusiony for me).
I found out, just this week, about an old blog/podcast called
"A Folk Song a Day", in which a guy records a song each day
for a year and posts it, along with a little bit of text. He
tends toward a more traditional style and, I think, does a
good job of presenting the material. The website is very
reminiscent of an old "blogspot" or "blogger" website. It
does not have any ability to see what song was done on a given
day of the year (this was all done back in the 20-teens). So
you can look at the song list (which is in alphabetical order)
or just go through entries--five to a page. I was not quite
satisfied with this approach. Nor did I want to see all of
the "design", images, and comments on the pages.
So I decided that I would scrape it. I started by loading the
alphabetical index page, viewing source, and copying out the
list of links. A quick find and replace and I had a comma
separated list of double-quoted (string) URLs. I put this
into an array literal ([365]string) in a file to be compiled
by golang. A little research let me write four quick regexes:
One for the song title, one for the month/day, one for the
`src` attribute to the `source` element inside the `audio`
element (basically, the link to the audio file), and one for
all of the text that accompanied the post--but without the
comments, navs, etc.
I then set up a struct that would take those items and gave
it the type name `Page`, then created a `[365]Page` array called
`Pages`. I used coroutines and channels to grab each page,
use the regex to parse things out, add the items to a `Page`,
and add the page to the `Pages` array.
Once that finished I dumped it out to a json file and the
program ended. All in all it took 20 or thirty minutes.
Next up was to move the json file to a new folder and add an
`index.php` file. In that file I have it check the current day
and month (according to the server it is running on). I then
search the json file (after loading it as an associative array
in php) for a "Page" with the given date string. I then render
a template with just those four items (plus and h1 and some info
text linking back to the original project). I added a teensy
bit of CSS to make it look nice enough (and uncluttered) and
voila! I can now go to a page on my website and get a song each
day. They still host the audio, I just stream it from them.
What a nice and easy/fun project. It is the type of thing I
would have done in Python with bs4 at some point... but I have
not used Python in so long and I can write Go code in my sleep,
so that is where I landed. Plus, that way I only needed the
standard library and did not have to deal with much else.
If anyone is interested, here is the original website:
http://www.afolksongaday.com/
Here is the page on my website that shows the song for the
current month/day:
https://sloum.colorfield.space/etc/fsad/
If it is up your alley, definitely try and support the original,
but I admit a strong preference--personally--for my presentation
method.