2018-03-25

2018-03-25
___G_o_p_h_e_r__c_r_a_w_l_e_r_s__a_n_d__r_o_b_o_t_s_._t_x_t__________

Fellow gopher poster kichimi[0] has started a little online
discussion about open gopher proxies hitting his server hard with
invalid requests. If I understood the situation correctly this
happes a lot through a google crawler hitting these proxy sites and
generating invalid requests for each selector (favicon, index.html,
�)

I agree with him, that these broken proxies are a bad idea. If you
are an admin of an open gopher proxy I would suggest you add a line
to your robots.txt file to prohibit crawling of the proxy as a whole.
I don't think that crawling as such is harmful, if done within
(rate) limits. I have learned that even veronica obeys a robots.txt[1].

I am not sure I want to disallow crawling on this server. Maybe
adding rate limiting in the packet filter myself if problems show up,
but anyone is free to read everything on here. Of course I don't have
such a nice huge vintage software[2] collection like kichimi has.

Also I think for gopher, menu crawling is the way to go. There's no
point in ruining a server's bandwidth by downloading all binary media
files. After all we build human scale software here[3] and all the
human metadata should be either in the selector already, or the
surrounding info entries.

___References________________________________________________________

[0]: gopher://kichimi.uk:70/1/
[1]: gopher://gopher.floodgap.com:70/0/v2/help/indexer
[2]: gopher://kichimi.uk:70/1/software
[3]: https://tilde.tinyserver.club/~jkriss/writing/human-scale