Introduction
Introduction Statistics Contact Development Disclaimer Help
Post ApmcNth3oMX6Rn7LLU by [email protected]
More posts by [email protected]
Post #Aplkjd7Mj0uZsuKr5M by [email protected]
0 likes, 0 repeats
So someone (likely a researcher) is scraping my parliamentary site at high spee…
Post #AploONN2TeSymTmsFs by [email protected]
0 likes, 0 repeats
@dch @bert_hubert Very likely. The other day I looked into the httpd logs of my…
Post #AplxuhqykzUlA5HaHA by [email protected]
0 likes, 1 repeats
@bert_hubert probably scraping for AI / LLM training data.
Post #Apm0Yd9ppqFs5bGe92 by [email protected]
0 likes, 0 repeats
I have now redirected all scraping traffic from their IP address to an HTML pag…
Post #Apm0YdHzLWmEUt59qy by [email protected]
0 likes, 0 repeats
After 84518 copies of 'please-contact-me.html' they gave up it appears.…
Post #Apm0YdPQtqjQryZ6SO by [email protected]
0 likes, 0 repeats
@bert_hubert which User-Agent was it
Post #Apm0YdXwODXNIMXtia by [email protected]
0 likes, 0 repeats
@winfried Scrapy/2.11.2 (+https://scrapy.org)" - and not once did they que…
Post #Apm0YdiZkg2npLWOIK by [email protected]
0 likes, 0 repeats
@winfried This is because out of the box, scrapy does not check robots.txt... h…
Post #Apm0Yds9B5hUJ202DI by [email protected]
0 likes, 0 repeats
@bert_hubert @winfried oh no :(
Post #Apm0YdzajPegg7Tyoi by [email protected]
0 likes, 0 repeats
@Green @winfried my robots.txt is fine with this access btw, but not checking i…
Post #Apm0Ye5yLgl8zuT4lM by [email protected]
0 likes, 0 repeats
@bert_hubert @winfried Yeah, the decision to have this setting disabled by defa…
Post #Apm0YeDPu0iLMzx1Mm by [email protected]
0 likes, 1 repeats
@winfried @bert_hubert @Green it’s actually quite logical as there is no sens…
Post #ApmcNth3oMX6Rn7LLU by [email protected]
0 likes, 0 repeats
@XEJKnol @bert_hubert I would say that depends on the underlying goal. As scrap…
Post #ApmcNtqdEmBmvTazGS by [email protected]
0 likes, 1 repeats
@bert_hubert @Green that’s also true, perhaps I am looking at this a bit too …
You are viewing proxied material from pleroma.anduin.net. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.