Post AwWp8DkmP2PVzsdZFA by [email protected] | |
More posts by [email protected] | |
Post #AwWc3adEcEMC6JZNHk by [email protected] | |
0 likes, 0 repeats | |
Tried to use /robots.txt to tell bots to stay out. The bots' response: &quo… | |
Post #AwWcJDCTV9iM3vVBbc by [email protected] | |
0 likes, 0 repeats | |
@nixCraft Fascists don't follow rules either.I'm wondering whether ther… | |
Post #AwWcT7qQGiUsLcul7I by [email protected] | |
0 likes, 0 repeats | |
@nixCraft Maybe feed them a zip bomb, if they go for a Disallowed file?https://… | |
Post #AwWcruItkXwHak3Bke by [email protected] | |
0 likes, 0 repeats | |
@nixCraft if you have a crawler honey trap, including it in robots.txt would en… | |
Post #AwWe47ffyYzcTl5qXA by [email protected] | |
0 likes, 1 repeats | |
@nixCraft If this would work, Stackoverflow would be dead by tomorrow. I wouldn… | |
Post #AwWf2EZVRtZlFts2r2 by [email protected] | |
0 likes, 0 repeats | |
@nixCraft I wonder when they started doing it. Looks like it's a recent thi… | |
Post #AwWghvN3KMc5p0UNVo by [email protected] | |
0 likes, 0 repeats | |
@nixCraft @Matti_Vuori It has been always useless. So, nothing has changed. | |
Post #AwWgj6Ji7EJc3mUEwC by [email protected] | |
0 likes, 0 repeats | |
@nixCraft “The solution to surveillance is pollution”. it’s the uniquene… | |
Post #AwWhGb7oanuvALtZFw by [email protected] | |
0 likes, 0 repeats | |
@nixCraft robots.txt is like asking a bully to not bullying you 🙃 | |
Post #AwWijGqVAYYuch27nc by [email protected] | |
0 likes, 0 repeats | |
@nixCraft I had to use Anubis and IPFire to block LLM scrapers. Works for the m… | |
Post #AwWjVfMTK8gOYb418S by [email protected] | |
0 likes, 0 repeats | |
@oe_simon 404 for the better! ☝️🏾😁@nixCraft | |
Post #AwWnSpigks1ndAMO3M by [email protected] | |
0 likes, 1 repeats | |
@nixCraft Man, if only we could have a far more accurate version of the CAPTCHA… | |
Post #AwWnq9bwxvpFz0KQIy by [email protected] | |
0 likes, 1 repeats | |
@heartshadows @m @nixCraft We can beat em by their own game, host the site insi… | |
Post #AwWoFBKvggNkZeIkFM by [email protected] | |
0 likes, 0 repeats | |
@nixCraft put a trap in place like disallow /list_of_politicians_that_received_… | |
Post #AwWp8DkmP2PVzsdZFA by [email protected] | |
0 likes, 0 repeats | |
@nixCraft Obeying your rules is optional? Well, guess what? | |
Post #AwWpaN3DQf0QwNgQkK by [email protected] | |
0 likes, 0 repeats | |
@adipoeserPursch @nixCraft eher nicht. 404 signalisiert: hier ist was nicht in … | |
Post #AwWpauzVEyEqCiCwgC by [email protected] | |
0 likes, 0 repeats | |
@heartshadows @nixCraft The EU text and data mining exemption gives content pro… | |
Post #AwWpd3QKrI8Mw9mEpU by [email protected] | |
0 likes, 0 repeats | |
@nixCraft actually, the lesser known sitemap directive encourages crawling, fro… | |
Post #AwWr0uQO4K0bvljwQa by [email protected] | |
0 likes, 0 repeats | |
@oe_simon Nein. Hier ist nix, also kann nix gescraped werden... 😁@nixCraft | |
Post #AwWsJx3QcEPXXie3mq by [email protected] | |
0 likes, 0 repeats | |
@nixCraft can we just start adding ridiculous terms of service to robots.txt so… | |
Post #AwYBW1n6vGRPzoRczw by [email protected] | |
0 likes, 1 repeats | |
@nixCraft that is how SO’s robots.txt looks today. It was different a few mon… | |
Post #AwYruB8O25Wlzwaw64 by [email protected] | |
0 likes, 0 repeats | |
@nixCraft major search engines have other ways to crawl content. One of them is… |