Introduction
Introduction Statistics Contact Development Disclaimer Help
Post Au16M6DKcTrhEHRmQy by [email protected]
More posts by [email protected]
Post #Au15BOOoZETYUS9ISu by [email protected]
0 likes, 4 repeats
I noticed that a *lot* of the crawlers/bots we see on www.bbc.co.uk & www.b…
Post #Au15BOY20xqex2Sepc by [email protected]
0 likes, 1 repeats
@tdp_org Fantastic! Are you at liberty to say how your classifier(s) are implem…
Post #Au16GJNWlHi2JwSLz6 by [email protected]
0 likes, 0 repeats
@tdp_org hot damn, that's a big difference
Post #Au16GJTuNYoUdjRRvk by [email protected]
0 likes, 0 repeats
@gsuberland Definitely highlights how spammy the web is eh? Wild West++ 🤣
Post #Au16GusfKTbMdEJF9k by [email protected]
0 likes, 0 repeats
@tdp_org Next step: immediate IP block. I have strong feelings towards the comp…
Post #Au16Guzku7GyzDcuCu by [email protected]
0 likes, 0 repeats
@ross Bad news: Many of these are plain old smartphones where the flashlight ap…
Post #Au16H8vf68JiCZoplo by [email protected]
0 likes, 0 repeats
@tdp_org does the filter just match known crawler user agents against their kno…
Post #Au16H91gjj8aVGdeAC by [email protected]
0 likes, 0 repeats
@kfh Yeah, more or less. It identifies the crawler/bot via the user-agent strin…
Post #Au16HfD94fJeK4FFSq by [email protected]
0 likes, 0 repeats
@piegames six hour IP block with a message to uninstall malware from your devic…
Post #Au16HuSwMDrZdOkZfM by [email protected]
0 likes, 0 repeats
@davidgerard ā¬†ļø
Post #Au16JT6d3GeTMI3crA by [email protected]
0 likes, 0 repeats
@tdp_org i assume you are aware that also meta provides official guidance on ho…
Post #Au16JTF8XdSPmg2Q7M by [email protected]
0 likes, 0 repeats
@slink Yep, that's what I followed for them šŸ‘šŸ»
Post #Au16K6hrokc0B4mPoW by [email protected]
0 likes, 0 repeats
@tdp_org i won't describe the mitigation we applied on rationalwiki, but we…
Post #Au16Llz7pacMUQe9Vw by [email protected]
0 likes, 0 repeats
@tdp_org do you do anything different based on if a bot is known or not?
Post #Au16Lm6DPEHyqPxoZ6 by [email protected]
0 likes, 0 repeats
@q We (in the distribution team) don't - these data are informational right…
Post #Au16M6DKcTrhEHRmQy by [email protected]
0 likes, 0 repeats
@dngrs yep - same botnet model LWN saw a while ago https://lwn.net/Articles/100…
Post #Au16MztlaPUNvoiwiG by [email protected]
0 likes, 0 repeats
@tdp_org @gsuberland that’s entirely consistent with what I’ve seen recentl…
Post #Au16OC0k2P2PmsYYrI by [email protected]
0 likes, 0 repeats
@tdp_org You could poison the well withhttps://come-from.mad-scientist.club/@al…
Post #Au19MxvBElXnf63sgK by [email protected]
0 likes, 0 repeats
@tdp_org This is interesting information and certainly something I will conside…
Post #Au19RkwstcNasO8w4W by [email protected]
0 likes, 0 repeats
@tdp_org nice.
Post #Au19T60I6FQCSzNKPA by [email protected]
0 likes, 0 repeats
@tdp_orgI can see this escalating, next is that they use semi random user-agent…
Post #Au19UozGL1B0GmiW9o by [email protected]
0 likes, 0 repeats
@kasperd Infeel like the definition of ā€œlegitimate crawlerā€ is getting hard…
Post #Au1FWQqfefwni2qjjc by [email protected]
0 likes, 0 repeats
@tdp_org take a look at https://datatracker.ietf.org/doc/html/draft-meunier-web…
Post #Au1FWwlSzYnYhAf248 by [email protected]
0 likes, 0 repeats
@tdp_org Is the bottom line zero or is it some other fixed number?Looks crazy h…
Post #Au1FZX4oxHLhtWwkRk by [email protected]
0 likes, 0 repeats
@jalict Yeah the bottom line is zero - I can't bear non zero-based graphs ļæ½…
Post #Au1FZXAUcBt0B7bHHs by [email protected]
0 likes, 0 repeats
@tdp_org Ah, so you are still serving all the crawlers?
Post #Au1FaHWkFn1BxSMhMG by [email protected]
0 likes, 0 repeats
@paco @tdp_org Where to draw the line is certainly a tricky question.As long as…
Post #Au1FcSKh38ao3AlToe by [email protected]
0 likes, 0 repeats
@paco @kasperd @tdp_org a little OT but FWIW the magic search URL param to prev…
Post #Au1FdVtLIrvlNSTZLc by [email protected]
0 likes, 0 repeats
@SimmerVigor Oh nice! From the title and ToC that looks very much like the sort…
Post #Au1FdqkXkmN04FWL32 by [email protected]
0 likes, 0 repeats
@ross @piegames I don't even bother with the time window. Permablock.I have…
Post #Au1FermVP4NBYZvILw by [email protected]
0 likes, 0 repeats
@jalict Those which aren't "blocked" by robots.txt or blocked bec…
Post #Au1KJjXtKOUXhgpW6a by [email protected]
0 likes, 0 repeats
@piegames @ross From a personal experience, on Android this is trivially easy a…
Post #Au1KKXBGmm6be0d9mq by [email protected]
0 likes, 0 repeats
@tdp_org So crawlers definitely not identifying themselves appropriately, if I …
Post #Au1KKXHeP3D3xncFjU by [email protected]
0 likes, 0 repeats
@gimulnautti Yeah. Someone is spoofing the bot user agents. Not sure why, maybe…
Post #Au1KMtPr5QWVBWZrn6 by [email protected]
0 likes, 0 repeats
@[email protected] i only run my small personal website but i was plannin…
Post #Au1KQjBv2mn6dDt66y by [email protected]
0 likes, 0 repeats
@tdp_org It doesn’t seem odd at all, if the product of your company requires …
Post #Au1KQjJ0cQSizDClA8 by [email protected]
0 likes, 0 repeats
@gimulnautti @tdp_org It might not actually be spoofing.Big Tech has started in…
Post #Au1KQjQSAkPvMIghlY by [email protected]
0 likes, 0 repeats
@ck @tdp_org omg, when you thought it couldn’t get worse… šŸ¤¦ā€ā™‚ļø
Post #Au1KRiFa1mxs0F4Nmq by [email protected]
0 likes, 0 repeats
@tdp_org What tools do you use for ASN validation by the way?
Post #Au1KRwQ1LldPwg3v0K by [email protected]
0 likes, 0 repeats
@tdp_org So what happens to requests from something that identify themselves as…
Post #Au1KTB5oHIDhkICMgy by [email protected]
0 likes, 0 repeats
@losttourist No, this is just stats/reporting (at the moment at least)
Post #Au1KVlWFoeRTIdnk7k by [email protected]
0 likes, 0 repeats
@ck @gimulnautti @tdp_org this is the worst -- have you read about this practic…
Post #Au1RGxtGEneH0GS6Ii by [email protected]
0 likes, 0 repeats
@edsu @gimulnautti @tdp_org I belive @jwildeboer has blogged about this.
Post #Au1RHfOySlZ9x4cXCa by [email protected]
0 likes, 0 repeats
@ck @gimulnautti Yeah i have heard of that too - never underestimate how shady …
Post #Au1RJiQ6DEvXtL7j5U by [email protected]
0 likes, 0 repeats
@piegames @ross This is a link I saved, there are probably many similar blogs o…
Post #Au1RJss7LvHUJ8JF9E by [email protected]
0 likes, 0 repeats
@ck @gimulnautti @tdp_org aw Jesus F Christ
Post #Au1RKSurLCrg63rwQ4 by [email protected]
0 likes, 0 repeats
@tdp_org that's a writeup a lot of people would be interested in I think.
Post #Au1RKdblaX81G21Lge by [email protected]
0 likes, 0 repeats
@ross @piegames it's sad, but it's about time people are made accountab…
Post #Au1RKdjZ7XMneDfZqK by [email protected]
0 likes, 0 repeats
@bilboed No. Don't push systemic issues onto individual responsibility. Bei…
Post #Au1RKdpwjoTFy0efmy by [email protected]
0 likes, 0 repeats
@piegames Good luck convincing app stores of that. They won't until there&#…
Post #Au1RNy286WcNTRwJNY by [email protected]
0 likes, 0 repeats
@ck @gimulnautti @tdp_org @jwildeboer ah thanks for the pointer, found this one…
Post #Au1RQ2Lj1oLdRI5lh2 by [email protected]
0 likes, 0 repeats
@tdp_org ASN = Autonomous System Number in case this helps anyone besides me. A…
Post #Au1RUFH8uYlQgexIHo by [email protected]
0 likes, 0 repeats
@tdp_org What I don't understand is, I had expected the improved detection …
Post #Au1TCmOdEKKW6aG2Sm by [email protected]
0 likes, 1 repeats
@itgrrl yeah. I heard about that. I haven’t used Google to search in years. N…
Post #Au1agA0vCOE9Qm7mEK by [email protected]
0 likes, 0 repeats
@paco OMG that really deserves to be the name of a no-slop search engine. FuckF…
Post #Au1b6PeXfxefSoAtN2 by [email protected]
0 likes, 0 repeats
@tdp_org nice! I never ran the stats like that, but yes, checking forward/rever…
Post #Au1gFPhZMZolzZhZxo by [email protected]
0 likes, 0 repeats
@tdp_org @SimmerVigor Not being well-mannered is definitely a problem. One of m…
Post #Au1gNRWoSL7HyQV1Ky by [email protected]
0 likes, 0 repeats
@ross @piegames It doesn't matter. They only scrape a few files per unique …
Post #Au1gOdrG6FROVGdOTo by [email protected]
0 likes, 0 repeats
@tdp_org ohhhh god yes. I love it when I make bad line on graph go down sharply
Post #Au1gT6GePpkLlwdZNQ by [email protected]
0 likes, 0 repeats
@bertkoor @tdp_org "Known bot" as in "bot of known identity&quot…
Post #Au1j1pSH7FOXfmlJGi by [email protected]
0 likes, 0 repeats
@ross @piegames a zip bomb is probably more effective.
Post #Au1j4QghelbDj9kbbs by [email protected]
0 likes, 0 repeats
@kasperd @tdp_org Is this really the current state? Pretending to be Googlebot …
Post #Au1j4QmNJg8W0kP8S0 by [email protected]
0 likes, 0 repeats
@feld @tdp_org There is probably a combination of both. I wasn’t the one maki…
Post #Au1jASt2OTgqHiOjlA by [email protected]
0 likes, 0 repeats
@paco @itgrrl @tdp_org There are two reasons that’s not a viable alternative …
Post #Au1o9UjyW0yfW1W2i0 by [email protected]
0 likes, 0 repeats
@tdp_org @gimulnautti Anecdotally, I think special cases allowing or treating g…
Post #Au2B9r9BVR4txDRsie by [email protected]
0 likes, 1 repeats
So what do you use, @kasperd ?And what is the mistrust of Microsoft _hosting_? …
Post #Au7Bfwf9VR767LVCHw by [email protected]
0 likes, 0 repeats
@itgrrl @paco @kasperd @tdp_org ooh thanks! I knew about udm=14 but never even …
Post #Au7BfwoMxAUCZvoYee by [email protected]
0 likes, 1 repeats
@DrHyde @paco @kasperd @tdp_org
Post #Au8RcDALzx9IZGIWpM by [email protected]
0 likes, 0 repeats
@paco @kasperd @tdp_org the shitty results and AI slop aren't intended to k…
Post #Au8RcDIrUJxEzeHK5Y by [email protected]
0 likes, 0 repeats
@DrHyde I am inverting the logic to make a point. Let me try to be clearer.Ever…
You are viewing proxied material from pleroma.anduin.net. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.