New Addon Bot detection #846
No reviewers
Labels
No labels
2018.09
2019.01
2019.03
2019.06
2019.09
2019.12
2020.03
2020.06
2020.09
2020.12
2021.03
2021.07
2021.09
2022.02
2022.06
2022.09
2022.12
2023.04
2023.05
2023.09
2024.03
2024.06
2024.09
2024.12
dependencies
Hackathon 2021
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: friendica/friendica-addons#846
Loading…
Reference in a new issue
No description provided.
Delete branch "features/6948-bot_detection"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
New addon for bot detection.
Using https://github.com/JayBizzle/Crawler-Detect
Solves https://github.com/friendica/friendica/issues/6948
😁 We are working on the same issue: https://github.com/annando/friendica-addons/blob/blockbot/blockbots/blockbots.php
ah :D .. OK I'm closing it :D
I thought using the suggested library is quite easy. Did you find a cleaner way? Or do you want to avoid a new library?
I wouldn't hinder you on continue working on your addon. If you want to, please continue. Only thing is that you please return error 403, not 404. I'm just testing the bot detection, that's why the branch is so complicated.
Concerning this composer thingy: I really would like to delegate this whole composer stuff for addons to you 😀
Challenge accepted :D
@MrPetovan I did it like you said with the composer.
The advancedcontentfilter includes the
vendor
directory, so I added it too. Was the include intended? Just to be sure ;-) (there are now 3.000 lines of code now included because of the vendor directory ;-) )I don't really like to include a static vendor folder. Thing is: This is a "living" library that is extended constantly. To reliably detect new bots, the library has to be updated all the time.
Whether it is in the addon composer or the core composer, both would require a
composer update
run to update the bot list. You would be updatingfriendica/friendica-addons
instead offriendica/friendica
.Is there any way that administrators (not the developers!) could update this library on their own, without provoking a GIT problem because of locally changed files?
I added the autoloader require, although it really worked on my local node without it. I faked the "Google Bot" user agent and got a WSOD with 403 :-)
They should not alter the added sources of the
addon
folder. But afaik there's no possibility to update the patterns without a composer updateDid you have the library in your core vendor folder still? That could explain why it's working without the addon autoloader..
Not if the library doesn't provide a download feature from a remote server, whether it was in the addon or core composer. It would be up to us to update the definition file on a regular basis, ideally before each release.
Yep, you're right ... Thanks for the hint!
I'm seeing a lot of 403's for
"python-opengraph-jaywink/0.2.0 (+https://github.com/jaywink/python-opengraph)"
and"Social-Relay/1.6.0-dev - https://github.com/jaywink/social-relay"
now. It this intended, completely unrelated or does the add-on break the Diaspora relay?When I disable the add-on the 403's for these two user agents are gone. So, it seems related to the add-on.
And, apparently, it breaks my server/node monitoring. The monitor crawls the nodeinfo every couple of minutes. When it gets a 40x or 50x I'm notified via email about it.
Seems like the crawler is a little bit more restrictive than necessary...
Both are bots, so it makes sense. For me the problem isn't in the library, but in its underlying motivation. You simply can't reliably guess the good/bad intents from any single request, and any attempt to close part or all the incoming traffic based on broad rules will inevitably fail with false positives or false negatives.
Yes, I understand that. But there should be a whitelist for the admin then or at least a fat warning that the add-on may break some communication. When the add-on breaks communication with the relay or some other useful remote services it is kind of dangerous to activate it.
I guess not every admin is looking very closely on the logs. I see a lot of bug reports/support questions coming. :-)
A whitelist sounds like a nice addition.
Whitelist should be easy
blockbot.whitelist
$crawlerDetector->isCrawler() && !inWhitelist()
:-)
Could you add a descriptive README.md file to the addon explaining what it does and what problems it might raise?
I have got a whitelist in my corresponding PR. But I think we should create issues at that repository to clean up this list.
I'd like to do both (README, Whitelist) in the next 2 days - if there's no other distraction in business ;-) .