Blockbot: Avoid false positives #849
No reviewers
Labels
No labels
2018.09
2019.01
2019.03
2019.06
2019.09
2019.12
2020.03
2020.06
2020.09
2020.12
2021.03
2021.07
2021.09
2022.02
2022.06
2022.09
2022.12
2023.04
2023.05
2023.09
2024.03
2024.06
2024.09
2024.12
dependencies
Hackathon 2021
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: friendica/friendica-addons#849
Loading…
Reference in a new issue
No description provided.
Delete branch "false-positive"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Two known false positives are already reported: https://github.com/JayBizzle/Crawler-Detect/issues/326
I don't think this library is meant to separate the "good" from the "bad". I'm not sure you'll get much success reporting "good bot". I believe this library is used to separate browser visits from automated visits, whatever their intentions.
It's good that we have a whitelist locally, but I don't think they should be ported upstream because it isn't the spirit of the library.
This project is reporting HTTP libraries as bots that are used - for example - for Twidere. I don't think that blocking some Android client is intended. We will see the reaction to my issue.
This pretty much sums it up 👍
Seems like we should remove that library and instead choose one that explicitly detects crawlers and bots - but no general HTTP libraries, Android clients, communication servers and so on.
I don't have a good feeling with this library. Although this PR contains a white list, it could happen that some other HTTP library is added or that some new AP server systems uses a flagged (and not whitelisted) agent.
Hm .. Sounds really like a serious problem .. I now found a little bit time to add a small settings page for this addon.
But instead I can re-work the whole addon to add a blacklist instead of whitelisting false-positives of the library. We could add a "default"-blacklist which is loaded first, and after it every change will get saved to the config table.
What do you say?
I don't like the idea of having an admin to enter bots by hand. Best is to find some library that does the job for us - or we compile our own list that is hardcoded to this addon.
Hm .. I see.. I'd prefer a library, because it seems like a "standard" job and we would have to permanently add new crawlers and create PRs for it.
My current idea is to change this addon so that it has got a additional debug mode where the external library is used. Without this we would use only our own list. Had you got a look at my changes? I currently collect a whitelist and a blacklist. I only have to have a look at my logfile for newly detected bots to arrive there. Then i decide to which list I should add it.
The changes are good.
Are you sure that we only need to check
HTTP_USER_AGENT
? In the code of the original library, I found other variables too.I haven't looked that deep into the original library. What is it looking at?
@annando Here are all headers they use: https://github.com/JayBizzle/Crawler-Detect/blob/master/src/Fixtures/Headers.php
I now changed the behaviour of this addon. It now has got a "training" mode that is only meant for developers. This addon will now only reject the access on known bots.
I think you should rename the config to something more explicit for the end user. Sure, for you it’s a training mode because you’re watching the logs and editing the files manually, but for most admin it would be something akin to a “safe mode”.
I think that this config mode is really only meant for developers. On a production system - without constantly watching the logfiles - this config mode is disastrous, see the issue where someone had got problems connecting to Mastodon. We have to discourage people to not even consider using that mode. So "training" seemed appropriate to me.
BTW: I'm adding new crawlers whenever they are detected at my systems. So it would be wisely to merge this PR short before the release. (Or I would create some follow up PR)
@MrPetovan I don't have merge permissions, so I hope you don't wait on me :D
I'm not, I'm waiting on @annando. He opened this PR way before it was actually ready to merge since he still is in "learning mode".
@MrPetovan have you forgotten merging this PR?
Yup. I was saying to myself that it was too easy.
And it's fixed. Addon releases are easier than core releases.