Blockbot: Avoid false positives #849

Merged
annando merged 35 commits from false-positive into 2019.06-rc 2019-06-24 05:28:29 +02:00
annando commented 2019-04-27 13:54:15 +02:00 (Migrated from github.com)

Two known false positives are already reported: https://github.com/JayBizzle/Crawler-Detect/issues/326

Two known false positives are already reported: https://github.com/JayBizzle/Crawler-Detect/issues/326
MrPetovan commented 2019-04-27 15:35:50 +02:00 (Migrated from github.com)

I don't think this library is meant to separate the "good" from the "bad". I'm not sure you'll get much success reporting "good bot". I believe this library is used to separate browser visits from automated visits, whatever their intentions.

It's good that we have a whitelist locally, but I don't think they should be ported upstream because it isn't the spirit of the library.

I don't think this library is meant to separate the "good" from the "bad". I'm not sure you'll get much success reporting "good bot". I believe this library is used to separate browser visits from automated visits, whatever their intentions. It's good that we have a whitelist locally, but I don't think they should be ported upstream because it isn't the spirit of the library.
MrPetovan (Migrated from github.com) requested changes 2019-04-27 15:37:46 +02:00
annando commented 2019-04-27 15:39:46 +02:00 (Migrated from github.com)

This project is reporting HTTP libraries as bots that are used - for example - for Twidere. I don't think that blocking some Android client is intended. We will see the reaction to my issue.

This project is reporting HTTP libraries as bots that are used - for example - for Twidere. I don't think that blocking some Android client is intended. We will see the reaction to my issue.
annando (Migrated from github.com) reviewed 2019-04-27 15:41:25 +02:00
MrPetovan (Migrated from github.com) reviewed 2019-04-27 16:11:06 +02:00
annando (Migrated from github.com) reviewed 2019-04-27 17:26:50 +02:00
MrPetovan (Migrated from github.com) reviewed 2019-04-27 22:45:02 +02:00
annando (Migrated from github.com) reviewed 2019-04-27 22:49:09 +02:00
MrPetovan (Migrated from github.com) reviewed 2019-04-28 04:34:36 +02:00
JayBizzle commented 2019-04-29 21:24:03 +02:00 (Migrated from github.com)

I don't think this library is meant to separate the "good" from the "bad". I'm not sure you'll get much success reporting "good bot". I believe this library is used to separate browser visits from automated visits, whatever their intentions.

This pretty much sums it up 👍

> I don't think this library is meant to separate the "good" from the "bad". I'm not sure you'll get much success reporting "good bot". I believe this library is used to separate browser visits from automated visits, whatever their intentions. This pretty much sums it up 👍
annando commented 2019-04-29 22:13:03 +02:00 (Migrated from github.com)

Seems like we should remove that library and instead choose one that explicitly detects crawlers and bots - but no general HTTP libraries, Android clients, communication servers and so on.

I don't have a good feeling with this library. Although this PR contains a white list, it could happen that some other HTTP library is added or that some new AP server systems uses a flagged (and not whitelisted) agent.

Seems like we should remove that library and instead choose one that explicitly detects crawlers and bots - but no general HTTP libraries, Android clients, communication servers and so on. I don't have a good feeling with this library. Although this PR contains a white list, it could happen that some other HTTP library is added or that some new AP server systems uses a flagged (and not whitelisted) agent.
nupplaphil commented 2019-04-30 21:05:14 +02:00 (Migrated from github.com)

Hm .. Sounds really like a serious problem .. I now found a little bit time to add a small settings page for this addon.

But instead I can re-work the whole addon to add a blacklist instead of whitelisting false-positives of the library. We could add a "default"-blacklist which is loaded first, and after it every change will get saved to the config table.

What do you say?

Hm .. Sounds really like a serious problem .. I now found a little bit time to add a small settings page for this addon. But instead I can re-work the whole addon to add a blacklist instead of whitelisting false-positives of the library. We could add a "default"-blacklist which is loaded first, and after it every change will get saved to the config table. What do you say?
annando commented 2019-04-30 22:14:41 +02:00 (Migrated from github.com)

I don't like the idea of having an admin to enter bots by hand. Best is to find some library that does the job for us - or we compile our own list that is hardcoded to this addon.

I don't like the idea of having an admin to enter bots by hand. Best is to find some library that does the job for us - or we compile our own list that is hardcoded to this addon.
nupplaphil commented 2019-04-30 22:16:05 +02:00 (Migrated from github.com)

Hm .. I see.. I'd prefer a library, because it seems like a "standard" job and we would have to permanently add new crawlers and create PRs for it.

Hm .. I see.. I'd prefer a library, because it seems like a "standard" job and we would have to permanently add new crawlers and create PRs for it.
annando commented 2019-04-30 22:31:52 +02:00 (Migrated from github.com)

My current idea is to change this addon so that it has got a additional debug mode where the external library is used. Without this we would use only our own list. Had you got a look at my changes? I currently collect a whitelist and a blacklist. I only have to have a look at my logfile for newly detected bots to arrive there. Then i decide to which list I should add it.

My current idea is to change this addon so that it has got a additional debug mode where the external library is used. Without this we would use only our own list. Had you got a look at my changes? I currently collect a whitelist and a blacklist. I only have to have a look at my logfile for newly detected bots to arrive there. Then i decide to which list I should add it.
nupplaphil commented 2019-05-01 23:43:43 +02:00 (Migrated from github.com)

The changes are good.

Are you sure that we only need to check HTTP_USER_AGENT ? In the code of the original library, I found other variables too.

The changes are good. Are you sure that we only need to check `HTTP_USER_AGENT` ? In the code of the original library, I found other variables too.
annando commented 2019-05-02 10:39:31 +02:00 (Migrated from github.com)

I haven't looked that deep into the original library. What is it looking at?

I haven't looked that deep into the original library. What is it looking at?
MrPetovan (Migrated from github.com) requested changes 2019-05-03 14:56:57 +02:00
nupplaphil commented 2019-05-03 15:42:30 +02:00 (Migrated from github.com)
@annando Here are all headers they use: https://github.com/JayBizzle/Crawler-Detect/blob/master/src/Fixtures/Headers.php
annando commented 2019-05-29 21:24:05 +02:00 (Migrated from github.com)

I now changed the behaviour of this addon. It now has got a "training" mode that is only meant for developers. This addon will now only reject the access on known bots.

I now changed the behaviour of this addon. It now has got a "training" mode that is only meant for developers. This addon will now only reject the access on known bots.
MrPetovan commented 2019-05-30 01:30:17 +02:00 (Migrated from github.com)

I think you should rename the config to something more explicit for the end user. Sure, for you it’s a training mode because you’re watching the logs and editing the files manually, but for most admin it would be something akin to a “safe mode”.

I think you should rename the config to something more explicit for the end user. Sure, for you it’s a training mode because you’re watching the logs and editing the files manually, but for most admin it would be something akin to a “safe mode”.
annando commented 2019-05-30 06:26:00 +02:00 (Migrated from github.com)

I think that this config mode is really only meant for developers. On a production system - without constantly watching the logfiles - this config mode is disastrous, see the issue where someone had got problems connecting to Mastodon. We have to discourage people to not even consider using that mode. So "training" seemed appropriate to me.

I think that this config mode is really only meant for developers. On a production system - without constantly watching the logfiles - this config mode is disastrous, see the issue where someone had got problems connecting to Mastodon. We have to discourage people to not even consider using that mode. So "training" seemed appropriate to me.
MrPetovan (Migrated from github.com) approved these changes 2019-05-30 15:30:08 +02:00
MrPetovan (Migrated from github.com) approved these changes 2019-05-31 17:42:18 +02:00
MrPetovan (Migrated from github.com) approved these changes 2019-06-01 07:02:22 +02:00
annando commented 2019-06-01 09:03:04 +02:00 (Migrated from github.com)

BTW: I'm adding new crawlers whenever they are detected at my systems. So it would be wisely to merge this PR short before the release. (Or I would create some follow up PR)

BTW: I'm adding new crawlers whenever they are detected at my systems. So it would be wisely to merge this PR short before the release. (Or I would create some follow up PR)
MrPetovan (Migrated from github.com) approved these changes 2019-06-06 22:33:31 +02:00
nupplaphil (Migrated from github.com) approved these changes 2019-06-10 14:38:43 +02:00
MrPetovan (Migrated from github.com) approved these changes 2019-06-10 14:40:16 +02:00
nupplaphil commented 2019-06-10 15:06:44 +02:00 (Migrated from github.com)

@MrPetovan I don't have merge permissions, so I hope you don't wait on me :D

@MrPetovan I don't have merge permissions, so I hope you don't wait on me :D
MrPetovan commented 2019-06-10 15:20:29 +02:00 (Migrated from github.com)

I'm not, I'm waiting on @annando. He opened this PR way before it was actually ready to merge since he still is in "learning mode".

I'm not, I'm waiting on @annando. He opened this PR way before it was actually ready to merge since he still is in "learning mode".
MrPetovan (Migrated from github.com) approved these changes 2019-06-10 16:50:17 +02:00
MrPetovan (Migrated from github.com) approved these changes 2019-06-11 13:56:23 +02:00
MrPetovan (Migrated from github.com) approved these changes 2019-06-12 14:33:42 +02:00
MrPetovan (Migrated from github.com) approved these changes 2019-06-12 18:45:29 +02:00
MrPetovan (Migrated from github.com) approved these changes 2019-06-17 16:29:33 +02:00
MrPetovan (Migrated from github.com) approved these changes 2019-06-20 13:59:06 +02:00
MrPetovan (Migrated from github.com) approved these changes 2019-06-20 19:33:20 +02:00
MrPetovan (Migrated from github.com) approved these changes 2019-06-23 14:06:52 +02:00
annando commented 2019-06-24 05:03:38 +02:00 (Migrated from github.com)

@MrPetovan have you forgotten merging this PR?

@MrPetovan have you forgotten merging this PR?
MrPetovan commented 2019-06-24 05:28:08 +02:00 (Migrated from github.com)

Yup. I was saying to myself that it was too easy.

Yup. I was saying to myself that it was too easy.
MrPetovan commented 2019-06-24 05:35:47 +02:00 (Migrated from github.com)

And it's fixed. Addon releases are easier than core releases.

And it's fixed. Addon releases are easier than core releases.
Sign in to join this conversation.
No description provided.