New Addon Bot detection #846

Merged
nupplaphil merged 4 commits from features/6948-bot_detection into develop 2019-04-22 13:49:18 +02:00
nupplaphil commented 2019-04-20 14:18:24 +02:00 (Migrated from github.com)

New addon for bot detection.
Using https://github.com/JayBizzle/Crawler-Detect

Solves https://github.com/friendica/friendica/issues/6948

  • adding composer.json
  • rename to "blockbot"
New addon for bot detection. Using https://github.com/JayBizzle/Crawler-Detect Solves https://github.com/friendica/friendica/issues/6948 - [x] adding composer.json - [x] rename to "blockbot"
annando commented 2019-04-20 16:41:43 +02:00 (Migrated from github.com)
:grin: We are working on the same issue: https://github.com/annando/friendica-addons/blob/blockbot/blockbots/blockbots.php
nupplaphil commented 2019-04-20 16:43:59 +02:00 (Migrated from github.com)

ah :D .. OK I'm closing it :D

I thought using the suggested library is quite easy. Did you find a cleaner way? Or do you want to avoid a new library?

ah :D .. OK I'm closing it :D I thought using the suggested library is quite easy. Did you find a cleaner way? Or do you want to avoid a new library?
annando commented 2019-04-20 16:50:08 +02:00 (Migrated from github.com)

I wouldn't hinder you on continue working on your addon. If you want to, please continue. Only thing is that you please return error 403, not 404. I'm just testing the bot detection, that's why the branch is so complicated.

Concerning this composer thingy: I really would like to delegate this whole composer stuff for addons to you 😀

I wouldn't hinder you on continue working on your addon. If you want to, please continue. Only thing is that you please return error 403, not 404. I'm just testing the bot detection, that's why the branch is so complicated. Concerning this composer thingy: I really would like to delegate this whole composer stuff for addons to you :grinning:
nupplaphil commented 2019-04-20 20:22:11 +02:00 (Migrated from github.com)

Challenge accepted :D

Challenge accepted :D
nupplaphil commented 2019-04-20 20:41:46 +02:00 (Migrated from github.com)

@MrPetovan I did it like you said with the composer.

The advancedcontentfilter includes the vendor directory, so I added it too. Was the include intended? Just to be sure ;-) (there are now 3.000 lines of code now included because of the vendor directory ;-) )

@MrPetovan I did it like you said with the composer. The advancedcontentfilter includes the `vendor` directory, so I added it too. Was the include intended? Just to be sure ;-) (there are now 3.000 lines of code now included because of the vendor directory ;-) )
MrPetovan (Migrated from github.com) requested changes 2019-04-20 23:40:37 +02:00
annando commented 2019-04-21 00:00:03 +02:00 (Migrated from github.com)

I don't really like to include a static vendor folder. Thing is: This is a "living" library that is extended constantly. To reliably detect new bots, the library has to be updated all the time.

I don't really like to include a static vendor folder. Thing is: This is a "living" library that is extended constantly. To reliably detect new bots, the library has to be updated all the time.
MrPetovan commented 2019-04-21 00:13:49 +02:00 (Migrated from github.com)

Whether it is in the addon composer or the core composer, both would require a composer update run to update the bot list. You would be updating friendica/friendica-addons instead of friendica/friendica.

Whether it is in the addon composer or the core composer, both would require a `composer update` run to update the bot list. You would be updating `friendica/friendica-addons` instead of `friendica/friendica`.
annando commented 2019-04-21 07:35:05 +02:00 (Migrated from github.com)

Is there any way that administrators (not the developers!) could update this library on their own, without provoking a GIT problem because of locally changed files?

Is there any way that administrators (not the developers!) could update this library on their own, without provoking a GIT problem because of locally changed files?
nupplaphil (Migrated from github.com) reviewed 2019-04-21 12:27:19 +02:00
nupplaphil commented 2019-04-21 12:36:34 +02:00 (Migrated from github.com)

I added the autoloader require, although it really worked on my local node without it. I faked the "Google Bot" user agent and got a WSOD with 403 :-)

I added the autoloader require, although it really worked on my local node without it. I faked the "Google Bot" user agent and got a WSOD with 403 :-)
nupplaphil commented 2019-04-21 12:38:02 +02:00 (Migrated from github.com)

Is there any way that administrators (not the developers!) could update this library on their own, without provoking a GIT problem because of locally changed files?

They should not alter the added sources of the addon folder. But afaik there's no possibility to update the patterns without a composer update

> Is there any way that administrators (not the developers!) could update this library on their own, without provoking a GIT problem because of locally changed files? They should not alter the added sources of the `addon` folder. But afaik there's no possibility to update the patterns without a composer update
MrPetovan commented 2019-04-21 13:48:32 +02:00 (Migrated from github.com)

I added the autoloader require, although it really worked on my local node without it. I faked the "Google Bot" user agent and got a WSOD with 403 :-)

Did you have the library in your core vendor folder still? That could explain why it's working without the addon autoloader..

> I added the autoloader require, although it really worked on my local node without it. I faked the "Google Bot" user agent and got a WSOD with 403 :-) Did you have the library in your core vendor folder still? That could explain why it's working without the addon autoloader..
MrPetovan commented 2019-04-21 13:49:27 +02:00 (Migrated from github.com)

Is there any way that administrators (not the developers!) could update this library on their own, without provoking a GIT problem because of locally changed files?

Not if the library doesn't provide a download feature from a remote server, whether it was in the addon or core composer. It would be up to us to update the definition file on a regular basis, ideally before each release.

> Is there any way that administrators (not the developers!) could update this library on their own, without provoking a GIT problem because of locally changed files? Not if the library doesn't provide a download feature from a remote server, whether it was in the addon or core composer. It would be up to us to update the definition file on a regular basis, ideally before each release.
nupplaphil commented 2019-04-21 13:51:49 +02:00 (Migrated from github.com)

I added the autoloader require, although it really worked on my local node without it. I faked the "Google Bot" user agent and got a WSOD with 403 :-)

Did you have the library in your core vendor folder still? That could explain why it's working without the addon autoloader..

Yep, you're right ... Thanks for the hint!

> > I added the autoloader require, although it really worked on my local node without it. I faked the "Google Bot" user agent and got a WSOD with 403 :-) > > Did you have the library in your core vendor folder still? That could explain why it's working without the addon autoloader.. Yep, you're right ... Thanks for the hint!
MrPetovan (Migrated from github.com) requested changes 2019-04-21 14:37:00 +02:00
nupplaphil (Migrated from github.com) reviewed 2019-04-21 17:19:47 +02:00
MrPetovan (Migrated from github.com) reviewed 2019-04-21 18:16:35 +02:00
MrPetovan (Migrated from github.com) approved these changes 2019-04-22 13:49:09 +02:00
AlfredSK commented 2019-04-22 15:14:09 +02:00 (Migrated from github.com)

I'm seeing a lot of 403's for "python-opengraph-jaywink/0.2.0 (+https://github.com/jaywink/python-opengraph)" and "Social-Relay/1.6.0-dev - https://github.com/jaywink/social-relay" now. It this intended, completely unrelated or does the add-on break the Diaspora relay?

I'm seeing a lot of **403**'s for `"python-opengraph-jaywink/0.2.0 (+https://github.com/jaywink/python-opengraph)"` and `"Social-Relay/1.6.0-dev - https://github.com/jaywink/social-relay"` now. It this intended, completely unrelated or does the add-on break the Diaspora relay?
AlfredSK commented 2019-04-22 15:25:16 +02:00 (Migrated from github.com)

When I disable the add-on the 403's for these two user agents are gone. So, it seems related to the add-on.

When I disable the add-on the 403's for these two user agents are gone. So, it seems related to the add-on.
AlfredSK commented 2019-04-22 15:34:37 +02:00 (Migrated from github.com)

And, apparently, it breaks my server/node monitoring. The monitor crawls the nodeinfo every couple of minutes. When it gets a 40x or 50x I'm notified via email about it.

--SERVICE-ALERT-------------------
-
- Hostaddress: 134.119.20.10
- Hostname: librabox (J125736)
- Service: Friendica Nodeinfo (M9346)
- - - - - - - - - - - - - - - - -
- State: CRITICAL
- Date: 2019-04-22 15:06:03
- Output: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/1.1 403 Error 403 - Forbidden
-
----------------------------------
And, apparently, it breaks my server/node monitoring. The monitor crawls the nodeinfo every couple of minutes. When it gets a 40x or 50x I'm notified via email about it. ``` --SERVICE-ALERT------------------- - - Hostaddress: 134.119.20.10 - Hostname: librabox (J125736) - Service: Friendica Nodeinfo (M9346) - - - - - - - - - - - - - - - - - - State: CRITICAL - Date: 2019-04-22 15:06:03 - Output: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/1.1 403 Error 403 - Forbidden - ---------------------------------- ```
nupplaphil commented 2019-04-22 15:37:19 +02:00 (Migrated from github.com)

Seems like the crawler is a little bit more restrictive than necessary...

Seems like the crawler is a little bit more restrictive than necessary...
MrPetovan commented 2019-04-22 15:54:04 +02:00 (Migrated from github.com)

Both are bots, so it makes sense. For me the problem isn't in the library, but in its underlying motivation. You simply can't reliably guess the good/bad intents from any single request, and any attempt to close part or all the incoming traffic based on broad rules will inevitably fail with false positives or false negatives.

Both are bots, so it makes sense. For me the problem isn't in the library, but in its underlying motivation. You simply can't reliably guess the good/bad intents from any single request, and any attempt to close part or all the incoming traffic based on broad rules will inevitably fail with false positives or false negatives.
AlfredSK commented 2019-04-22 16:00:46 +02:00 (Migrated from github.com)

Yes, I understand that. But there should be a whitelist for the admin then or at least a fat warning that the add-on may break some communication. When the add-on breaks communication with the relay or some other useful remote services it is kind of dangerous to activate it.
I guess not every admin is looking very closely on the logs. I see a lot of bug reports/support questions coming. :-)

Yes, I understand that. But there should be a whitelist for the admin then or at least a fat warning that the add-on may break some communication. When the add-on breaks communication with the relay or some other useful remote services it is kind of dangerous to activate it. I guess not every admin is looking very closely on the logs. I see a lot of bug reports/support questions coming. :-)
MrPetovan commented 2019-04-22 16:45:58 +02:00 (Migrated from github.com)

A whitelist sounds like a nice addition.

A whitelist sounds like a nice addition.
nupplaphil commented 2019-04-22 21:03:49 +02:00 (Migrated from github.com)

Whitelist should be easy

  • adding a textbox to the add-on admin page
  • save each line as an array to config blockbot.whitelist
  • $crawlerDetector->isCrawler() && !inWhitelist()

:-)

Whitelist should be easy - adding a textbox to the add-on admin page - save each line as an array to config `blockbot.whitelist` - `$crawlerDetector->isCrawler() && !inWhitelist()` :-)
tobiasd commented 2019-04-23 08:21:12 +02:00 (Migrated from github.com)

New Addon Bot detection

Could you add a descriptive README.md file to the addon explaining what it does and what problems it might raise?

> New Addon Bot detection Could you add a descriptive README.md file to the addon explaining what it does and what problems it might raise?
annando commented 2019-04-23 10:02:58 +02:00 (Migrated from github.com)

I have got a whitelist in my corresponding PR. But I think we should create issues at that repository to clean up this list.

I have got a whitelist in my corresponding PR. But I think we should create issues at that repository to clean up this list.
nupplaphil commented 2019-04-23 10:32:04 +02:00 (Migrated from github.com)

I'd like to do both (README, Whitelist) in the next 2 days - if there's no other distraction in business ;-) .

I'd like to do both (README, Whitelist) in the next 2 days - if there's no other distraction in business ;-) .
Sign in to join this conversation.
No description provided.