[retriever] First submission of retriever plugin #898

Closed
mexon wants to merge 1 commits from mat/retriever into develop
mexon commented 2019-10-13 16:53:12 +02:00 (Migrated from github.com)

This is my plugin to allow using Friendica to view entire news feeds, without having to click through to the upstream source. I use it together with mailstream to support offline news reading. It was really handy when I was living in China! There's probably other software that does the same thing, but this is mine :-P

I've actually been using this for 7 years. Sorry about the huge and long-delayed PR, but it'd be really nice if someone familiar with the core took a look at this.

In particular I don't really understand the relationship between the item and item-content tables, and data from the Item class vs what gets passed in to post_remote. I had to make my own retriever_get_body and retriever_set_body function to deal with this, which looks wrong.

The other thing that looks wrong is how I store pictures. This may seem like a redundant feature now that there's image proxying, but I think it's worth keeping since upstream content does sometimes disappear. However the pictures I download never get automatically deleted, so am I doing this wrong? For now I've made this an optional feature, defaulting to switched off.

This is my plugin to allow using Friendica to view entire news feeds, without having to click through to the upstream source. I use it together with mailstream to support offline news reading. It was really handy when I was living in China! There's probably other software that does the same thing, but this is mine :-P I've actually been using this for 7 years. Sorry about the huge and long-delayed PR, but it'd be really nice if someone familiar with the core took a look at this. In particular I don't really understand the relationship between the item and item-content tables, and data from the Item class vs what gets passed in to post_remote. I had to make my own retriever_get_body and retriever_set_body function to deal with this, which looks wrong. The other thing that looks wrong is how I store pictures. This may seem like a redundant feature now that there's image proxying, but I think it's worth keeping since upstream content does sometimes disappear. However the pictures I download never get automatically deleted, so am I doing this wrong? For now I've made this an optional feature, defaulting to switched off.
MrPetovan commented 2019-10-13 17:21:05 +02:00 (Migrated from github.com)

Hey, better later than never!

Hey, better later than never!
MrPetovan (Migrated from github.com) requested changes 2019-10-13 17:35:53 +02:00
MrPetovan (Migrated from github.com) left a comment

Friendica now tries to comply with the PSR-2 coding standards, see https://friendica.mrpetovan.com/help/Developers-Intro#1_5_2

I'll let you look into it and make the necessary changes before I do a coding standard review of your code.

Friendica now tries to comply with the PSR-2 coding standards, see https://friendica.mrpetovan.com/help/Developers-Intro#1_5_2 I'll let you look into it and make the necessary changes before I do a coding standard review of your code.
mexon commented 2019-10-13 18:34:04 +02:00 (Migrated from github.com)

I've got a busy week coming up so it may be a while before I tackle the coding standards, but I'll get there.

I've got a busy week coming up so it may be a while before I tackle the coding standards, but I'll get there.
annando commented 2019-10-13 18:41:27 +02:00 (Migrated from github.com)

Thanks for your PR! But I do have some general things about it. From a design perspective I would suggest a completely different approach. AFAIK you seem to have some kind of "retry" mechanism in it that is triggered by the cron mechanism.

I would suggest doing it completely different (and I desperately hope that this doesn't come rude). From the event when the content is posted, you could call a worker that will then do the update (if this was a feed). You can have a look at the Twitter addon on how worker calls can be performed for addons.

When the fetching of the content does fail, you can simply defer the worker (Worker::defer()). This will then automatically perform periodic retrials and - after some time of unsuccessful retrials - will remove the worker task.

This should reduce the complexity of this addon a lot, since you wouldn't need any finished marker anymore and you wouldn't need to apply any retrial mechanism on your own.

Also please don't do any direct database queries for the item-content table. Please use the Item class instead to fetch and update the content. The whole structure of the item storage is very much encapsulated now, which is done to be able to change the database structure without problems.

Also when creating the database structure file, please have a look at the field sizes for existing comparable fields. So uid should always be mediumint unsigned, the contact-id should be int unsigned and all boolean fields should have the boolean type.

Also please have a look at all calls to the q(...) function and replace them with the appropriate DBA::... functions.

There are also some other things in the code (like using the matching constants instead of using the constant value - or the way that the Logger entries had been built). But I think we should it step by step. It could be a longer process - but in the end the result should be shiny and brilliant :-)

Thanks for your PR! But I do have some general things about it. From a design perspective I would suggest a completely different approach. AFAIK you seem to have some kind of "retry" mechanism in it that is triggered by the cron mechanism. I would suggest doing it completely different (and I desperately hope that this doesn't come rude). From the event when the content is posted, you could call a worker that will then do the update (if this was a feed). You can have a look at the Twitter addon on how worker calls can be performed for addons. When the fetching of the content does fail, you can simply defer the worker (```Worker::defer()```). This will then automatically perform periodic retrials and - after some time of unsuccessful retrials - will remove the worker task. This should reduce the complexity of this addon a lot, since you wouldn't need any ```finished``` marker anymore and you wouldn't need to apply any retrial mechanism on your own. Also please don't do any direct database queries for the ```item-content``` table. Please use the ```Item``` class instead to fetch and update the content. The whole structure of the item storage is very much encapsulated now, which is done to be able to change the database structure without problems. Also when creating the database structure file, please have a look at the field sizes for existing comparable fields. So ```uid``` should always be ```mediumint unsigned```, the ```contact-id``` should be ```int unsigned``` and all boolean fields should have the ```boolean``` type. Also please have a look at all calls to the ```q(...)``` function and replace them with the appropriate ```DBA::...``` functions. There are also some other things in the code (like using the matching constants instead of using the constant value - or the way that the Logger entries had been built). But I think we should it step by step. It could be a longer process - but in the end the result should be shiny and brilliant :-)
mexon commented 2019-10-13 19:02:06 +02:00 (Migrated from github.com)

@annando The worker idea sounds good! I've never really looked into it, and my retry mechanism is one of the most ancient and reliable parts of the code, but yeah it's a great use case for a worker system. It also probably solves a niggle I've wanted to solve for a while: it waits one whole cron cycle before actually fetching anything, whereas it really could be immediate.

I've been tearing my hair out trying to use the Item class to update content - I did try it, with "weird" results. It's particularly tricky since I often don't know if the item has been stored or not: this would be a lot easier if there was a post_remote hook that was called after the item has been stored. Or you could make these items real objects instead of arrays - you could have different classes for pre-store and post-store versions, but with similar APIs. I can give this another go though.

DBA is nicer and the biggest task I've been tackling the last two weeks is moving all my queries to that class. But I can't figure out how to do left outer join with it - any suggestions / examples? Also I find DBA::update counterintuitive, is there a more detailed explanation of how that $old_fields parameter works?

@annando The worker idea sounds good! I've never really looked into it, and my retry mechanism is one of the most ancient and reliable parts of the code, but yeah it's a great use case for a worker system. It also probably solves a niggle I've wanted to solve for a while: it waits one whole cron cycle before actually fetching anything, whereas it really could be immediate. I've been tearing my hair out trying to use the Item class to update content - I did try it, with "weird" results. It's particularly tricky since I often don't know if the item has been stored or not: this would be a lot easier if there was a post_remote hook that was called *after* the item has been stored. Or you could make these items real objects instead of arrays - you could have different classes for pre-store and post-store versions, but with similar APIs. I can give this another go though. DBA is nicer and the biggest task I've been tackling the last two weeks is moving all my queries to that class. But I can't figure out how to do left outer join with it - any suggestions / examples? Also I find DBA::update counterintuitive, is there a more detailed explanation of how that $old_fields parameter works?
annando commented 2019-10-13 19:35:24 +02:00 (Migrated from github.com)

Concerning the DBA calls: When nothing else is possible, you could always perform calls via DBA::p(...). Concerning DBA::update(): Just don't use the $old_fields parameter. This is optional and mostly used for testing if an update is needed at all.

For only being called on already posted items, you could use the post_remote_end and post_local_end hooks.

Concerning the DBA calls: When nothing else is possible, you could always perform calls via ```DBA::p(...)```. Concerning ```DBA::update()```: Just don't use the ```$old_fields``` parameter. This is optional and mostly used for testing if an update is needed at all. For only being called on already posted items, you could use the ```post_remote_end``` and ```post_local_end``` hooks.
mexon commented 2019-10-16 13:25:20 +02:00 (Migrated from github.com)

Will resubmit once I've implemented comments above

Will resubmit once I've implemented comments above
annando commented 2019-10-16 14:45:33 +02:00 (Migrated from github.com)

You can always do a "draft PR" as well. Then people can comment on your work while you are improving the PR - until you consider this okay.

You can always do a "draft PR" as well. Then people can comment on your work while you are improving the PR - until you consider this okay.

Pull request closed

Sign in to join this conversation.
No description provided.