[retriever] First submission of retriever plugin #898
No reviewers
Labels
No Label
2018.09
2019.01
2019.03
2019.06
2019.09
2019.12
2020.03
2020.06
2020.09
2020.12
2021.03
2021.07
2021.09
2022.02
2022.06
2022.09
2022.12
2023.04
2023.05
2023.09
2024.03
2024.06
dependencies
Hackathon 2021
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: friendica/friendica-addons#898
Loading…
Reference in New Issue
No description provided.
Delete Branch "mat/retriever"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This is my plugin to allow using Friendica to view entire news feeds, without having to click through to the upstream source. I use it together with mailstream to support offline news reading. It was really handy when I was living in China! There's probably other software that does the same thing, but this is mine :-P
I've actually been using this for 7 years. Sorry about the huge and long-delayed PR, but it'd be really nice if someone familiar with the core took a look at this.
In particular I don't really understand the relationship between the item and item-content tables, and data from the Item class vs what gets passed in to post_remote. I had to make my own retriever_get_body and retriever_set_body function to deal with this, which looks wrong.
The other thing that looks wrong is how I store pictures. This may seem like a redundant feature now that there's image proxying, but I think it's worth keeping since upstream content does sometimes disappear. However the pictures I download never get automatically deleted, so am I doing this wrong? For now I've made this an optional feature, defaulting to switched off.
Hey, better later than never!
Friendica now tries to comply with the PSR-2 coding standards, see https://friendica.mrpetovan.com/help/Developers-Intro#1_5_2
I'll let you look into it and make the necessary changes before I do a coding standard review of your code.
I've got a busy week coming up so it may be a while before I tackle the coding standards, but I'll get there.
Thanks for your PR! But I do have some general things about it. From a design perspective I would suggest a completely different approach. AFAIK you seem to have some kind of "retry" mechanism in it that is triggered by the cron mechanism.
I would suggest doing it completely different (and I desperately hope that this doesn't come rude). From the event when the content is posted, you could call a worker that will then do the update (if this was a feed). You can have a look at the Twitter addon on how worker calls can be performed for addons.
When the fetching of the content does fail, you can simply defer the worker (
Worker::defer()
). This will then automatically perform periodic retrials and - after some time of unsuccessful retrials - will remove the worker task.This should reduce the complexity of this addon a lot, since you wouldn't need any
finished
marker anymore and you wouldn't need to apply any retrial mechanism on your own.Also please don't do any direct database queries for the
item-content
table. Please use theItem
class instead to fetch and update the content. The whole structure of the item storage is very much encapsulated now, which is done to be able to change the database structure without problems.Also when creating the database structure file, please have a look at the field sizes for existing comparable fields. So
uid
should always bemediumint unsigned
, thecontact-id
should beint unsigned
and all boolean fields should have theboolean
type.Also please have a look at all calls to the
q(...)
function and replace them with the appropriateDBA::...
functions.There are also some other things in the code (like using the matching constants instead of using the constant value - or the way that the Logger entries had been built). But I think we should it step by step. It could be a longer process - but in the end the result should be shiny and brilliant :-)
@annando The worker idea sounds good! I've never really looked into it, and my retry mechanism is one of the most ancient and reliable parts of the code, but yeah it's a great use case for a worker system. It also probably solves a niggle I've wanted to solve for a while: it waits one whole cron cycle before actually fetching anything, whereas it really could be immediate.
I've been tearing my hair out trying to use the Item class to update content - I did try it, with "weird" results. It's particularly tricky since I often don't know if the item has been stored or not: this would be a lot easier if there was a post_remote hook that was called after the item has been stored. Or you could make these items real objects instead of arrays - you could have different classes for pre-store and post-store versions, but with similar APIs. I can give this another go though.
DBA is nicer and the biggest task I've been tackling the last two weeks is moving all my queries to that class. But I can't figure out how to do left outer join with it - any suggestions / examples? Also I find DBA::update counterintuitive, is there a more detailed explanation of how that $old_fields parameter works?
Concerning the DBA calls: When nothing else is possible, you could always perform calls via
DBA::p(...)
. ConcerningDBA::update()
: Just don't use the$old_fields
parameter. This is optional and mostly used for testing if an update is needed at all.For only being called on already posted items, you could use the
post_remote_end
andpost_local_end
hooks.Will resubmit once I've implemented comments above
You can always do a "draft PR" as well. Then people can comment on your work while you are improving the PR - until you consider this okay.
Pull request closed