Fix submit inserts #34
Loading…
Reference in New Issue
No description provided.
Delete Branch "bug/fix-submit-inserts"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
A wrong
sprintf
string made all new directory submission fail, this is now fixed. I'll try to manually resubmit all the inserts that failed this way that I can find in the logfile.I found 271 accounts that were never inserted since October 23rd. I can't reliably go further because the logfile didn't log the PID of the worker, which means I can't associate a particular outcome with a particular request. I can resubmit all the URLs that ever entered the logfile, though.
Found 5061 total URLs submitted since I set up my directory, resubmitting them all. Most already existed or, worse, don't exist anymore, but I've been able to add 306 profiles back to the directory.
The directory should be able to do some contact discovery at the known servers. We have the structure for that.
This manual pull was out of the ordinary, as the insert process was broken for a while, but yeah, I agree. Is there a feed of discoverable users I can pull from?
Every server (when not forbidden by configuration) is reachable via
/poco
. See for example https://pirati.ca/pocoThere all users are listed who don't disagree with being listed.
Other interesting paths are:
/poco/@server
- This is a list of all servers that an instance does know/poco/@global
- All contacts that this server had recently contactedThe last two paths list every federated network.
So a simple concept would be to regularly query each server for its own contacts, the servers it knows, and all the recently contacted profiles to garnish the sync pull list and run them through the usual submit process.
I'd rather not dabble in properties mapping from this output, but maybe I should? What do you think?
The optional contact discovery that can be activated in the Friendica core is doing exactly this. It queries all known servers for a list of their known servers. And then it queries these servers for their local contacts and the public contacts the servers are knowing.
For the directory I could imagine that the server could do the first two steps (querying the known servers for a list of their servers and then query the servers for a list of their local contacts). These queries are really fast, so they don't create any relevant load.
I don't understand your last sentence since I really asking myself what "dabbling" could mean. The translation was misleading.
Sorry, what I meant was that the
poco
output could theoretically be used to fill directory profiles fields, but I'd rather leave it to the current (no)scrape process.What is it about the list of known public contacts that makes it slow?
Ah okay. The fields in
poco
are reliable as well. This would prevent doing multiple network requests.The list of known public contacts is slower because there is a small difference between returning 40 or 4,000 returned contacts ;-)
I see the data structure allows for pagination, but I assume we didn't implement it yet?
Before I do that, though, I'd like to find a way to order the directory by last activity instead of last scraped. Maybe fetch the status page to retrieve the most recent post date?
Pagination isn't implemented, how do you know? ;-)
I could add a date with the last public activity in
noscrape
. This would be better, I guess.That would be great indeed. It isn't a top priority, however, don't get yourself worked up about it.
Have a look at https://pirati.ca/noscrape/heluecht - There is the field
last-activity
that displays the last activity (post or login) for privacy reasons in the format "year-number of week" (For example "2017-44").Do you think that this is okay?
Oh yeah, definitely! Thanks a lot!
Pull request is done.