Fix submit inserts

MrPetovan commented

2017-11-01 07:34:46 +01:00

(Migrated from github.com)

A wrong sprintf string made all new directory submission fail, this is now fixed. I'll try to manually resubmit all the inserts that failed this way that I can find in the logfile.

A wrong `sprintf` string made all new directory submission fail, this is now fixed. I'll try to manually resubmit all the inserts that failed this way that I can find in the logfile.

MrPetovan commented

2017-11-01 08:51:00 +01:00

(Migrated from github.com)

I found 271 accounts that were never inserted since October 23rd. I can't reliably go further because the logfile didn't log the PID of the worker, which means I can't associate a particular outcome with a particular request. I can resubmit all the URLs that ever entered the logfile, though.

MrPetovan commented

2017-11-01 09:41:48 +01:00

(Migrated from github.com)

Found 5061 total URLs submitted since I set up my directory, resubmitting them all. Most already existed or, worse, don't exist anymore, but I've been able to add 306 profiles back to the directory.

annando commented

2017-11-01 14:53:49 +01:00

(Migrated from github.com)

The directory should be able to do some contact discovery at the known servers. We have the structure for that.

MrPetovan commented

2017-11-01 14:58:37 +01:00

(Migrated from github.com)

This manual pull was out of the ordinary, as the insert process was broken for a while, but yeah, I agree. Is there a feed of discoverable users I can pull from?

annando commented

2017-11-01 15:38:49 +01:00

(Migrated from github.com)

Every server (when not forbidden by configuration) is reachable via /poco. See for example https://pirati.ca/poco

There all users are listed who don't disagree with being listed.

Other interesting paths are:

/poco/@server - This is a list of all servers that an instance does know
/poco/@global - All contacts that this server had recently contacted

The last two paths list every federated network.

Every server (when not forbidden by configuration) is reachable via ``` /poco```. See for example https://pirati.ca/poco There all users are listed who don't disagree with being listed. Other interesting paths are: - ```/poco/@server``` - This is a list of all servers that an instance does know - ```/poco/@global``` - All contacts that this server had recently contacted The last two paths list every federated network.

MrPetovan commented

2017-11-01 16:25:45 +01:00

(Migrated from github.com)

So a simple concept would be to regularly query each server for its own contacts, the servers it knows, and all the recently contacted profiles to garnish the sync pull list and run them through the usual submit process.

I'd rather not dabble in properties mapping from this output, but maybe I should? What do you think?

So a simple concept would be to regularly query each server for its own contacts, the servers it knows, and all the recently contacted profiles to garnish the sync pull list and run them through the usual submit process. I'd rather not dabble in properties mapping from this output, but maybe I should? What do you think?

annando commented

2017-11-01 16:33:14 +01:00

(Migrated from github.com)

The optional contact discovery that can be activated in the Friendica core is doing exactly this. It queries all known servers for a list of their known servers. And then it queries these servers for their local contacts and the public contacts the servers are knowing.

For the directory I could imagine that the server could do the first two steps (querying the known servers for a list of their servers and then query the servers for a list of their local contacts). These queries are really fast, so they don't create any relevant load.

I don't understand your last sentence since I really asking myself what "dabbling" could mean. The translation was misleading.

The optional contact discovery that can be activated in the Friendica core is doing exactly this. It queries all known servers for a list of their known servers. And then it queries these servers for their local contacts and the public contacts the servers are knowing. For the directory I could imagine that the server could do the first two steps (querying the known servers for a list of their servers and then query the servers for a list of their local contacts). These queries are really fast, so they don't create any relevant load. I don't understand your last sentence since I really asking myself what "dabbling" could mean. The translation was misleading.

MrPetovan commented

2017-11-01 17:04:01 +01:00

(Migrated from github.com)

Sorry, what I meant was that the poco output could theoretically be used to fill directory profiles fields, but I'd rather leave it to the current (no)scrape process.

What is it about the list of known public contacts that makes it slow?

Sorry, what I meant was that the `poco` output could theoretically be used to fill directory profiles fields, but I'd rather leave it to the current (no)scrape process. What is it about the list of known public contacts that makes it slow?

annando commented

2017-11-01 17:38:54 +01:00

(Migrated from github.com)

Ah okay. The fields in poco are reliable as well. This would prevent doing multiple network requests.

The list of known public contacts is slower because there is a small difference between returning 40 or 4,000 returned contacts ;-)

Ah okay. The fields in ```poco``` are reliable as well. This would prevent doing multiple network requests. The list of known public contacts is slower because there is a small difference between returning 40 or 4,000 returned contacts ;-)

MrPetovan commented

2017-11-01 17:49:28 +01:00

(Migrated from github.com)

I see the data structure allows for pagination, but I assume we didn't implement it yet?

Before I do that, though, I'd like to find a way to order the directory by last activity instead of last scraped. Maybe fetch the status page to retrieve the most recent post date?

I see the data structure allows for pagination, but I assume we didn't implement it yet? Before I do that, though, I'd like to find a way to order the directory by last activity instead of last scraped. Maybe fetch the status page to retrieve the most recent post date?

annando commented

2017-11-01 18:42:16 +01:00

(Migrated from github.com)

Pagination isn't implemented, how do you know? ;-)

I could add a date with the last public activity in noscrape. This would be better, I guess.

Pagination isn't implemented, how do you know? ;-) I could add a date with the last public activity in ```noscrape```. This would be better, I guess.

MrPetovan commented

2017-11-01 19:30:13 +01:00

(Migrated from github.com)

That would be great indeed. It isn't a top priority, however, don't get yourself worked up about it.

annando commented

2017-11-03 22:15:30 +01:00

(Migrated from github.com)

Have a look at https://pirati.ca/noscrape/heluecht - There is the field last-activity that displays the last activity (post or login) for privacy reasons in the format "year-number of week" (For example "2017-44").

Do you think that this is okay?

Have a look at https://pirati.ca/noscrape/heluecht - There is the field ```last-activity``` that displays the last activity (post or login) for privacy reasons in the format "year-number of week" (For example "2017-44"). Do you think that this is okay?

MrPetovan commented

2017-11-04 02:01:13 +01:00

(Migrated from github.com)

Oh yeah, definitely! Thanks a lot!

annando commented

2017-11-04 09:53:34 +01:00

(Migrated from github.com)

Pull request is done.

Rows
Columns

Fix submit inserts #34