1 Home
Robin van Boven edited this page 6 years ago

Directory concepts

Below is a breakdown of some concepts that make up the directory's workings. Since it's a federated network this may be relevant to both developers and admins making configuration choices.

Submitting profiles

Friendica servers will submit profiles that opt-in for the global directory. A submit is no more than a GET request containing the URL of a profile.

https://dir.example.com/submit?url=[profile_url] The url parameter is hex encoded.

The directory then does it's thing to request profile data, do health checks, syncing, etc. however it sees fit.

Scrape and noscrape

Originally profile information was obtained by parsing the HTML of a profile page and finding the relevant information through class selectors. However since this directory only keeps track of Friendica profiles, a JSON endpoint producing the same data was introduced called noscrape.

For every https://example.com/profile/alice profile URL. There is a counterpart https://example.com/noscrape/alice that produces the required information on later versions of Friendica.

Syncing

The syncing protocol, allows anyone to set up a new fully functional directory without asking for permissions and with only limited trust placed in the external directories. The only thing it shares is profile URLs. Every directory is then responsible for running it's own scrapes, maintenance and health checking logic. However it does allow admins to cooperate for performance reasons, if they want to, by coordinating their syncing settings.

Sync: Pushing

A directory can be configured to have push targets. This is a form of actively forwarding submissions.

When a directory with pushing enabled receives a submission, the URL is also stored in the push queue. The next time the sync cronjob is run it will take a configurable size batch from the push queue as well as the configured push targets and do a submission for all of them.

This happens exactly in the same way as a Friendica server would submit. A GET request to /submit?url=[profile_url].

Warning! Currently there is no protection against infinite push loops. Take this into consideration when choosing where to push to and when monitoring performance.

The opposite is also true. When you set no push targets and your directory receives submissions, these profiles may go by unnoticed on other directories unless they pull from your directory.

Sync: Pulling

A directory can be configured to have pull targets. This lets you actively pull submissions.

Every directory by default has syncing URLs available to pull from.

  • https://dir.example.com/sync/pull/all retrieves all current profile URLs.
  • https://dir.example.com/sync/pull/since/[when] retrieves profile URLs that have been modified or deleted since the when unix timestamp.

When you configure a pull target, your directory will pull from these endpoints during the sync cronjob. The first time it will do a "full sync" by requesting /sync/pull/all. From there on it will mark each directory with a timestamp to use with the /sync/pull/since/[when] endpoint.

Because the amount of profile URLs that this produces may be many, obtained URLs are stored in a queue in the database. Each time the sync cronjob runs it will process a configurable batch size of this queue.

Warning! Setting the batch size and/or number of threads to very high values may have a significant impact on performance of both your own server and the Friendica servers hosting the profiles. Try to keep these values sane so you don't accidentally run a DOS attack. (The default values are a good reference for "sane")

Site health

Directories will track the health of a Friendica server using several metrics. A health score, internally is a value between -100 and +100. When this score drops below a threshold (-20 by default), profiles associated with this site are removed from the directory.

  1. Speeds measured when submissions are processed.
  2. Speeds and server information obtained from "probes".

The first, submission related metrics, is quite straightforward. When a profile is submitted, the directory needs to make requests to the Friendica server. The time it takes to download profile information, download a profile image and scrape the profile page. This is only used for displaying in reports and does not impact the health score (though that could be changed).

The second, health probes, inspects more than just speed. It makes a GET request to https://example.com/friendica/json and checks for speed, SSL issues (like self-signed or expired certificates), Friendica version, plugins used, registration policy and more.

Several of these values are then used to update the health score. This uses a somewhat biased and outdated algorithm found here. It may be worthwhile to review with the community what should be considered indicative of healthy servers.

Note! Request speed and especially timeouts or failed requests are strong indicators of health. When your directory is the bottleneck for slow or failed requests it will misrepresent the health score of Friendica servers you're probing. This will eventually cause profiles to be deleted and "healthy public servers" to be removed too. Even if those are in perfectly good shape.

Healthy public servers

As displayed in https://dir.example.com/servers, public servers have slightly higher requirements than the health score alone. They need:

  • An open registration policy.
  • HTTPS support (without serious issues).
  • A decent health score.
  • A minimum amount of users.

Again it may be worthwhile to review with the community what should be considered indicative of healthy servers. But the idea is to help new Friendica users find a "good choice" of public server.