Merge pull request #1 from Beanow/develop

Release new features
This commit is contained in:
RedMatrix 2014-08-10 09:06:31 +10:00
commit 1bde8a76c4
29 changed files with 2439 additions and 423 deletions

78
.htconfig.php Normal file
View File

@ -0,0 +1,78 @@
<?php
//MySQL host.
$db_host = 'localhost';
$db_user = 'friendica-dir';
$db_pass = 'thisisyourpasswordbuddy';
$db_data = 'friendica-dir';
// Choose a legal default timezone. If you are unsure, use "America/Los_Angeles".
// It can be changed later and only applies to timestamps for anonymous viewers.
$default_timezone = 'Europe/Amsterdam';
// What is your site name?
$a->config['sitename'] = "EXPERIMENTAL Friendica public directory";
//Settings related to the syncing feature.
$a->config['syncing'] = array(
//Pulling may be quite intensive at first when it has to do a full sync and your directory is empty.
//This timeout should be shorter than your cronjob interval. Preferably with a little breathing room.
'timeout' => 3*60, //3 minutes
//Push new submits to the `sync-target` entries?
'enable_pushing' => true,
//Maximum amount of items per batch per target to push to other sync-targets.
//For example: 3 targets x20 items = 60 requests.
'max_push_items' => 10,
//Pull updates from the `sync-target` entries?
'enable_pulling' => true,
//This is your normal amount of threads for pulling.
//With regular intervals, there's no need to give this a high value.
//But when your server is brand new, you may want to keep this high for the first day or two.
'pulling_threads' => 25,
//How many items should we crawl per sync?
'max_pull_items' => 250
);
//Things related to site-health monitoring.
$a->config['site-health'] = array(
//Wait for at least ... before probing a site again.
//The longer this value, the more "stable" site-healths will be over time.
//Note: If a bad (negative) health site submits something, a probe will be performed regardless.
'min_probe_delay' => 3*24*3600, // 3 days
//Probes get a simple /friendica/json file from the server.
//Feel free to set this timeout to a very tight value.
'probe_timeout' => 5, // seconds
//Imports should be fast. Feel free to prioritize healthy sites.
'skip_import_threshold' => -20
);
//Things related to the maintenance cronjob.
$a->config['maintenance'] = array(
//This is to prevent I/O blocking. Will cost you some RAM overhead though.
//A good server should handle much more than this default, so you can tweak this.
'threads' => 10,
//Limit the amount of scrapes per execution of the maintainer.
//This will depend a lot on the frequency with which you call the maintainer.
//If you have 10 threads and 80 max_scrapes, that means each thread will handle 8 scrapes.
'max_scrapes' => 80,
//Wait for at least ... before scraping a profile again.
'min_scrape_delay' => 3*24*3600, // 3 days
//At which health value should we start removing profiles?
'remove_profile_health_threshold' => -60
);

View File

@ -1,4 +1,96 @@
dir
===
# Friendica Global Directory
Friendica Global Directory
Example cronjob.
```
*/30 * * * * www-data cd /var/www/friendica-directory; php include/cron_maintain.php
*/5 * * * * www-data cd /var/www/friendica-directory; php include/cron_sync.php
```
## How syncing works
The new syncing features include: pushing and pulling.
### Pushing
Submissions you receive can be submitted to other directories using a push target.
You do this by creating an entry in the sync-targets table with the push bit set to `1`.
Also, you must enable pushing in your `.htconfig` settings.
The next time `include/cron_sync.php` is run from your cronjob, the queued items will be submitted to your push targets.
### Pulling
For pulling to work, the target server must enable pulling.
This makes the `/sync/pull/all` and `/sync/pull/since/[when]` methods work on that server.
Next you can add an entry in the sync-targets table with the pull bit set to `1`.
Also, you must enable pulling in your `.htconfig` settings.
The next time `include/cron_sync.php` is run from your cronjob, the pulling sources will be checked.
New items will be queued in your pull queue.
The queue will be gradually cleared based on your `syncing.max_pull_items` settings.
You can check the backlog of this queue at the `/admin` page.
## How submissions are processed
1. The /submit endpoint takes a `?url=` parameter.
This parameter is an encoded URL, the original ASCII is treated as binary and base16 encoded.
This URL should be a profile location, such as `https://fc.oscp.info/profile/admin`.
This URL will be checked in the database for existing accounts.
This check includes a normalization, http vs https is ignored as well as www. prefixes.
2. If noscrape is supported by the site, this will be used instead of a scrape request.
In this case `https://fc.oscp.info/noscrape/admin`.
If noscrape fails or is not supported, the url provided (as is) will be scraped for meta information.
* `<meta name="dfrn-global-visibility" content="true" />`
* `<meta name="friendica.community" content="true" />`
or `<meta name="friendika.community" content="true" />`
* `<meta name="keywords" content="these,are,your,public,tags" />`
* `<link rel="dfrn-*" href="https://fc.oscp.info/*" />`
any dfrn-* prefixed link and it's href attribute.
* `.vcard .fn` as `fn`
* `.vcard .title` as `pdesc`
* `.vcard .photo` as `photo`
* `.vcard .key` as `key`
* `.vcard .locality` as `locality`
* `.vcard .region` as `region`
* `.vcard .postal-code` as `postal-code`
* `.vcard .country-name` as `country-name`
* `.vcard .x-gender` as `gender`
* `.marital-text` as `marital`
3. If the `dfrn-global-visibility` value is set to false. Any existing records will be deleted.
And the process exits here.
4. A submission is IGNORED when at least the following data could not be scraped.
* `key` the public key from the hCard.
* `dfrn-request` required for the DFRN protocol.
* `dfrn-confirm` required for the DFRN protocol.
* `dfrn-notify` required for the DFRN protocol.
* `dfrn-poll` required for the DFRN protocol.
5. If the profile existed in the database and the profile is not explicitly set to
public using the `dfrn-global-visibility` meta tag. It will be deleted.
6. If the profile existed in the database and the profile lacks either an `fn` or `photo`
attribute, it will be deleted.
7. The profile is now inserted/updated based on the found information.
Notable database fields are:
* `homepage` the originally (decoded) `?url=` parameter.
* `nurl` the normalized URL created to remove http vs https and www vs non-www urls.
* `created` the creation date and time in UTC (now if the entry did not exist yet).
* `updated` the current date and time in UTC.
8. If an insert has occurred, the URL will now be used to check for duplicates.
The highest insert ID will be kept, anything else deleted.
9. If provided, your public tags are now split by ` ` (space character) and stored in the tags table.
This uses your normalized URL as unique key for your profile.
10. The `photo` provided will be downloaded and resized to 80x80, regardless of source size.
11. Should there somehow have been an error at this point such as that there is no profile ID known.
Everything will get deleted based on the original `?url=` parameter.

View File

@ -227,11 +227,12 @@ function t($s) {
if(! function_exists('fetch_url')) {
function fetch_url($url,$binary = false) {
function fetch_url($url,$binary = false, $timeout=20) {
$ch = curl_init($url);
if(! $ch) return false;
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, max(intval($timeout), 1)); //Minimum of 1 second timeout.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,true);
curl_setopt($ch, CURLOPT_MAXREDIRS,8);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);

View File

@ -144,3 +144,84 @@ CREATE TABLE IF NOT EXISTS `user` (
`password` char(255) NOT NULL,
PRIMARY KEY (`uid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;
-- --------------------------------------------------------
--
-- Table structure for table `site-health`
--
CREATE TABLE IF NOT EXISTS `site-health` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`base_url` varchar(255) NOT NULL,
`health_score` int(11) NOT NULL DEFAULT 0,
`no_scrape_url` varchar(255) NULL DEFAULT NULL,
`dt_first_noticed` datetime NOT NULL,
`dt_last_seen` datetime NULL DEFAULT NULL,
`dt_last_probed` datetime NULL DEFAULT NULL,
`dt_last_heartbeat` datetime NULL DEFAULT NULL,
`name` varchar(255) NULL DEFAULT NULL,
`version` varchar(255) NULL DEFAULT NULL,
`plugins` text NULL DEFAULT NULL,
`reg_policy` char(32) NULL DEFAULT NULL,
`info` text NULL DEFAULT NULL,
`admin_name` varchar(255) NULL DEFAULT NULL,
`admin_profile` varchar(255) NULL DEFAULT NULL,
`ssl_state` bit(1) NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `base_url` (`base_url`),
KEY `health_score` (`health_score`),
KEY `dt_last_seen` (`dt_last_seen`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;
CREATE TABLE IF NOT EXISTS `site-probe` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`site_health_id` int(10) unsigned NOT NULL,
`dt_performed` datetime NOT NULL,
`request_time` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `site_health_id` (`site_health_id`),
KEY `dt_performed` (`dt_performed`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;
CREATE TABLE IF NOT EXISTS `site-scrape` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`site_health_id` int(10) unsigned NOT NULL,
`dt_performed` datetime NOT NULL,
`request_time` int(10) unsigned NOT NULL,
`scrape_time` int(10) unsigned NOT NULL,
`photo_time` int(10) unsigned NOT NULL,
`total_time` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `site_health_id` (`site_health_id`),
KEY `dt_performed` (`dt_performed`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;
CREATE TABLE IF NOT EXISTS `sync-targets` (
`base_url` varchar(255) NOT NULL,
`pull` bit(1) NOT NULL DEFAULT b'0',
`push` bit(1) NOT NULL DEFAULT b'1',
`dt_last_pull` bigint unsigned NULL DEFAULT NULL,
PRIMARY KEY (`base_url`),
KEY `push` (`push`),
KEY `pull` (`pull`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;
CREATE TABLE IF NOT EXISTS `sync-push-queue` (
`url` varchar(255) NOT NULL,
PRIMARY KEY (`url`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
CREATE TABLE IF NOT EXISTS `sync-pull-queue` (
`url` varchar(255) NOT NULL,
PRIMARY KEY (`url`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
CREATE TABLE IF NOT EXISTS `sync-timestamps` (
`url` varchar(255) NOT NULL,
`modified` datetime NOT NULL,
PRIMARY KEY (`url`),
KEY `modified` (`modified`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;

View File

@ -11,86 +11,161 @@ function attribute_contains($attr,$s) {
}}
if(! function_exists('noscrape_dfrn')) {
function noscrape_dfrn($url) {
$submit_noscrape_start = microtime(true);
$data = fetch_url($url);
$submit_noscrape_request_end = microtime(true);
if(empty($data)) return false;
$parms = json_decode($data, true);
if(!$parms || !count($parms)) return false;
$parms['tags'] = implode(' ', (array)$parms['tags']);
$submit_noscrape_end = microtime(true);
$parms['_timings'] = array(
'fetch' => round(($submit_noscrape_request_end - $submit_noscrape_start) * 1000),
'scrape' => round(($submit_noscrape_end - $submit_noscrape_request_end) * 1000)
);
return $parms;
}}
if(! function_exists('scrape_dfrn')) {
function scrape_dfrn($url) {
function scrape_dfrn($url, $max_nodes=3500) {
$minNodes = 100; //Lets do at least 100 nodes per type.
$timeout = 10; //Timeout will affect batch processing.
//Try and cheat our way into faster profiles.
if(strpos($url, 'tab=profile') === false){
$url .= (strpos($url, '?') > 0 ? '&' : '?').'tab=profile';
}
$scrape_start = microtime(true);
$ret = array();
$s = fetch_url($url);
$s = fetch_url($url, $timeout);
$scrape_fetch_end = microtime(true);
if(! $s)
return $ret;
$dom = HTML5_Parser::parse($s);
if(! $dom)
return $ret;
$items = $dom->getElementsByTagName('meta');
// get DFRN link elements
$nodes_left = max(intval($max_nodes), $minNodes);
$targets = array('hide', 'comm', 'tags');
$targets_left = count($targets);
foreach($items as $item) {
$x = $item->getAttribute('name');
if($x == 'dfrn-global-visibility') {
$z = strtolower(trim($item->getAttribute('content')));
if($z != 'true')
$ret['hide'] = 1;
if($z === 'false')
$ret['explicit-hide'] = 1;
$targets_left = pop_scrape_target($targets, 'hide');
}
if($x == 'friendika.community' || $x == 'friendica.community') {
$z = strtolower(trim($item->getAttribute('content')));
if($z == 'true')
$ret['comm'] = 1;
$targets_left = pop_scrape_target($targets, 'comm');
}
if($x == 'keywords') {
$z = str_replace(',',' ',strtolower(trim($item->getAttribute('content'))));
if(strlen($z))
$ret['tags'] = $z;
$targets_left = pop_scrape_target($targets, 'tags');
}
$nodes_left--;
if($nodes_left <= 0 || $targets_left <= 0) break;
}
$items = $dom->getElementsByTagName('link');
// get DFRN link elements
$nodes_left = max(intval($max_nodes), $minNodes);
foreach($items as $item) {
$x = $item->getAttribute('rel');
if(substr($x,0,5) == "dfrn-")
$ret[$x] = $item->getAttribute('href');
$nodes_left--;
if($nodes_left <= 0) break;
}
// Pull out hCard profile elements
$nodes_left = max(intval($max_nodes), $minNodes);
$items = $dom->getElementsByTagName('*');
$targets = array('fn', 'pdesc', 'photo', 'key', 'locality', 'region', 'postal-code', 'country-name', 'gender', 'marital');
$targets_left = count($targets);
foreach($items as $item) {
if(attribute_contains($item->getAttribute('class'), 'vcard')) {
$level2 = $item->getElementsByTagName('*');
foreach($level2 as $x) {
if(attribute_contains($x->getAttribute('class'),'fn'))
if(attribute_contains($x->getAttribute('class'),'fn')){
$ret['fn'] = $x->textContent;
if(attribute_contains($x->getAttribute('class'),'title'))
$targets_left = pop_scrape_target($targets, 'fn');
}
if(attribute_contains($x->getAttribute('class'),'title')){
$ret['pdesc'] = $x->textContent;
if(attribute_contains($x->getAttribute('class'),'photo'))
$targets_left = pop_scrape_target($targets, 'pdesc');
}
if(attribute_contains($x->getAttribute('class'),'photo')){
$ret['photo'] = $x->getAttribute('src');
if(attribute_contains($x->getAttribute('class'),'key'))
$targets_left = pop_scrape_target($targets, 'photo');
}
if(attribute_contains($x->getAttribute('class'),'key')){
$ret['key'] = $x->textContent;
if(attribute_contains($x->getAttribute('class'),'locality'))
$targets_left = pop_scrape_target($targets, 'key');
}
if(attribute_contains($x->getAttribute('class'),'locality')){
$ret['locality'] = $x->textContent;
if(attribute_contains($x->getAttribute('class'),'region'))
$targets_left = pop_scrape_target($targets, 'locality');
}
if(attribute_contains($x->getAttribute('class'),'region')){
$ret['region'] = $x->textContent;
if(attribute_contains($x->getAttribute('class'),'postal-code'))
$targets_left = pop_scrape_target($targets, 'region');
}
if(attribute_contains($x->getAttribute('class'),'postal-code')){
$ret['postal-code'] = $x->textContent;
if(attribute_contains($x->getAttribute('class'),'country-name'))
$targets_left = pop_scrape_target($targets, 'postal-code');
}
if(attribute_contains($x->getAttribute('class'),'country-name')){
$ret['country-name'] = $x->textContent;
if(attribute_contains($x->getAttribute('class'),'x-gender'))
$targets_left = pop_scrape_target($targets, 'country-name');
}
if(attribute_contains($x->getAttribute('class'),'x-gender')){
$ret['gender'] = $x->textContent;
}
$targets_left = pop_scrape_target($targets, 'gender');
}
}
}
if(attribute_contains($item->getAttribute('class'),'marital-text'))
if(attribute_contains($item->getAttribute('class'),'marital-text')){
$ret['marital'] = $item->textContent;
$targets_left = pop_scrape_target($targets, 'marital');
}
$nodes_left--;
if($nodes_left <= 0 || $targets_left <= 0) break;
}
$scrape_end = microtime(true);
$fetch_time = round(($scrape_fetch_end - $scrape_start) * 1000);
$scrape_time = round(($scrape_end - $scrape_fetch_end) * 1000);
$ret['_timings'] = array(
'fetch' => $fetch_time,
'scrape' => $scrape_time
);
return $ret;
}}
@ -110,3 +185,10 @@ function validate_dfrn($a) {
return $errors;
}}
if(! function_exists('pop_scrape_target')) {
function pop_scrape_target(&$array, $name) {
$at = array_search($name, $array);
unset($array[$at]);
return count($array);
}}

114
include/cron_maintain.php Normal file
View File

@ -0,0 +1,114 @@
<?php
// Debug stuff.
// ini_set('display_errors', 1);
// ini_set('log_errors','0');
error_reporting(E_ALL^E_NOTICE);
$start_maintain = time();
$verbose = $argv[1] === 'verbose';
//Startup.
require_once('boot.php');
$a = new App;
//Config and DB.
require_once(".htconfig.php");
require_once("dba.php");
$db = new dba($db_host, $db_user, $db_pass, $db_data, $install);
//Get our set of items. Youngest items first, after the threshold.
//This may be counter-intuitive, but is to prevent items that fail to update from blocking the rest.
$res = q(
"SELECT `id`, `homepage`, `censored` FROM `profile` WHERE `updated` < '%s' ORDER BY `updated` DESC LIMIT %u",
dbesc(date('Y-m-d H:i:s', time()-$a->config['maintenance']['min_scrape_delay'])),
intval($a->config['maintenance']['max_scrapes'])
);
//Nothing to do.
if(!$res || !count($res)){
exit;
}
//Close DB here. Threads need their private connection.
$db->getdb()->close();
//We need the scraper.
require_once('include/submit.php');
//POSIX threads only.
if(!function_exists('pcntl_fork')){
logger('Error: no pcntl_fork support. Are you running a different OS? Report an issue please.');
die('Error: no pcntl_fork support. Are you running a different OS? Report an issue please.');
}
//Create the threads we need.
$items = count($res);
$threadc = min($a->config['maintenance']['threads'], $items); //Don't need more threads than items.
$threads = array();
//Debug...
if($verbose) echo("Creating $threadc maintainer threads for $items profiles.".PHP_EOL);
logger("Creating $threadc maintainer threads for $items profiles.");
for($i = 0; $i < $threadc; $i++){
$pid = pcntl_fork();
if($pid === -1){
if($verbose) echo('Error: something went wrong with the fork. '.pcntl_strerror());
logger('Error: something went wrong with the fork. '.pcntl_strerror());
die('Error: something went wrong with the fork. '.pcntl_strerror());
}
//You're a child, go do some labor!
if($pid === 0) break;
//Store the list of PID's.
if($pid > 0) $threads[] = $pid;
}
//The work for child processes.
if($pid === 0){
//Lets be nice, we're only doing maintenance here...
pcntl_setpriority(5);
//Get personal DBA's.
$db = new dba($db_host, $db_user, $db_pass, $db_data, $install);
//Get our (round-robin) workload from the DB results.
$myIndex = $i+1;
$workload = array();
while(isset($res[$i])){
$entry = $res[$i];
$workload[] = $entry;
$ids[] = $entry['id'];
$i+=$threadc;
}
while(count($workload)){
$entry = array_pop($workload);
set_time_limit(20); //This should work for 1 submit.
if($verbose) echo "Submitting ".$entry['homepage'].PHP_EOL;
run_submit($entry['homepage']);
}
exit;
}
//The main process.
else{
foreach($threads as $pid){
pcntl_waitpid($pid, $status);
if($status !== 0){
if($verbose) echo "Bad process return value $pid:$status".PHP_EOL;
logger("Bad process return value $pid:$status");
}
}
$time = time() - $start_maintain;
if($verbose) echo("Maintenance completed. Took $time seconds.".PHP_EOL);
logger("Maintenance completed. Took $time seconds.");
}

93
include/cron_sync.php Normal file
View File

@ -0,0 +1,93 @@
<?php
/*
#TODO:
* First do the pulls then the pushes.
If pull prevents the push, the push queue just creates a backlog until it gets a chance to push.
* When doing a first-pull, there's a safety mechanism for the timeout and detecting duplicate attempts.
1. Perform all JSON pulls on the source servers.
2. Combine the results into one giant pool of URLs.
3. Write this pool to a file (TODO-file).
4. Shuffle the pool in RAM.
5. Start threads for crawling.
6. Every finished crawl attempt (successful or not) should write to a 2nd file (DONE-file).
IF the first-pull times out, don't do anything else.
Otherwise, mark the dates we last performed a pull from each server.
* When resuming a first-pull.
1. Check for the TODO-file and the DONE-file.
2. Remove the entries in the DONE-file from the pool in the TODO-file.
3. Replace the TODO-file with the updated pool.
4. Perform steps 4, 5 and 6 (shuffle, create threads and crawl) from before.
This way you can resume without repeating attempts.
* Write documentation about syncing.
* Create "official" directory policy for my directory.
* Decide if a retry mechanism is desirable for pulling (for the failed attempts).
After all, you did imply trust when you indicated to pull from that source...
This could be done easily by doing a /sync/pull/all again from those sources.
* Decide if cron_sync.php should be split into push pull and pull-all commands.
*/
// Debug stuff.
ini_set('display_errors', 1);
ini_set('log_errors','0');
error_reporting(E_ALL^E_NOTICE);
$start_syncing = time();
//Startup.
require_once('boot.php');
$a = new App;
//Create a simple log function for CLI use.
global $verbose;
$verbose = $argv[1] === 'verbose';
function msg($message, $fatal=false){
global $verbose;
if($verbose || $fatal) echo($message.PHP_EOL);
logger($message);
if($fatal) exit(1);
};
//Config.
require_once(".htconfig.php");
//Connect the DB.
require_once("dba.php");
$db = new dba($db_host, $db_user, $db_pass, $db_data, $install);
//Import syncing functions.
require_once('sync.php');
//Get work for pulling.
$pull_batch = get_pulling_job($a);
//Get work for pushing.
list($push_targets, $push_batch) = get_pushing_job($a);
//Close the connection for now. Process forking and DB connections are not the best of friends.
$db->getdb()->close();
if(count($pull_batch))
run_pulling_job($a, $pull_batch, $db_host, $db_user, $db_pass, $db_data, $install);
//Do our multi-fork job, if we have a batch and targets.
if(count($push_targets) && count($push_batch))
run_pushing_job($push_targets, $push_batch, $db_host, $db_user, $db_pass, $db_data, $install);
//Log the time it took.
$time = time() - $start_syncing;
msg("Syncing completed. Took $time seconds.");

15
include/g.line-min.js vendored Normal file
View File

@ -0,0 +1,15 @@
/*!
* g.Raphael 0.51 - Charting library, based on Raphaël
*
* Copyright (c) 2009-2012 Dmitry Baranovskiy (http://g.raphaeljs.com)
* Licensed under the MIT (http://www.opensource.org/licenses/mit-license.php) license.
*/
(function(){function S(h,o){for(var p=h.length/o,m=0,k=p,b=0,i=[];m<h.length;)k--,0>k?(b+=h[m]*(1+k),i.push(b/p),b=h[m++]*-k,k+=p):b+=1*h[m++];return i}function E(h,o,p,m,k,b,i,c){var F,f,u,w;function J(a){for(var s=[],e=0,G=b.length;e<G;e++)s=s.concat(b[e]);s.sort(function(a,e){return a-e});for(var c=[],g=[],e=0,G=s.length;e<G;e++)s[e]!=s[e-1]&&c.push(s[e])&&g.push(o+d+(s[e]-v)*A);for(var s=c,G=s.length,l=a||h.set(),e=0;e<G;e++){var c=g[e]-(g[e]-(g[e-1]||o))/2,f=((g[e+1]||o+m)-g[e])/2+(g[e]-(g[e-
1]||o))/2,j;a?j={}:l.push(j=h.rect(c-1,p,Math.max(f+1,1),k).attr({stroke:"none",fill:"#000",opacity:0}));j.values=[];j.symbols=h.set();j.y=[];j.x=g[e];j.axis=s[e];for(var f=0,r=i.length;f<r;f++)for(var c=b[f]||b[0],n=0,u=c.length;n<u;n++)c[n]==s[e]&&(j.values.push(i[f][n]),j.y.push(p+k-d-(i[f][n]-y)*H),j.symbols.push(q.symbols[f][n]));a&&a.call(j)}!a&&(t=l)}function N(a){for(var g=a||h.set(),e,c=0,j=i.length;c<j;c++)for(var f=0,m=i[c].length;f<m;f++){var l=o+d+((b[c]||b[0])[f]-v)*A,n=o+d+((b[c]||
b[0])[f?f-1:1]-v)*A,r=p+k-d-(i[c][f]-y)*H;a?e={}:g.push(e=h.circle(l,r,Math.abs(n-l)/2).attr({stroke:"#000",fill:"#000",opacity:1}));e.x=l;e.y=r;e.value=i[c][f];e.line=q.lines[c];e.shade=q.shades[c];e.symbol=q.symbols[c][f];e.symbols=q.symbols[c];e.axis=(b[c]||b[0])[f];a&&a.call(e)}!a&&(C=g)}c=c||{};h.raphael.is(b[0],"array")||(b=[b]);h.raphael.is(i[0],"array")||(i=[i]);for(var d=c.gutter||10,l=Math.max(b[0].length,i[0].length),O=c.symbol||"",P=c.colors||this.colors,t=null,C=null,q=h.set(),g=[],a=
0,n=i.length;a<n;a++)l=Math.max(l,i[a].length);for(var K=h.set(),a=0,n=i.length;a<n;a++)c.shade&&K.push(h.path().attr({stroke:"none",fill:P[a],opacity:c.nostroke?1:0.3})),i[a].length>m-2*d&&(i[a]=S(i[a],m-2*d),l=m-2*d),b[a]&&b[a].length>m-2*d&&(b[a]=S(b[a],m-2*d));var g=Array.prototype.concat.apply([],b),l=Array.prototype.concat.apply([],i),g=this.snapEnds(Math.min.apply(Math,g),Math.max.apply(Math,g),b[0].length-1),v=g.from,g=g.to,l=this.snapEnds(Math.min.apply(Math,l),Math.max.apply(Math,l),i[0].length-
1),y=l.from,a=l.to,A=(m-2*d)/(g-v||1),H=(k-2*d)/(a-y||1),l=h.set();c.axis&&(n=(c.axis+"").split(/[,\s]+/),+n[0]&&l.push(this.axis(o+d,p+d,m-2*d,v,g,c.axisxstep||Math.floor((m-2*d)/20),2,h)),+n[1]&&l.push(this.axis(o+m-d,p+k-d,k-2*d,y,a,c.axisystep||Math.floor((k-2*d)/20),3,h)),+n[2]&&l.push(this.axis(o+d,p+k-d,m-2*d,v,g,c.axisxstep||Math.floor((m-2*d)/20),0,h)),+n[3]&&l.push(this.axis(o+d,p+k-d,k-2*d,y,a,c.axisystep||Math.floor((k-2*d)/20),1,h)));for(var Q=h.set(),R=h.set(),E,a=0,n=i.length;a<n;a++){c.nostroke||
Q.push(E=h.path().attr({stroke:P[a],"stroke-width":c.width||2,"stroke-linejoin":"round","stroke-linecap":"round","stroke-dasharray":c.dash||""}));for(var D=Raphael.is(O,"array")?O[a]:O,I=h.set(),g=[],j=0,T=i[a].length;j<T;j++){var x=o+d+((b[a]||b[0])[j]-v)*A,z=p+k-d-(i[a][j]-y)*H;(Raphael.is(D,"array")?D[j]:D)&&I.push(h[Raphael.is(D,"array")?D[j]:D](x,z,3*(c.width||2)).attr({fill:P[a],stroke:"none"}));if(c.smooth){if(j&&j!=T-1){f=o+d+((b[a]||b[0])[j-1]-v)*A;var L=p+k-d-(i[a][j-1]-y)*H;u=x;w=z;var r=
o+d+((b[a]||b[0])[j+1]-v)*A,B=p+k-d-(i[a][j+1]-y)*H,M=(u-f)/2;F=(r-u)/2;f=Math.atan((u-f)/Math.abs(w-L));r=Math.atan((r-u)/Math.abs(w-B));f=L<w?Math.PI-f:f;r=B<w?Math.PI-r:r;B=Math.PI/2-(f+r)%(2*Math.PI)/2;L=M*Math.sin(B+f);f=M*Math.cos(B+f);M=F*Math.sin(B+r);r=F*Math.cos(B+r);F=u-L;f=w+f;u+=M;w+=r;g=g.concat([F,f,x,z,u,w])}j||(g=["M",x,z,"C",x,z])}else g=g.concat([j?"L":"M",x,z])}c.smooth&&(g=g.concat([x,z,x,z]));R.push(I);c.shade&&K[a].attr({path:g.concat(["L",x,p+k-d,"L",o+d+((b[a]||b[0])[0]-v)*
A,p+k-d,"z"]).join(",")});!c.nostroke&&E.attr({path:g.join(",")})}q.push(Q,K,R,l,t,C);q.lines=Q;q.shades=K;q.symbols=R;q.axis=l;q.hoverColumn=function(a,c){!t&&J();t.mouseover(a).mouseout(c);return this};q.clickColumn=function(a){!t&&J();t.click(a);return this};q.hrefColumn=function(a){var c=h.raphael.is(arguments[0],"array")?arguments[0]:arguments;if(!(arguments.length-1)&&typeof a=="object")for(var e in a)for(var b=0,d=t.length;b<d;b++)t[b].axis==e&&t[b].attr("href",a[e]);!t&&J();b=0;for(d=c.length;b<
d;b++)t[b]&&t[b].attr("href",c[b]);return this};q.hover=function(a,b){!C&&N();C.mouseover(a).mouseout(b);return this};q.click=function(a){!C&&N();C.click(a);return this};q.each=function(a){N(a);return this};q.eachColumn=function(a){J(a);return this};return q}var I=function(){};I.prototype=Raphael.g;E.prototype=new I;Raphael.fn.linechart=function(h,o,p,m,k,b,i){return new E(this,h,o,p,m,k,b,i)}})();

7
include/g.raphael.js Normal file

File diff suppressed because one or more lines are too long

11
include/raphael.js Normal file

File diff suppressed because one or more lines are too long

342
include/site-health.php Normal file
View File

@ -0,0 +1,342 @@
<?php
/*
Based on a submitted URL, take note of the site it mentions.
Ensures that the site health will be tracked if it wasn't already.
If $check_health is set to true, this function may trigger some health checks (CURL requests) when needed.
Do not enable it unless you have enough execution time to do so.
But when you do, it's better to check for health whenever a site submits something.
After all, the highest chance for the server to be online is when it submits activity.
*/
if(! function_exists('notice_site')){
function notice_site($url, $check_health=false)
{
global $a;
//Parse the domain from the URL.
$site = parse_site_from_url($url);
//Search for it in the site-health table.
$result = q(
"SELECT * FROM `site-health` WHERE `base_url`= '%s' ORDER BY `id` ASC LIMIT 1",
dbesc($site)
);
//If it exists, see if we need to update any flags / statuses.
if(!empty($result) && isset($result[0])){
$entry = $result[0];
//If we are allowed to do health checks...
if(!!$check_health){
//And the site is in bad health currently, do a check now.
//This is because you have a high certainty the site may perform better now.
if($entry['health_score'] < -40){
run_site_probe($entry['id'], $entry);
}
//Or if the site has not been probed for longer than the minimum delay.
//This is to make sure not everything is postponed to the batches.
elseif(strtotime($entry['dt_last_probed']) < time()-$a->config['site-health']['min_probe_delay']){
run_site_probe($entry['id'], $entry);
}
}
}
//If it does not exist.
else{
//Add it and make sure it is ready for probing.
q(
"INSERT INTO `site-health` (`base_url`, `dt_first_noticed`) VALUES ('%s', NOW())",
dbesc($site)
);
//And in case we should probe now, do so.
if(!!$check_health){
$result = q(
"SELECT * FROM `site-health` WHERE `base_url`= '%s' ORDER BY `id` ASC LIMIT 1",
dbesc($site)
);
if(!empty($result) && isset($result[0])){
$entry = $result[0];
run_site_probe($result[0]['id'], $entry);
}
}
}
//Give other scripts the site health.
return isset($entry) ? $entry : false;
}}
//Extracts the site from a given URL.
if(! function_exists('parse_site_from_url')){
function parse_site_from_url($url)
{
//Currently a simple implementation, but may improve over time.
#TODO: support subdirectories?
$urlMeta = parse_url($url);
return $urlMeta['scheme'].'://'.$urlMeta['host'];
}}
//Performs a ping to the given site ID
//You may need to notice the site first before you know it's ID.
if(! function_exists('run_site_ping')){
function run_site_probe($id, &$entry_out)
{
global $a;
//Get the site information from the DB, based on the ID.
$result = q(
"SELECT * FROM `site-health` WHERE `id`= %u ORDER BY `id` ASC LIMIT 1",
intval($id)
);
//Abort the probe if site is not known.
if(!$result || !isset($result[0])){
logger('Unknown site-health ID being probed: '.$id);
throw new \Exception('Unknown site-health ID being probed: '.$id);
}
//Shortcut.
$entry = $result[0];
$base_url = $entry['base_url'];
$probe_location = $base_url.'/friendica/json';
//Prepare the CURL call.
$handle = curl_init();
$options = array(
//Timeouts
CURLOPT_TIMEOUT => max($a->config['site-health']['probe_timeout'], 1), //Minimum of 1 second timeout.
CURLOPT_CONNECTTIMEOUT => 1,
//Redirecting
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_MAXREDIRS => 8,
//SSL
CURLOPT_SSL_VERIFYPEER => true,
// CURLOPT_VERBOSE => true,
// CURLOPT_CERTINFO => true,
CURLOPT_SSL_VERIFYHOST => 2,
CURLOPT_PROTOCOLS => CURLPROTO_HTTP | CURLPROTO_HTTPS,
//Basic request
CURLOPT_USERAGENT => 'friendica-directory-probe-0.1',
CURLOPT_RETURNTRANSFER => true,
CURLOPT_URL => $probe_location
);
curl_setopt_array($handle, $options);
//Probe the site.
$probe_start = microtime(true);
$probe_data = curl_exec($handle);
$probe_end = microtime(true);
//Check for SSL problems.
$curl_statuscode = curl_errno($handle);
$sslcert_issues = in_array($curl_statuscode, array(
60, //Could not authenticate certificate with known CA's
83 //Issuer check failed
));
//When it's the certificate that doesn't work.
if($sslcert_issues){
//Probe again, without strict SSL.
$options[CURLOPT_SSL_VERIFYPEER] = false;
//Replace the handler.
curl_close($handle);
$handle = curl_init();
curl_setopt_array($handle, $options);
//Probe.
$probe_start = microtime(true);
$probe_data = curl_exec($handle);
$probe_end = microtime(true);
//Store new status.
$curl_statuscode = curl_errno($handle);
}
//Gather more meta.
$time = round(($probe_end - $probe_start) * 1000);
$status = curl_getinfo($handle, CURLINFO_HTTP_CODE);
$type = curl_getinfo($handle, CURLINFO_CONTENT_TYPE);
$effective_url = curl_getinfo($handle, CURLINFO_EFFECTIVE_URL);
//Done with CURL now.
curl_close($handle);
#TODO: if the site redirects elsewhere, notice this site and record an issue.
$wrong_base_url = parse_site_from_url($effective_url) !== $entry['base_url'];
try{
$data = json_decode($probe_data);
}catch(\Exception $ex){
$data = false;
}
$parse_failed = !$data;
$parsedDataQuery = '';
if(!$parse_failed){
$given_base_url_match = $data->url == $base_url;
//Record the probe speed in a probes table.
q(
"INSERT INTO `site-probe` (`site_health_id`, `dt_performed`, `request_time`)".
"VALUES (%u, NOW(), %u)",
$entry['id'],
$time
);
//Update any health calculations or otherwise processed data.
$parsedDataQuery = sprintf(
"`dt_last_seen` = NOW(),
`name` = '%s',
`version` = '%s',
`plugins` = '%s',
`reg_policy` = '%s',
`info` = '%s',
`admin_name` = '%s',
`admin_profile` = '%s',
",
dbesc($data->site_name),
dbesc($data->version),
dbesc(implode("\r\n", $data->plugins)),
dbesc($data->register_policy),
dbesc($data->info),
dbesc($data->admin->name),
dbesc($data->admin->profile)
);
//Did we use HTTPS?
$urlMeta = parse_url($probe_location);
if($urlMeta['scheme'] == 'https'){
$parsedDataQuery .= sprintf("`ssl_state` = b'%u',", $sslcert_issues ? '0' : '1');
} else {
$parsedDataQuery .= "`ssl_state` = NULL,";
}
//Do we have a no scrape supporting node? :D
if(isset($data->no_scrape_url)){
$parsedDataQuery .= sprintf("`no_scrape_url` = '%s',", dbesc($data->no_scrape_url));
}
}
//Get the new health.
$version = $parse_failed ? '' : $data->version;
$health = health_score_after_probe($entry['health_score'], !$parse_failed, $time, $version, $sslcert_issues);
//Update the health.
q("UPDATE `site-health` SET
`health_score` = '%d',
$parsedDataQuery
`dt_last_probed` = NOW()
WHERE `id` = %d LIMIT 1",
$health,
$entry['id']
);
//Get the site information from the DB, based on the ID.
$result = q(
"SELECT * FROM `site-health` WHERE `id`= %u ORDER BY `id` ASC LIMIT 1",
$entry['id']
);
//Return updated entry data.
if($result && isset($result[0])){
$entry_out = $result[0];
}
}}
//Determines the new health score after a probe has been executed.
if(! function_exists('health_score_after_probe')){
function health_score_after_probe($current, $probe_success, $time=null, $version=null, $ssl_issues=null)
{
//Probe failed, costs you 30 points.
if(!$probe_success) return max($current-30, -100);
//A good probe gives you 20 points.
$current += 20;
//Speed scoring.
if(intval($time) > 0){
//Pentaly / bonus points.
if ($time > 800) $current -= 10; //Bad speed.
elseif ($time > 400) $current -= 5; //Still not good.
elseif ($time > 250) $current += 0; //This is normal.
elseif ($time > 120) $current += 5; //Good speed.
else $current += 10; //Excellent speed.
//Cap for bad speeds.
if ($time > 800) $current = min(40, $current);
elseif ($time > 400) $current = min(60, $current);
}
//Version check.
if(!empty($version)){
$versionParts = explode('.', $version);
//Older than 3.x.x?
//Your score can not go above 30 health.
if(intval($versionParts[0]) < 3){
$current = min($current, 30);
}
//Older than 3.2.x?
elseif(intval($versionParts[1] < 2)){
$current -= 5; //Somewhat outdated.
}
#TODO: See if this needs to be more dynamic.
#TODO: See if this is a proper indicator of health.
}
//SSL problems? That's a big deal.
if($ssl_issues === true){
$current -= 10;
}
//Don't go beyond +100 or -100.
return max(min(100, $current), -100);
}}
//Changes a score into a name. Used for classes and such.
if(! function_exists('health_score_to_name')){
function health_score_to_name($score)
{
if ($score < -50) return 'very-bad';
elseif ($score < 0) return 'bad';
elseif ($score < 30) return 'neutral';
elseif ($score < 50) return 'ok';
elseif ($score < 80) return 'good';
else return 'perfect';
}}

272
include/submit.php Normal file
View File

@ -0,0 +1,272 @@
<?php
require_once('datetime.php');
require_once('site-health.php');
function run_submit($url) {
global $a;
if(! strlen($url))
return false;
logger('Updating: ' . $url);
//First run a notice script for the site it is hosted on.
$site_health = notice_site($url, true);
$submit_start = microtime(true);
$nurl = str_replace(array('https:','//www.'), array('http:','//'), $url);
$profile_exists = false;
$r = q("SELECT * FROM `profile` WHERE ( `homepage` = '%s' OR `nurl` = '%s' )",
dbesc($url),
dbesc($nurl)
);
if(count($r)) {
$profile_exists = true;
$profile_id = $r[0]['id'];
}
//Remove duplicates.
if(count($r) > 1){
for($i=1; $i<count($r); $i++){
logger('Removed duplicate profile '.intval($r[$i]['id']));
q("DELETE FROM `photo` WHERE `profile-id` = %d LIMIT 1",
intval($r[$i]['id'])
);
q("DELETE FROM `profile` WHERE `id` = %d LIMIT 1",
intval($r[$i]['id'])
);
}
}
require_once('Scrape.php');
//Skip the scrape? :D
$noscrape = $site_health && $site_health['no_scrape_url'];
if($noscrape){
//Find out who to look up.
$which = str_replace($site_health['base_url'], '', $url);
$noscrape = preg_match('~/profile/([^/]+)~', $which, $matches) === 1;
//If that did not fail...
if($noscrape){
$parms = noscrape_dfrn($site_health['no_scrape_url'].'/'.$matches[1]);
$noscrape = !!$parms; //If the result was false, do a scrape after all.
}
}
if(!$noscrape){
$parms = scrape_dfrn($url);
}
//Empty result is due to an offline site.
if(!count($parms)){
//For large sites this could lower the health too quickly, so don't track health.
//But for sites that are already in bad status. Do a cleanup now.
if($profile_exists && $site_health['health_score'] < $a->config['maintenance']['remove_profile_health_threshold']){
logger('Nuked bad health record.');
nuke_record($url);
}
return false;
}
//We don't care about valid dfrn if the user indicates to be hidden.
elseif($parms['explicit-hide'] && $profile_exists) {
logger('User opted out of the directory.');
nuke_record($url);
return true; //This is a good update.
}
//This is most likely a problem with the site configuration. Ignore.
elseif(validate_dfrn($parms)) {
return false;
}
if((x($parms,'hide')) || (! (x($parms,'fn')) && (x($parms,'photo')))) {
if($profile_exists) {
nuke_record($url);
}
return true; //This is a good update.
}
$photo = $parms['photo'];
dbesc_array($parms);
if(x($parms,'comm'))
$parms['comm'] = intval($parms['comm']);
if($profile_exists) {
$r = q("UPDATE `profile` SET
`name` = '%s',
`pdesc` = '%s',
`locality` = '%s',
`region` = '%s',
`postal-code` = '%s',
`country-name` = '%s',
`gender` = '%s',
`marital` = '%s',
`homepage` = '%s',
`nurl` = '%s',
`comm` = %d,
`tags` = '%s',
`updated` = '%s'
WHERE `id` = %d LIMIT 1",
$parms['fn'],
$parms['pdesc'],
$parms['locality'],
$parms['region'],
$parms['postal-code'],
$parms['country-name'],
$parms['gender'],
$parms['marital'],
dbesc($url),
dbesc($nurl),
intval($parms['comm']),
$parms['tags'],
dbesc(datetime_convert()),
intval($profile_id)
);
logger('Update returns: ' . $r);
}
else {
$r = q("INSERT INTO `profile` ( `name`, `pdesc`, `locality`, `region`, `postal-code`, `country-name`, `gender`, `marital`, `homepage`, `nurl`, `comm`, `tags`, `created`, `updated` )
VALUES ( '%s', '%s', '%s', '%s' , '%s', '%s', '%s', '%s', '%s', '%s', %d, '%s', '%s', '%s' )",
$parms['fn'],
$parms['pdesc'],
$parms['locality'],
$parms['region'],
$parms['postal-code'],
$parms['country-name'],
$parms['gender'],
$parms['marital'],
dbesc($url),
dbesc($nurl),
intval($parms['comm']),
$parms['tags'],
dbesc(datetime_convert()),
dbesc(datetime_convert())
);
logger('Insert returns: ' . $r);
$r = q("SELECT `id` FROM `profile` WHERE ( `homepage` = '%s' or `nurl` = '%s' ) order by id asc",
dbesc($url),
dbesc($nurl)
);
if(count($r))
$profile_id = $r[count($r) - 1]['id'];
if(count($r) > 1) {
q("DELETE FROM `photo` WHERE `profile-id` = %d LIMIT 1",
intval($r[0]['id'])
);
q("DELETE FROM `profile` WHERE `id` = %d LIMIT 1",
intval($r[0]['id'])
);
}
}
if($parms['tags']) {
$arr = explode(' ', $parms['tags']);
if(count($arr)) {
foreach($arr as $t) {
$t = strip_tags(trim($t));
$t = substr($t,0,254);
if(strlen($t)) {
$r = q("SELECT `id` FROM `tag` WHERE `term` = '%s' and `nurl` = '%s' LIMIT 1",
dbesc($t),
dbesc($nurl)
);
if(! count($r)) {
$r = q("INSERT INTO `tag` (`term`, `nurl`) VALUES ('%s', '%s') ",
dbesc($t),
dbesc($nurl)
);
}
}
}
}
}
$submit_photo_start = microtime(true);
require_once("Photo.php");
$photo_failure = false;
$status = false;
if($profile_id) {
$img_str = fetch_url($photo,true);
$img = new Photo($img_str);
if($img) {
$img->scaleImageSquare(80);
$r = $img->store($profile_id);
}
$r = q("UPDATE `profile` SET `photo` = '%s' WHERE `id` = %d LIMIT 1",
dbesc($a->get_baseurl() . '/photo/' . $profile_id . '.jpg'),
intval($profile_id)
);
$status = true;
}
else{
nuke_record($url);
return false;
}
$submit_end = microtime(true);
$photo_time = round(($submit_end - $submit_photo_start) * 1000);
$time = round(($submit_end - $submit_start) * 1000);
//Record the scrape speed in a scrapes table.
if($site_health && $status) q(
"INSERT INTO `site-scrape` (`site_health_id`, `dt_performed`, `request_time`, `scrape_time`, `photo_time`, `total_time`)".
"VALUES (%u, NOW(), %u, %u, %u, %u)",
$site_health['id'],
$parms['_timings']['fetch'],
$parms['_timings']['scrape'],
$photo_time,
$time
);
return $status;
}
function nuke_record($url) {
$nurl = str_replace(array('https:','//www.'), array('http:','//'), $url);
$r = q("SELECT `id` FROM `profile` WHERE ( `homepage` = '%s' OR `nurl` = '%s' ) ",
dbesc($url),
dbesc($nurl)
);
if(count($r)) {
foreach($r as $rr) {
q("DELETE FROM `photo` WHERE `profile-id` = %d LIMIT 1",
intval($rr['id'])
);
q("DELETE FROM `profile` WHERE `id` = %d LIMIT 1",
intval($rr['id'])
);
}
}
return;
}

438
include/sync.php Normal file
View File

@ -0,0 +1,438 @@
<?php
/**
* Pull this URL to our pulling queue.
* @param string $url
* @return void
*/
function sync_pull($url)
{
global $a;
//If we support it that is.
if($a->config['syncing']['enable_pulling']){
q("INSERT INTO `sync-pull-queue` (`url`) VALUES ('%s')", dbesc($url));
}
}
/**
* Push this URL to our pushing queue as well as mark it as modified using sync_mark.
* @param string $url
* @return void
*/
function sync_push($url)
{
global $a;
//If we support it that is.
if($a->config['syncing']['enable_pushing']){
q("INSERT INTO `sync-push-queue` (`url`) VALUES ('%s')", dbesc($url));
}
sync_mark($url);
}
/**
* Mark a URL as modified in some way or form.
* This will cause anyone that pulls our changes to see this profile listed.
* @param string $url
* @return void
*/
function sync_mark($url)
{
global $a;
//If we support it that is.
if(!$a->config['syncing']['enable_pulling']){
return;
}
$exists = count(q("SELECT * FROM `sync-timestamps` WHERE `url`='%s'", dbesc($url)));
if(!$exists)
q("INSERT INTO `sync-timestamps` (`url`, `modified`) VALUES ('%s', NOW())", dbesc($url));
else
q("UPDATE `sync-timestamps` SET `modified`=NOW() WHERE `url`='%s'", dbesc($url));
}
/**
* For a single fork during the push jobs.
* Takes a lower priority and pushes a batch of items.
* @param string $target A sync-target database row.
* @param array $batch The batch of items to submit.
* @return void
*/
function push_worker($target, $batch)
{
//Lets be nice, we're only doing a background job here...
pcntl_setpriority(5);
//Find our target's submit URL.
$submit = $target['base_url'].'/submit';
foreach($batch as $item){
set_time_limit(30); //This should work for 1 submit.
msg("Submitting {$item['url']} to $submit");
fetch_url($submit.'?url='.bin2hex($item['url']));
}
}
/**
* Gets an array of push targets.
* @return array Push targets.
*/
function get_push_targets(){
return q("SELECT * FROM `sync-targets` WHERE `push`=b'1'");
}
/**
* Gets a batch of URL's to push.
* @param object $a The App instance.
* @return array Batch of URL's.
*/
function get_push_batch($a){
return q("SELECT * FROM `sync-push-queue` LIMIT %u", intval($a->config['syncing']['max_push_items']));
}
/**
* Gets the push targets as well as a batch of URL's for a pushing job.
* @param object $a The App instance.
* @return list($targets, $batch) A list of both the targets array and batch array.
*/
function get_pushing_job($a)
{
//When pushing is requested...
if(!!$a->config['syncing']['enable_pushing']){
//Find our targets.
$targets = get_push_targets();
//No targets?
if(!count($targets)){
msg('Pushing enabled, but no push targets.');
$batch = array();
}
//If we have targets, get our batch.
else{
$batch = get_push_batch($a);
if(!count($batch)) msg('Empty pushing queue.'); //No batch, means no work.
}
}
//No pushing if it's disabled.
else{
$targets = array();
$batch = array();
}
return array($targets, $batch);
}
/**
* Runs a pushing job, creating a thread for each target.
* @param array $targets Pushing targets.
* @param array $batch Batch of URL's to push.
* @param string $db_host DB host to connect to.
* @param string $db_user DB user to connect with.
* @param string $db_pass DB pass to connect with.
* @param mixed $db_data Nobody knows.
* @param mixed $install Maybe a boolean.
* @return void
*/
function run_pushing_job($targets, $batch, $db_host, $db_user, $db_pass, $db_data, $install)
{
//Create a thread for each target we want to serve push messages to.
//Not good creating more, because it would stress their server too much.
$threadc = count($targets);
$threads = array();
//Do we only have 1 target? No need for threads.
if($threadc === 1){
msg('No threads needed. Only one pushing target.');
push_worker($targets[0], $batch);
}
//When we need threads.
elseif($threadc > 1){
//POSIX threads only.
if(!function_exists('pcntl_fork')){
msg('Error: no pcntl_fork support. Are you running a different OS? Report an issue please.', true);
}
//Debug...
$items = count($batch);
msg("Creating $threadc push threads for $items items.");
//Loop while we need more threads.
for($i = 0; $i < $threadc; $i++){
$pid = pcntl_fork();
if($pid === -1) msg('Error: something went wrong with the fork. '.pcntl_strerror(), true);
//You're a child, go do some labor!
if($pid === 0){push_worker($targets[$i], $batch); exit;}
//Store the list of PID's.
if($pid > 0) $threads[] = $pid;
}
}
//Wait for all child processes.
$theading_problems = false;
foreach($threads as $pid){
pcntl_waitpid($pid, $status);
if($status !== 0){
$theading_problems = true;
msg("Bad process return value $pid:$status");
}
}
//If we did not have any "threading" problems.
if(!$theading_problems){
//Reconnect
global $db;
$db = new dba($db_host, $db_user, $db_pass, $db_data, $install);
//Create a query for deleting this queue.
$where = array();
foreach($batch as $item) $where[] = dbesc($item['url']);
$where = "WHERE `url` IN ('".implode("', '", $where)."')";
//Remove the items from queue.
q("DELETE FROM `sync-push-queue` $where LIMIT %u", count($batch));
msg('Removed items from push queue.');
}
}
/**
* Gets a batch of URL's to push.
* @param object $a The App instance.
* @return array Batch of URL's.
*/
function get_queued_pull_batch($a){
//Randomize this, to prevent scraping the same servers too much or dead URL's.
$batch = q("SELECT * FROM `sync-pull-queue` ORDER BY RAND() LIMIT %u", intval($a->config['syncing']['max_pull_items']));
msg(sprintf('Pulling %u items from queue.', count($batch)));
return $batch;
}
/**
* Gets an array of pull targets.
* @return array Pull targets.
*/
function get_pull_targets(){
return q("SELECT * FROM `sync-targets` WHERE `pull`=b'1'");
}
/**
* Gets a batch of URL's to push.
* @param object $a The App instance.
* @return array Batch of URL's.
*/
function get_remote_pull_batch($a)
{
//Find our targets.
$targets = get_pull_targets();
msg(sprintf('Pulling from %u remote targets.', count($targets)));
//No targets, means no batch.
if(!count($targets))
return array();
//Pull a list of URL's from each target.
$urls = array();
foreach($targets as $target){
//First pull, or an update?
if(!$target['dt_last_pull'])
$url = $target['base_url'].'/sync/pull/all';
else
$url = $target['base_url'].'/sync/pull/since/'.intval($target['dt_last_pull']);
//Go for it :D
$target['pull_data'] = json_decode(fetch_url($url), true);
//If we didn't get any JSON.
if($target['pull_data'] === null){
msg(sprintf('Failed to pull from "%s".', $url));
continue;
}
//Add all entries as keys, to remove duplicates.
foreach($target['pull_data']['results'] as $url)
$urls[$url]=true;
}
//Now that we have our URL's. Store them in the queue.
foreach($urls as $url=>$bool){
if($url) sync_pull($url);
}
//Since this all worked out, mark each source with the timestamp of pulling.
foreach($targets as $target){
if($targets['pull_data'] && $targets['pull_data']['now'])
q("UPDATE `sync-targets` SET `dt_last_pull`=%u WHERE `base_url`='%s'", $targets['pull_data']['now'], dbesc($targets['base_url']));
}
//Finally, return a batch of this.
return get_queued_pull_batch($a);
}
/**
* Gathers an array of URL's to scrape from the pulling targets.
* @param object $a The App instance.
* @return array URL's to scrape.
*/
function get_pulling_job($a)
{
//No pulling today...
if(!$a->config['syncing']['enable_pulling'])
return array();
//Firstly, finish the items from our queue.
$batch = get_queued_pull_batch($a);
if(count($batch)) return $batch;
//If that is empty, fill the queue with remote items and return a batch of that.
$batch = get_remote_pull_batch($a);
if(count($batch)) return $batch;
}
/**
* For a single fork during the pull jobs.
* Takes a lower priority and pulls a batch of items.
* @param int $i The index number of this worker (for round-robin).
* @param int $threadc The amount of workers (for round-robin).
* @param array $pull_batch A batch of URL's to pull.
* @param string $db_host DB host to connect to.
* @param string $db_user DB user to connect with.
* @param string $db_pass DB pass to connect with.
* @param mixed $db_data Nobody knows.
* @param mixed $install Maybe a boolean.
* @return void
*/
function pull_worker($i, $threadc, $pull_batch, $db_host, $db_user, $db_pass, $db_data, $install)
{
//Lets be nice, we're only doing maintenance here...
pcntl_setpriority(5);
//Get personal DBA's.
global $db;
$db = new dba($db_host, $db_user, $db_pass, $db_data, $install);
//Get our (round-robin) workload from the batch.
$workload = array();
while(isset($pull_batch[$i])){
$entry = $pull_batch[$i];
$workload[] = $entry;
$i+=$threadc;
}
//While we've got work to do.
while(count($workload)){
$entry = array_pop($workload);
set_time_limit(20); //This should work for 1 submit.
msg("Submitting ".$entry['url']);
run_submit($entry['url']);
}
}
/**
* Runs a pulling job, creating several threads to do so.
* @param object $a The App instance.
* @param array $pull_batch A batch of URL's to pull.
* @param string $db_host DB host to connect to.
* @param string $db_user DB user to connect with.
* @param string $db_pass DB pass to connect with.
* @param mixed $db_data Nobody knows.
* @param mixed $install Maybe a boolean.
* @return void
*/
function run_pulling_job($a, $pull_batch, $db_host, $db_user, $db_pass, $db_data, $install)
{
//We need the scraper.
require_once('include/submit.php');
//POSIX threads only.
if(!function_exists('pcntl_fork')){
msg('Error: no pcntl_fork support. Are you running a different OS? Report an issue please.', true);
}
//Create the threads we need.
$items = count($pull_batch);
$threadc = min($a->config['syncing']['pulling_threads'], $items); //Don't need more threads than items.
$threads = array();
msg("Creating $threadc pulling threads for $items profiles.");
//Build the threads.
for($i = 0; $i < $threadc; $i++){
$pid = pcntl_fork();
if($pid === -1) msg('Error: something went wrong with the fork. '.pcntl_strerror(), true);
//You're a child, go do some labor!
if($pid === 0){pull_worker($i, $threadc, $pull_batch, $db_host, $db_user, $db_pass, $db_data, $install); exit;}
//Store the list of PID's.
if($pid > 0) $threads[] = $pid;
}
//Wait for all child processes.
$theading_problems = false;
foreach($threads as $pid){
pcntl_waitpid($pid, $status);
if($status !== 0){
$theading_problems = true;
msg("Bad process return value $pid:$status");
}
}
//If we did not have any "threading" problems.
if(!$theading_problems){
//Reconnect
global $db;
$db = new dba($db_host, $db_user, $db_pass, $db_data, $install);
//Create a query for deleting this queue.
$where = array();
foreach($pull_batch as $item) $where[] = dbesc($item['url']);
$where = "WHERE `url` IN ('".implode("', '", $where)."')";
//Remove the items from queue.
q("DELETE FROM `sync-pull-queue` $where LIMIT %u", count($pull_batch));
msg('Removed items from pull queue.');
}
}

View File

@ -158,8 +158,9 @@ class HTML5_TreeBuilder {
if ($this->ignore_lf_token) $this->ignore_lf_token--;
$this->ignored = false;
$token['name'] = str_replace(':', '-', $token['name']);
if(isset($token['name']))
$token['name'] = str_replace(':', '-', $token['name']);
// indenting is a little wonky, this can be changed later on
switch ($mode) {

View File

@ -2,36 +2,210 @@
function admin_content(&$a) {
if(! $_SESSION['uid']) {
notice("Permission denied.");
return;
goaway($a->get_baseurl());
}
$r = q("SELECT COUNT(*) FROM `flag` as `ftotal` WHERE 1");
if(count($r))
$a->set_pager_total($r[0]['ftotal']);
$r = q("SELECT * FROM `flag` WHERE 1 ORDER BY `total` DESC LIMIT %d, %d ",
intval($a->pager['start']),
intval($a->pager['itemspage'])
);
if(! count($r)) {
notice("No entries.");
return;
}
//Get 100 flagged entries.
$r = q("SELECT `flag`.*, `profile`.`name`, `profile`.`homepage`
FROM `flag` JOIN `profile` ON `flag`.`pid`=`profile`.`id`
ORDER BY `total` DESC LIMIT 100"
);
if(count($r)) {
$flagged = '';
foreach($r as $rr) {
if($rr['reason'] == 1)
$str = 'censor';
$str = 'Adult';
if($rr['reason'] == 2)
$str = 'dead';
$o .= '<a href="' . 'moderate/' . $rr['pid'] . '/' . $str . '">'
. $str . ' profile: ' . $rr['pid'] . ' (' . $rr['total'] . ')</a><br />';
$str = 'Dead';
$flagged .= '<a href="' . 'moderate/' . $rr['pid'] . '/' . $rr['reason'] . '">'.
"{$rr['total']}x $str - [#{$rr['pid']}] {$rr['name']} ({$rr['homepage']})</a><br />";
}
} else {
$flagged = 'No entries.';
}
//Get the maintenance backlog size.
$res = q("SELECT count(*) as `count` FROM `profile` WHERE `updated` < '%s'",
dbesc(date('Y-m-d H:i:s', time()-$a->config['maintenance']['min_scrape_delay'])));
$maintenance_backlog = 'unknown';
if(count($res)){ $maintenance_backlog = $res[0]['count'].' entries'; }
//Get the pulling backlog size.
$res = q("SELECT count(*) as `count` FROM `sync-pull-queue`");
$pulling_backlog = 'unknown';
if(count($res)){ $pulling_backlog = $res[0]['count'].' entries'; }
$tpl = file_get_contents('view/admin.tpl');
return replace_macros($tpl, array(
'$present' => is_file('.htimport') ? ' (present)' : '',
'$flagged' => $flagged,
'$maintenance_backlog' => $maintenance_backlog,
'$pulling_backlog' => $pulling_backlog,
'$maintenance_size' => $a->config['maintenance']['max_scrapes'].' items per maintenance call.'
));
}
$o .= paginate($a);
return $o;
function admin_post(&$a)
{
//Submit a profile URL.
if($_POST['submit_url']){
goaway($a->get_baseurl().'/submit?url='.bin2hex($_POST['submit_url']));
}
//Get our input.
$url = $_POST['dir_import_url'];
$page = intval($_POST['dir_page']);
$batch = $_POST['batch_submit'];
//Directory
$file = realpath(__DIR__.'/..').'/.htimport';
//Per batch setting.
$perPage = 200;
$perBatch = 2;
if($batch){
require_once('include/submit.php');
require_once('include/site-health.php');
//First get all data from file.
$data = file_get_contents($file);
$list = explode("\r\n", $data);
//Fresh batch?
if(!isset($_SESSION['import_progress'])){
$_SESSION['import_progress'] = true;
$_SESSION['import_success'] = 0;
$_SESSION['import_failed'] = 0;
$_SESSION['import_total'] = 0;
notice("Started new batch. ");
}
//Make sure we can use try catch for all sorts of errors.
set_error_handler(function($errno, $errstr='', $errfile='', $errline='', $context=array()){
if((error_reporting() & $errno) == 0){ return; }
throw new \Exception($errstr, $errno);
});
for($i=0; $i<$perBatch; $i++){
if($url = array_shift($list)){
set_time_limit(15);
$_SESSION['import_total']++;
$_SESSION['import_failed']++;
try{
//A site may well turn 'sour' during the import.
//Check the health again for this reason.
$site = parse_site_from_url($url);
$r = q("SELECT * FROM `site-health` WHERE `base_url`= '%s' ORDER BY `id` ASC LIMIT 1", $site);
if(count($r) && intval($r[0]['health_score']) < $a->config['site-health']['skip_import_threshold']){
continue;
}
//Do the submit if health is ok.
if(run_submit($url)){
$_SESSION['import_failed']--;
$_SESSION['import_success']++;
}
}catch(\Exception $ex){/* We tried... */}
}
else break;
}
$left = count($list);
$success = $_SESSION['import_success'];
$skipped = $_SESSION['import_skipped'];
$total = $_SESSION['import_total'];
$errors = $_SESSION['import_failed'];
if($left > 0){
notice("$left items left in batch...<br>$success updated profiles.<br>$errors import errors.");
file_put_contents($file, implode("\r\n", $list));
$fid = uniqid('autosubmit_');
echo '<form method="POST" id="'.$fid.'"><input type="hidden" name="batch_submit" value="1"></form>'.
'<script type="text/javascript">setTimeout(function(){ document.getElementById("'.$fid.'").submit(); }, 300);</script>';
} else {
notice("Completed batch! $success updated. $errors errors.");
unlink($file);
unset($_SESSION['import_progress']);
}
return;
}
//Doing a poll from the directory?
elseif($url){
require_once('include/site-health.php');
$result = fetch_url($url."/lsearch?p=$page&n=$perPage&search=.*");
if($result)
$data = json_decode($result);
else
$data = false;
if($data){
$rows = '';
foreach($data->results as $profile){
//Skip known profiles.
$purl = $profile->url;
$nurl = str_replace(array('https:','//www.'), array('http:','//'), $purl);
$r = q("SELECT count(*) as `matched` FROM `profile` WHERE (`homepage` = '%s' OR `nurl` = '%s') LIMIT 1",
dbesc($purl),
dbesc($nurl)
);
if(count($r) && $r[0]['matched']){
continue;
}
//Find out site health.
else{
$site = parse_site_from_url($purl);
$r = q("SELECT * FROM `site-health` WHERE `base_url`= '%s' ORDER BY `id` ASC LIMIT 1", $site);
if(count($r) && intval($r[0]['health_score']) < $a->config['site-health']['skip_import_threshold']){
continue;
}
}
$rows .= $profile->url."\r\n";
}
file_put_contents($file, $rows, $page > 0 ? FILE_APPEND : 0);
$progress = min((($page+1) * $perPage), $data->total);
notice("Imported ".$progress."/".$data->total." URLs.");
if($progress !== $data->total){
$fid = uniqid('autosubmit_');
echo
'<form method="POST" id="'.$fid.'">'.
'<input type="hidden" name="dir_import_url" value="'.$url.'">'.
'<input type="hidden" name="dir_page" value="'.($page+1).'">'.
'</form>'.
'<script type="text/javascript">setTimeout(function(){ document.getElementById("'.$fid.'").submit(); }, 500);</script>';
} else {
goaway($a->get_baseurl().'/admin');
}
}
}
}

324
mod/health.php Normal file
View File

@ -0,0 +1,324 @@
<?php
require_once('include/site-health.php');
function health_content(&$a) {
if($a->argc > 1){
return health_details($a, $a->argv[1]);
}
if($_GET['s']){
return health_search($a, $_GET['s']);
}
return health_summary($a);
}
function health_search(&$a, $search)
{
if(strlen($search) < 3){
$result = 'Please use at least 3 characters in your search';
}
else {
$r = q("SELECT * FROM `site-health` WHERE `base_url` LIKE '%%%s%%' ORDER BY `health_score` DESC LIMIT 100", dbesc($search));
if(count($r)){
$result = '';
foreach($r as $site){
//Get user count.
$site['users'] = 0;
$r = q(
"SELECT COUNT(*) as `users` FROM `profile`
WHERE `homepage` LIKE '%s%%'",
dbesc($site['base_url'])
);
if(count($r)){
$site['users'] = $r[0]['users'];
}
$result .=
'<span class="health '.health_score_to_name($site['health_score']).'">&hearts;</span> '.
'<a href="/health/'.$site['id'].'">' . $site['base_url'] . '</a> '.
'(' . $site['users'] . ')'.
"<br />\r\n";
}
}
else {
$result = 'No results';
}
}
$tpl .= file_get_contents('view/health_search.tpl');
return replace_macros($tpl, array(
'$searched' => $search,
'$result' => $result
));
}
function health_summary(&$a){
$sites = array();
//Find the user count per site.
$r = q("SELECT `homepage` FROM `profile` WHERE 1");
if(count($r)) {
foreach($r as $rr) {
$site = parse_site_from_url($rr['homepage']);
if($site) {
if(!isset($sites[$site]))
$sites[$site] = 0;
$sites[$site] ++;
}
}
}
//See if we have a health for them.
$sites_with_health = array();
$site_healths = array();
$r = q("SELECT * FROM `site-health` WHERE `reg_policy`='REGISTER_OPEN'");
if(count($r)) {
foreach($r as $rr) {
$sites_with_health[$rr['base_url']] = (($sites[$rr['base_url']] / 100) + 10) * intval($rr['health_score']);
$site_healths[$rr['base_url']] = $rr;
}
}
arsort($sites_with_health);
$total = 0;
$public_sites = '';
foreach($sites_with_health as $k => $v)
{
//Stop at unhealthy sites.
$site = $site_healths[$k];
if($site['health_score'] <= 20) break;
//Skip small sites.
$users = $sites[$k];
if($users < 10) continue;
$public_sites .=
'<span class="health '.health_score_to_name($site['health_score']).'">&hearts;</span> '.
'<a href="/health/'.$site['id'].'">' . $k . '</a> '.
'(' . $users . ')'.
"<br />\r\n";
$total ++;
}
$public_sites .= "<br>Total: $total<br />\r\n";
$tpl .= file_get_contents('view/health_summary.tpl');
return replace_macros($tpl, array(
'$versions' => $versions,
'$public_sites' => $public_sites
));
}
function health_details($a, $id)
{
//The overall health status.
$r = q(
"SELECT * FROM `site-health`
WHERE `id`=%u",
intval($id)
);
if(!count($r)){
$a->error = 404;
return;
}
$site = $r[0];
//Figure out SSL state.
$urlMeta = parse_url($site['base_url']);
if($urlMeta['scheme'] !== 'https'){
$ssl_state = 'No';
}else{
switch ($site['ssl_state']) {
case null: $ssl_state = 'Yes, but not yet verified.'; break;
case '0': $ssl_state = 'Certificate error!'; break;
case '1': $ssl_state = '&radic; Yes, verified.'; break;
}
$ssl_state .= ' <a href="https://www.ssllabs.com/ssltest/analyze.html?d='.$urlMeta['host'].'" target="_blank">Detailed test</a>';
}
//Get user count.
$site['users'] = 0;
$r = q(
"SELECT COUNT(*) as `users` FROM `profile`
WHERE `homepage` LIKE '%s%%'",
dbesc($site['base_url'])
);
if(count($r)){
$site['users'] = $r[0]['users'];
}
//Get avg probe speed.
$r = q(
"SELECT AVG(`request_time`) as `avg_probe_time` FROM `site-probe`
WHERE `site_health_id` = %u",
intval($site['id'])
);
if(count($r)){
$site['avg_probe_time'] = $r[0]['avg_probe_time'];
}
//Get scraping / submit speeds.
$r = q(
"SELECT
AVG(`request_time`) as `avg_profile_time`,
AVG(`scrape_time`) as `avg_scrape_time`,
AVG(`photo_time`) as `avg_photo_time`,
AVG(`total_time`) as `avg_submit_time`
FROM `site-scrape`
WHERE `site_health_id` = %u",
intval($site['id'])
);
if(count($r)){
$site['avg_profile_time'] = $r[0]['avg_profile_time'];
$site['avg_scrape_time'] = $r[0]['avg_scrape_time'];
$site['avg_photo_time'] = $r[0]['avg_photo_time'];
$site['avg_submit_time'] = $r[0]['avg_submit_time'];
}
//Get probe speed data.
$r = q(
"SELECT `request_time`, `dt_performed` FROM `site-probe`
WHERE `site_health_id` = %u",
intval($site['id'])
);
if(count($r)){
//Include graphael line charts.
$a->page['htmlhead'] .= '<script type="text/javascript" src="'.$a->get_baseurl().'/include/raphael.js"></script>'.PHP_EOL;
$a->page['htmlhead'] .= '<script type="text/javascript" src="'.$a->get_baseurl().'/include/g.raphael.js"></script>'.PHP_EOL;
$a->page['htmlhead'] .= '<script type="text/javascript" src="'.$a->get_baseurl().'/include/g.line-min.js"></script>';
$speeds = array();
$times = array();
$mintime = time();
foreach($r as $row){
$speeds[] = $row['request_time'];
$time = strtotime($row['dt_performed']);
$times[] = $time;
if($mintime > $time) $mintime = $time;
}
for($i=0; $i < count($times); $i++){
$times[$i] -= $mintime;
$times[$i] = floor($times[$i] / (24*3600));
}
$a->page['htmlhead'] .=
'<script type="text/javascript">
jQuery(function($){
var r = Raphael("probe-chart")
, x = ['.implode(',', $times).']
, y = ['.implode(',', $speeds).']
;
r.linechart(30, 15, 400, 300, x, [y], {symbol:"circle", axis:"0 0 0 1", shade:true, width:1.5}).hoverColumn(function () {
this.tags = r.set();
for (var i = 0, ii = this.y.length; i < ii; i++) {
this.tags.push(r.popup(this.x, this.y[i], this.values[i]+"ms", "right", 5).insertBefore(this).attr([{ fill: "#eee" }, { fill: this.symbols[i].attr("fill") }]));
}
}, function () {
this.tags && this.tags.remove();
});
});
</script>';
}
//Get scrape speed data.
$r = q(
"SELECT AVG(`total_time`) as `avg_time`, date(`dt_performed`) as `date` FROM `site-scrape`
WHERE `site_health_id` = %u GROUP BY `date`",
intval($site['id'])
// date('Y-m-d H:i:s', time()-(3*24*3600)) //Max 3 days old.
);
if($r && count($r)){
//Include graphael line charts.
$a->page['htmlhead'] .= '<script type="text/javascript" src="'.$a->get_baseurl().'/include/raphael.js"></script>'.PHP_EOL;
$a->page['htmlhead'] .= '<script type="text/javascript" src="'.$a->get_baseurl().'/include/g.raphael.js"></script>'.PHP_EOL;
$a->page['htmlhead'] .= '<script type="text/javascript" src="'.$a->get_baseurl().'/include/g.line-min.js"></script>';
$speeds = array();
$times = array();
$mintime = time();
foreach($r as $row){
$speeds[] = $row['avg_time'];
$time = strtotime($row['date']);
$times[] = $time;
if($mintime > $time) $mintime = $time;
}
for($i=0; $i < count($times); $i++){
$times[$i] -= $mintime;
$times[$i] = floor($times[$i] / (24*3600));
}
$a->page['htmlhead'] .=
'<script type="text/javascript">
jQuery(function($){
var r = Raphael("scrape-chart")
, x = ['.implode(',', $times).']
, y = ['.implode(',', $speeds).']
;
r.linechart(30, 15, 400, 300, x, [y], {shade:true, axis:"0 0 0 1", width:1}).hoverColumn(function () {
this.tags = r.set();
for (var i = 0, ii = this.y.length; i < ii; i++) {
this.tags.push(r.popup(this.x, this.y[i], Math.round(this.values[i])+"ms", "right", 5).insertBefore(this));
}
}, function () {
this.tags && this.tags.remove();
});
});
</script>';
}
//Nice name for registration policy.
switch ($site['reg_policy']) {
case 'REGISTER_OPEN': $policy = "Open"; break;
case 'REGISTER_APPROVE': $policy = "Admin approved"; break;
case 'REGISTER_CLOSED': $policy = "Closed"; break;
default: $policy = $site['reg_policy']; break;
}
$tpl .= file_get_contents('view/health_details.tpl');
return replace_macros($tpl, array(
'$name' => $site['name'],
'$policy' => $policy,
'$site_info' => $site['info'],
'$base_url' => $site['base_url'],
'$health_score' => $site['health_score'],
'$health_name' => health_score_to_name($site['health_score']),
'$no_scrape_support' => !empty($site['no_scrape_url']) ? '&radic; Supports noscrape' : '',
'$dt_first_noticed' => $site['dt_first_noticed'],
'$dt_last_seen' => $site['dt_last_seen'],
'$version' => $site['version'],
'$plugins' => $site['plugins'],
'$reg_policy' => $site['reg_policy'],
'$info' => $site['info'],
'$admin_name' => $site['admin_name'],
'$admin_profile' => $site['admin_profile'],
'$users' => $site['users'],
'$ssl_state' => $ssl_state,
'$avg_probe_time' => round($site['avg_probe_time']),
'$avg_profile_time' => round($site['avg_profile_time']),
'$avg_scrape_time' => round($site['avg_scrape_time']),
'$avg_photo_time' => round($site['avg_photo_time']),
'$avg_submit_time' => round($site['avg_submit_time'])
));
}

View File

@ -68,7 +68,7 @@ function moderate_content(&$a) {
$id = intval($a->argv[1]);
if($a->argc > 2)
$reason = $a->argv[2];
if($id) {
$r = q("SELECT * FROM `profile` WHERE `id` = %d LIMIT 1",
intval($id)
@ -80,6 +80,8 @@ function moderate_content(&$a) {
);
goaway($a->get_baseurl() . '/admin');
}
}else{
goaway($a->get_baseurl() . '/admin');
}
$c .= "<h1>Moderate/delete profile</h1>";
@ -129,16 +131,16 @@ function moderate_content(&$a) {
$o .= "<div class=\"directory-end\" ></div>\r\n";
$c .= '<br /><br /><iframe height="400" width="800" src="' . $rr['homepage'] . '" ></iframe>';
$c .= '<br /><br /><iframe height="400" width="800" src="' . $rr['homepage'] . '" class="profile-moderate-preview"></iframe>';
$c .= '<br />' . $rr['homepage'] . '<br />';
$o .= '<form action="moderate" method="post" >';
$checked = (($reason === 'censor') ? 'checked="checked" ' : '');
$o .= '<input type="radio" name="action" value="censor"' . $checked . '>Censor Profile<br /><br />';
$checked = (($reason === 'dead') ? 'checked="checked" ' : '');
$o .= '<input type="radio" name="action" value="dead"' . $checked . '" >Dead Account<br /><br />';
$checked = (($reason === '1') ? 'checked="checked" ' : '');
$o .= '<label><input type="radio" name="action" value="censor"' . $checked . '>Censor Profile</label><br /><br />';
$checked = (($reason === '2') ? 'checked="checked" ' : '');
$o .= '<label><input type="radio" name="action" value="dead"' . $checked . '" >Dead Account</label><br /><br />';
$o .= '<input type="radio" name="action" value="bogus" >Bogus request<br /><br />';
$o .= '<label><input type="radio" name="action" value="bogus" >Bogus request</label><br /><br />';
$o .= '<input type="hidden" name="id" value="' . $id . '" ><br /><br />';

View File

@ -2,9 +2,11 @@
function opensearch_init(&$a) {
$r = file_get_contents('view/osearch.tpl');
$tpl = file_get_contents('view/osearch.tpl');
header("Content-type: application/opensearchdescription+xml");
echo $r;
echo replace_macros($tpl, array(
'$base' => $a->get_baseurl()
));
killme();
}

View File

@ -1,169 +0,0 @@
<?php
require_once("Photo.php");
function profile_photo_init(&$a) {
if(! local_user()) {
return;
}
require_once("mod/profile.php");
profile_load($a,$a->user['nickname']);
}
function profile_photo_post(&$a) {
if(! local_user()) {
notice ( t('Permission denied.') . EOL );
return;
}
if((x($_POST,'cropfinal')) && ($_POST['cropfinal'] == 1)) {
// phase 2 - we have finished cropping
if($a->argc != 2) {
notice( t('Image uploaded but image cropping failed.') . EOL );
return;
}
$image_id = $a->argv[1];
if(substr($image_id,-2,1) == '-') {
$scale = substr($image_id,-1,1);
$image_id = substr($image_id,0,-2);
}
$srcX = $_POST['xstart'];
$srcY = $_POST['ystart'];
$srcW = $_POST['xfinal'] - $srcX;
$srcH = $_POST['yfinal'] - $srcY;
$r = q("SELECT * FROM `photo` WHERE `resource-id` = '%s' AND `scale` = %d LIMIT 1",
dbesc($image_id),
intval($scale));
if(count($r)) {
$base_image = $r[0];
$im = new Photo($base_image['data']);
$im->cropImage(175,$srcX,$srcY,$srcW,$srcH);
$r = $im->store(0, $base_image['resource-id'],$base_image['filename'], t('Profile Photos'), 4, 1);
if($r === false)
notice ( t('Image size reduction (175) failed.') . EOL );
$im->scaleImage(80);
$r = $im->store(0, $base_image['resource-id'],$base_image['filename'], t('Profile Photos'), 5, 1);
if($r === false)
notice( t('Image size reduction (80) failed.') . EOL );
// Unset the profile photo flag from any other photos I own
$r = q("UPDATE `photo` SET `profile` = 0 WHERE `profile` = 1 AND `resource-id` != '%s' ",
dbesc($base_image['resource-id'])
);
$r = q("UPDATE `contact` SET `avatar-date` = '%s' WHERE `self` = 1 LIMIT 1",
dbesc(datetime_convert())
);
}
goaway($a->get_baseurl() . '/profiles');
return; // NOTREACHED
}
$src = $_FILES['userfile']['tmp_name'];
$filename = basename($_FILES['userfile']['name']);
$filesize = intval($_FILES['userfile']['size']);
$imagedata = @file_get_contents($src);
$ph = new Photo($imagedata);
if(! ($image = $ph->getImage())) {
notice( t('Unable to process image.') . EOL );
@unlink($src);
return;
}
@unlink($src);
$width = $ph->getWidth();
$height = $ph->getHeight();
if($width < 175 || $height < 175) {
$ph->scaleImageUp(200);
$width = $ph->getWidth();
$height = $ph->getHeight();
}
$hash = hash('md5',uniqid(mt_rand(),true));
$smallest = 0;
$r = $ph->store(0 , $hash, $filename, t('Profile Photos'), 0 );
if($r)
notice( t('Image uploaded successfully.') . EOL );
else
notice( t('Image upload failed.') . EOL );
if($width > 640 || $height > 640) {
$ph->scaleImage(640);
$r = $ph->store(0 , $hash, $filename, t('Profile Photos'), 1 );
if($r === false)
notice( t('Image size reduction (640) failed.') . EOL );
else
$smallest = 1;
}
$a->config['imagecrop'] = $hash;
$a->config['imagecrop_resolution'] = $smallest;
$a->page['htmlhead'] .= file_get_contents("view/crophead.tpl");
return;
}
if(! function_exists('profile_photo_content')) {
function profile_photo_content(&$a) {
if(! local_user()) {
notice( t('Permission denied.') . EOL );
return;
}
if(! x($a->config,'imagecrop')) {
$tpl = file_get_contents('view/profile_photo.tpl');
$o .= replace_macros($tpl,array(
));
return $o;
}
else {
$filename = $a->config['imagecrop'] . '-' . $a->config['imagecrop_resolution'] . '.jpg';
$resolution = $a->config['imagecrop_resolution'];
$tpl = file_get_contents("view/cropbody.tpl");
$o .= replace_macros($tpl,array(
'$filename' => $filename,
'$resource' => $a->config['imagecrop'] . '-' . $a->config['imagecrop_resolution'],
'$image_url' => $a->get_baseurl() . '/photo/' . $filename
));
return $o;
}
return; // NOTREACHED
}}

View File

@ -1,195 +1,18 @@
<?php
require_once('include/datetime.php');
require_once('include/submit.php');
require_once('include/sync.php');
function submit_content(&$a) {
//Decode the URL.
$url = hex2bin(notags(trim($_GET['url'])));
if(! strlen($url))
exit;
logger('Updating: ' . $url);
$nurl = str_replace(array('https:','//www.'), array('http:','//'), $url);
$profile_exists = false;
$r = q("SELECT * FROM `profile` WHERE ( `homepage` = '%s' OR `nurl` = '%s' ) LIMIT 1",
dbesc($url),
dbesc($nurl)
);
if(count($r)) {
$profile_exists = true;
$profile_id = $r[0]['id'];
}
require_once('Scrape.php');
$parms = scrape_dfrn($url);
// logger('dir_submit: ' . print_r($parms,true));
if((! count($parms)) || (validate_dfrn($parms))) {
exit;
}
if((x($parms,'hide')) || (! (x($parms,'fn')) && (x($parms,'photo')))) {
if($profile_exists) {
nuke_record($url);
}
exit;
}
$photo = $parms['photo'];
dbesc_array($parms);
if(x($parms,'comm'))
$parms['comm'] = intval($parms['comm']);
if($profile_exists) {
$r = q("UPDATE `profile` SET
`name` = '%s',
`pdesc` = '%s',
`locality` = '%s',
`region` = '%s',
`postal-code` = '%s',
`country-name` = '%s',
`gender` = '%s',
`marital` = '%s',
`homepage` = '%s',
`nurl` = '%s',
`comm` = %d,
`tags` = '%s',
`updated` = '%s'
WHERE `id` = %d LIMIT 1",
$parms['fn'],
$parms['pdesc'],
$parms['locality'],
$parms['region'],
$parms['postal-code'],
$parms['country-name'],
$parms['gender'],
$parms['marital'],
dbesc($url),
dbesc($nurl),
intval($parms['comm']),
$parms['tags'],
dbesc(datetime_convert()),
intval($profile_id)
);
logger('Update returns: ' . $r);
}
else {
$r = q("INSERT INTO `profile` ( `name`, `pdesc`, `locality`, `region`, `postal-code`, `country-name`, `gender`, `marital`, `homepage`, `nurl`, `comm`, `tags`, `created`, `updated` )
VALUES ( '%s', '%s', '%s', '%s' , '%s', '%s', '%s', '%s', '%s', '%s', %d, '%s', '%s', '%s' )",
$parms['fn'],
$parms['pdesc'],
$parms['locality'],
$parms['region'],
$parms['postal-code'],
$parms['country-name'],
$parms['gender'],
$parms['marital'],
dbesc($url),
dbesc($nurl),
intval($parms['comm']),
$parms['tags'],
dbesc(datetime_convert()),
dbesc(datetime_convert())
);
logger('Insert returns: ' . $r);
$r = q("SELECT `id` FROM `profile` WHERE ( `homepage` = '%s' or `nurl` = '%s' ) order by id asc",
dbesc($url),
dbesc($nurl)
);
if(count($r))
$profile_id = $r[count($r) - 1]['id'];
if(count($r) > 1) {
q("DELETE FROM `photo` WHERE `profile-id` = %d LIMIT 1",
intval($r[0]['id'])
);
q("DELETE FROM `profile` WHERE `id` = %d LIMIT 1",
intval($r[0]['id'])
);
}
}
if($parms['tags']) {
$arr = explode(' ', $parms['tags']);
if(count($arr)) {
foreach($arr as $t) {
$t = strip_tags(trim($t));
$t = substr($t,0,254);
if(strlen($t)) {
$r = q("SELECT `id` FROM `tag` WHERE `term` = '%s' and `nurl` = '%s' LIMIT 1",
dbesc($t),
dbesc($nurl)
);
if(! count($r)) {
$r = q("INSERT INTO `tag` (`term`, `nurl`) VALUES ('%s', '%s') ",
dbesc($t),
dbesc($nurl)
);
}
}
}
}
}
require_once("Photo.php");
$photo_failure = false;
$img_str = fetch_url($photo,true);
$img = new Photo($img_str);
if($img) {
$img->scaleImageSquare(80);
$r = $img->store($profile_id);
}
if($profile_id) {
$r = q("UPDATE `profile` SET `photo` = '%s' WHERE `id` = %d LIMIT 1",
dbesc($a->get_baseurl() . '/photo/' . $profile_id . '.jpg'),
intval($profile_id)
);
}
else
nuke_record($url);
//Currently we simply push RAW URL's to our targets.
sync_push($url);
//Run the submit sequence.
run_submit($url);
exit;
}
function nuke_record($url) {
$nurl = str_replace(array('https:','//www.'), array('http:','//'), $url);
$r = q("SELECT `id` FROM `profile` WHERE ( `homepage` = '%s' OR `nurl` = '%s' ) ",
dbesc($url),
dbesc($nurl)
);
if(count($r)) {
foreach($r as $rr) {
q("DELETE FROM `photo` WHERE `profile-id` = %d LIMIT 1",
intval($rr['id'])
);
q("DELETE FROM `profile` WHERE `id` = %d LIMIT 1",
intval($rr['id'])
);
}
}
return;
}

80
mod/sync.php Normal file
View File

@ -0,0 +1,80 @@
<?php
function sync_content(&$a)
{
header('Content-type: application/json; charset=utf-8');
//When no arguments were given, return a json token to show we support this method.
if($a->argc < 2){
echo json_encode(array(
'pulling_enabled'=>!!$a->config['syncing']['enable_pulling'],
'pushing_enabled'=>!!$a->config['syncing']['enable_pushing']
));
exit;
}
//Method switcher here.
else{
switch($a->argv[1]){
case 'pull':
if(!$a->config['syncing']['enable_pulling']){
echo json_encode(array('error'=>'Pulling disabled.')); exit;
}
switch ($a->argv[2]) {
case 'all': echo json_encode(do_pull_all()); exit;
case 'since': echo json_encode(do_pull($a->argv[3])); exit;
}
default: echo json_encode(array('error'=>'Unknown method.')); exit;
}
}
}
function do_pull($since)
{
if(!intval($since)){
return array('error' => 'Must set a since timestamp.');
}
//Recently modified items.
$r = q("SELECT * FROM `sync-timestamps` WHERE `modified` > '%s'", date('Y-m-d H:i:s', intval($since)));
//This removes all duplicates.
$profiles = array();
foreach($r as $row) $profiles[$row['url']] = $row['url'];
//This removes the keys, so it's a flat array.
$results = array_values($profiles);
//Format it nicely.
return array(
'now' => time(),
'count' => count($results),
'results' => $results
);
}
function do_pull_all()
{
//Find all the profiles.
$r = q("SELECT `homepage` FROM `profile`");
//This removes all duplicates.
$profiles = array();
foreach($r as $row) $profiles[$row['homepage']] = $row['homepage'];
//This removes the keys, so it's a flat array.
$results = array_values($profiles);
//Format it nicely.
return array(
'now' => time(),
'count' => count($results),
'results' => $results
);
}

24
mod/versions.php Normal file
View File

@ -0,0 +1,24 @@
<?php
function versions_content(&$a){
$sites = array();
//Grab a version list.
$versions = '';
$r = q("SELECT count(*) as `count`, `version` FROM `site-health` WHERE `version` IS NOT NULL GROUP BY `version` ORDER BY `version` DESC");
if(count($r)){
foreach($r as $version){
$versions .=
($version['count'] >= 10 ? '<b>' : '').
$version['version'] . ' ('.$version['count'].')<br>'."\r\n".
($version['count'] >= 10 ? '</b>' : '');
}
}
$tpl .= file_get_contents('view/versions.tpl');
return replace_macros($tpl, array(
'$versions' => $versions
));
}

41
view/admin.tpl Normal file
View File

@ -0,0 +1,41 @@
<div class="flagged-wrapper">
<h1>Flagged entries</h1>
<div class="flagged-entries">$flagged</div>
</div>
<div class="maintenance-wrapper">
<h1>Maintenance</h1>
<p>
<strong>Current maintenance backlog: $maintenance_backlog</strong><br>
<i>$maintenance_size</i>
</p>
</div>
<div class="pulling-wrapper">
<h1>Pulling</h1>
<p>
<strong>Current pulling backlog: $pulling_backlog</strong><br>
</p>
</div>
<div class="import-wrapper">
<h1>Import tools</h1>
<h2>Mirror a directory</h2>
<p>This is very slow, faster would be to use pull targets as that is multi-threaded.</p>
<form method="POST">
<label>Extract URL's:</label>
<input type="text" name="dir_import_url" value="http://dir.friendica.com">
<input type="hidden" name="dir_page" value="0">
<input type="submit" value="Execute">
</form>
<br>
<form method="POST">
<label>Batch submit from file: $present</label>
<input type="submit" name="batch_submit" value="Run batch">
</form>
<h2>Manual submit</h2>
<form method="POST">
<input type="text" name="submit_url" placeholder="Profile url" size="35" />
<input type="submit" value="Submit">
</form>
</div>

35
view/health_details.tpl Normal file
View File

@ -0,0 +1,35 @@
<h1>
<span class="health $health_name">&hearts;</span> $name<br>
<sup><a href="$base_url">$base_url</a></sup>
</h1>
<p><a href="/health">&laquo; Back to index</a></p>
<div class="meta">
<h3>General information</h3>
<div class="users">$users users</div>
<div class="policy">$policy registration policy</div>
<div class="version">Friendica $version</div>
<div class="first_noticed">First noticed: $dt_first_noticed</div>
<div class="last_seen">Last update: $dt_last_seen</div>
<pre class="site-info">$site_info</pre>
</div>
<div class="security">
<h3>Security</h3>
<div class="ssl_state">HTTPS: $ssl_state</div>
</div>
<div class="performance">
<h3>Performance information</h3>
<div style="float:left;margin-right:30px;padding-top:20px;">
<div class="probe_speed">Probe speed: $avg_probe_timems</div>
<div class="photo_speed">Photo speed: $avg_photo_timems</div>
<div class="profile_speed">Profile speed: $avg_profile_timems</div>
<div class="scrape_speed">Scrape speed: $avg_scrape_timems</div>
<div class="submit_speed">Submit speed: $avg_submit_timems</div>
<span class="health perfect">$no_scrape_support</span>
</div>
<div id="probe-chart" class="speed-chart">Probe speed</div>
<div id="scrape-chart" class="speed-chart">Submit speed</div>
</div>

10
view/health_search.tpl Normal file
View File

@ -0,0 +1,10 @@
<h1>Search your site</h1>
<form method="GET">
<label>Your site URL:</label>
<input type="text" name="s" placeholder="example.com" value="$searched" />
<input type="submit" value="Search" />
</form>
<p><a href="/health">&laquo; Back to index</a></p>
<h1>Search results</h1>
<div class="result-sites">$result</div>

14
view/health_summary.tpl Normal file
View File

@ -0,0 +1,14 @@
<h1>Search your site</h1>
<form method="GET">
<label>Your site URL:</label>
<input type="text" name="s" placeholder="example.com" />
<input type="submit" value="Search" />
</form>
<h1>Healthy public sites</h1>
<p>
These are sites with their registration set to an open policy and a decent health score.<br>
Not on the list: try searching.<br>
More info: ask <a href="https://fc.oscp.info/profile/beanow">Beanow</a>.
</p>
<div class="public-sites">$public_sites</div>

View File

@ -4,11 +4,11 @@
<ShortName>Friendica Global Directory</ShortName>
<Description>Search Friendica Global Directory</Description>
<InputEncoding>UTF-8</InputEncoding>
<Image width="16" height="16" type="image/x-icon">http://dir.friendica.com/images/friendica-16.ico</Image>
<Image width="64" height="64" type="image/png">http://dir.friendica.com/images/friendica-64.png</Image>
<Url type="text/html" method="GET" template="http://dir.friendica.com/directory">
<Image width="16" height="16" type="image/x-icon">$base/images/friendica-16.ico</Image>
<Image width="64" height="64" type="image/png">$base/images/friendica-64.png</Image>
<Url type="text/html" method="GET" template="$base/directory">
<Param name="search" value="{searchTerms}"/>
</Url>
<moz:SearchForm>http://dir.friendica.com</moz:SearchForm>
<moz:SearchForm>$base</moz:SearchForm>
</OpenSearchDescription>

View File

@ -1587,3 +1587,30 @@ input#dfrn-url {
margin-left: 20px;
}
.health{font-size:120%; vertical-align:bottom;}
.health.very-bad{ color:#f99; }
.health.bad{ color:#f1ba7a; }
.health.neutral{ color:#e6e782; }
.health.ok{ color:#bef273; }
.health.good{ color:#7cf273; }
.health.perfect{ color:#33ff80; }
.speed-chart{
float:left;
width:480px;
height:320px;
text-align:center;
}
.flagged-entries{
max-height:315px;
overflow:auto;
border:1px solid #ccc;
padding:5px;
line-height:1.3em;
}
.profile-moderate-preview{
width:80%;
min-height:700px;
}

2
view/versions.tpl Normal file
View File

@ -0,0 +1,2 @@
<h1>Used versions</h1>
<div class="version-list">$versions</div>