call it v 1.1.7

moved BParse class back into main script from submodule
improved CSV output and error handling
2023-01-14 12:38:47 +01:00 · 2023-01-14 12:37:49 +01:00 · 2023-01-14 12:37:04 +01:00 · 2023-01-14 12:36:24 +01:00 · 2023-01-14 12:15:45 +01:00 · 2023-01-14 11:45:18 +01:00
3 changed files with 136 additions and 27 deletions
--- a/README.md
+++ b/README.md
@ -9,8 +9,11 @@ To make certain that you don't block just any instance in the Fediverse
 because $somebody has it on their blocklist you assign _trust levels_ to
 the correctness of the blocklists of the other servers. Only when a server
 is blocked with a total trust level that is above a confidence level, it
-will be added to resulting blocklist automatically. Otherwise the user
-will be ask if they want to add a node to their blocklist or not.
+will be added to resulting blocklist automatically. Otherwise you will be
+ask if you want to add a node to the blocklist or not.
+
+And just to state the obvious: You should never blindly trust the blocklists
+of your peers but do your own investications about in block when in doubt.

 ## Config file

@ -80,6 +83,26 @@ trust = 30
 trust = -50
 ```

+You can also add a list of protected nodes to the config file. To do so add a
+section `[safe harbor]` to the config file. This section has only one entry
+called `domains` and the value of this entry is a comma separated list of domains
+that should never get on your blocklist.
+
+For example
+
+```
+[safe harbor]
+domains = friendica.example.com
+```
+
+You can also add Mastodon instances you trust. In addition to the configuration
+needed for Friendica nodes you have to add the `type = mastodon` entry to the
+config section. *Please note* that the used API endpoint is not available on
+all Mastodon instances.
+
+Please note only suspended entries from the Mastodon blocklist will be added to
+the blocklist. Silenced entries will be ignored.
+
 ### Running the script

 You have to supply the file name of the configuration file on the command
@ -134,6 +157,8 @@ combined trust level for automatically blocking it would be `50`.
     
 ### REUSE compliance

+[![REUSE status](https://api.reuse.software/badge/git.friendi.ca/tobias/brewserverblocklist)](https://api.reuse.software/info/git.friendi.ca/tobias/brewserverblocklist)
+
 This project uses [REUSE](https://reuse.software/) to ensure that all components
 are release under a FLOSS compatible license. If you contribute to the project,
 please ensure that your contribution is REUSE compliant.
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,10 +1,16 @@
 [tool.poetry]
 name = "brewserverblocklist"
-version = "1.0.0"
+version = "1.1.7"
 description = "A python script to collect the server-wide blocklists from Friendica nodes to build a collection from trusted admin choice"
 authors = ["Tobias Diekershoff"]
 license = "GNU General Public License v3.0"
 readme = "README.md"
+include = [
+  { path = "src/brewserverblocklist" },
+]
+packages = [
+  { include = "brewserverblocklist", from = "src"},
+]

 [tool.poetry.scripts]
 brewserverblocklist = "brewserverblocklist.brewserverblocklist:main"
--- a/src/brewserverblocklist/brewserverblocklist.py
+++ b/src/brewserverblocklist/brewserverblocklist.py
@ -5,6 +5,8 @@
 """
 This script can be used to create server block lists by compining the blocklists
 of any number of other Friendica instances.
+
+See https://git.friendi.ca/tobias/brewserverblocklist
 """
 import argparse
 import configparser
@ -12,6 +14,21 @@ import sys
 from os.path import exists
 import requests

+class BParser(argparse.ArgumentParser):
+    """
+    This expansion of the ArgParser class will display the --help results by
+    default if an error occurs (e.g. no arguments are passed to the script).
+
+    It is based on an StackOverflow answer from 2010 by unutbu who refered to
+    a reply from Steven Bethard as source of the code.
+
+    https://stackoverflow.com/questions/4042452/display-help-message-with-python-argparse-when-script-is-called-without-any-argu/4042861#4042861
+    """
+    def error(self, message):
+        sys.stderr.write('error: %s\n' % message)
+        self.print_help()
+        sys.exit(2)
+
 class BrewBlocklist():
    """
    This is the cauldron that is used to
@ -29,11 +46,26 @@ class BrewBlocklist():
        self.sources = []
        self.auto_accept = auto_accept
        self.auto_accept_direction = auto_accept_direction
+        self.safe_harbor = []
+        self.error = []
        config = configparser.RawConfigParser()
        config.read(configfile)
        for section in config.sections():
            section_values = dict(config.items(section))
-            self.sources.append({'url': section, 'trust': int(section_values['trust'])})
+            if (section.find('http://') > -1) or (section.find('https://') > -1):
+                print('The section name in the config file must not contain the protocol ({})'.format(section))
+                sys.exit(1)
+            if not section == 'safe harbor':
+                if not 'type' in section_values.keys():
+                    section_values['type'] = 'friendica'
+                self.sources.append({
+                    'url': section,
+                    'trust': int(section_values['trust']),
+                    'type': section_values['type']
+                })
+            else:
+                for item in section_values['domains'].split(','):
+                    self.safe_harbor.append(item)
        self.outputfile = outputfile
        self.confidence = confidence
        self.blocklist = {}
@ -46,14 +78,36 @@ class BrewBlocklist():
        mention of the server wins) and sum the trust levels of the blocks.
        """
        for source in self.sources:
-            requ = requests.get('https://{}/blocklist/domain/download'.format(source['url']))
-            for line in requ.text.split('\n'):
-                try:
-                    pattern, reason = line.split(',')
-                except ValueError:
+            if source['type'] == 'friendica':
+                # Friendica publishes the blocklist as CSV file
+                requ = requests.get('https://{}/blocklist/domain/download'.format(source['url']))
+                if not requ.status_code == requests.codes.ok:
+                    self.error.append('The request to {} failed'.format(sources['url']))
                    break
-                self.blocklist[pattern] = self.blocklist.get(pattern, 0) + source['trust']
-                self.reasons[pattern] = self.reasons.get(pattern, reason)
+                for line in requ.text.split('\n'):
+                    try:
+                        pattern, reason = line.split(',')
+                    except ValueError:
+                        # happens in an empty line in the source CSV file, which seems
+                        # to be the last line of the file so we can just break the loop
+                        # one step early and ignore the exception silently.
+                        break
+                    self.blocklist[pattern] = self.blocklist.get(pattern, 0) + source['trust']
+                    self.reasons[pattern] = self.reasons.get(pattern, reason)
+            elif source['type'] == 'mastodon':
+                # Mastodon has an API endpoint that contains the information
+                requ = requests.get('https://{}//api/v1/instance/domain_blocks'.format(source['url']))
+                if not requ.status_code == requests.codes.ok:
+                    self.error.append('The request to {} failed'.format(sources['url']))
+                    break
+                try:
+                    for item in requ.json():
+                        self.blocklist[item['domain']] = self.blocklist.get(item['domain'], 0) + source['trust']
+                        self.reasons[item['domain']] = self.reasons.get(item['domain'], item['comment'])
+                except:
+                    self.error.append('{} returned no valid json to the API call'.format(source['url']))
+            else:
+                raise ValueError('{} is not a supported node type, check your config file'.format(source['type']))

    def clean_list(self):
        """
@ -66,21 +120,22 @@ class BrewBlocklist():
        c_blocklist = {}
        c_reasons = {}
        for key, value in self.blocklist.items():
-            if value < self.confidence:
-                if not self.auto_accept:
-                    print('Domain: {} [total trust {}]'.format(key, value))
-                    print('Reason: {}'.format(self.reasons[key]))
-                    keep = input('Keep that entry? [Y/n] > ')
-                    if keep not in ['n', 'N']:
-                        c_blocklist[key] = value
-                        c_reasons[key] = self.reasons[key]
+            if not key in self.safe_harbor:
+                if value < self.confidence:
+                    if not self.auto_accept:
+                        print('Domain: {} [total trust {}]'.format(key, value))
+                        print('Reason: {}'.format(self.reasons[key]))
+                        keep = input('Keep that entry? [Y/n] > ')
+                        if keep not in ['n', 'N']:
+                            c_blocklist[key] = value
+                            c_reasons[key] = self.reasons[key]
+                    else:
+                        if self.auto_accept_direction:
+                            c_blocklist[key] = value
+                            c_reasons[key] = self.reasons[key]
                else:
-                    if self.auto_accept_direction:
-                        c_blocklist[key] = value
-                        c_reasons[key] = self.reasons[key]
-            else:
-                c_blocklist[key] = value
-                c_reasons[key] = self.reasons[key]
+                    c_blocklist[key] = value
+                    c_reasons[key] = self.reasons[key]
        self.blocklist = c_blocklist
        self.reasons = c_reasons

@ -88,18 +143,41 @@ class BrewBlocklist():
        """
        Print the CSV list of the collected blocklist into either STDOUT or
        the output file that was defined as command line parameter.
+
+        You can upload the resulting CSV file into Friendica from the admin
+        panel of your node. Only the 1st and 2nd column is important. The 3rd
+        column contains the total trust value for the blocklist entry.
        """
        if self.outputfile:
            out_file = open(self.outputfile, 'w')
            orig_stdout = sys.stdout
            sys.stdout = out_file
        for key, value in self.blocklist.items():
+            try:
+                if ("," in self.reasons[key] or " " in self.reasons[key]) and not self.reasons[key].startswith('"'):
+                    self.reasons[key] = '"{}"'.format(self.reasons[key])
+            except TypeError:
+                self.reasons[key] = '"no reason given"'
+                self.error.append("for {} no blocking reason was provided".format(key))
            print('{}, {}, {}'.format(key, self.reasons[key], value))
        if self.outputfile:
            sys.stdout = orig_stdout
            out_file.close()
+        if len(self.error):
+            print("\n\nWhile creating the blocklist the following problems occured:")
+            print("\n".join(self.error))
+
 def main():
-    parser = argparse.ArgumentParser()
+    """
+    This will run the script.
+
+    * parse the command line arguments
+    * check the config file is actually there
+    * put the cauldron on the fireplace
+    * collect the ingredient
+    * serve the result
+    """
+    parser = BParser()
    parser.add_argument('-c', '--config',
                        dest='configfile',
                        required=True,
@ -127,7 +205,7 @@ def main():
    arg_auto_accept = not args.auto_accept_direction is None
    if not exists(args.configfile):
        print('The config file {} was not found.'.format(args.configfile))
-        sys.exit()
+        sys.exit(1)
    brew = BrewBlocklist(args.configfile, args.outputfile, arg_auto_accept,
                         args.auto_accept_direction, args.confidence)
    brew.collect_ingrediens()
Author	SHA1	Message	Date
Tobias Diekershoff	cd40e283ac	call it v 1.1.7	2023-01-14 12:38:47 +01:00
Tobias Diekershoff	8f3ba1e55d	moved BParse class back into main script from submodule	2023-01-14 12:37:49 +01:00
Tobias Diekershoff	960b18a461	improved CSV output and error handling	2023-01-14 12:37:04 +01:00
Tobias Diekershoff	db7565dcd0	improved CSV output and error handling	2023-01-14 12:36:24 +01:00
Tobias Diekershoff	1fe4f2df10	removed unused LICENSE file	2023-01-14 12:15:45 +01:00
Tobias Diekershoff	f361b9fd50	call it v 1.1.5	2023-01-14 11:45:18 +01:00
Tobias Diekershoff	423897a9a3	added module bargparse to the project	2023-01-14 11:44:06 +01:00
Tobias Diekershoff	c230c64680	call it v 1.1.4	2023-01-14 08:57:57 +01:00
Tobias Diekershoff	92e9e9d4bb	added the CC-BY-SA-2.5 license for the bargparse module	2023-01-14 08:56:19 +01:00
Tobias Diekershoff	bc4c514cbb	added argparse class by Steven Bethard	2023-01-14 08:55:58 +01:00
Tobias Diekershoff	350f6d6f66	ensure that section names do not include the protocol	2023-01-08 11:26:02 +01:00
Tobias Diekershoff	7c8294bf49	Updated README	2023-01-08 08:38:36 +01:00
Tobias Diekershoff	4df3f1b969	Updated README	2023-01-08 08:06:15 +01:00
Tobias Diekershoff	f2a3635ea8	call it v 1.1.2	2023-01-07 23:47:05 +01:00
Tobias Diekershoff	f71425528e	add some exception handling for problems	2023-01-07 23:46:28 +01:00
Tobias Diekershoff	e0b30df10a	remove debug information	2023-01-07 23:20:13 +01:00
Tobias Diekershoff	da26f594a6	call it v 1.1.0	2023-01-07 22:46:54 +01:00
Tobias Diekershoff	4fa20216f9	support blocklists from Mastodon API compatible nodes	2023-01-07 22:46:20 +01:00
Tobias Diekershoff	a30b809cbf	typo in the README	2023-01-07 22:23:00 +01:00
Tobias Diekershoff	bca141a120	added safe harbor for domains One can now add a [safe harbor] section to the config file with an entry "domains" that has a comma separated list of domains that shall never end on the block list.	2023-01-07 22:20:50 +01:00
Tobias Diekershoff	b333d0c989	added some docstrings and comments	2023-01-07 21:27:16 +01:00
Tobias Diekershoff	22c275391d	added REUSE badge to the README	2023-01-07 19:45:54 +01:00