B8 for Friendica
B8 is an excellent bayesian spam implementation for PHP. However when evaluating it for use in Friendica there were a few shortcomings. B8's primary audience is guestbooks and blogs - single user situations.
Friendica is a multi-user distributed social environment. So the first thing we need to add to b8 is a concept of user ID.
Second we don't want to use a second stored set of DB login credentials so we're going to implemetn Friendica's MySQL driver and use our existing connection and credentials.
The third requirement is that the B8 processing model is to load a set of word/data sets from the DB, perform processing (which may change the value of the data) and then store the results back to the DB. We're in a highly dynamic environment with lots of sometimes concurrent message processing. So the plan is to alter the storage architecture to read data in, do processing, and then apply a somewhat atomic change operation where the changes are performed in a single query using the current data in storage rather than something passed through outside processing and where the data may be outdated come time to store it.
In accordance with the LGPL of the B8 package these changes are available in source form at http://github.com/friendica/friendica in the directory library/spam