1
0
Fork 0
friendica_2020-09-1_sharedH.../library/spam/doc/ChangeLog
2012-01-31 15:54:41 -08:00

179 lines
8.9 KiB
Text
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

2010-12-30 Tobias Leupold <tobias.leupold@web.de>
* Release: Version 0.5.1
* Bigger changes:
- Fixed some issues with the scope of variables leading to problems when multiple instances of b8 are created. Thanks to Mike Creuzer for the bug report :-)
- Centralized the loading of class definition files in the b8 constructor and created a function to handle the inclusion.
* b8.php: Return a lexer error code instead of a rating if the lexer failed. The lexer never returned FALSE but b8 checked only for this value to validate the lexer didn't fail. Thanks to Matt Friedman for the bug report :-)
* lexer/lexer_default.php: A bit of code cleanup: less useless nesting.
* doc/readme.*: Updated the documentation, added a FAQ.
2010-06-27 Tobias Leupold <tobias.leupold@web.de>
* Release: Version 0.5-r1
* doc/readme.*: Updated the documentation; forgot the newly introduced b8::HAM and b8::SPAM variables. Added some additional information about the storage model.
2010-06-02 Tobias Leupold <tobias.leupold@web.de>
* Release: Version 0.5
* 100.000 Changes (new major release!), at a glance:
- No PHP 4 compatibility anymore. Much cleaner code base with less hacks.
- Completely reworked storage model. The SQL performance increased dramatically, the Berkeley DB performance remains as fast as it always has been.
- Better lexer which can also handle non-latin1 texts in a nice way, so that e.g. Cyrillic or Chinese texts can be classified more performant.
- No config files anymore, multiple instances of b8 can be now created in the same script with different configuration, databases and no problems.
- No spooky administration interface anymore that needs an SQL database, even if Berkeley DB is used (anybody who actually used this?! I never did ;-).
- No "install" scripts and routines and a less end-user compatible documentation. Anybody integrating b8 in his homepage won't be an end-user, will he?
2009-02-03 Oliver Lillie (aka buggedcom)
* Revision: 221 (the original PHP 5 port)
* Rewrote Tobias' original class for optimisation and PHP 5 functionality.
* Improved database mysql query useage by over ~820%
* Class is faster, ~20%.
* Slight increase in memory usage, but it's small and given the advantages of the speed increase and query reduction it's worth it.
* Removed install code from mysql class and added a sql file. Anyone who wants to use this is generally going to be more advanced anyway and see the sql to install.
2009-02-03 Tobias Leupold <tobias.leupold@web.de>
* Release: Version 0.4.4 -- changed the license type from GPL to LGPL
2008-06-27 Tobias Leupold <tobias.leupold@web.de>
* Release: Version 0.4.3 -- no bugs found ... so let's make a release with only small changes ;-)
* b8.php: Removed debugging messages that were commented out anyway
* storage/storage_mysql.php: Made it possible to pass both a MySQL-link resource and a table name to b8. This makes b8 useable in the Redaxo CMS (and probably others)
* doc/readme.htm: Updated documentation accordingly
2008-02-17 Tobias Leupold <tobias.leupold@web.de>
* Release: Version 0.4.2
* interface/backup.php: the bayes*dbversion tag is now written to a database emptied by drop(), so that it will be useable without an error message even if no backup is recovered afterwards.
* doc/readme.htm: added a security note to the configuration section (htaccess should be used to avoid everybody to be able to see the configuration)
2007-09-17 Tobias Leupold <tobias.leupold@web.de>
* Release: Version 0.4.1
* storage/storage_mysql.php: fixed b8 crashing when getting passed a persistent MySQL resource link. Thanks to Paul Chapman for the bug report :-)
2007-06-08 Tobias Leupold <tobias.leupold@web.de>
* Release: Version 0.4
* Let's go the whole hog. b8's class is now "b8" and no more "bayes", and all internal variables have now according names.
* Reworked the whole (surprisingly crappy) implementation of b8. No more global() calls, everything happens inside the classes now. Made that whole stuff really object oriented (as good as possible with PHP's poor OOP model ;-).
* No more PHP code in the configuration files.
* Created an extra lexer class. This is now also configurable.
* Storage classes now can create their own databases when this is requested by the configuration.
* MySQL calls are no random shots anymore: either, a MySQL-link resource is passed to b8 on startup which will be used for the queries, or the class sets up it's own link. Same for SQLite.
* The interface now uses a separate storage backend capable of SQL. In this way, we _really_ can query the database for e. g. an ordered list of tokens. After doing what we wanted with this work database, the b8 database can be synced with it.
* Added a lot of verbose error handling.
* Fixed a dumb error: all tokens from a text were used for the spamminess calculation, because two for() loops both used $i as their counter. D'oh!!! Now, the filter's performance is way better.
* Catched on the way how that whole math stuff works a little more ;-) Now, the calculation of the single probabilities proposed by Mr. Robinson does a little more the stuff it was intended to do, because ...
* Made some calculation constants parameters: the number of tokens to use, the default rating for unknown tokens and Gary Robinson's s constant.
* Introduced an optional minimum deviation that a token's rating must have to be considered in the spamminess calculation.
* The default extreme ratings for tokens only in ham or spam are now optional. One can also choose to calculate all ratings by Mr. Robinson's method.
* Noticed that text primary keys are not case sensitive by default in MySQL, which has a noticeable impact on the filter's performance. Informed the MySQL users about that.
* The whole code sucks much less ;-) b8 should be way more user friendly now.
* Re-wrote the whole documentation.
* Fixed the ChangeLog :-)
2007-02-08 Tobias Leupold <tobias.leupold@web.de>
* Release: Version 0.3.3 again ;-)
* bayes-php is now b8. See http://www.nasauber.de/blog/text.php?text=58 for details :-) Thanks to Tobias Lang (http://langt.net/) for this cool new name!
2007-01-05 Tobias Leupold <tobias.leupold@web.de>
* Release: Version 0.3.3
* Renamed the internal BerkeleyDB handle from "$db" to the less general name "$bayes_php_db" due to an collision with phpwcms's (http://www.phpwcms.de/) global $db variable and potentially other php programs.
* Commented out Laurent Goussard's SQLite storage class by default, as it's try { } catch { } calls break PHP 4
2006-09-03 Tobias Leupold <tobias.leupold@web.de>
* Release: Version 0.3.2
* Laurent Goussard (loranger@free.fr) contributed an SQLite storage class(which needs PHP 5).
* I finally added my eMail address to the sources ;-)
2006-07-24 Tobias Leupold <tobias.leupold@web.de>
* Release: Version 0.3.1
* Fixed a problem in the unlearn() function: If a text was unlearned that wasn't learned before (accidentaly), it could happen that the count parameter for this text was smaller than 0, breaking the spamminess calulation
2006-07-02 Tobias Leupold <tobias.leupold@web.de>
* Release: Version 0.3
* Improved the get_tokens() function; the filter should now be a lot more performant, especially with short texts
* Added the "lastseen" parameter for each token to make the database maintainable (outdated tokens can be deleted)
* Added a real database maintainance interface
2006-06-12 Tobias Leupold <tobias.leupold@web.de>
* Release: Version 0.2.1
* Fixed a problem in get_tokens() (if it was called more than once, tokens were counted more often than they appeared in the text)
* Slightly enhanced the default index.php interface: after learning a text as Ham or Spam, the rating before and after it is displayed to inform the user about it
2006-05-21 Tobias Leupold <tobias.leupold@web.de>
* Release: Version 0.2
* Comments now in English (to pretend international success of bayes-php ;-)
* Recommendations of Paul Graham's article "Better Bayesian Filtering" ( http://www.paulgraham.com/better.html ) are now considered: Tokens that only appear in Ham or Spam and not in the other category are rated with 0.9998 or 0.0002 if they were less than 10 times in Ham or Spam and with 0.9999 or 0.0001 if they appeared more that 10 times. This should allow the filter to differentiate spam texts more sharp from ham texts. Also, token "degeneration" as described in the article is performed for unknown tokens to estimate their spamminess.
* The database connect is now swapped in a separate configuration file, so only this file has to be preserved if bayes-php is updated and only this file has to be changed to configure the script.
2006-03-29 Tobias Leupold <tobias.leupold@web.de>
* Release: Version 0.1.1
* get_tokens() beachtet jetzt auch HTML-Tags und Wörter mit Akzenten und Apostrophen
* Verschiedene Kleinigkeiten "sauber" gemacht :-)
2006-03-05 Tobias Leupold <tobias.leupold@web.de>
* Added 2007-06-08: Initial release (Version 0.1)