README - bmf - bmf (Bayesian Mail Filter) 0.9.4 fork + patches | |
git clone git://git.codemadness.org/bmf | |
Log | |
Files | |
Refs | |
README | |
LICENSE | |
--- | |
README (5151B) | |
--- | |
1 bmf -- Bayesian Mail Filter | |
2 | |
3 About bmf | |
4 ========= | |
5 | |
6 This is a mail filter which uses the Bayes algorithm as explained in Paul | |
7 Graham's article "A Plan for Spam". It aims to be faster, smaller, and … | |
8 versatile than similar applications. Implementation is ANSI C and uses … | |
9 functions. Supported platforms are (in theory) all POSIX systems. | |
10 | |
11 This project provides features which are not available in other filters: | |
12 | |
13 (1) Independence from external programs and libraries. Tokens are store… | |
14 memory using simple vectors which require no heavyweight external data | |
15 structure libraries. The tokens are stored in plain-text "flat" files. | |
16 | |
17 (2) Efficient processing. Input data is parsed by a handcrafted parser | |
18 which weighs in under 3% of the equivalent code generated by flex. No | |
19 portion of the input is ever copied and all i/o and memory allocation are | |
20 done in large chunks. Updated token lists are merged and written in one | |
21 step. Hashing is being considered for the next version to improve lookup | |
22 speed. | |
23 | |
24 (3) Simple and elegant implementation. No heavyweight, copy-intensive m… | |
25 decoding routines are used. Decoding of quoted-printable text for selec… | |
26 mime types is being considered for the next version. | |
27 | |
28 Note: the core filter function is from esr's bogofilter v0.6 (available … | |
29 http://sourceforge.net/projects/bogofilter/) with bugfix updates. | |
30 | |
31 For the most recent version of this software, see: | |
32 | |
33 http://sourceforge.net/projects/bmf/ | |
34 | |
35 How to integrate bmf | |
36 ==================== | |
37 | |
38 The following procmail recipes will invoke bmf for each incoming email a… | |
39 place spam into $MAILDIR/spam. The first sample invokes bmf in its norm… | |
40 mode of operation and the second invokes bmf as a filter. | |
41 | |
42 ### begin sample one ### | |
43 # Invoke bmf and use return code to filter spam in one step | |
44 :0HB | |
45 * ? bmf | |
46 | formail -A"X-Spam-Status: Yes, tests=bmf" >>$MAILDIR/spam | |
47 | |
48 ### begin sample two ### | |
49 # Invoke bmf as a filter | |
50 :0 fw | |
51 | bmf -p | |
52 | |
53 # Filter spam | |
54 :0: | |
55 ^X-Spam-Status: Yes | |
56 $MAILDIR/spam | |
57 | |
58 The following maildrop equivalents are suggested by Christian Kurz. | |
59 | |
60 ### begin sample one ### | |
61 # Invoke bmf and use return code to filter spam in one step | |
62 exception { | |
63 `bmf` | |
64 if ( $RETURNCODE == 0 ) | |
65 to $MAILDIR/spam | |
66 } | |
67 | |
68 ### begin sample two ### | |
69 # Invoke bmf as a filter | |
70 exception { | |
71 xfilter "bmf -p" | |
72 if (/^X-Stam-Status: Yes/) | |
73 to $MAILDIR/spam | |
74 } | |
75 | |
76 | |
77 If you put bmf in your procmail or maildrop scripts as suggested above, … | |
78 will always register an email as either spam or non-spam. To reverse th… | |
79 registration and train bmf, the following mutt macros may be useful: | |
80 | |
81 macro index \ed "<enter-command>unset wait_key\n<pipe-entry>bmf -S\n<e… | |
82 macro index \et "<enter-command>unset wait_key\n<pipe-entry>bmf -t\n<e… | |
83 macro index \eu "<enter-command>unset wait_key\n<pipe-entry>bmf -N\n<e… | |
84 | |
85 These will override these commands: | |
86 | |
87 <Esc>d = de-register as non-spam, register as spam, and move to spam f… | |
88 <Esc>t = test for spamicity. | |
89 <Esc>u = de-register as spam, register as non-spam, and move to inbox … | |
90 | |
91 Alternatively, if you use gnus you could add the following lines to your | |
92 .gnus to accomplish a similar result: | |
93 | |
94 (defun spam () | |
95 (interactive) | |
96 (pipe-message "/usr/local/bin/bmf -S") | |
97 (gnus-summary-move-article 1 "nnml:Spam")) | |
98 | |
99 (defun notspam () | |
100 (interactive) | |
101 (pipe-message "/usr/local/bin/bmf -N") | |
102 (gnus-summary-move-article 1 "nnml:inbox")) | |
103 | |
104 (add-hook | |
105 'gnus-sum-load-hook | |
106 (lambda nil | |
107 (define-key gnus-summary-mode-map (read-kbd-macro "C-c C-o") 'spam) | |
108 (define-key gnus-summary-mode-map (read-kbd-macro "C-c C-p") 'notspa… | |
109 | |
110 How to train bmf | |
111 ================ | |
112 | |
113 First, please keep in mind that bmf "learns" how to recognize spam from … | |
114 input that you give it. It works best if you give it exactly the email … | |
115 you receive, or have received in the recent past. | |
116 | |
117 Here are some good techniques for training bmf: | |
118 | |
119 - If you keep a history of email that you have received, use your curr… | |
120 and/or saved emails. It is fairly easy to create a small shell scri… | |
121 that will pass all of your normal email to "bmf -n" and all of your … | |
122 to "bmf -s". Note that if you do not use the mbox storage format, y… | |
123 MUST invoke bmf exactly once per email. Using "cat * | bmf -n" will… | |
124 work properly because bmf sees the entire input as one big email. | |
125 | |
126 - If you already use spamassassin, you can use it to train bmf for a | |
127 couple of days or weeks. If spamassassin tags it as spam, run it | |
128 through "bmf -s". If not, run it through "bmf -n". This can be | |
129 automated with procmail or maildrop recipes. | |
130 | |
131 Here are some things that you should NOT do: | |
132 | |
133 - Get impatient with the training process and repeatedly pass one email | |
134 through "bmf -s". | |
135 | |
136 - Manually move words around between lists and/or adjust the word coun… | |
137 | |
138 Final words | |
139 =========== | |
140 | |
141 Thanks for trying bmf. If you have any problems, comments, or suggestio… | |
142 please direct them to the bmf mailing list, [email protected]… | |
143 | |
144 Tom Marshall | |
145 20 Oct 2002 |