imapspamfilter.py - deletes spam in-place over IMAP
Download here
imapspamfilter.py is a client-side IMAP client that
deletes spam mail without downloading your
mailbox. It does this by looking only at the header and
the first 1024 bytes of the body, and applying a set of
heuristics to decide if the mail is spam. It prints a
list of suspect messages, asks for confirmation, and
then deletes them all in place.
After running imapspamfilter.py you can
view/download your clean mail box with your regular
email client.
imapspamfilter.py is for use on Unix systems like
GNU/Linux and *BSD containing a full Python distribution.
imapspamfilter.py is a Python script and a
command-line client.
Just edit the top few lines of the
script to configure it for your email account. Then run
it on your terminal by typing:
python ./imapspamfilter.py
You will need Python
version 2.2.1 or later.
If you have the following kind of email account, then
imapspamfilter.py is for you:
- You are English speaking and do not receive
legitimate email in any other language.
- There are a lot of other users with the same domain
that you don't know, but who also receive a lot of spam.
This is anyone who has an account with a popular ISP,
such as john@aol.com.
- You do not mind the very small percentage of
legitimate emails that may get dropped as a result of
bad content. For instance, anyone who sends an email
with certain well-known words in it is being
quite silly these days, so you don't mind never getting
these emails.
- (Not strictly necessary.) You are pretty sure of a
few key words that mean the email is definitely of
interest. Such as if you are an electrician, you can
add 'amps', 'volts', 'light', 'cable', your name (and
whatever else) to your word list just to be sure.
How imapspamfilter.py works
imapspamfilter.py uses the regular IMAP protocol to
retrieve an index of unseen messages. However, where
most IMAP clients only retrieve the header information
(like From: and Subject:), imapspamfilter.py uses
the BODY.PEEK[TEXT]<0.1024> option of IMAP FETCH
to retrieve part of the message body.
A set of complex heuristics are then applied. Resources
include a complete list of freemail domains, and the
entire English dictionary. This is very much like a
heuristics algorithm of a filter like SpamAssassin,
however here, messages are not graded on a scale. Messages
are considered to be spam or not.
To see the heuristics simply browse the well-commented
Python source.
Modifying imapspamfilter.py for new spam and viruses
If a new virus comes out that starts dumping tonnes of
messages to your IMAP account, and if the contents make
these emails identifiable, it's relatively easy to code
these new rules into Python script.
Standards
See RFC-2060.
Probably not worth the read. Just open up your favorite
IMAP client, and run
Ethereal.