Trinity UniversityTrinity University Computer Science

Use of Spamassassin on Computer Science Unix Systems

The overall philosophy behind how we are running the Spamassassin is that your mail will be filtered by the server Mail.CS.Trinity.Edu . Spamassassin divides your mail into two mailboxes identifying the mail in them as ham (Non-spam mail), and spam. The names of the two mailboxes are arbitrary. The simplest thing is to have the ham mail placed in a mailbox in your default mail directory called inbox. The the spam mail could be placed in a mailbox called something like spamfound. In addition you might have two mailboxes called ham and spam that will be used for training as we will see shortly. The reason for using inbox as the mail file your mailclient (such as mutt, pine, kmail, etc) is that without using something like POP3 your mail client can't access your real mail spoolfile on the server, which is:

/var/spool/mail/<your_login_name>
If you want to use a mail client and have it pop mail then you would leave your mail in the normal Linux spool file. As you go through your mail you may find mail in your spool file that is actually spam so it should be transferred into the spam mail box. If you find mail in your spamfound mailbox that is really ham then it would be transferred to your ham mail box. Any spam found in your inbox mailbox should be transferred to the spam maibox. Now initially you might find that Spamassassin will make a large number of mistakes but with time and training the mistakes should shrink to the point that you would expect your spoolfile would be nearly or maybe totally spam free. Training should be done with about equal amounts of spam and ham. So one might purposely transfer some messages from your inbox (your good mail file) to the ham folder to equalize out the number of spam and ham messages you are using in training. More about this later. The data gathered by Spamassassin to train your mail is in a databases found in the .spamassassin sub-directory in your home directory.

The filtering program is called spamassassin-default.rc and, as you will see, it is run by a .procmailrc script that you place in your home directory.

Procmail

Now the .procmailrc file that you place in your home directory will be actually run by the server Mail.CS.Trinity.Edu, but since the client machines and the server share your home directory you can do all the procmail configuration on a client as well as train the spamassassin databases and these files will be used for filtering on the server.

You should perform the following steps to setup your Spamassassin configuration.

1. Adding the following recipes to the top of your .procmailrc will get the spam out of the way. Allowing everything else to be filtered as per your normal procmail recipes. The example .procmail script shown below presumes that you have subscribed to the fedora mail list and that you wish to have all fedora messages written into a mailbox file called fedora.

PATH=$HOME/bin:/usr/bin:/usr/ucb:/bin:/usr/local/bin:SHELL=/bin/sh
MAILDIR =       $HOME/Mail      # You'd better make sure it exists
LOGFILE =       $MAILDIR/procmail.log
LOCKFILE=       $HOME/.lockmail

INCLUDERC=/etc/mail/spamassassin/spamassassin-default.rc

:0
*^Subject:.*\[SPAM\] 
spam

:0
*^To:.*fedora-list@redhat.com 
fedora

:0
inbox

The line with the references to fedora allow you to put all mail that is addressed to:

fedora-list@redhat.com into a mailbox called fedora.

2. You train the database using the following commands assuming that you have collected ham in a file called ham and spam in a file called spam.

For ham:

sa-learn --mbox --ham Mail/ham
For spam:

sa-learn --mbox --spam Mail/spam
It turns out that it is productive to train spamassassin on the ham and spam that it has already identified.

Spamassassin configuration file

Your Spamassassin configuration directory is called .spamassassinrc and is located in your home directory. Create this directory using the

mkdir .spamassassinrc
command while in your home directory.

In that directory you will find the spamasssin databases as well as a configuration file called, user_prefs. This file can be generated by using the web interface at http://www.yrex.com/spam/spamconfig.php . The default configuration file produced by this web page is:

# SpamAssassin config file for version 2.5x
# generated by http://www.yrex.com/spam/spamconfig.php (version 1.01)

# How many hits before a message is considered spam.
required_hits           5.0

# Whether to change the subject of suspected spam
rewrite_subject         0

# Text to prepend to subject if rewrite_subject is used
subject_tag             [SPAM]

# Encapsulate spam in an attachment
report_safe             1

# Use terse version of the spam report
use_terse_report        0

# Enable the Bayes system
use_bayes               1

# Enable Bayes auto-learning
auto_learn              1

# Enable or disable network checks
skip_rbl_checks         0
use_razor2              1
use_dcc                 1
use_pyzor               1

# Mail using languages used in these country codes will not be marked
# as being possibly spam in a foreign language.
ok_languages            all

# Mail using locales used in these country codes will not be marked
# as being possibly spam in a foreign language.
ok_locales              all
In this file the required_hits option tells the system what level of “spamness” the user wants to treat as spam. The subject_tag you choose should match the one being searched for in the .procmailrc files.

You can execute the command:

perldoc Mail::SpamAssassin::Conf
for details of the options that can be used in this file.

Further information on Spamassassin can be obtained at http://www.spamassassin.org .


Trinity University

Site Index
Comments or Suggestions
Computer Science Department
Trinity University
One Trinity Place
San Antonio, Texas 78212-7200
voice: (210) 999-7480
fax: (210) 999-7477

Trinity google site search
CS WebMail
CS ListServer