Spamassassin
Essay by review • December 24, 2010 • Essay • 1,575 Words (7 Pages) • 1,408 Views
SpamAssassin
by...
Electronic Mail has become one of the primary services in the Information Age. Virtually all Internet users have at least some form of email account, from a Corporate CEO to your average grade schooler. With a service so prolific and inescapable, it seems it was just a matter of time before advertisers began to target digital mail-boxes along with "customers" postal mail-boxes.
Spam is hardly a new convention. Basically a form of unsolicited commercial advertisement, Spam is everywhere these days. You can casually flick on the television and find a number of "ANTI-SPAM!!!!" advertisements, from ISP's like Net Zero or AOL. Transmitted instantly to thousands of unwilling victims, Spam is an incredibly cheap form of advertising. It is also a powerful method for hackers to mass-distribute a virus. Which, concurrently, means that there is a LOT of Spam on the Internet. Filtering out the "Junk Mail" of the 21st century has become a full time job in some companies.
"Third Party" Mail Filters are also not a new concept. Developed by any one of a thousand different companies, or open source projects, mail filters are quite plentiful. Customers have quite number of things to choose from, and many of the better applications can be well-configured to a customers specific needs.
We will focus on a single product, however. Namely, "SpamAssassin." SpamAssassin is an open source mail filter, initially designed for deployment on Apache Servers. (Also under the Apache License). The methods of SpamAssassin are myriad in their variety. From granular text filtering and DNS block-lists, to Bayesian filtering and header/footer deconstruction, to a thousand configurable filters and parameters, SpamAssassin has a gigantic arsenal of tools to use in the "War against Spam."
We'll cover filtering in more detail a little later on...
A man by the name of Justin Mason laid the groundwork for what would one day become SpamAssassin in 1997. SpamAssassin descended from another PERL script, dubbed "filtering," written by another programmer. Mason began to slowly chip away at filter.plx, rewriting it from the ground up. In 2001 the first build of SpamAssassin was released SourceForge, naturally as an open source application. Mason had little problem in assembling a development team from a roster of veteran open source programmers impressed with his work. And soon after, SpamAssassin was truly born.
Now that we have a basic idea of what SpamAssassin is, and how it cam to be, let's take a glimpse into how it works. An excellent, although brief, description comes from http://en.wikipedia.org/wiki/Spamassassin also printed below.
Methods of usage
"SpamAssassin is a Perl-based application (Mail::SpamAssassin in CPAN) which is usually used to filter all incoming mail for one or several users. It can be run as a standalone application or as a client (spamc) that communicates with a daemon (spamd). The latter mode of operation has performance benefits, but under certain circumstances may introduce additional security risks.
Typically either variant of the application is set up in a generic mail filter program, or it is called directly from a mail user agent that supports this, whenever new mail arrives. Mail filter programs such as procmail can be made to pipe all incoming mail through SpamAssassin with an adjustment to user's .procmailrc file."
Well, that's a decent start. SpamAssassin has two methods of operation, namely a Standalone system, and a Daemonized version with many extra functions available. Standalone has been mostly replaced by the upgraded Client-Daemon package Although SA standalone is also more compatible with other third party mail filters, it may soon become a discontinued feature. With that in mind, lets take a brief look at two of the core processes in the Wiki description, spamd and spamc...
Spamc, short for "spam client," handles some basic processing for an incoming piece of email. Being a Perl application, it takes the email as a text file, via STDIN and proceeds to spool the reformatted text to the spamd daemon. It then simply waits for the return from the daemon, and feeds the output to STDOUT. A fairly simple program, eh?
Spamd, on the other hand, is a Daemonized version of SpamAssassin, and is certainly more complicated than it's little brother, spamc. Spamd simply loads a list of SpamAssassin filters, and goes to work. And work, for the most part, is WAITING. Spamd listens on port 783 on default parameters, but may naturally be configured to just about anything. When a request is detected on the listening port, spamd activates an unusual process, called Child-Forking. Child-Forking is a memory allocation technique, designed to minimize overhead. When spamd receives a connection on its listening port, it "spawns" a child to handle the request. The child reads an email message from the network socket, which should then be closed for writing on the other end . Spamd will then use SpamAssassin to rewrite the message, and dump the processed message back to the socket before closing the connection. The child process then dies. In theory, Child-Forking should cut down on overhead, effectively opening and closing files, and writing to memory addresses only as needed. Now armed with a basic snapshot of what makes SpamAssassin tick, let's move on to filters.
As mentioned previously, SpamAssassin has an ever-increasing array of configurable mail-filters. To quote the SpamAssassin website, SA uses a point-based scoring system. "...the scores are assigned using a neural network trained with error back propagation (Perceptron)." Don't get scared yet, the "neural network" they are referring to is a lattice of Regular Expressions similar
...
...