|
Spam Filtering Methodology
The spam filtering mechanism relies on the SMTP protocol for accepting unfiltered Internet Email and delivering filtered mail to a recipient mail server.
Unfiltered Internet Email works by having the source mail server identify the correct mail server for a domain and delivering messages to that mail server via SMTP. The DNS MX record is used to identify the correct mail server for a domain.
Figure 1 - How Email works without Spam filtering
With Spam and Virus filtering through VirusGator, the basic model remains the same except that a cluster of servers dedicated to spam and virus filtering is placed between the source mail server and your normal mail server.
Figure 2 Simplified Spam Filter diagram.
The spam filter servers communicate with both the source mail server and your normal mail server via SMTP.
Nothing is changed on your normal mail server except that it should only talk to mail servers in the VirusGator cluster. In reality, the topology used by our Spam filter is much more robust than the simplified diagram. Below is a more representative diagram of our spam filtering service systems.
Figure 3 Spam Filtering Server Cluster diagram
The placement of the spam filtering cluster servers between the source and your normal mail server is achieved by way of
DNS MX records which may be updated through either your own DNS servers, our DNS servers, or your Domain Registrar.
The normal mail server is listed as the mail exchanger (MX) for a domain.
Here is a sample MX record:
xyz.com. IN MX 10 mail.xyz.com.
This record says that for xyz.com, all mail should be sent to the host mail.xyz.com.
All you need to do to enable Spam filtering for the domain is replace the host mail.xyz.com with the hostname of
our spam filtering server cluster.
Here is a sample spam filter configured MX record :
xyz.com. IN MX 10 chomp.virusgator.com.
Once this is done, chomp.virusgator.com is the valid mail server for this domain and all mail for xyz.com should go to this logical server. Once the cluster servers have completed their work, mail is delivered to your normal mail server via SMTP.
Important: The above is a sample configuration. Our support staff will inform you regarding the host names for your new MX record. Do not use the above sample settings as they will not work.
Spam Filtering Logic
The precise spam filtering methodology used and logic involved in identifying viruses and spam is
very complicated and would require many pages of illustrations and explanations to describe in detail.
Below you will find a simplified diagram of the basic logic used in identifying spam
and this logic is the basis used by the spam filters in VirusGator.
Spam Filtering Logic Diagram
False Positives
Although the spam filtering rules used by Virusgator are very accurate; we realize that no automated process could ever achieve 100% accuracy. With this in mind Virusgator relies in the end, on human intervention to guarantee that no legitimate email message will ever be rejected, and you will never lose a customer or prospect due to automated scanning. This might at first seem to be a step in the wrong direction, but having a human administrator make the final decision over questionable mail allows the administrator to identify the types of mail that a domain receives as false positives and adjust the rules to allow this mail to pass without further intervention. Over time, the human intervention required is lowered drastically. If you are willing to accept the risk of automatically rejecting high scoring messages, you certainly may configure VirusGator to do so.
VirusGator SME and Streams
Each Domain and optionally each mail user have the ability to join an individual "Stream". Individual mail streams allow for the precise individual control of acceptable messages and spam levels on a per-domain and per account basis. It is also possible for an individual user to "opt-out" of spam and virus filtering completely if desired (though not recommended). For these individuals it would be preferred to set the spam scoring very broadly (see the explanation below) and keep virus filtering in place.
Spam and Virus Tests
The Spam Filter servers perform a variety of tests to determine whether the message is a virus, spam, or valid Email. The tests are performed real-time and only delay the delivery of Email by a few milliseconds.
Connection validation. Connecting mail servers are checked to determine whether they are valid mail servers. These checks include such things as mail host spoofing (improper HELO responses),
relay authorization, SPF authorization, and return address validation.
Reject known spammers. We examine the IP address and sender domain of the connecting mail server to determine whether the sender is a known spammer, open relay or another recognized source of spam
and viruses. This step reduces load on the recipient mail server and minimizes the impact of dictionary attacks.
Virus Scan. The message content is scanned for viruses. Virus definitions are updated hourly, so that new viruses may be identified quickly without user intervention. If a message is found to contain a virus, the message is handled as the per-user or domain stream has pre-determined. See Stream Settings below for options
Spam Filtering
Check whitelists and blacklists
With VirusGator SME each User may create whitelists and blacklists to bypass spam filtering without affecting any other users on the domain or system.
Heuristics
Heuristics checks for characteristics commonly found in spam messages such as words obfuscated by "chickenpox" such as "d*e*b*t*", or heavy use of html or remote-loading images. Presence of these attributes indicates an increased likelihood that a message is spam. The heuristics results score is then added to the spam score of a message.
Bayesian Analysis
Bayesian filtering uses probability analysis to determine whether or not a message is likely to be spam.
Bayesian analysis compares a body of identified spam messages and identified good messages, the Bayesian filter keeps track of various identifiers or "tokens" found in each message you use to train it.
The presences of tokens (good and bad) are used to calculate the probability that the message is or is not spam.
Prior to enabling Bayesian filtering, you must train the system for your individual mail. For example; mortgage brokers will certainly see legitimate mail with mortgage offers, but a Pet Store owner would likely not receive such offers as legitimate mail. Each business is different and Bayesian allows for increased accuracy beyond the default 98% spam catch rates of the VirusGator system.
RBL DNS Tests
RBL stands for Real-time Blacklist. RBLs are databases maintained by various organizations that list IP addresses of known spammers, open relays, open proxies, compromised systems and other sources of spam. You can configure Messages so they are not blocked outright when the source server is found to be on one of these lists as some of these lists may be overzealous in their attempts to block spam. In such instances the messages would be scored for spam as normal and if they score above your threshold, will be placed in the pending queue or rejected outright if the message scores above the auto-reject threshold you have set (20 is the default setting).
Calculate spam score
The results of the tests outlined above are used to determine an overall spam score for a message.
If a message exceeds the threshold score for spam, it is determined to be spam. Messages that score below the stream threshold are determined to be good and passed on immediately.
Tagging Messages
All filtered messages are tagged to help mail administrators determine whether the message has been filtered or not.
This tagging is done by inserting special headers into the source of the Email.
These headers are not visible normally and can only be seen by viewing the source code of the message.
Tagging filtered messages allows mail administrators to have a great deal of flexibility in managing the spam filtered mail.
All filtered messages receive the following header:
X-Scanned-By: VirusGator (www . virusgator . com)
This header confirms that the message has been filtered by the VirusGator Spam Filter.
In some cases, spammers will try to bypass the filtering server and send mail directly to the normal mail server. In such cases the mail administrator can set a rule to reject all messages without this header, which means unfiltered mail can be rejected immediately. A legitimate mail server should never bypass DNS to send mail directly to a mail server or IP address which is not the listed MX making the likelihood of blocking real mail with this rule low.
Identified spam messages receive additional headers. Here's an example of the headers inserted into a message:
MIME-Version: 1.0
X-Accept-Language: en
X-Priority: Normal
From: Wireless Special
To: bill@im1.net
Subject: Motorola RAZR at no cost
Date: Mon, 5 Sep 2005 10:12:54 EST
Message-ID:
X-Mailer: 3.2.5-30 [Aug 10 2005, 12:37:19]
Content-Type: text/html; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
X-VirusGatorSME-Stream: im1.net
X-Spam-Score: 5.9 (***) COMBINED_FROM,HTML_FONT_INVISIBLE,HTML_MESSAGE,MIME_HTML_ONLY,REMOVE_PAGE
X-Bayes-Prob: 0.9999 (Score 2)
X-VirusGator-Stats-ID: 16716382 - 37fa4a78c42c
X-Scanned-By: VirusGator (www . virusgator . com) on 65.111.44.16
X-UIDL: $P!"!og!7C<"!!X]!!
Status: U
The "X-Spam-Score" header shows how the message scored and lists the various attributes contributing to the spam score.
By default messages scoring higher than 5.2 are determined to be spam and held in the pending queue for review. Messages scoring higher than your streams pre-determined threshold score can be automatically rejected.
As administrator or controller of your own stream, you may set the thresholds for these actions.
Stream Settings
The settings for your individual stream can be safely left with the default settings or you can manually fine tune your stream settings using the convenient web interface.
The following is a list of the options for settings on a stream:
-
Automatically reject messages scoring more than this amount (5.0 to 2000). For many streams a setting of 8.8 works well after tuning. The default setting for auto-rejection is 20 and you should leave this setting alone until you've had some experience using the system; drop it gradually to low the amount of spam for review.
-
Auto-reject messages scoring more than this amount without creating an incident (10.0 to 1000000). The default setting is 1000000. Every message handled by the system has an individual incident number assigned to it. You can set this so that any message scoring over say 20.0 would just be discarded without incident. A message scoring over 20 is very likely to be spam; over 99.9% chance of being spam with such a score.
-
Spam threshold (1.0 to 100). Default setting is 5.0. Any message scoring higher than your threshold setting is held in the pending queue. This assures you never lose a legitimate mail due to filtering. 98% of messages scoring over 5.0 are spam for most streams.
-
Real-time settings- you can choose to hold or reject messages in the administrator’s real-time blacklists as shown below.
-
Hold messages from hosts in administrator's real-time 'Hold' blacklists-Yes or No
-
Reject messages from hosts in administrator's real-time 'Reject' blacklists-Yes or No
-
Only accept mail for accounts in the Valid Recipients table-Yes or No. If you only have a few valid recipients for your domain, you
can add the valid recipients to the valid recipients list and reject mail exclusively for the listed users. This will help protect from dictionary style spam attacks.
-
Handling for messages containing viruses. The default setting is Discard. You may choose to accept, hold, discard, reject, quarantine, or reject and quarantine.
-
Handling for messages containing Windows executables. The default setting is Accept. You may choose to accept, hold, discard, reject, quarantine, or reject and quarantine.
-
Only tag spam -- do not hold any messages- Yes or No. Default setting is no. You may wish to simply tag messages at the beginning to get an idea of the scores your user's mail will receive.
This could allow you to determine in advance the settings you wish to apply to a stream and prevent the delay of mail sitting in the pending queue. In most cases this feature is not availed but it is there if you wish to use it.
-
Convert any 'reject' settings to 'hold' in tag-only mode- Yes or No. When your stream is in "tag-only" mode, you can use this setting to hold mail which might otherwise be rejected.
-
String to put in tagged subjects- The default is [Spam:%*] which allows a corresponding number of stars (*) to indicate the spam score. You might want to add something to this line or simply leave it as is.
-
String to put in subjects of approved messages- default is blank. You may put anything you like here.
-
Inherit rules from 'default' stream-Yes or No. Default is Yes. The default stream is the master stream of the VirusGator Spam filter system. You may want to inherit your rules from the default stream.
-
Permit use of auto-whitelisting-Yes or No. Default is Yes. Recipients of messages from a known network can be auto-whitelisted. A known network is normally a network under your control; eg your own netblock.
To use this feature you must first enable a known network in the known networks list.
-
Send e-mail notification of pending messages-Yes or No. Default is Yes. You will get a daily message indicating the number of messages in the pending queue.
-
E-mail address for notification of pending messages- this should be set to the mail system administrator's email address.
-
Plain-text boilerplate to append to messages- this is blank by default. You could add a legal message or any text you wish here.
-
HTML boilerplate to append to messages- this is blank by default.
-
Enable Bayesian analysis- Yes or No. Bayesian analysis allows you to train your VirusGator stream for the particular content of e-mail that you receive. You must have a corpus or body of at least 100 messages that have been voted on prior two enabling Bayesian analysis.
-
Inherit Bayesian training history from these streams. You may choose to inherit the Bayesian training of existing streams. This would allow you to enable Bayesian analysis quickly without having to train your own stream. It is best to train the stream based on your unique Mail content rather than inheriting the content training of another stream.
-
Add links to messages to train Bayesian analyzer. In order to train your stream for Bayesian analysis you must enable Add Links to messages so that users may click on the links and vote for whether a message is or is not spam. Once you have accumulated a corpus or body of messages of at least 100, you may disable the links to train the Bayesian analyzer. The larger the body of message training is, the more accurate Bayesian analysis will be. We recommend training on at least 200 messages prior to enabling Bayesian filtering
-
Add training links to messages even if whitelisted-Yes or No. You may choose to enable training for Bayesian analysis on messages even if the sender or domain is whitelisted. This can further enhance the accuracy of Bayesian filtering.
-
Add training links in message headers-Yes or No. You may choose to add training links in the headers of that message rather than the body.
-
Remove pre-existing Bayesian training links from incoming mail-Yes or No. you may choose to remove preexisting busy and training links from your incoming mail. An instance where this might happen is if mail from a different VirusGator stream is forwarded to your stream.
-
Only train on error when spam corpus reaches this size (0 to 10000000). Once you have trained your Bayesian filter you may decide to on the train on error identifications.
-
Score below which to auto-learn as non-spam (-1000.0 to 5). The default setting is 5.0. You may choose to have the Bayesian filter learn to automatically identify messages- scoring poll of this setting as non spam.
-
Score above which to auto-learn as spam (0 to 10000). The default setting is 20.0. You may choose to have the Bayesian filter automatically learn to identify messages- scoring above your auto-reject threshold as spam.
Security Considerations
Our Spam Filter has been designed to be a secure extension of a company's network. Email data represent company confidential information and should be managed with the same care as any company data asset.
VirusGator's privacy policy also ensures that a customer's data is always theirs and will never be subject to review by any third-parties without your explicit permission. The VirusGator Spam Filter system does not store the contents of Email. Our filter analyzes the first 8 kilobytes of incoming mail to determine if it is valid mail. Attached files are analyzed entirely to determine if the data may be malicious in nature. You can set which types of attachments you wish to allow and filter in your stream settings as explained above.
The spam and virus filter is a pass-through mechanism designed to maintain the integrity of your communications infrastructure.
VirusGator logs SMTP transaction information to help troubleshoot problems in mail delivery should they arise. This information is typically kept for 7 days and is not archived. VirusGator's log files are also subject to our privacy policy and will not be used for any purpose other than to provide service to our customers.
Redundancy and Failover
Our Spam Filter is designed for maximize performance, redundancy and failover protection. The filtering cluster servers are mirrored on multiple servers for optimal performance and load balancing. In the event that one server goes down, another server will quickly take its place.
This provides uninterrupted mail service for all customer domains. If a customer's normal mail server goes down, we provide additional failover protection for a company's Email infrastructure by queuing mail until their mail server is brought back online. We also maintain backup mail relays on separate distinct networks in the unlikely event that our primary network goes offline. We have servers located at our web hosting facilities in Fort Myers, FL.
If all this looks complicated to you don't despair- we're here for you. For most groups our default settings are sufficient protection. Just be aware that if you need these extra settings, they are available and our staff are here to help you with any questions on our spam filtering services that you may have. The Virusgator system has filtered millions of messages and is a mature, reliable system. We'll help you with all your initial settings and questions and guarantee you'll be happy with our service. With a FREE 30 day trial, who could ask for more? Give us a try- you'll love our service.
|