scientific and technical website design projects news

An anatomy of spam

Spam is one of the really irritating things about modern life, but like its namesake, there is not usually a definable anatomy, never mind an easily identified origin! In this study, however, we get a sneak peak at what drives a spam…

For a number of years we have employed email forms on our websites to combat the worst excesses of spam. This simple system hides our email addresses from being harvested, and allows us both to follow up new enquiries, and to re-direct email easily dependent on who is available to answer it. What it prevents is our addresses being parcelled up and flogged mercilessly by spam robots!

Recently, however, we started getting very similar emails arriving in our in-box, all with random subject lines, and containing 20-30 links to pages within a domain, with a small added random component to each address*. A quick look round the web showed us that the web addresses often appeared in web-blog comments posts. Comments on blogs are intended to permit interested readers to feedback into a story, but have long been used as a means for the unscrupulous to advertise a load of web addresses. These addresses will usually be providing services that cannot be advertised more conventionally because they are illegal, immoral, or offer downloads that might damage or enslave your computer. This is spam; it exists is because there are some things that you simply cannot mount an ad campaign for, after all, it really isn’t that difficult or expensive to get a site noticed, so anyone using this technique for a legitimate product would be just too dumb to breath!

A quick bit of modification to our email form allowed us to pick up the IP address of the robot(s) sending the junk, and this was identified as, which appears to be a Swedish IP, advertising a site in the Netherlands (I don’t see any reason to advertise them here! Thanks to ip-lookup for tracing the IP information). Clearly one of the more stupid spam machines had picked up on our contact form, and every two hours was posting us a new set of 20-30 links, without realising that there was no blog behind the form for their advertising to have any effect on…

Hey-ho, at present this is only a minor annoyance, and has even provided some interest in that the activity is regular, and possesses a degree of anatomy that is subject to some analysis. As ever, however, we have to be alert to how efficiently we might block this kind of activity if it were to get more significant.

An obvious contender would be to block the IP address, which has remained constant for the last few days, this can be coupled to an apology and opportunity to contact through snail mail or phone if the IP address later passes on to a genuine person. Alternatively, the industry standard approach has been to deploy Captcha, which asks anyone submitting the form to type in a set of letters and numbers from an image that is doctored to make it difficult for a machine to read it.

Both systems have their advantages and disadvantages: Unfortunately a lot of early Captcha systems have now been broken; on the other hand, IP addresses are pretty easy to spoof or change…

As a final comment, on the mutability of IP addresses, it is quite interesting to probe an access log and watch a more sophisticated hacker ‘at work’. This is characterised by a number of related probes coming in, but with the IP address jumping over a broad range of values. This kind of thing can force you to get quite brutal with IP blocking – but more on that some other time!

*The random components to subject lines and addresses make it more difficult for spam assassin programs to pick out these bad emails.