Dave talks about using .htaccess to block referrer spam.

Dave over at Dave’s Chalkboard has a whole category on referrer spamming and how to put an end to it using .htaccess. Dave uses ExpressionEngine, much as we do here, and EE has a built-in referrer spam blacklist so I was curious why he’d go through the trouble of using .htaccess instead. Turns out there’s a good reason: Bandwidth.

EE’s blacklist works pretty well at blocking sites from showing up in the referrer list that EE maintains itself, but it doesn’t stop it from showing up in any other tracking services you might use nor does it stop the bandwidth from being used when you’re hit constantly by these assholes. Dave estimates that the referrer spam attempts were eating up about around 3.6GB to 7GB of bandwidth a month, which isn’t a big deal when you have 50GB of bandwidth to play with, but for a site like SEB which has a mere 19GB a month that bandwidth lost to referrer spamming can quickly add up. So I’m thinking of following Dave’s lead and see if I can’t figure out how to massage my .htaccess file a bit and cut back on some of the wasted bandwidth. It’ll probably be a little trickier for me considering that I have two domains to protect, but I’ll let you know how it goes.

Update: After studying Dave’s .htaccess file and reading up on regular expressions I’ve gone ahead and implemented my own attempt at referrer spam blocking. Dave had a pretty good sized file full of URLs, but I wanted to make it as simple as possible. So rather than using full URLs I’ve put together one that makes use of pattern matching to filter out most of the crap that comes along. There’s a few sites that managed to keep their domain names free of the popular keywords so I had to add in lines just for them, but overall I think this is going to catch a lot of it with a minimal amount of work.

Testing things out with the helpful wannaBrowser appears to confirm that I’m in good shape. Still, there’s a chance my patterns are overly broad so if you find you’re getting a Forbidden error when trying to follow a link from someplace to SEB, please be sure to let me know about it.

14 thoughts on “Dave talks about using .htaccess to block referrer spam.

  1. If your control panel supports it, try the IP Deny function. It lets you put in the ip addresses to deny access to. You can even use partial ip addresses, and it will look up ips based on url.

  2. The only problem with denying IP addresses is that most spammers use dynamic IPs these days to get around that very tactic. With the .htaccess method I can define partial URL names to block using pattern matching and that is much more effective as it’ll catch it regardless of IP address.

  3. I did try IP banning. That was working, but it was also banning normal users too. I wasn’t just banning a single IP since spammers use multiple IP’s or dynamic IP’s. I would ban an entire ISP. After getting an e-mail from someone who was banned, I decided the referral banning was the way to go.

  4. I really could use this file too please!!!

    I have never used all my bandwidth and I got slammed yesterday.

  5. I gave up on using .htaccess a long time ago. It took way too much time to keep adding entries to the list.

    A tool that comes pretty close to working great is Referrer Karma (http://unknowngenius.com/blog/wordpress/ref-karma/).

    It does a pretty good job without having to enter data into the .htaccess file.

    I have all but given up on weblogging due to scumbags that comment spam and refer spam. There was a time when weblogging used to be fun, now it just seems like a lot of work. I already have a job and don’t really feel like “working” on my blog too.

    Good luck with your referral spam problem…

  6. Heh, I stopped using that method long ago. I found I was adding many entries to the file every day. I finally just gave up on the whole idea.

    There are probably many people that still do this. One person uses .htaccess exclusivly to stop commment and referral spam.

    John Dvorak talks about comment spam and a simple .htaccess modification that will pevent 99% of it. Heard that on the latest TWiT podcast.

  7. Dave – Is there a way to block users with .htaccess who are in turn blocking their IP addresses from appearing in my server log (currently these are logged as “no entry). If you can also send me a copy of your .htaccess file, I would really appreciate it.

    Ed

  8. I have long since given up on the idea of using .htaccess. It just took way too much effort to deal with it.

    Mostly the problem with blocking an IP address is that if the user is using a dialup connection or a proxy system like AOL, the IP address will change pretty often. Blocking a range of IP’s will block people that are not doing bad things too.

    I finally just gave up on the hosting a site myself and am using WordPress.com as my blog provider now.

  9. I don’t know enough about EE to answer Ed’s question. Unless the webserver is configured very strangely, the raw server logs must contain IP number or resolved names of whoever queries the server. EE’s internal logs are a different matter, but if both are available for correlation, a solution may present itself.

    As to the effectiveness of spam countermeasures like mod-access and mod-rewrite, I run an EE site for somebody and literally a dozen lines of rewrite code make the server survive between 10-20k referrer spams a day – more than enough to get the site kicked off a shared server. The long and short of it is that you have to resign yourself to an arms race and carefully craft and adapt your countermeasures, all of which can be automated to large degree.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.