Dave talks about using .htaccess to block referrer spam.

Posted by Les on Monday, July 26, 2004 at 10:34 AM. Read 1303 times. Tags: ,
{name} pic

Dave over at Dave’s Chalkboard has a whole category on referrer spamming and how to put an end to it using .htaccess. Dave uses ExpressionEngine, much as we do here, and EE has a built-in referrer spam blacklist so I was curious why he’d go through the trouble of using .htaccess instead. Turns out there’s a good reason: Bandwidth.

EE’s blacklist works pretty well at blocking sites from showing up in the referrer list that EE maintains itself, but it doesn’t stop it from showing up in any other tracking services you might use nor does it stop the bandwidth from being used when you’re hit constantly by these assholes. Dave estimates that the referrer spam attempts were eating up about around 3.6GB to 7GB of bandwidth a month, which isn’t a big deal when you have 50GB of bandwidth to play with, but for a site like SEB which has a mere 19GB a month that bandwidth lost to referrer spamming can quickly add up. So I’m thinking of following Dave’s lead and see if I can’t figure out how to massage my .htaccess file a bit and cut back on some of the wasted bandwidth. It’ll probably be a little trickier for me considering that I have two domains to protect, but I’ll let you know how it goes.

Update: After studying Dave’s .htaccess file and reading up on regular expressions I’ve gone ahead and implemented my own attempt at referrer spam blocking. Dave had a pretty good sized file full of URLs, but I wanted to make it as simple as possible. So rather than using full URLs I’ve put together one that makes use of pattern matching to filter out most of the crap that comes along. There’s a few sites that managed to keep their domain names free of the popular keywords so I had to add in lines just for them, but overall I think this is going to catch a lot of it with a minimal amount of work.

Testing things out with the helpful wannaBrowser appears to confirm that I’m in good shape. Still, there’s a chance my patterns are overly broad so if you find you’re getting a Forbidden error when trying to follow a link from someplace to SEB, please be sure to let me know about it.

Comments:

Page 1 of 1 pages

Dave M. United States Posted on 07/26/2004 at 11:55 AM

Dave M. pic

If you want to see my .htaccess file, just let me know. I’ll be happy to send it to you.

 Signature 

Dave Metzener
Dave’s Chalkboard

Les United States Posted on 07/26/2004 at 12:17 PM

Les pic

Sounds good to me. Drop me an email with it and thanks big time!

 Signature 

All I know is the wine lasts longer when you don’t gotta share it with someone
All I know is my steak tastes better when I take my steak tastes better pill
-- I Feel Fantastic, Jonathan Coulton

Sue United States Posted on 07/26/2004 at 12:27 PM

Sue pic

If your control panel supports it, try the IP Deny function. It lets you put in the ip addresses to deny access to. You can even use partial ip addresses, and it will look up ips based on url.

Les United States Posted on 07/26/2004 at 01:15 PM

Les pic

The only problem with denying IP addresses is that most spammers use dynamic IPs these days to get around that very tactic. With the .htaccess method I can define partial URL names to block using pattern matching and that is much more effective as it’ll catch it regardless of IP address.

 Signature 

All I know is the wine lasts longer when you don’t gotta share it with someone
All I know is my steak tastes better when I take my steak tastes better pill
-- I Feel Fantastic, Jonathan Coulton

Dave M. United States Posted on 07/26/2004 at 01:17 PM

Dave M. pic

I did try IP banning. That was working, but it was also banning normal users too. I wasn’t just banning a single IP since spammers use multiple IP’s or dynamic IP’s. I would ban an entire ISP. After getting an e-mail from someone who was banned, I decided the referral banning was the way to go.

 Signature 

Dave Metzener
Dave’s Chalkboard

Tank863 United States Posted on 02/07/2005 at 10:49 PM

Tank863 pic

dave,

could you send you .htaccess file to me. I am going through this bandwidth mess now and would like some help.

Tank863

The Linguist Korea (South) Posted on 02/12/2005 at 07:44 AM

The Linguist pic

I too would like a copy of your .htaccess file if possible.

Thanks a bunch.

Stacy United States Posted on 07/22/2005 at 03:29 PM

Stacy pic

I really could use this file too please!!!

I have never used all my bandwidth and I got slammed yesterday.

Dave M. United States Posted on 07/22/2005 at 03:36 PM

Dave M. pic

I gave up on using .htaccess a long time ago. It took way too much time to keep adding entries to the list.

A tool that comes pretty close to working great is Referrer Karma (http://unknowngenius.com/blog/wordpress/ref-karma/).

It does a pretty good job without having to enter data into the .htaccess file.

I have all but given up on weblogging due to scumbags that comment spam and refer spam. There was a time when weblogging used to be fun, now it just seems like a lot of work. I already have a job and don’t really feel like “working” on my blog too.

Good luck with your referral spam problem…

 Signature 

Dave Metzener
Dave’s Chalkboard

Ben United States Posted on 09/29/2005 at 01:13 PM

Ben pic

I would love to take a gander at that .htaccess file. Thanks!

Dave M. United States Posted on 09/29/2005 at 01:20 PM

Dave M. pic

Heh, I stopped using that method long ago. I found I was adding many entries to the file every day. I finally just gave up on the whole idea.

There are probably many people that still do this. One person uses .htaccess exclusivly to stop commment and referral spam.

John Dvorak talks about comment spam and a simple .htaccess modification that will pevent 99% of it. Heard that on the latest TWiT podcast.

 Signature 

Dave Metzener
Dave’s Chalkboard

Ed Carvalho United States Posted on 01/14/2006 at 03:55 PM

Ed Carvalho pic

Dave - Is there a way to block users with .htaccess who are in turn blocking their IP addresses from appearing in my server log (currently these are logged as “no entry). If you can also send me a copy of your .htaccess file, I would really appreciate it.

Ed

Dave M. United States Posted on 01/14/2006 at 05:00 PM

Dave M. pic

I have long since given up on the idea of using .htaccess. It just took way too much effort to deal with it.

Mostly the problem with blocking an IP address is that if the user is using a dialup connection or a proxy system like AOL, the IP address will change pretty often. Blocking a range of IP’s will block people that are not doing bad things too.

I finally just gave up on the hosting a site myself and am using WordPress.com as my blog provider now.

 Signature 

Dave Metzener
Dave’s Chalkboard

elwedriddsche United States Posted on 01/14/2006 at 08:46 PM

elwedriddsche pic

I don’t know enough about EE to answer Ed’s question. Unless the webserver is configured very strangely, the raw server logs must contain IP number or resolved names of whoever queries the server. EE’s internal logs are a different matter, but if both are available for correlation, a solution may present itself.

As to the effectiveness of spam countermeasures like mod-access and mod-rewrite, I run an EE site for somebody and literally a dozen lines of rewrite code make the server survive between 10-20k referrer spams a day - more than enough to get the site kicked off a shared server. The long and short of it is that you have to resign yourself to an arms race and carefully craft and adapt your countermeasures, all of which can be automated to large degree.

 Signature 

Science is answers that must always be questioned.
Philosophy is questions that may never be answered.
Religion is answers that must never be questioned.
Politics is answers that lobbyists pay for.

Page 1 of 1 pages

Name:

Email:

Location:

URL:

Smileys


Remember my personal information

Notify me of follow-up comments?

Submit the word you see below:


<< Back to main