Webserver status update.

Posted by Les on Wednesday, March 01, 2006 at 03:22 PM. Read 846 times. Tags:
{name} pic

It’s clear that our current hosting solution is like trying to fit 10 pounds of shit into a 5 pound sack, as my mother is often wont to say, but we’re managing to limp along. Some days are better than others, but the performance is clearly not what I’d like it to be and it’s mainly due to trying to run in an environment that is less than the minimum recommended for the package we’re using. In blocking some of the more aggressive search engine crawlers out there we’ve also managed to block Bloglines from scanning our RSS feeds so folks who were keeping up with us via that conduit are missing out and it’s not clear which block we need to remove to restore that functionality. Occasionally when trying to view the site you may get an error message instead indicating that the script has used up all the available memory. When that happens simply hitting reload should be enough to get the site to render. We’re just bumping against our limits and causing the script to crap out on occasion is all.  In addition to that I need to have our new hosting providers set up a reverse DNS map so that email notifications can be sent to AOL and Roadrunner as those are currently failing.

I’ve held off on putting the ticket in because I’m kicking around whether or not to tough it out here by stepping up to the next available package or restarting the search for a better hosting solution. I’m also debating whether to stick with ExpressionEngine or see if I can make use of one of the other packages out there that aren’t as resource intensive. Of course, not being overly familiar with what the minimum requirements of other packages happens to be doesn’t make that choice any easier to make. I could end up switching to something else only to find it’s just as bad under the load SEB generates as EE is.

Anyway, just wanted to let you guys know that while we may be limping along at the moment we’re not ignoring the situation in the least. Options are still being considered and alternatives are being explored. Bear with us, this could take awhile.

Comments:

Page 1 of 1 pages

elwedriddsche United States Posted on 03/01/2006 at 04:17 PM

elwedriddsche pic

I was messing around behind the scenes a bit, but I haven’t had time to do something decisive. From what I can tell, today we’ve seen another instance of overabundant search engines.

My personal headache is that even instructing the webserver to tell them to bugger off doesn’t do squat against the sheer number of requests. Something I wanted to set up and held off on is a throttling module. I suppose it’s time to go and do it.

At the risk of being repetitive, I see but two long-term solutions. Reduce SEB’s memory and CPU footprint, which almost certainly involves a painful migration to another script, and/or shunting search engines into a search-engine optimized clone of SEB. If somebody has any ideas, don’t be shy.

 Signature 

Science is answers that must always be questioned.
Philosophy is questions that may never be answered.
Religion is answers that must never be questioned.
Politics is answers that lobbyists pay for.

Richy C. Great Britain (UK) Posted on 03/01/2006 at 04:47 PM

Richy C. pic

You could try changing Apache to run on say port 81 and then set up Squid as a proxy server on port 80 to retrieve the pages from Apache. That way you should drastically reduce the CPU and memory usage as instead of Apache, PHP, MySQL having to recompile each page, Squid can just pull it’s cached copy straight off the hard drive.

Other optimisation techniques include making semi-dynamic content (such as the “SE Comments” on the left) be generated every 10 minutes to a plain HTML file and file being included on the page (again instead of generating it each time), tweaking the MySQL configuration, stripping PHP “to the bare bones” (do you really need all the modules that are compiled in?), tweaking Apache’s MaxClients settings, disabling KeepAlive in Apache and a few other speed tweaks.

elwedriddsche United States Posted on 03/01/2006 at 06:21 PM

elwedriddsche pic

Thanks for the suggestions, Richy.

You could try changing Apache to run on say port 81 and then set up Squid as a proxy server on port 80 to retrieve the pages from Apache. That way you should drastically reduce the CPU and memory usage as instead of Apache, PHP, MySQL having to recompile each page, Squid can just pull it’s cached copy straight off the hard drive.

I’m receptive to that idea.

EE does its own caching, but I have no idea how effective it is. On top of that, I can see that it’s vastly more expensive to use EE instead of squid, which is optimized for that job.

Depending on the kind of access control squid is capable of, it may also be possible to punt on unwanted spiders well before they hit apache.

Other optimisation techniques include making semi-dynamic content (such as the “SE Comments� on the left) be generated every 10 minutes to a plain HTML file and file being included on the page (again instead of generating it each time), tweaking the MySQL configuration, stripping PHP “to the bare bones� (do you really need all the modules that are compiled in?), tweaking Apache’s MaxClients settings, disabling KeepAlive in Apache and a few other speed tweaks.

Quite frankly, I do not understand why EE is such a memory hog. You are absolutely right, though, that there are a number of page elements that do not need to be dynamically generated. If a static copy is updated once in a while, a number of elements that Les yanked could be brought back.

Tweaking MySQL is admittedly something I don’t have a lot of experience with. Having said that, it’s apache that chews RAM and CPU like there’s no tomorrow. Tweaking MySQL simply isn’t worth the bother right now.

I haven’t looked at phpinfo() in a while. It’s a standard-issue build, which means that there are a few modules that could be culled. Having said that, to get EE to run I can’t drop php’s memory_limit below 24M. For a VPS with 256M guaranteed, the math is straightforward - even losing a meg or so yields but limited returns.

Apache’s maxclients and friends is something I’ve toyed with extensively. Given the amount of memory PHP needs, there’s a delicate balance between the site’s responsiveness and resource usage.

It’s worth having a closer look at keepalives, though. I need a better understanding what apache spends its time on…

 Signature 

Science is answers that must always be questioned.
Philosophy is questions that may never be answered.
Religion is answers that must never be questioned.
Politics is answers that lobbyists pay for.

ShawnC United States Posted on 03/02/2006 at 10:44 AM

ShawnC pic

Richy said:

You could try changing Apache to run on say port 81

Just a technicality, but I beg to differ.

IANA reserves ports 0-1023 as “well-known” ports that cannot be reassigned.  Port 81 is reserved for “HOSTS2 Name Server.” Here is the list of (well-known ports

I’m not too familiar with Squid, but for the same reason I listed above, you probably cannot set up Squid on port 80.

HTTP Proxy servers are commonly set up on 8080. 

Probably the most difficult thing about the setup you suggested is actually getting your traffic to route through the alternate port.

Elwed said:

to tell them to bugger off doesn’t do squat against the sheer number of requests.

Not much you can do about that, except to make sure that those requests take as little time to execute as possible.  As I suggested before, and you have re-iterated, the dynamic content is ultimately what is slowing you down and causing your resource usage to explode:

SEB said:

Page rendered in 8.5140 with 66 SQL queries.

Sorry to rain on your parade… I really don’t have any ideas to suggest other than rendering more content statically.

Shawn.

elwedriddsche United States Posted on 03/02/2006 at 12:06 PM

elwedriddsche pic

Just a technicality, but I beg to differ.

IANA reserves ports 0-1023 as “well-knownâ€? ports that cannot be reassigned.  Port 81 is reserved for “HOSTS2 Name Server.â€? Here is the list of (well-known ports

I’m not too familiar with Squid, but for the same reason I listed above, you probably cannot set up Squid on port 80.

HTTP Proxy servers are commonly set up on 8080.

Probably the most difficult thing about the setup you suggested is actually getting your traffic to route through the alternate port.

IANA is a fiduciary (thanks for the education, Consi) that holds a variety of assignments that are required for interoperability purposes in trust. Assignments by IANA do not carry the force of law, though. Everybody is free to run whatever they want on whatever port they chose, provided they don’t cause problems elsewhere and they don’t ask others to cater to non-standard usage.

Richy doesn’t propose to run squid as a proxy server, but as an http accelerator, a subtle difference. Plugging squid into port 80 is the right thing to do in this case and if the real server is visible at port 81, so what?

The setup he suggests is very straightforward. It’s a few lines of configuration change for apache and squid each; it only gets tedious if you want to massage the URLs in SEB’s database and the skin/theme/template to route static content to either squid or apache or even elsewhere.

I have toyed with the idea of making more content static and routing any static content to a lightweight webserver like thttpd or mathopd. They seem very fast, use next to no memory, and run in a single process.

Not much you can do about that, except to make sure that those requests take as little time to execute as possible.  As I suggested before, and you have re-iterated, the dynamic content is ultimately what is slowing you down and causing your resource usage to explode:

Indeed. Deadscot’s site, which also runs EE, tends to drive mysql nuts if spammers hawk a new product. For SEB, the buck stops at apache.

By the way, if it weren’t for firewalling the worst offenders, you may not be able to read this site today. The reason I like the idea of using something like squid as an accelerator is that anything that can drop unwanted traffic before it hits apache is very good news. Squid uses a lot of memory, though, and I have to experiment in a lab first before messing with a “production” site.

I really don’t have any ideas to suggest other than rendering more content statically.

The execution time is bad. The number of SQL you have to take with a grain of salt, because their impact depends on the nature of the queries. Drupal has a module that traces SQL queries and displays them at the page bottom. I’m not aware of a similar feature for EE.

On the other hand, the number of queries either means that EE’s caching isn’t very effective or that the sidebar does too much. I believe the queries are closely correlated to the number of comments (i.e. pulling up the list of recent commenters for each article). Among other things, EE caches SQL queries and I have no clue how to evaluate this further.

Anyway, keep the suggestions coming.

 Signature 

Science is answers that must always be questioned.
Philosophy is questions that may never be answered.
Religion is answers that must never be questioned.
Politics is answers that lobbyists pay for.

ben United States Posted on 03/02/2006 at 08:45 PM

ben pic

Are you still hosted at Blogomania?

Les United States Posted on 03/02/2006 at 08:48 PM

Les pic

Not at this time, no. We made the jump to a Virtual Private Server solution elsewhere. It’s just not quite as robust as we really need.

 Signature 

Gods dont kill people. People with Gods kill people. - David Viaene

elwedriddsche United States Posted on 03/02/2006 at 10:46 PM

elwedriddsche pic

Pound. I keep forgetting about that one.

 Signature 

Science is answers that must always be questioned.
Philosophy is questions that may never be answered.
Religion is answers that must never be questioned.
Politics is answers that lobbyists pay for.

elwedriddsche United States Posted on 03/03/2006 at 11:48 PM

elwedriddsche pic

It’s time to get more serious about performance testing. The primary objective is to harden against traffic spikes, a secondary objective is to improve performance under normal load.

Since I can’t recreate the exact environment, a few simplifying assumptions are necessary. Specifically, using an elderly PC with RAM limited to the guaranteed allotment to the VPS and no swap should be close enough - just as long as performance differences scale proportionally to the live VPS.

To properly regress and benchmark sample configurations, it looks like feeding siege URL lists culled from the server logs will do nicely.

Now the trick will be finding the time to do all of that…

 Signature 

Science is answers that must always be questioned.
Philosophy is questions that may never be answered.
Religion is answers that must never be questioned.
Politics is answers that lobbyists pay for.

manofsteel United States Posted on 03/05/2006 at 01:43 PM

manofsteel pic

Les, have you talked to Nevin at Pmachine hosting?  I bet he would have some great advice for you.  Maybe he would even give you a screaming deal for hosting.

Les United States Posted on 03/05/2006 at 08:17 PM

Les pic

Not as of yet, but only because I’ve looked at the packages they offer and my needs are way beyond that. I have talked about what requirements are needed with Paul, though, and that’s how I learned I had undershot a bit.

 Signature 

Gods dont kill people. People with Gods kill people. - David Viaene

Page 1 of 1 pages

Name:

Email:

Location:

URL:

Smileys


Remember my personal information

Notify me of follow-up comments?

Submit the word you see below:


<< Back to main