Optimization | mark kaplun's blog

301 redirections should be handled in the application, not .htacces

I see many tips and questions about how to redirect a URL with .htaccess rules. On the face of it, it makes a total sense, why should you waste the time to bootstrap your website code (which might include DB access and what not), just to send a redirect response, when the webserver can do it much faster for you?

There are several reasons not to do it in .htaccess

Unless you are redirecting most of the site, the rate of hits on a 301 should be low but the lines containing those roles in the .htaccess file still needs to be read and parsed for every url of the site, even those that serve javascript and CSS if you are using the naive approach in writting the rules. In contrast, your application can check if a redirect is needed only after checking all other possibilities. Each check is slower but the accumulated CPU time spent on this will be lower. This of course depends on your rules and how fast your application determines that there is no match for a URL, and how likely a url is to require a redirect
Statistics gathering. If you do it in .htaccess the only statistical tool you can employ to analyze the redirects is log file and they are rotating and bust a bitch to parse and collect into some better storage system.At the aplication you can simply write the data to a DB, or send an event to google analytics
Your site should be managed from one console, and redirect are more related to a application level configuration then to webserver configuration. It can be very annoying if you write a new post, give it a nice url just to discover that for some reason it is always redirected to some other place without understanding why as your administration software do not know about the htaccess rule, and you probably forgot that it is there (or maybe even someone else put it there).

Will the use of HipHop VM (HHVM) help with making your wordpress site faster? unlikely

Been a while since I last heard about facebook’s HipHop PHP optimizer project. First time I have heard of it it was a compiler from PHP to C, something I have already ran into with another interpreted language – TCL/TK, and is mainly beneficial for projects that once the interpreted code (Iie PHP code) is stable and shipped there is no need to modify it. In other words you lose the ability to modify your code on a whim that is the reason why most sites today use interpreted languages.

I was actually surprised to learn that the main reason facebook was unhappy with the compiler is that the deployment of a compiled code was resource intensive and since facebook is pushing a new update once a day they started to look into other alternatives to compiling their code into machine code.

The approach they are trying now is to write their own PHP interpreter (and a web server dedicated to running it) which will use JIT (Just In Time) technology to compile PHP code into native code and execute it. As JIT proved to be a very efficient technology when applied to optimizin javascript which like PHP is an interpreted language, I find it easy to believe that it executes PHP code faster then the conventional interpreter.

But if it is faster, how come it will not make your site faster? To understand this you need to keep in mind how facebook’s scale and how it works works.

Facebook had at some point 180k servers A 1% optimization will allow them to save 1800 servers and the cost of their electricity and maintenance. My estimate based on pricing by web hosting companies is that this might amount to saving 100k$ each month. So facebook is more likely doing this optimization to reduce cost and not to improve side speed, but for lesser sites a %1 optimization will not be enough to avoid the need of upgrading your hosting plan and even if there was a cost benefit it is unlikely that for most sites the savings will be worth the amount of time that will need to be invested in changing to use HHMV and testing your site on it, especially since it is not a fully mature product yet (just because it works for facebook doesn’t mean it works everywhere)

The other thing to take into account is that by its nature facebook can do a very limited caching as essentially all the visitors are logged in users. They can still keep information in memory in a similar way to how the object caching in wordpress works, but they still need a PHP logic to bring it all together, while wordpress sites can use full page caching plugins like the W3TC plugin which produce HTML pages that serving them bypasses entirely the need to interpret the PHP code and therefor improvements in PHP interpreting is of very little importance to those sites.

It is not that HHMV is totally useless outside of facebook, just that its impact will be much bigger on bigger and more complex sites then most wordpress sites tend to be. The nice thing about it is that it is open source and therefor the can adopt the PHP JIT techniques from HHVM into the core PHP interpreter.

Should you optimize your wordpress MYSQL tables? (probably no)

While it looks like a no brainer (you only need press one button to optimize. so why not), the consensus between the mysql experts tend to discard the usefulness of optimizing as a way to improve your wordpress performance.

The real question here is not if optimizing is good or bad, but whether you should dedicate in advance time to perform it. And since while the table optimization is done the site should be offline, does the benefits are high enough to justify it.

What the optimization does it to defrag the files used for the table and rebuild the index. defraging might save some space on your harddisk, but will not impact your site’s performance. The index rebuild potentially can improve performance but in practice it rarely does so, especially for the small sites which is probably 99.9% of the stand alone wordpress sites in the world.

For people managing wordpress networks it might be more complicated as the defrag benefits might accumulate to something substantial, but I have a feeling that whatever the benefit will be, the time and effort needed to communicate to your users that their sites will be down will outweigh them.

Maybe this is something that you should do only when you are already performing site maintenance for other reason like version upgrade.

Caching with transient options and API in wordpress

from the transiants api codex page:

… offers a simple and standardized way of storing cached data in the database temporarily by giving it a custom name and a timeframe after which it will expire and be deleted

One usage pattern for the transient API is to cache values you retrieve from a remote server. The overhead of establishing a connection to the remote server, sending a query and waiting for a reply is too big so we make a concession and instead of been totally up to date with our info, but with a site that take ages to load, we better be 5 minutes late with the info, but with usable site, and we will do it by caching the last result for 5 minutes in a transient option.

So instead of having

echo get_my_latest_video_from_youtube();

We can use

$video = get_transient('latestvideo'); // we might have the value already in our cache, lets retrieve it
if ($video) { // it is there
echo $video;
} else { // nothing in the cache, or the cache had expired
$video =  get_my_latest_video_from_youtube(); // get the video code
set_transient('latestvideo',$video,5*60); // set the cache with expiry set to 5 minutes in the future
echo $video;
}

The nice thing about this code is its robustness as it will recover from any event that hurt the cache and regenerate the info.

Important to note that this solution does not eliminate entirely the delay in site load caused by accessing the remote data, it just make it less frequent. For a site which has only 1 visitor every 5 minutes or more we basically haven’t changed anything as the cache will expire before the next visitor will come. Setting longer expiration interval gives you more performance value, so you should set it as long as possible without making the displayed data to be stupidly out of date.

But what can be done if we want that all of our users will have great experience, not only 99% of the time, but 100% of the time? If our interval is long enough we can pre populate the cache with a scheduled backend operation.

The next code assumes your interval is 5 hours

wp_schedule_event(time(),'hourly','regenerate_video'); // since we want to the avoid the situation in which the cache expires we have to use a schedule which is smaller then 5 hours. This should really be done only on plugin or theme activation

add_action('regenerate_video',regenerate_video);

function regenerate_video() {
$video =  get_my_latest_video_from_youtube(); // get the video code
set_transient('latestvideo',$video,5*60*60);
}

This way we prime the cache every hour and therefor every user gets a current enough info. But then the cache practically never expires couldn’t we get the same results by using the usual options api aand store the cached value as an option? Our code will then look like

// on front end
$video = get_option('latestvideo');
if (!$video) {
regenerat_video();
$video = get_option('latestvideo');
}
echo $video;

// on the backend
wp_schedule_event(time(),'hourly','regenerate_video'); // since we want to the avoid the situation in which the cache expires we have to use a schedule which is smaller then 5 hours. This should really be done only on plugin or theme activation

add_action('regenerate_video',regenerate_video);

function regenerate_video() {
$video =  get_my_latest_video_from_youtube(); // get the video code
update_option('latestvideo',$video);
}

What we have done here is to practically change the expiry mechanism. Now we can control better when the data expires.

So which pattern is better, transients or straight options? There is another factor you need to take into account before deciding about that – the existence of object catching in the site.
Unlike options, transients do not autoload into memory when WordPress starts up. This means that if there is no active object cache on the site, get_transient actually performs an extra SB query, and there is nothing worse for performance then a DB query that can be avoided, especially when this query is happening on the front end.

On the other hand, when object caching is active, transients are not stored to the DB at all, but only at the cache. This eliminates the cost of using get_transient and make the options table smaller and therefor each operation (add,change,delete,query) on it faster.