WordPress is in a place where it is “good enough” and don’t have much incentive to become “great”

While still being very far from monopolizing the web, WordPress is a de-facto monopoly in independent web publishing, especially in sites that do not require a complicated editorial process – blogs, promotional sites, small shops and such. It is just so much better than any competition in the area in ease of installation, ease of use, documentation, free support forums, low paid support professionals, it just do not make much sense to use any other tool when you need anything that looks like a CMS on the web.

The problem with being in such a position is that you get sure that you got there from all the things you have done, and it is much harder to pinpoint problematic areas which needs more then incremental improvement, but an actual rethinking. We saw this happen to APPLE (think the original macintosh days, not the current appliances company), MicroSoft, IBM, SUN, each in turn reached a pinnacle of its market segment and just stopped innovating. Microsoft’s Internet Explorer 6 is probably the best example of what happens, once it dominated the market and its bugs became a de-facto standard, MicroSoft dismantled the team working on it, and for years there had been no improvement in it as it was jus good enough. Most of us probably know how it ended, with time it became apperant that there is actually much more browsers can be used for than what MicroSoft assumed.

Is WordPress the equivalent of IE6 in the niche of web publishing? it is very hard to judge such things in real time, it is much easier in hindsight 😉 , but it does seem to me like innovation is on the decline, and if you read the tech sites you get a very negative drift against anyone admitting he uses WordPress.

So, what is my point? Hey this is clearly marked a s a “rant” and rants do not have to have a specific point :)

wordpress 4.4 is a meh release for most users

WordPress 4.4. had gone RC and while I am far from being a typical wordpress user, so maybe my opinion as a user doesn’t count much, it seems like it has no improvement in anything related to content production and consumption.

The REST API infrastructure is something that in the realm of big organizations. I just can’t see anyone fully separating client and server sides and connecting them with only API request. This is not efficient and will hurt SEO, and while sites that are more of an application then content (think google docs) do not care about  SEO and the performance hit they will be hit with is just nothing to worry about when considering the advantages in software development practices that the separation gives you (good JS developers are easier to find then good PHP ones as almost by definition there are more of them). But for anyone else? The infrastructure can be useful in cutting development time of similar feature but even that is done by very few people.

WordPress as an oEmbed provider is a nice feature but not practical. The problem with oEmbed in general is that you need to trust the source of the embeds and there is just no reason for anyone to trust someone he doesn’t know. This feature can be useful for people that have several sites or in a network, but for people that have one site there is no real advantage of embedding over using some sort of a shortcode that justifies the performance hit that will come from embedding content in an iframe.

Responsive images, in a world in which with everyday mobile bandwidth becomes cheaper and mobile CPUs become stronger, serving an image which is too big is just a not very significant problem, and again people that just post an image from time to time and are not photo bloggers are unlikely to feel the difference. And this is actually the small problem with the feature, the bigger one is that there is no user control on which images are used as alternatives and in theory you might end serving some cropped images and some resized images as alternatives to each other, something that they are obviously not.

The only feature that does make me excited is the taxonomy metadata which will let developers write code which do not feel like a total hack when trying to add features to taxonomies.

Overall, almost nothing to get very excited about, but also nothing to really hate. I think that is the exact definition of the word “meh” :).

taming wp_upload_dir to create a directory with the name you want instead of a date

First I guess I need to answer “why” and there are probably two of them

  • Why do I care that it creates a date based directory even if not needed?
    Because I hate to waste CPU cycles if it can be avoided with almost no effort, and I hate to see empty directories when I need to traverse the uploads tree with an FTP software
  • Why not take the base directory from the value being returned and create the directory by myself?
    Because I prefer to trust the core APIs over my own code whenever possible. Core code is tested in real life by millions every day on different platforms and I will probably never test it on more then one. So while creating a directory seems like a very easy thing to do I sill prefer avoid thinking about the possible caveats that might be specific to a specific OS.

The code is actually easy, this is a snippet of something I work on right now


// returns the directory into which the files are stored
function mk_flf_dir() {
  add_filter('upload_dir','mk_flf_upload_dir',10,1);
  $dirinfo = wp_upload_dir();
  remove_filter('upload_dir','mk_flf_upload_dir',10,1);

  return $dirinfo['path'];
}

// override the default directory about to be created by wp_upload_dir
function mk_flf_upload_dir($info) {
  $info['path'] = $info['basedir'].'/fast_logins';
  return $info;
}

The fine point here is to remove the filter after it has done its thing, just because there is a slight chance some other code will want to call wp_upload-dir after your code had run.

 

Brute force attack on wordpress might bring it down because password validation is hard

There were several discussions about how brute force attacks against WordPress can bring sites down. I have to admit that I didn’t believe that as I never seen anything like that happen and I could not think of any reason for it.

On the face of it handling a login request is very similar to handling any other page request with the small additional cost of one query to the DB that gets the password to authenticate against. Oh boy how wrong I was.

First  it turned out that because of the way the internal data structures are organized, WordPress will try to get all the data associated with the user being authenticated which means that there will be at least two queries instead of one, and it seems like the total time querying the DB doubles. Then, you get into the password validation process which according to security principals is designed to be slow mathematical computation. The mathematical computation is what demands the CPU to work harder which at the end might bring down sites.

I still find it hard to believe that a properly administered site will be brought down that way but at least now it makes theoretical sense.

There is one important thing that I observed while investigating the issue. When trying to login with a user that do not exists the CPU cost is about 20 times lower then the CPU cost when trying to login with a proper user (very non scientific measurement), but the interesting thing is that when trying to login with a valid user, it costs the same if the password was authenticated or not.

Which brings us to the point of user enumeration attacks against wordpress and why core developers are doing a mistake by not addressing it. If it is hard to guess the valid users, an hacker will try all kinds of user/password combinations and it seems like there is a very big chance that most attempt will be against non existing users which are “cheaper” to handle, but if there is an easy way to find out what are the valid users, the attacker will direct all attempts at those users and even if they fail they do cost relatively a lot of CPU to handle.

Sounds like until the core developers will get a grip, an user enumeration prevention plugin is a good thing to have.

 

update_option will not always save the value to the DB

Yes, I am getting to the point in which I start to call the WordPress core team members “idiots”, at least between me and myself.

Case in point is https://core.trac.wordpress.org/ticket/34689 which is about update_option not always saving values to the DB because it checks the value returned by get_option and performs the write to the DB only if that value is different than the one update_option is requested to save.

Actually sound very logical, right? If the values are the same, what is the point of wasting the resources to write to the DB. The problem is that the value that get_option returns is not the value stored in the DB as several filters might be applied to it, therefor in some situations the value returned by get_option might be the same to the one passed to update_option but still different then the one in the DB.

So why no one had noticed it so far? I think that most people are not aware that you can filter the result of get_option on the one side, and on the other most update_option are made in admin in which the filters mentioned before will not be set as they are useless on admin side.

It is surprising to discover such a bug in one of the lowest level functions WordPress has, a function being used by almost every plugin, but it is just shows that in software when you don’t know about bugs, it doesn’t mean there aren’t any and no matter how battle tried the software is.

What is annoying is the refusal of the core team to admit that it is a bug. In software development there are all kinds of situations in which bugs are results of bad design but once it becomes old enough it is hard to fix it because by then everybody expects it and therefor it becomes a feature. But when a = b do not make a==b true, there is just no way to pretend it is not a bug.

wp_is_mobile is a trap waiting to bite you, just avoid it

What can be wrong with a function that just checks the user agent to determine if the user is on a mobile device like wp_is_mobile? even if the function works as promised the whole idea of using the user agent to detect the type of device on server side is wrong.

Using that function (or any server side detection really, but I focus on wordpress here) violates the core principal of responsive design, that you serve the same HTML to all users.

In practice you will run into trouble once you will want to cache your HTML and then you will start to sometimes get the mobile version of the site on desktop and vice versa. The “nice” thing here is that by that time the original developer had moved on and there will be someone new that the site owner will have to recruit in order to fix the resulting mess. Pros just don’t do that to client.

What is the alternative? Detect whatever needs to be detected using javascript at client side and set a class on the body element. What about people that turn off JS? I say fuck the luddites, let them have a desktop version on their mobile. OK, strike that, make your CSS mobile friendly as much as possible just don’t worry about the UX of the luddites.

Will the use of HipHop VM (HHVM) help with making your wordpress site faster? unlikely

Been a while since I last heard about facebook’s HipHop PHP optimizer project. First time I have heard of it it was a compiler from PHP to C, something I have already ran into with another interpreted language – TCL/TK, and is mainly beneficial for projects that once the interpreted code (Iie PHP code) is stable and shipped there is no need to modify it. In other words you lose the ability to modify your code on a whim that is the reason why most sites today use interpreted languages.

I was actually surprised to learn that the main reason facebook was unhappy with the compiler is that the deployment of a compiled code was resource intensive and since facebook is pushing a new update once a day they started to look into other alternatives to compiling their code into machine code.

The approach they are trying now is to write their own PHP interpreter (and a web server dedicated to running it) which will use JIT (Just In Time) technology to compile PHP code into native code and execute it. As JIT proved to be a very efficient technology when applied to optimizin javascript which like PHP is an interpreted language, I find it easy to believe that it executes PHP code faster then the conventional interpreter.

But if it is faster, how come it will not make your site faster? To understand this you need to keep in mind how facebook’s scale and how it works works.

Facebook had at some point 180k servers A 1% optimization will allow them to save 1800 servers and the cost of their electricity and maintenance. My estimate based on pricing by web hosting companies is that this might amount to saving 100k$ each month. So facebook is more likely doing this optimization to reduce cost and not to improve side speed, but for lesser sites a %1 optimization will not be enough to avoid the need of upgrading your hosting plan and even if there was a cost benefit it is unlikely that for most sites the savings will be worth the amount of time that will need to be invested in changing to use HHMV and testing your site on it, especially since it is not a fully mature product yet (just because it works for facebook doesn’t mean it works everywhere)

The other thing to take into account is that by its nature facebook can do a very limited caching as essentially all the visitors are logged in users. They can still keep information in memory in a similar way to how the object caching in wordpress works, but they still need a PHP logic to bring it all together, while wordpress sites can use full page caching plugins like the W3TC plugin which produce HTML pages that serving them bypasses entirely the need to interpret the PHP code and therefor improvements in PHP interpreting is of very little importance to those sites.

It is not that HHMV is totally useless outside of facebook, just that its impact will be much bigger on bigger and more complex sites then most wordpress sites tend to be. The nice thing about it is that it is open source and therefor the can adopt the PHP JIT techniques from HHVM into the core PHP interpreter.

Every user that had loaded any page of your site is your user

I found that I am annoyed with the way wordpress classifies users, there are administrators, editors,authors, contributors and subscribers. This classification is based entirely on what can the user access on the wordpress admin, but most people that use you site don’t have an account and therefor they are not classified at all, which is a big mental mistake.

Users without an account can be

  • casual readers – access your site at random intervals
  • follower – reads every new post or checks you site every week
  • commenter – leaves a comment
  • rss subscriber – follows update in rss
  • email notification subscriber
  • news letter subscriber
  • discussion follower – following comment updates via RSS or email.

And maybe there are more types. This kind of profiling your users should help you in monetizing your site while keeping all your users as happy as ossible.

For example, some sites don’t show ads to logged in users, treating them more as partners then source of income, but maybe it will be wise to treat commenter the same way?

WordPress’s fetch_feed API supports fetching more then one feed at a time

http://core.trac.wordpress.org/ticket/22140

Apparently several RSS feeds can be fetched “at once” by using the fetch_feed API, but core developer not excited to advertise this possibility  (and I agree because I never heard anyone complain about the lack of such a feature)

How to import big wordpress file

For an active wordpress site the content gets bigger with time. Usually tou don’t even notice that until the site becomes slow and you install some caching plugin which makes the site to run even faster and you forget about the whole thing again.

Problem arises when you want to move your content with wordpress’s export and import tools. If you have a lot of content, the exported file which will be generated might be too big to be uploaded to the new server.

The easiest way to solve the problem is to split the exported file into smaller pieces using this splitter tool, and import each one of the generated files.