The decision to enable xml-rpc remote publishing support by defualt in wordpress 3.5 is good, but the execution is lacking

xml-rpc protocol is basically used to expose a set of API implemented by a site/server which enables other software to interact with the site/server in a way which is easier to program then trying to mimic user interaction. WordPress currently (version 3.5) expose API for publishing pingbacks, and content. Software like windows live writer by Microsoft uses the content publishing API supplied by WordPress to create a non browser editing environment, but the main users of the protocol right now are smartphones because the small screen size makes the WordPress web interface almost unusable.

Upto WordPress version 3.4 the remote publishing by XML-RPC was disabled by default, and the text explaining the option in the admin was technical and said nothing about smartphones.

With the rise of smartphone use, and the number of smartphone apps that use XML-RPC to publish content to wordpress, it is only a logical move to enable XML-RPC by default, but the development moto of “decisions not options” was taken too far as in this case the option has enough importance to justify having it.

The reasons are mainly security related

  1. It doesn’t matter how robust is WordPress code there is always a chance of a security bug that might relate only to the XML-RPC code
  2. Plugin authors will probably start supporting XML-RPC opening more attack vectors, and user will not even know about it because it will not have any GUI indication, and you will not know that unless you read the plugins documentation.
  3. There is no knowledge base on how to defend against brute force/dictionary attacks from XML-RPC. Current plugins might work,but will they give you a notice like “You failed to login 3 times, please wait 5 minutes till the next attempt” on the XML-RPC layer, and how the app will display that notice?

It might be that core developers are right and there is no risk added by having XML-RPC on all the time, but I think that a more conservative two step approach like

  1. Make it default to on, leave the option in the admin
  2. In two releases look at the experience of running WordPress that way and decided whether to eliminate the option as well

The reason it should work that way is that most user just leave the default setting on, so there will be a big enough user base to field test the feature even when the option to turn it off exist.

WordPress settings API is PITA

The WordPress settings API is there to “help authors manage their plugin custom options“, but does it? Many lengthy tutorials pointed to from the codex hints that the answer is probably “not really”. To quote Olly Benson from yet another settings API tutorial/example (emphasize mine)

WordPress does make a habit of creating mountains out of molehills sometimes, and the Settings API seems to be a fantastic example of this.  Trying to follow the instructions on the initial page got me hopelessly lost, and it was only when I went through Otto’s WordPress Settings API tutorial that I begin to understand how to implement it.

And the problem is not with the codex, the problem is in the structure of the API itself. Instead of having a simple fast and dirty code like

add_action('admin_init','my_init');

function my_init() {
add_page('title','title',10,'my_options_page');
}

function my_options_page() {
if (isset($_POST['my_value'])) {
validate_value();
update_option('my_option',$_POST['my_value']);
}
<form>
Enter value <input type="text" name="my_value" value="<?echo get_option('my_option')?>
</form>
}

Where all the logic of handling the change of values is placed in the same place as the presentation, the my_options_page function, which makes it much easier to understand and debug.

The settings API basically moves your options handling away from your presentation code. To use it you need to call at least 3 initialization functions to which you have to supply 3 callback functions, and all the handling is done in a “black box” that doesn’t give you any hint for misconfiguration and it is hard to debug.

When trying to use the API I end spending more time to make myself feel good about following coding best practices then needed to code the same functionality in an equally accessible and secure way.

The time has come to clean the net from the prehistoric junk called breadcrums

One guy on the wordpress stackexchange asked how have to have breadcrumbs that show a different “path” the one he currently gets, and this reminded me that I hate breadcrumbs.

Breadcrumbs where created by Jacob Nielsen, and he describes the motivation in the article “Breadcrumb Navigation Increasingly Useful“. I was using the net on 2007 and before so I totally agree that at that point in time adding breadcrumbs to sites had improved their usability. But times had changed and we understand better how users use the internet the the importance of “in site” navigation system, and a catch all default like breadcrumbs have no place anymore on the net.

The problem with breadcrumbs is that they are based on the assumption that content is hierarchical and there is an hierarchical path from the home page to any page in the site. There are several problems with that assumption:

  • A lot of the content on the internet is not hierarchical. This post is categorized as “Rant”, but it is mainly because I don’t like to have an “uncategorized” category. Many blogs and small sites owners categorize because the CMS tools give them (and some times force them) the option to categorize and the end result is that there are totally unrelated content being categorized at the same way.
    Take a look at Jacob Neilsen article itself, what is its breadcrumbs? “useit.com > Alertbox > April 2007 Breadcrumbs “. Right now the alertbox contains 457 links to artcles. If I want to find more information on an article I read there, it is better to leave the site and go for google to make a search then trying to guess which of the links on that page might actually contain the information I’m looking for.
  • For many pages (like in the question) there might be multiple hierarchical paths so which one should you use?
    For this post there is the path of the category “rant”, the tag “Breadcrumbs”, and the date it was published “2012” > “12” > “12”. For personal blog the date hierarchy is the most appropriate, but for technical hierarchy by categories makes more sense.
  • Site owners don’t want users to navigate by hierarchy, they prefer to direct user traffic to a more profitable pages, or pages that they think better serve your needs instead of overwhelming the user with information at an higher hierarchy level.
  • Most sites that care recognized that navigating by hierarchy is frustrating to the user as it requires at least 2 clicks to reach a destination (one up and one down) and improved their in-site search so the user will get to its destination in at most 2 clicks.

In 1995, when the idea of breadcrumbs was created, web sites followed the structure of file systems. The breadcrumbs solution is actually used in the windows explorer in windows 7 but it is just not good enough because yes it takes me 5 levels up in the hierarchy if I need to in one click, but it doesn’t help me to get down the hierarchy to where I need.
But modern OSs are trying to get away from the hierarchical organization of data, Iphone hides the file system from the users, and other OSs trying to analyze user behavior and provide built in search facilities. Sites, being more flexible should lead this trend instead of stick with the old ways.

But wait, what about SEO, isn’t breadcrumbs good for SEO? Maybe it was true in 2007 before search engines standardized on using site maps. Without site maps breadcrumbs where a relatively easy way to provide inter site connectivity that was required to help search engines discover all of the content in the site. Today if your site doesn’t have a site map you either don’t care much about SEO, or you are sure that your site is inter connected.

The only SEO related reason to use breadcrumbs today is that google might use them instead of the full URL in the search results when the URL is very long. This for sure improves the usability of the google results, but does it add value to the site owner? Even if it does, I would add bread crumbs either as RDF (ugly) or use the microdata way but just hide it with CSS. This way google has the data, and you have more free screen space on your site.

The possible impact of changing wordpress (and php) max memory settings on site performance

In the  last several days there where several questions in wordpress answers on stackexchange related to out of memory errors. This was mostly related to some plugin which required more memory to function, and people asked what is the way to change/overcome the default memory limit of a PHP process.

My impression from the questions and answers was that people fail to understand why there is a limit at all and treat the limit as some bizarre PHP thing that you need to overcome instead of trying to understand it. There is even a plugin “Change memory limit” that its description says

Update the WordPress default memory limit. Never run into the dreaded “allowed memory size of 33554432 bytes exhausted” error again!

To understand why there is a limit you need to understand the most hidden secrets of linux and windows that will surprise most developers – After an application had allocated memory from the OS it can not free it back.Yes, when a program call the free() function, an object destructor or any other dealoocation method, the memory is returned to the free memory pool of the application from which it might allocate its next memory, but it will never be returned to the OS as long as the software is running*.

Since software doesn’t really deallocate, a server software, that is supposed to run all the time, once reached its pick memory usage will stay there.This has to be taken into account when you want to ensure specific performance with the way apache works.

Appache in prefork mode basically run itself several times, where each instance can handle one request. If no instances are free to handle the request, the request has to wait in a queue. The maximal number of concurrent requests the server can process is the number of instances we can run at the same time. Assuming we don’t do any heavily CPU bound process, our limitation is the memory that can be allocated to each instance.

And how can we calculate the amount of memory an apache needs? The naive approach is to try and use average memory consumption, but once a software passed its “average” allocation, the memory will not be released. potentially an apache instance running one memory hungry process can take control over all the available memory leaving no memory available for the other instances which will probably lead to them failing in handling request. you might think that you configured your server to enable 10 request to be handled but 9 of them fail.

It is important to understand that once the memory was allocated it is of no importance that the instance never need again all of that memory and handles only small request. The memory is attached to the instance forever.

And this is why the memory  limit exists, to protect the whole server from one faulty piece of code. If you set the limit to 128KB then you can be assured that atleast the rest of the memory is available to the other instances.

So basically the number of apache instances we can run safely without the fear of the server suddenly breaking down for no apparent reason, is (amount of memory available on the server) / (max memory limit). The higher the limit the less requests your server can process in the same time which potentially leads to less responsive server.

Apache actually can be configure to kill instances after serving a certain amount of requests and by that actually free memory. This will improve the server performance on average but it also has a cost, the cost of running a new instance. You should probably always plan for the worst case scenario and experiment very carefully with relaxing the memory restriction.

Prefork is not the only way to configure apache to run, and there are also the worker and event configurations, but they require that the PHP library you use will be thread safe. Some people claim it actually works for them but the PHP developers don’t recommend running that way.

And then, if you use fastcgi to execute php instead of mod_php, you basically change it from being an apache problem to fastcgi problem which might actually be better since while fastcgi might hurt the performance of pages generated with PHP, apache itself will be able to serve static files.

* Mainly because memory from the OS is being allocated in big chunks and it is very likely that when you dynamically allocate and free memory from that chunk some allocated “live”  memory will be in every chunk.

Tip to Michael Arrington – the only way to control your data is to host it on your site

Michael Arrington’s briliant rant against instagram’s move that cripples photos shared from it to twitter.

http://techcrunch.com/2012/12/06/they-screwed-us-right-before-they-screwed-us-again-poohead/

offtopic: the MG disclaimer is hillarious