How to import big wordpress file

For an active wordpress site the content gets bigger with time. Usually tou don’t even notice that until the site becomes slow and you install some caching plugin which makes the site to run even faster and you forget about the whole thing again.

Problem arises when you want to move your content with wordpress’s export and import tools. If you have a lot of content, the exported file which will be generated might be too big to be uploaded to the new server.

The easiest way to solve the problem is to split the exported file into smaller pieces using this splitter tool, and import each one of the generated files.

How to make taxonomy pages appear as result in wordpress search

In addition to many other drawbacks wordpress search has it just can’t search the description associated with a taxonomy (category, tag) or author, so even if the most obvious search search result is the category page, the internal search will never show it.

But there is a way to hack around it if you really have to. All that needs to be done is to have a page with the exact same URL as a taxonomy.

If you which for the category “events” to be searchable, assuming its url is /category/events, all you have to do is to create two pages, one with the slug “category” and a sub page of it with the slug “events” and put the text associated with the category in the “events” page.

The only problem is that the search result will be styled like a page, but this is a small price to pay.

In wordpress, pages can have whatever URL you want them to have

For all content types  except pages wordpress uses a system of patterns to identify from the structure of the URL itself which type of content is being accessed. Once identified it can know in which part of the DB it should look for the content associated with the URL.

This is the reason why you usually should have a prefix “directory” in the URL which uniquely identifies your content. If there are two possible interpretations wordpress will match the first that is found.

Pages are different. WordPress kind of assumes that by default all content in the site is pages and the parsing rule for page URLs is “if it is not something else it might be a page”.

This lets you place pages anywhere in the URL structure. Here the question was about having an Event/post_slug URL for posts and have also an Event/Contact URL for a page. To do that you just need to have a page with a slug Events and a page with a slug Contact as its sub page.

As long as there is no post with the slug contact, when wordpress get a Events/Contact URL it tries to find a post in the events category with the slug Contact, and if there is none it will try to find a page with the slag Contact under a page with the slug Events and BINGO.

Two problems with this approach. Neither of them is probably major enough to prevent to use of this technique

  1. For every URL of the structure Events/xxxx where there is no post with the slug xxxx, wordpress will have to make another DB query to check if there is a page with the slug xxxx under the page with the slug “Events”
  2. You always have to remember not to create a post or subcategory of the category “Events” with the slug  “Contact”. If you do that you page will not be access and you will not have any warning about that.

If you edit your config files via SSH then you should keep them in an inaccessible place

There is an exploit that can hit only those that know how to use linux to manage their site. Apparently linux editors store a backup copy of the file being edited in the same directory of the file. Therefor when a config file is being edited a copy of it is created, and since the names the editor gives to the backup files is predictable, and usually accessible as plain text from the web, all the data in the config is exposed at edit time. It is even worse if the editor failed to erase the backup after editing was finished.

Therefor you either should not edit config files locally, but transfer them over FTP, or put then in an inaccessible location (If wordpress is installed at the root directory you can put the config file at the usually inaccessible directory above it) or replace the config script with a script that read the config, or the secret parts of it from other location.

WordPress plugins and themes do not have to be GPL since the court ruled that APIs are not copyrightable

preface:

  1. I don’t like GPL, I think that for most places that it is being used at, especially in wordpress, the BSD license would have served as well and would have removed the illusion that just by selecting a restrictive license your code becomes less prune for IP theft.
  2. Matt Mullenweg created a great product and succeeded to maintain a great community around it. So far his insistence on GPL everywhere haven’t really hurt either of them and maybe actually strengthened them.

It seems like once a year there is some form of debate about wordpress, GPL and whether people might develop software related to wordpress which does not use a GPL compatible license. This time it is about whether people selling themes/plugins under split license (one for code and another for styling) should participate in wordcamps.

Maybe it is time to take one step back and ask again whether themes and plugins have to be GPL compatible.

The legal base for the claim is this legal opinion from James Vasile of the Software Freedom Law Center. There are two things to notice

  • By law a lawyer have to help his client to present the best legal case for his objective. If I had the money I could find 10 lawyers which will contradict every second word in that post.
  • After many bold claims the last paragraph backtracks from it all

    Finally, we note that it might be possible to design a valid WordPress theme that avoids the factors that subject it to WordPress’s copyright, but such a theme would have to forgo almost all the WordPress functionality that makes the software useful.

But the most important thing to know about lawyers opinions is that they are always not much better then a guess and only the court can actually decide what is the correct legal interpretation of a legal situation. Lawyers can “guess” much better when there are precedences, but there where no court cases revolving around the nature of derivative work similar to wordpress plugins and themes that I know of (and I’m sure the lawyer would have cited them if he was aware of any).

Well, in may 2012 there was a ruling about a similar issue, whether API is copyrightable.In the legal battle between Oracle and Google about the use of java derivative in the android OS. Oracle claimed that just because Google implemented the same API that java has, without a license from Oracle, it infringed on its copyright. This claim was dismissed in court, but it has more to it then that, and according to the report the judge had set a limit to when a derivative work do not inherit the license of its origin.

Ninety-seven percent of the source code in the API packages is different; it’s only the three percent that overlaps that formed the heart of Oracle’s copyright claim. That three percent included packages, methods, and class names. But those declarations—like starting a function with package java.lang—can only be used in certain ways. “In order to declare a particular functionality, the language demands that the method declaration take a particular form

Therefore claiming that just because some lines of code are similar in all themes to the GPLed themes provided as part of the wordpress distribution as Mark Jaquith says

If that argument doesn’t convince you, then note that the vast majority of themes derive from the original WordPress core themes. How they load different PHP subfiles, loop through posts, and get and interact with WordPress data is all covered by the original WordPress core themes, which are explicitly GPL

Doesn’t hold water unless there is some different way to use the wordpress API, which there isn’t. Big part of the PHP code in many themes is identical because there is either no other way to perform a specific functionality, or it is the best practice.

In my opinion the wordpress foundation (or wordpress.org or whoever is talking for wordpress) might have a right cause, but they win the fights because they have bigger sticks, and not because the law is on their side.

twitter got hacked, guess it is a bad idea to trust it in managing my online identity

The news are that twitter got hacked and up to 250k user accounts where compromised. I’m not a real user of twitter although I have an account, so I might be wrong but in my opinion no one will feel extremely sad if some of his mental farts will be deleted or changed. Content on twitter, by the nature of the service that focuses on real time updates, is just not important enough in the long run.

But…. twitter is also a leading identity authentication provider on the web. If my twitter account was compromised it means that for a while the hacker had access to all the sites to which I have registered with my twitter account. It is hard to generalize how much cascading damage can follow from the hacker using my account, but it is not nice to even think about it. Twitter didn’t disclose the nature of the information to which the hacker got access, but I truly hope they don’t have a log of the sites to which I authenticate myself using my twitter account.

What about letting the sender know that his message ended up in the spam queue?

The really annoying thing about spam is not that we are wasting our bandwidth to process it, but false positives – messages which our anti spam software decided they are spam while they where totally legit.This hurts is both as receivers and senders, we can never be sure if we haven’t missed a great business offer because it was marked as spam, or that the message that we sent asking for urgent help, from someone who should be inclined to helping us, was not ignored but lost cause it looked like spam.

In the email world you can at least ask for auto respond when the email was read. Not a great indication as it is impossible to know if the email was marked as spam or someone is impolite  or just can’t be bothered with clicking the button which will send the auto respond indication.

In the blog comment world, and contact pages we don’t even have that, you can’t even ask to be notified if your comment/contact message got into the read queue instead of the spam queue.

In the email world it is impossible to let the sender know if the email was declared spam because the sender part of the email is always spoofed by the spammer and if you will send to the “sender” an automatic message telling him  that his message was declared spam, you will bombard  unsuspecting people which don’t even know that you exist with this messages.

Websites are in better position as the HTTP protocol force you to send a reply, so why not send something like “sorry but my stupid and out of date anti spam software decided your comment is a spam” when a comment is declared as spam? spam bots will probably ignore it but legit commenter will know that they should not expect the comment to be published, and if they have to, they can try contacting the site owner by other means.

With the rate I get them, akismet might as well delete all spammy comments

About 4 hours ago I deleted 90 comments which akismet declared as spam, now I have 5 more spammy comments. At this rate of spam, there is no chance that I will be able to detect any false positive in the spam queue therefor I wonder what is exactly the point of having a spam queue. And this is on a new blog which should be low ranked for any interesting search term, am I just an anomaly or does more popular blogs getting even more spam?

This isn’t only a blog comments problem, most of the time if your mail gets into a spam queue on someones mail the chances he will notice it are very slim. I am lucky since I don’t receive much mail in english and I can quickly scan my spam for hebrew subject line and find false positives.

Over the last years we were conditioned to trust our spam filters. Maybe it  is time to take a step further and just configure them to delete anything that looks like spam and save as the need to manually delete it..

Contact form have to store the submitted information in the database as sending it in e-mail is not reliable enough

Run into a contact form problem on a client’s site. The result of sending a contact request via a form was an error message display to the submitter and no email was sent to the admin of the form.
It turned out that the admin email was misconfigured, a totally invalid address was used and the software that was handling the actual transmission first tried to validate the address and failed because it didn’t.

This can be looked upon as one more case of dumb user failing to copy&paste his own email address, but once you start thinking of it you realize that the main problem is not whether the address was correct or not, but that the indication was displayed to the submitter while the site admin didn’t know anything about it.

And even if everything was configured correctly, will the admin be guarantied to receive the e-mail? not at all. Between having problem in his own POP/IMAP server, spam filters that identified the mail as spam, server upgrade that resulted in misconfigured mailing component, and SMTP server configuration change (password for example) to which the admin was not notified about, the chances that right now someone is not getting a contact request that was submitted successfully is pretty high.

Yes, at least on the server side it is possible to log whatever errors you can when sending the mail to the internet fails, but who really bothers to read the error log if he doesn’t know that there is something wrong, and how will you know if even when a problem is detected it is displayed to the submitter instead of the admin?

Isn’t it much better to keep all the contact requests in the server’s DB, and notify the admin that he has new contact requests awaiting him? This way even if the notification had failed he will still be able to access the data, and as a bonus it will be accessible even when he can’t access his mailbox.

It is probably not as simple for commercial organizations in which the contact form submission is just the first step in a sale, and it is convenient to track it in your mailbox, but hopefully this kind of organizations have better CRM solutions then just using email.