Robots.txt for a Wordpress Site

When you install WP it comes with an autogenerated robots.txt file which is rather incomplete. 

From experience I find this is the most appropriate robots.txt file (at a minimum), for a WP site installed right in the root folder:

User-agent: *
Disallow: /feed
Disallow: /*/feed
Disallow: /xmlrpc
Disallow: /wp-
Disallow: /?p=
Disallow: /*trackback
Allow: /wp-content/uploads/
Allow: /wp-includes/js/
Allow: /wp-includes/css/


If the WP installation is in a subfolder (example folder /blog/ ) then the robots.txt for the entire site has to have WP specific directives in addition to those for the  rest of the site.

Under the general user agent (User-agent: *) line you will add these similar directives to the others that apply to the rest of the site:


Disallow: /blog/feed
Disallow: /blog/*/feed
Disallow: /blog/xmlrpc
Disallow: /blog/wp-
Disallow: /blog/?p=
Disallow: /blog/*trackback
Allow: /blog/wp-content/uploads/
Allow: /blog/wp-includes/js/
Allow: /blog/wp-includes/css/


In  addition tag, category, archive and author pages require a robots "noindex" meta tag, because all they are at best is lists of links to posts, and at worst duplicate content when all all part of each post is listed as well.

This is easily achieved by using the All in One SEO Pack pulgin (or similar) and configuring it to add those robots "noindex" meta tag to those types of urls.

Configure the sitemap generator to not include those types of urls as well.

Note: In rare cases WP site webmasters are actually providing some good stand-alone content for each category page - In that case don't add a robots noindex meta tag to category pages.

Other things to watch out for when using Wordpress:
  1. don't forget to change the Privacy (in Settings) to allow robots to crawl the site.
  2. ensure the WP site is on the same canonical domain as the rest of the site (all www or all non-www). For WP site it's managed from Settings.
  3. ensure correct server response for a non-existent url - that's a 404, with or without a custom error page.
  4. Get your permalink structure decided form the start, before you allow the site to be indexed. It's much harder to change later and it will require 301 redirection from the old url to the new ones.
  5. either disable comments or moderate them seriously.
  6. keep software updated to the latest version.
  7. keep the number of both tags and categories small, don't pepper a page or post with umpteen tags and category labels.
  8. don't use tag or category clouds - they do look spammy.
  9. disable feed generation if you can - feeds are the main vehicle for getting your site content scraped by other.
  10. modify the theme to get rid of the login url.
  11. watch out for footer or other site-wide links when using any theme, especially one you got from other sources instead of Wordpress. Ensure they all use rel="nofollow".
  12. avoid having a blogroll or if you do, use a plugin to add rel="nofollow" to the links (at least for all pages except at most the homepage). Of course exercise common sense when keeping links in a blogroll - they should not be part of a link exchange or paid links and they should all be relevant to your site.


my gg

Comments