Using .htaccess file for 301 redirects and URL rewrites

When addressing duplicate content for SEO, the subject inevitably comes to 301 redirects and URL rewrites. However, what do you do when a client owns multiple websites and decides to combine websites into one domain, or eliminate a website domain?  If content is left live on the old domain, it will duplicate the content on the new domain.  Google may think the new domain is scraping content off the old domain and use the Panda filter to penalize the entire new domain.

The usual answer is… 301 redirect the old website to the new website.

But How Exactly?

Here’s the problem that always happens.  They end up redirecting the homepage from the old website to the homepage of the new website.  But what about all the interior pages?  There may be hundreds of back links to interior pages that get completely wasted as soon as the old website is deleted.  If you are hosting on a Microsoft IIs server, you may be limited without installing ISAPI or some other rewrite module. However, if you are hosting on an Apache server, the .htaccess file gives you an extremely flexible means of handling redirects.


There are a few options to address redirects, depending on your situation for the old domain to new domain transition:

  1. If the two sites are different, ideally, you would redirect each interior page of the old site to a relevant interior page of the new site.  In the .htaccess file of the old site, you would specify the old page (using a relative path) and the new page (using an absolute path).  Here is the code to redirect one page to another individual page.Redirect 301 /old-page http://new-domain.com/new-page
  2. If you are just changing the domain name and keeping the same folder structure, you can just redirect every page of the old domain to the same path on the new domain.  Here’s the.htaccess code for that.Redirect 301 / http://newdomain.com/
  3. If  the sites are different and there is no way to identify relevant pages for a one to one page comparison or if comparing every page of each site is simply not feasible or cost effective, you can use a wildcard in the .htaccess to simply redirect all the interior pages of the old site to the homepage of the new domain and a separate line to 301 redirect the old website to the new domain.  Here is the code for that.
RewritEngine On
RewriteRule . http://www.newdomain.com [R=301,L,NC]
Redirect 301 / http://www.newdomain.com/

.htaccess Redirect Best Practices

301 Redirect non-www to www

Generally any page of a website is accessible by typing the URL into your browser without “www” and with “www”.  For example http://website.com and http://www.website.com will serve the same page.  Any link to a non-www URL that a spider finds, will lead that spider to crawl the entire website using non-www for every page.  This assumes you are using relative paths in your navigation, which most CMS and web developers do.  The search engine is thus crawling your site twice, doubling the task for the crawler.  Now understand that search engine spiders have a limited crawl budget for each website.  Google uses PageRank to determine how deep to crawl your website.  If you look at crawl stats, you will see that your entire site does not get crawled each time the bot visits.  If the bot must crawl through two versions of your website, you basically doubled its work.  Later, the search engine algorithm will analyze the content, identify duplicate pages and determine which ones take priority over the others.

But why make extra work for the search engine?

If you want to optimize your website for search engines, than cut out this extra work for the crawler and algorithm.  Don’t let the crawler find the non-www version of your website.  The server URL rewrite that returns a 301 code will not allow the search engine crawler bot (or humans) to access any non-www URL pages of your website. This is an old standard SEO best practice for reducing duplicate content issues.

RewriteEngine On
 RewriteBase /
 RewriteCond %{HTTP_HOST} ^yourdomain.com [NC]
 RewriteRule ^(.*)$ http://www.yourdomain.com/$1 [L,R=301]

.htaccess Language – Apache Server Configuration Directives

There is a lot more you can do with .htaccess code that will affect SEO.  The below SEO .htaccess tips and more in depth background for creating your own Apache server directives can be found at http://www.corz.org.  Many  of these use regular expressions, or regexes that are explained very well at http://www.seomoz.org/learn-seo/redirection.

Escaping:

\      char escape that particular char

For instance to specify special characters.. [].()\ etc.

Text:

.                 Any single character  (on its own = the entire URI)
[chars]       Character class: One of following chars
[^chars]      Character class: None of following chars
text1|text2    Alternative: text1 or text2 (i.e. “or”)

e.g. [^/] matches any character except /
(foo|bar)\.html matches foo.html and bar.html

Quantifiers:

?      0 or 1 of the preceding text
*      0 or N of the preceding text  (hungry)
+      1 or N of the preceding text

e.g. (.+)\.html? matches foo.htm and foo.html
(foo)?bar\.html matches bar.html and foobar.html

Grouping:

(text)     Grouping of text

Either to set the borders of an alternative or for making back references where the nthe group can be used on the target of a RewriteRule with $n

e.g.  ^(.*)\.html foo.php?bar=$1

Anchors:

^      Start of line anchor
$      End   of line anchor

An anchor explicitly states that the character right next to it MUST be either the very first character (“^”), or the very last character (“$”) of the URI string to match against the pattern, e.g..

^foo(.*)   matches foo and foobar but not eggfoo
(.*)l$   matches fool and cool, but not foo

Leave a Reply