Canonicalization can be quite a complex area for webmasters. Internet marketers who design their own websites can be bewildered too when encountering this issue for the first time. Let’s take a look at what is canonicalization and how you can use it to solve URL problems.
Canonicalization is the process of standardizing URLs. Sometimes a domain name can have several variations but leading to the same page and this can cause a problem for search engines to determine exactly what URL version you’re using. For instance, Google can determine that the following URLs are not the same when in fact all of them lead to the same page.
http://www.yourwebsitename.com
http://yourwebsitename.com
http://www.yourwebsitename.com/index.html
http://yourwebsitename.com/index.html
Here are two reasons why this can happen:
(1) Inconsistent Internal Linking Pattern
You didn’t maintain a consistent internal linking pattern within your own site. If your homepage link has different versions of URLs linking to it like the above example, then you can face the canonicalization issue. Always decide beforehand what URL to use and be consistent. Also, use absolute link instead of relative link for your homepage link. Absolute link means using the domain name URL directly. If your homepage link is called “Internet Business Home”, hyperlink it directly to http://www.yourwebsitename.com instead of index.html which is a relative link.
(2) Link Building With Different URLs To Same Domain
Sometimes, even though your internal linking pattern is correct, you might encounter the canonicalization issue. For instance, by building inbound links to your domain using different URL versions. Google can get confused and determine that all these different URLs leading to the same domain are in fact different domain names. Sometimes this can be beyond your control as you don’t have any control who can link to you.
Pitfalls of Using Different URLs To Same Domain
Linking inconsistently whether on-site or off-site can have a negative impact on search engine rankings, PageRank and link juice distribution. Just think about this scenario. If you’re building links to http://www.yourwebsitename.com and also http://yourwebsitename.com or http://www.yourwebsitename.com/index.html, thinking it’s the same thing, well this is a big mistake as you’re wasting your time. You’re in fact reducing your link popularity power to your main domain which is http://www.yourwebsitename.com.
You won’t get maximum PageRank and link juice as well. If you get 30 links for each different URL, don’t think you got 90 links to http://www.yourwebsitename.com . It’s only 30 as the URL variations have compromised your link building efforts. So don’t expect to get the boost in rankings for http://www.yourwebsitename.com since you initially thought you got 90 inbound links.
Another downside is losing PageRank partially or totally to your main domain. Talking from experience, this is what I’ve encountered. I have a domain with PageRank 3 and out of nowhere, it dropped to PageRank 0. I knew it still has solid backlinks to it and I didn’t do anything to get it penalized. I was perplexed.
How Did I Know I was Affected?
The first thing I did was to check whether my domain was still indexed. After a check on Google with the site command “site:www.yourwebsitename.com”, I found my domain still listed but it didn’t have “www” in front just yourwebsitename.com.
I found that weird because it has always been indexed with “www”. I checked my other indexed pages and all have “www”. I made a thorough search on this issue and found that I had canonical URLs. My internal linking was consistent and I didn’t build inbound links to my domain using different URL versions so should be something else. I suspect I could have a few links with different URL versions which I didn’t build and Google mistakenly confused http://yourwebsitename.com as being my main domain instead of http://www.yourwebsitename.com resulting in my PageRank being dropped.
What Did I Do To Solve The Problem?
I think if I didn’t find a solution in time, http://yourwebsitename.com would have received the PR3 instead in the next update. Fortunately I found a quick solution after research. This is what I did.
I did a 301 Redirect from “Non-www” to “www” to tell Google that http://yourwebsitename.com and http://www.yourwebsitename.com are the same websites. This 301 Redirect will force resolution to only one URL. Here is what the 301 Redirect looks like. It’s actually a code.
============================================
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{HTTP_HOST} ^yourwebsitename\.com$ [NC]
RewriteRule ^(.*)$ http://www.yourwebsitename.com/$1 [R=301,L]
============================================
You need to input this code in your .htaccess file in your server to instruct Google to do the 301 Redirect. Your .htaccess file is located in the same folder as your index file. Just download it and open it. To open .htaccess, simply drag-and-drop it in a text editor like Notepad or Metapad. Next, input the 301 Redirect code there and save. Now, just upload .htaccess to your server. That’s what I did and after some time, my domain was indexed again but this time with “www”. I also regained my initial PageRank.
The canonicalization URL is a pretty technical issue and many people can get confused with what’s happening to their website or blog. I hope this article has shed some light on this subject.
This is a guest post by Jean Lam, if you want to guest post check the guidelines here.
Image Credit: bull3t
21 comments