Sunday, August 27, 2006

Extend the canonicalization feature in Google Sitemaps

Canonicalization, for those readers who don't know, is a term that refers to which URL is the definitive URL of a site, the one that the site most wants to be known by. For example, your home page address may be linked to on some websites without the www. Inevitably this happens, even though you really would prefer they link to it with the www.

It is very nice that Google Sitemaps (now renamed to Google Webmaster Tools) has a feature where you can specify which version of your site you want indexed: the one with the www or without. Then all inbound links to either version should aggregate to the version that you specify through Sitemaps. However, this doesn't go nearly far enough because many sites own multiple domain name, for example typo versions of their brand names to protect them from cybersquatters or to bring visitors in who don't know how to spell. And there are plenty of blogs out there under domains like typepad.com and blogs.com but the blogger has also signed up for TypePad's Pro service and have the blog under their own domain name too. For example, divamarketingblog.com and bloombergmarketing.blogs.com is the same blog, but the blogger (Toby Bloomberg) had to redirect www.divamarketingblog.com to bloombergmarketing.blogs.com to resolve the canonicalization issue, even though the redirect should have gone the other way. Unfortunately, the folks at TypePad (who control blogs.com), do not allow their bloggers to have a 301 or 302 redirect issued from blogs.com or from typepad.com. Therefore the only way to properly solve this canonicalization issue for bloggers on TypePad is for Google to extend the Google Sitemaps canonicalization feature to allow for other URLs too, not just the www or no www switch. So in the case of Diva Marketing Blog, Toby should be allowed to specify that her preferred canonical URL is www.divamarketingblog.com, not bloombergmarketing.blogs.com.

Think of all the huge number of Typepad blogs out there where they have signed up for the Pro service and have their own domain name associated with that blog. That adds up to a heck of a lot of duplicate pages in the Google index that could be eliminated with this Sitemaps feature.

No comments: