Creating an XML (Google) Sitemap for Umbraco using Linq To Xml
It's becoming more and more commonplace to add an XML Sitemap to your website. In a nutshell a sitemap allows search engines to easily discover every page in your site hierarchy and therefore crawl them. A lot of people think sitemaps are a Google thing, but actually they are a standard defined by http://www.sitemaps.org/, though of course Google does utilise them. But so do many other search engines, too.
So what does a sitemap look like? In essence it's a very simple XML file that contains a list of all the pages in your site. The sitemap protocol defines the structure of the XML.
Creating a Sitemap for Umbraco CMS
In this post I want to concentrate on automatically generating a sitemap for an Umbraco site (Umbraco is a popular .NET CMS). There are a couple of Umbraco packages that do this, but I find my way simpler and quicker to deploy - simply copy the file into the root of your website and that is it - no packages needed.
I also aim to show you how you can use the Umbraco PublishedContent API and LinqToXml (introduced in .NET 3.5) to do this. The code will be in C#, but should be easily convertible to other .NET languages, and is created using a standard generic handler .ashx file.
The basic principle is to create an XDocument and then iterate over every published node (using IPublishedContent) creating an XElement for each node (page). The handler then outputs the resulting XML document, setting the correct content-type so that the output is seen as XML by search engines.
Generic Handler Code (C#)
You can then just reference this in your robots.txt file like this:
User-Agent: * Sitemap: http://www.example.com/sitemap.ashx