It's becoming more and more commonplace to add an XML Sitemap to your website. In a nutshell a sitemap allows search engines to easily discover every page in your site hierarchy and therefore crawl them. A lot of people think sitemaps are a Google thing, but actually they are a standard defined by http://www.sitemaps.org/, though of course Google does utilise them. But so do many other search engines.
So what does a sitemap look like? In essence it's a very simple XML file that contains a list of all the pages in your site. The sitemap protocol defines the structure of the XML.
Creating a Sitemap for Umbraco CMS
In this post I want to concentrate on automatically generating a sitemap for an Umbraco site (Umbraco is a popular .NET CMS). There are a couple of Umbraco packages that do this, but I find my way simpler and quicker to deploy - simply copy the file into the root of your website and that is it - no packages needed.
I also aim to show you how you can use the Umbraco API and LinqToXml (introduced in .NET 3.5) to do this. The code will be in C#, but should be easily convertible to other .NET languages, and is created using a standard generic handler .ashx file.
The basic principle is to create an XDocument and then iterate over every published node using the Umbraco nodeFactory creating an XElement for each page. The handler then outputs the resulting XML document, setting the correct content-type.
The C# Code for the Handler
using System;
using System.Web;
using System.Xml.Linq;
using umbraco.presentation.nodeFactory;
public class SiteMap : IHttpHandler
{
/* Generates an XML Sitemap for Umbraco using LinqToXml */
private static readonly XNamespace xmlns = "http://www.sitemaps.org/schemas/sitemap/0.9";
public void ProcessRequest(HttpContext context)
{
// Set correct headers from XML
context.Response.ContentType = "text/xml";
context.Response.Charset = "utf-8";
// Get the absolute base URL for this website
Uri url = HttpContext.Current.Request.Url;
string baseUrl = String.Format("{0}://{1}{2}", url.Scheme, url.Host, url.IsDefaultPort ? "" : ":" + url.Port);
// Create a new XDocument using namespace and add root element
XDocument doc = new XDocument(new XDeclaration("1.0", "utf-8", "yes"));
XElement urlset = new XElement(xmlns + "urlset");
// Get the root node
Node root = new Node(-1);
// Iterate all nodes in site and add them to document
RecurseNodes(urlset, root, baseUrl);
doc.Add(urlset);
// Write XML document to response stream
context.Response.Write(doc.Declaration + "\n");
context.Response.Write(doc.ToString());
}
// Method to recurse all nodes and create each element
private static void RecurseNodes(XElement urlset, Node node, string baseUrl)
{
foreach (Node n in node.Children)
{
// If the document has a property called "hidePage" set to true then ignore this node
if (n.GetProperty("hidePage") == null || n.GetProperty("hidePage").Value != "1")
{
string url = umbraco.library.NiceUrl(n.Id);
// Tidy up home page so it's more canonical
if (url.EndsWith("/home.aspx"))
url = url.Replace("/home.aspx", "/");
// Create the XML node
XElement urlNode = new XElement(xmlns + "url", new XElement(xmlns + "loc", baseUrl + url), new XElement(xmlns + "lastmod", n.UpdateDate.ToUniversalTime()));
urlset.Add(urlNode);
}
// Check if the node has any child nodes and, if it has, recurse them
if (node.Children != null && node.Children.Count > 0)
RecurseNodes(urlset, n, baseUrl);
}
}
public bool IsReusable
{
get
{
return false;
}
}
}
thanks, looking forward to read more linq to umbraco posts! as i understand, this will create a url like wwww.mysite.com/sitemap.ashx and i can then tell google that my sitemap located in this address? thanks.