Indexing uCommerce Products in Umbraco with Lucene.NET Blog post

Lucene.Net

This blog post deals with indexing and searching products in uCommerce using Lucene .NET (a dot net port of the Java Lucene text search engine library). This first post is about how to create an index using Lucene, and will be followed up by a post of how to use that index for searching in Part Two.

Introduction to uCommerce and Database-driven Searching

uCommerce is a popular e-commerce platform that is built upon the Umbraco CMS. It is a powerful and flexible platform for building online shops, but (rather like Umbraco itself) it is more of a framework than an "out-of-the-box" product. In other words, it comes with all the functionality you need to develop your shopping site, but leaves it up to you how to "assemble" the parts together. Initially uCommerce was built around XSLT templating, but the latest version has been built with Razor templating in mind, too. To get started using Razor then it is well worth checking out the uCommerce Razor store.

However, one of the weaknesses of uCommerce is the search. Because uCommerce is database driven (and built on-top of nHibernate ORM) then it is assumed you will use the API to perform any product searches. This works OK for basic single keyword searches, as you can formulate queries farily easily. For example, a simple search might look like:

Example

        var keyword = HttpContext.Current.Request.QueryString["search"];

if(!string.IsNullOrWhiteSpace(keyword))
{
    var products =Product.Find(p =>
                            p.VariantSku == null
                            && p.DisplayOnSite
                            &&
                            (
                                p.Sku.Contains(keyword)
                                || p.Name.Contains(keyword)
                                || p.ProductDescriptions.Any(d => d.DisplayName.Contains(keyword) 
                                                                    || d.ShortDescription.Contains(keyword)
                                                                    || d.LongDescription.Contains(keyword)
                                                            )
                            )
                        );
}
    

However, you are limited to basic keyword matches where you are looking for one string in another string using Contains(). This works fine if you type in a singe keyword but will probably fail to return anything if you enter an entire phrase (since that entire phrase will have to be matched in it's entirety in one of your fields). To put it simply, potential customers will expect a lot more than this.

A Better Way of Searching - Using the Lucene Text Search Engine

Lucene is, "a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform." Lucene .NET is the .NET port of this library. If you are familiar with Umbraco you will have heard of Examine- well, Examine is just Umbraco's implementation of Lucene (used for searching the backend). So the good news is that if you have installed Umbraco, you also have installed Lucene. NET too.

This means that we can easily use the power of Lucene to create a searchable index of products in our uCommerce database. The advantage of this is that Lucene supports much more powerful queries and is also lightening fast, too.

Creating a Product Indexer

It is actually very simple to create an index of all your products using Lucene. There are 4 basic steps:

  1. Identify and create a folder for where you want to store your index files
  2. Use the uCommerce API to create a query to retrurn the products you want to index
  3. For every product create a Lucene Document containing the fields you wish to store in your index
  4. Inserts these documents in your index and write it to your index folder

There are many ways you can create an index - such as in an ASP.NET user control, a C# class library or a Razor macroscript. However, the way I'm going to do it is using a generic ASP.NET handler file (ASHX file). The advantage of a handler is that it is lightweight and can easily be called by accessing a URL in your site. The code I'll show below is just a starting point, of course - but it should be enough to get you started.

Show Me teh Codes!!!

OK, enough waffle - just show me some code for how to do this! Fair enough, a working example is the easiest way to learn (this is based on uCommerce v3). To create a generic handler (.ashx) file you can use Visual Studio - chose "Add New Item" and then select "Generic Handler" from the list. You can place this file anywhere within your website, but I've called mine "IndexProducts" which will give you a file called IndexProducts.ashx. In the code-behind you can then replace the example code with the C# code below:

Code Example

        [email protected] Language="C#" Class="IndexProducts" %>

using System;
using System.Web;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using UCommerce.EntitiesV2;
using UCommerce.Extensions;
using umbraco.cms.businesslogic.web;
using Lucene.Net.Analysis;
using Lucene.Net.Documents;

/// 
/// Handler to create Lucene indexes for uCommerce products
/// 
public class IndexProducts : IHttpHandler
{
    public void ProcessRequest(HttpContext context)
    {
        context.Response.ContentType="text/plain";

        // set script timeout to be 600 seconds incase indexing takes a while
        context.Server.ScriptTimeout = 600;

        // define the folder where indexes will be created
        conststring indexPath = "~/App_Data/TEMP/ExamineIndexes/ProductsIndex"; 
        
        CreateLuceneIndex(indexPath, context);
    }

    privatevoidCreateLuceneIndex(string basePath,HttpContext context)
    {
        // purely used for diagnostics
        var stopwatch = newSystem.Diagnostics.Stopwatch(); 
        
        /* get the absolute path to the directory where the indexes will be created (and if it doesn't exist, create it) */

        string dirPath = context.Server.MapPath(basePath);

        if (!Directory.Exists(dirPath))
        {
            Directory.CreateDirectory(dirPath);
        }

        DirectoryInfo di = newDirectoryInfo(dirPath);
        Lucene.Net.Store.FSDirectory directory = Lucene.Net.Store.FSDirectory.Open(di);
        
        /* Select the standard Lucene analyser */
        
        var analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
        
        stopwatch.Start();
        int count =0;
        
        /* Open the index writer using the selected analyser */

        using(Lucene.Net.Index.IndexWriter writer = newLucene.Net.Index.IndexWriter(directory, analyzer,true,Lucene.Net.Index.IndexWriter.MaxFieldLength.UNLIMITED))
        {
            // Get all the visible products from uCommerce we wish to index
            var products = Product.All().Where(p => p.DisplayOnSite);

            // Loop through the products
            foreach(var product in products)
            {
                /* For every product, we create a new document and add the fields we want to index to it */
                
                var doc = newLucene.Net.Documents.Document();
                
                var url = UCommerce.Api.CatalogLibrary.GetNiceUrlForProduct(product);
                
                /* Note: the field "ManufacturerCode" is an example custom field which you probably won't have - so remove */
                
                doc.Add(newLucene.Net.Documents.Field("ID", product.Id.ToString(),Field.Store.YES,Field.Index.NOT_ANALYZED,Field.TermVector.YES));
                doc.Add(newLucene.Net.Documents.Field("Url", url,Field.Store.YES,Field.Index.NOT_ANALYZED,Field.TermVector.YES));
                doc.Add(newField("Sku", product.Sku,Field.Store.YES,Field.Index.ANALYZED,Field.TermVector.YES));
                doc.Add(newField("DisplayName", product.DisplayName()?? product.Name,Field.Store.YES,Field.Index.ANALYZED,Field.TermVector.YES));
                doc.Add(newField("Description", product.LongDescription()??"",  Field.Store.YES,Field.Index.ANALYZED,Field.TermVector.YES));
                doc.Add(newField("ManufacturerCode", product.GetPropertyValue("ManufacturerCode"),Field.Store.YES,Field.Index.ANALYZED,Field.TermVector.YES));
                writer.AddDocument(doc);
                count++;
            }

            /* We optimise the index and close the writer */
            
            writer.Optimize();
            writer.Close();
        }
        
        stopwatch.Stop();

        context.Response.Write(String.Format("Indexed {0} products in {1}.\n\n", count, stopwatch.Elapsed.ToString()));
    }

    publicboolIsReusable
    {
        get
        {
            returnfalse;
        }
    }

}
    

How it Works

Note: This is an old post and you'd use a SurfaceController or similar in more recent versions of Umbraco MVC to perform the indexing.

The main part of the script is contained in the foreach loop that iterates over the products fetched by the uCommerce API query. In this we we create a new document and the add the fields we want to index to it. In my example I include a custom field called ManufacturerCode - this is just an example of how to access a custom property, and will probably need changing depending on what custom fields you may have. You can store as many (or as few) fields as you like, though the more text you store the bigger your index will be.

For way we store values are as a Field in Lucene - check the docs for more detail on what the different values mean. Generally you will want to store the value in the field as analysed and searchable.

How to Run the Script

An .ashx file is just like an .aspx page and can be run from a browser. So simply navigate to your script in a browser to run it. If all goes well you will see a message similar to the one below:

Indexed 1015 products in 00:00:01.6257507.

As you will notice, Lucene is very fast!

Scheduled Indexing

In reality you won't want to index your products manually by navigating to a script in your browser. What you will probably want is a scheduled task to trigger indexing. Luckily it's easy to wire up this using Umbraco's own Task Scheduler. Just open the umbracoSettings.config file in /config/ and navigate to the scheduledTasks element. Here you can add your own task that is called on a regular schedule (in my example every hour - 3600 seconds). Just enter the full URL to your .ashx file, something like:

<scheduledTasks>
   
<!-- add tasks that should be called with an interval (seconds) -->
   
<tasklog="true" alias="productsLuceneIndex" interval="3600" url="http://localhost/IndexProducts.ashx"/>
</scheduledTasks>

And there you have it, your products will now be indexed on a regular basis!

Remember to tune in for Part Two where I show you how to search your sparkly new products index...


1 Comment


Hamaramall Avatar

How much i got to know about Lucene.Net is a line-by-line port of popular Apache Lucene, which is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search. Thanks to you for providing database for u commerce using lucene.net by code XSLT template.This code has help to implement me a new code research related to XML documents. And this online code docs have help me to get through implementing new structures codes.

Just fill in the form and click Submit. But note all comments are moderated, so spare the viagra spam!

Tip: You can use Markdown syntax within comments.