Part Two: How to index products in uCommerce using Lucene .NET
This blog post is a followup to a previous post explaining how to index products in uCommerce using Lucene .NET If you haven't read that post, please do so first.
Introduction to Searching with Lucene
In my previous post I told you how you can easily index uCommerce products (in Umbraco) using the Lucene text search engine. But indexing products is only half the story - you also need to be able to search those products too! Luckily searching a Lucene index is fairly straightforward once you get past some of the jargon - though more advanced searches do require an understanding or Lucene query language.
In this post I'll show you a simple example of an Umbraco Razor macroscript (.cshtml) that can be used to search your index and return the results in order of relevance. If you deal exclusively in XSLT then I'm afraid you'll have to work out how to create your own XSLT extension to execute this. Or if you prefer to working using ASP.NET user controls then that will work, too.
The Basics of a Lucene Search
To search a Lucene index you need to do a few things:
- Know the path to the directory where your index is stored
- Create your search query - normally by parsing a search phrase
- Execute that query and return some results (in order of relevance)
- Iterate over the results, extracting the relevant field data from each document returned by the search
- Display those results (normally with a link back to the item that was indexed)
Code Example
So how would some code that does this look? Well, below I'll show you a simple example. It's simple in that it performs the basics, but if you want things like pagination or highlighting then that is something you'll need to implement yourself :p Note that in this example the searchPhrase is 'hard-coded' but in reality you'd probably pass this in from a query string parameter.
/* The path to where your index folder is */ string dirPath = Server.MapPath("~/App_Data/TEMP/ExamineIndexes/ProductsIndex/"); DirectoryInfo di = new DirectoryInfo(dirPath); /* The maximum number of results to show */ const int maxResults = 100; /* The phrase you are searching for */ string searchPhrase ="google nexus 7"; var analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29); /* Create a new boolean query using the fields you want to search */ Lucene.Net.Search.BooleanQuery bq = new Lucene.Net.Search.BooleanQuery(); Lucene.Net.Search.Query query; var parser = new Lucene.Net.QueryParsers.QueryParser(Lucene.Net.Util.Version.LUCENE_29,"DisplayName", analyzer); query = parser.Parse(searchPhrase); query.SetBoost(20);// boost score to make this field more relevant bq.Add(query,Lucene.Net.Search.BooleanClause.Occur.SHOULD); parser =new Lucene.Net.QueryParsers.QueryParser(Lucene.Net.Util.Version.LUCENE_29,"Sku", analyzer); query = parser.Parse(searchPhrase); query.SetBoost(50); bq.Add(query,Lucene.Net.Search.BooleanClause.Occur.SHOULD); parser = new Lucene.Net.QueryParsers.QueryParser(Lucene.Net.Util.Version.LUCENE_29,"Description", analyzer); query = parser.Parse(searchPhrase); bq.Add(query,Lucene.Net.Search.BooleanClause.Occur.SHOULD); /* Open the directory to be searched... */ using(var directory = Lucene.Net.Store.FSDirectory.Open(di)) { using(var searcher =new Lucene.Net.Search.IndexSearcher(Lucene.Net.Index.IndexReader.Open(directory,true))) { /* execute the query and return the top hits */ var collector = Lucene.Net.Search.TopScoreDocCollector.create(maxResults,true); searcher.Search(bq, collector); Lucene.Net.Search.ScoreDoc[] hits = collector.TopDocs().ScoreDocs; <h3>[email protected]"@searchPhrase"</h3> /* loop over the results and extract the fields we want from the index and display them */ <ul> @for (int i = 0; i < hits.Length; i++) { int docId = hits[i].doc; float score = hits[i].score; //thisis an indicator of relevance Lucene.Net.Documents.Document doc = searcher.Doc(docId); /* retrieve values from our fields */ int productId = int.Parse(doc.Get("ID")); string productUrl = doc.Get("Url"); string productName = doc.Get("DisplayName"); <li><a href="@productUrl">@productName</a> (@score)</li> } </ul> } }
How It Works
The main thrust of the code is to do with parsing the search phrase to create your query. It is the query that is then passed to the searcher object that performs the actual search. There are various ways to parse a query in Lucene, but in this example I use the BooleanQuery class. This class allows us to search across multiple fields by combining queries - you can basically define whether the phrase SHOULD occur, MUST occur or MUST_NOT occur. In our query I use "SHOULD" to indicate that our search phrase should occur in at least one of the fields for a match to occur. I also use the query.SetBoost() method on some fields to indicate they are more important than others - so, in my example, a match in the DisplayName field of the product is weighted higher than one in the Description.
After we have parsed the search we can then excute it to get an array of ScoreDoc objects back. We can then loop over these and extract the values from the fields we indexed. So, for instance, to get back the ID of the product then you can simply use the Get("ID") method of the doc to return the field value. Once you have this you can always use the uCommerce API to get a reference to the actual Product object:
int productId = int.Parse(doc.Get("ID"));
var product = UCommerce.EntitiesV2.Product.Get(productId);
And that is basically it. Of course, you can make this code much neater, but I hope it will be a good starting point for you all.
1 Comment
Daniel
Thanks for taking the time to write these articles! I have always found the Lucene very confusing, so I was not looking forward to indexing uCommerce products, but your code made it very easy to understand. Cheers!
Leave a Comment
Just fill in the form and click Submit. But note all comments are moderated, so spare the viagra spam!
Tip: You can use Markdown syntax within comments.