Lucene.Net Object Mapping

Today I finally took some time to turn a little library I’ve used for a while now into a NuGet package, called Lucene.Net.ObjectMapping. At the same time, I also uploaded the code to GitHub. But let’s look at Lucene.Net Object Mapping in more detail.

How To Install

Since this is a NuGet package, installation is as simple as running the following command in the Package Manager Console

Install-Package Lucene.Net.ObjectMapping

Alternatively, you can just search for Lucene.Net.ObjectMapping in the package manager and you should find it.

How To Use It?

Using object mapping is as simple as calling two methods: ToDocument to convert an object into a document and ToObject to convert a Document (that was created with the ToDocument method) into the original object.

MyObject obj = ...;
Document doc = obj.ToDocument();
// Save the document to your Lucene.Net Index

// Later, load the document from the index again
Document docFromIndex = ...;
MyObject objFromDoc = docFromIndex.ToObject<MyObject>();

How does it work?

Under the covers, the library is JSON-serializing the object and stores the JSON in the actual Lucene.Net document. In addition, it stores some metadata like the actual and the static types of the object you stored, as well as the timestamp (ticks) of when the document was created. The type information is used when you search for documents that were created for a specific type. The static type is the type you pass in as the type parameter to ToDocument, whereas the actual type is the actual (dynamic) type of the object you’re passing in. Since all this information is stored in the document too, there are no issues re-creating objects from an class hierarchy too.
In addition to storing the object information itself, the library also indexes the individual properties of the object you’re storing, including nested properties. By default, it uses a mapper which works as follows.

  • Public properties/fields of objects are mapped to Lucene.Net fields with the same name; e.g. a property called “Id” is mapped to a field called “Id”.
  • Properties/fields that are arrays are mapped to multiple Lucene.Net fields, all with the same name (the name of the property that holds the array).
  • Nested properties/fields, i.e. objects from properties/fields, use the name of the property as a prefix for the properties/fields of the object.

Each field is created with the following mapping of field types:

  • Boolean properties are mapped to a numeric field (Int) with a value of 1 for true and 0 for false.
  • DateTime properties are mapped to a numeric field (Long) with the value being the Ticks property of the DateTime.
  • Float properties are mapped to a numeric field (Float) with the value being the float value.
  • Double and Decimal properties are mapped to a numeric field (Double) with the value being the double value.
  • Guid properties are mapped to string fields which are NOT_ANALYZED, i.e. you can search for the GUID as is.
  • Integer (also Long, Short, and Byte as well as their unsigned/signed counterparts) properties are mapped to a numeric field (Long) with the value being the integer value.
  • Null values are not mapped at all; thus, the absence of a field implies the corresponding property is null.
  • String properties are mapped to string fields which are ANALYZED.
  • TimeSpan properties are mapped to a numeric field (Long) with the value being the Ticks property of the TimeSpan.
  • Uri properties are mapped to string fields which are ANALYZED.

Example Mapping

Let’s look at a simple example of an object and its mapping to a Lucene.Net Document. Consider the following object model.

public class MyObject
{
    public int Id { get; set; }
    public string Name { get; set; }
    public ObjectMeta Meta { get; set; }
}

public class ObjectMeta
{
    public DateTime LastModified { get; set; }
    public string ModifiedBy { get; set; }
    public string[] Modifications { get; set; }
}

// Create an instance of MyObject
MyObject obj = new MyObject()
{
    Id = 1234,
    Name = "My Lucene.Net mapped Object",
    Meta = new ObjectMeta()
    {
        LastModified = DateTime.UtcNow,
        ModifiedBy = "the dude",
        Modifications = new string[] { "changed a", "removed b", "added c" },
    },
};

Document doc = obj.ToDocument();

The mapping rules called out above will add the following fields for searching to the document. Please note that I’m not calling out the fields needed for the internal workings of the Lucene.Net.ObjectMapping library.

Field Name Type Value
Id Numeric / Long 1234
Name String / ANALYZED My Lucene.Net mapped Object
Meta.LastModified Numeric / Long < the number of ticks at the current time >
Meta.ModifiedBy String / ANALYZED the dude
Meta.Modifications String / ANALYZED changed a
Meta.Modifications String / ANALYZED removed b
Meta.Modifications String / ANALYZED added c

The mapper is by no means complete. Ideas to extend it in the future exist, including functionality to

  • specify attributes on string properties (or properties mapped to string fields) to specify how to index the string (NO vs ANALYZED vs NOT_ANALYZED vs NOT_ANALYZED_NO_NORMS vs ANALYZED_NO_NORMS).
  • specify attributes on any properties to define how to map the field, e.g. by specifying a class which can map the field

I’ll talk a little more on how to use this all when searching for documents in your Lucene.Net index. But as a sneak preview: the library also provides extension methods to the Searcher class from Lucene.Net that you can use to specify an object type to filter your documents on.