Lucene.Net Object Mapping

Today I finally took some time to turn a little library I’ve used for a while now into a NuGet package, called Lucene.Net.ObjectMapping. At the same time, I also uploaded the code to GitHub. But let’s look at Lucene.Net Object Mapping in more detail.

How To Install

Since this is a NuGet package, installation is as simple as running the following command in the Package Manager Console

Install-Package Lucene.Net.ObjectMapping

Alternatively, you can just search for Lucene.Net.ObjectMapping in the package manager and you should find it.

How To Use It?

Using object mapping is as simple as calling two methods: ToDocument to convert an object into a document and ToObject to convert a Document (that was created with the ToDocument method) into the original object.

MyObject obj = ...;
Document doc = obj.ToDocument();
// Save the document to your Lucene.Net Index

// Later, load the document from the index again
Document docFromIndex = ...;
MyObject objFromDoc = docFromIndex.ToObject<MyObject>();

How does it work?

Under the covers, the library is JSON-serializing the object and stores the JSON in the actual Lucene.Net document. In addition, it stores some metadata like the actual and the static types of the object you stored, as well as the timestamp (ticks) of when the document was created. The type information is used when you search for documents that were created for a specific type. The static type is the type you pass in as the type parameter to ToDocument, whereas the actual type is the actual (dynamic) type of the object you’re passing in. Since all this information is stored in the document too, there are no issues re-creating objects from an class hierarchy too.
In addition to storing the object information itself, the library also indexes the individual properties of the object you’re storing, including nested properties. By default, it uses a mapper which works as follows.

  • Public properties/fields of objects are mapped to Lucene.Net fields with the same name; e.g. a property called “Id” is mapped to a field called “Id”.
  • Properties/fields that are arrays are mapped to multiple Lucene.Net fields, all with the same name (the name of the property that holds the array).
  • Nested properties/fields, i.e. objects from properties/fields, use the name of the property as a prefix for the properties/fields of the object.

Each field is created with the following mapping of field types:

  • Boolean properties are mapped to a numeric field (Int) with a value of 1 for true and 0 for false.
  • DateTime properties are mapped to a numeric field (Long) with the value being the Ticks property of the DateTime.
  • Float properties are mapped to a numeric field (Float) with the value being the float value.
  • Double and Decimal properties are mapped to a numeric field (Double) with the value being the double value.
  • Guid properties are mapped to string fields which are NOT_ANALYZED, i.e. you can search for the GUID as is.
  • Integer (also Long, Short, and Byte as well as their unsigned/signed counterparts) properties are mapped to a numeric field (Long) with the value being the integer value.
  • Null values are not mapped at all; thus, the absence of a field implies the corresponding property is null.
  • String properties are mapped to string fields which are ANALYZED.
  • TimeSpan properties are mapped to a numeric field (Long) with the value being the Ticks property of the TimeSpan.
  • Uri properties are mapped to string fields which are ANALYZED.

Example Mapping

Let’s look at a simple example of an object and its mapping to a Lucene.Net Document. Consider the following object model.

public class MyObject
{
    public int Id { get; set; }
    public string Name { get; set; }
    public ObjectMeta Meta { get; set; }
}

public class ObjectMeta
{
    public DateTime LastModified { get; set; }
    public string ModifiedBy { get; set; }
    public string[] Modifications { get; set; }
}

// Create an instance of MyObject
MyObject obj = new MyObject()
{
    Id = 1234,
    Name = "My Lucene.Net mapped Object",
    Meta = new ObjectMeta()
    {
        LastModified = DateTime.UtcNow,
        ModifiedBy = "the dude",
        Modifications = new string[] { "changed a", "removed b", "added c" },
    },
};

Document doc = obj.ToDocument();

The mapping rules called out above will add the following fields for searching to the document. Please note that I’m not calling out the fields needed for the internal workings of the Lucene.Net.ObjectMapping library.

Field Name Type Value
Id Numeric / Long 1234
Name String / ANALYZED My Lucene.Net mapped Object
Meta.LastModified Numeric / Long < the number of ticks at the current time >
Meta.ModifiedBy String / ANALYZED the dude
Meta.Modifications String / ANALYZED changed a
Meta.Modifications String / ANALYZED removed b
Meta.Modifications String / ANALYZED added c

The mapper is by no means complete. Ideas to extend it in the future exist, including functionality to

  • specify attributes on string properties (or properties mapped to string fields) to specify how to index the string (NO vs ANALYZED vs NOT_ANALYZED vs NOT_ANALYZED_NO_NORMS vs ANALYZED_NO_NORMS).
  • specify attributes on any properties to define how to map the field, e.g. by specifying a class which can map the field

I’ll talk a little more on how to use this all when searching for documents in your Lucene.Net index. But as a sneak preview: the library also provides extension methods to the Searcher class from Lucene.Net that you can use to specify an object type to filter your documents on.

Writing to Event Log — the right way

This one’s been on my mind for a long time. I know it’s very tempting to just use System.Diagnostics.EventLog.WriteEntry to write some string to the event log. But personally I never liked the fact that you write all that static text along with the variables like actual error messages etc. Why make your life harder analyzing events later on when there’s an easy way to fix that?

Instrumentation Manifests to the Rescue!

For a while now this has actually been quite easy, using instrumentation manifests. You can read more about it here: http://msdn.microsoft.com/en-us/library/windows/desktop/dd996930(v=vs.85).aspx. These manifests allow you to define events, templates for events, messages for events, even your own event channels (so you wouldn’t need to log into that crowded “Application” channel anymore) and a lot more. But let’s look at a little example.

<?xml version="1.0" encoding="utf-8"?>
<instrumentationManifest xsi:schemaLocation="http://schemas.microsoft.com/win/2004/08/events eventman.xsd" xmlns="http://schemas.microsoft.com/win/2004/08/events" xmlns:win="http://manifests.microsoft.com/win/2004/08/windows/events" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:trace="http://schemas.microsoft.com/win/2004/08/events/trace">
    <instrumentation>
        <events>
            <provider name="MyService" guid="{DDB3FC6E-6CC4-4871-9F27-88C1B1F19BBA}" symbol="TheEventLog"
                      message="$(string.MyService.ProviderMessage)"
                      resourceFileName="MyService.Events.dll"
                      messageFileName="MyService.Events.dll"
                      parameterFileName="MyService.Events.dll">
                <events>
                    <event symbol="ServiceStarted" version="0" channel="Application"
                           value="1000" level="win:Informational"
                           message="$(string.MyService.event.1000.message)" />
                    <event symbol="ServiceStopped" version="0" channel="Application"
                           value="1001" level="win:Informational"
                           message="$(string.MyService.event.1001.message)"/>
                    <event symbol="ServiceConfigurationError" version="0" channel="Application"
                           value="1002" level="win:Error" template="ServiceException"
                           message="$(string.MyService.event.1002.message)"/>
                    <event symbol="ServiceUnhandledException" version="0" channel="Application"
                           value="1003" level="win:Error" template="ServiceException"
                           message="$(string.MyService.event.1003.message)"/>
                </events>
                <levels/>
                <channels>
                    <importChannel name="Application" chid="Application"/>
                </channels>
                <templates>
                    <template tid="ServiceException">
                        <data name="Exception" inType="win:UnicodeString" outType="xs:string"/>
                    </template>
                </templates>
            </provider>
        </events>
    </instrumentation>
    <localization>
        <resources culture="en-US">
            <stringTable>
                <string id="level.Informational" value="Information"/>
                <string id="level.Error" value="Error"/>
                <string id="channel.Application" value="Application"/>

                <string id="MyService.ProviderMessage"
                        value="My Windows Service"/>

                <string id="MyService.event.1000.message"
                        value="My Windows Service has started."/>
                <string id="MyService.event.1001.message"
                        value="My Windows Service has stopped."/>
                <string id="MyService.event.1002.message"
                        value="My Windows Service encountered a problem with its configuration. Please fix these issues and start the service again.:%n%n%1"/>
                <string id="MyService.event.1003.message"
                        value="My Windows Service encountered an unhandled exception:%n%n%1"/>
            </stringTable>
        </resources>
    </localization>
</instrumentationManifest>

Let’s start at the top. Lines 5-9 define some basic information about this instrumentation provider, like a name, a unique ID and a symbol (which will come in handy later). We can also define a friendly name for events logged this way (i.e. the event source). Let’s ignore the three xyzFileName attributes for now. On lines 11-22 we’re defining four events, some of them informational (like “the service started” or “the service stopped”), some are errors (e.g. configuration errors, or unhandled exceptions). If we wanted to define our own channel, we’d do so between lines 25 and 27. For now we’re just re-using (i.e. importing) the pre-defined “Application” channel.

Event Templates

Event templates are particularly handy if you want to write parameters with your events. Lines 29-31 define a template which has exactly one parameter, which happens to be a unicode string. We’ll use it to store exceptions. We can define more than one parameter and there’s a lot of types to use, but I’ll let you explore those on your own. This template, as you can see, is referred to by the two events with IDs 1002 and 1003.

Resources

The localization gods are with us to. Our event and template definitions so far were abstract, no actual UI strings were contained. We can define those per language, as you can see starting line 37. In the resources element and its sub-elements, we define the actual strings we want to show, including any parameters. Parameters are numbered (1-based) and are referred to with %1, %2, %3 and so on. As you can see on lines 51 and 53, we’re defining the strings for the two error events with one parameter each (“%1”), to contain the exception message. If you want line breaks, you’ll achieve those with “%n”.

Compile, with some Sugar added

So now we have a fancy manifest, but what can we do with it? Well, eventually we want to log events using the definitions from this manifest, so let’s get to it. The Windows SDK comes with two very handy tools, MC.exe (the message compiler) and RC.exe (the resource compiler). We’ll use the first to compile the manifest — and generate some c# code as a side effect — then use the second to compile the output of the first into a resource which can be linked into an executable. The commands are as follows.

mc.exe -css MyService.Events manifest.man -r obj\Debug
rc.exe obj\Debug\manifest.rc

MC.exe was nice enough to generate a file called manifest.cs for us. That file contains some code that you can use to log every event you defined in the manifest. This is why it was so handy to define the events (and templates): depending on how many parameters an event’s template has, the generated methods will ask you to provide just as many (typed) values for those parameters. Isn’t that great?! You’ll also find the compiled manifest.res file in obj\Debug. You can link that into its own executable (or your main assembly too, if you wanted), as follows:

csc.exe /out:MyService.Events.dll /target:library /win32res:obj\Debug\manifest.res

And you have a satellite assembly which holds the manifest you’ve built! CSC will log a warning about missing source files (because you didn’t add any .cs files to be compiled) but so far that doesn’t hurt anyone. We could probably also use link.exe but so far the C# compiler does a nice enough job.

Use that generated Code

Remember the code that was generated for us by MC.exe? Let’s go ahead and use it.

// ...
TheEventLog.EventWriteServiceStarted();
// ...
TheEventLog.EventWriteServiceConfigurationError(exception.Message); // ... or log the entire exception, including stack traces.
// ...

Wasn’t that very easy?

Install the Event Provider

There’s still something missing though: we’ll need to install our instrumentation/event provider with the system. It’s similar to creating the event source (which in fact will happen automatically when installing the manifest). This will typically happen in your application’s/service’s installer, using a command line as follows. But before that, remember the xyzFileName attributes we talked about? These need to be updated to point to the full path of the MyService.Events.dll assembly we generated. Otherwise the following command is going to fail.

wevtutil.exe im path\to\my\manifest.man

From now on, when your app or service starts and logs those events, they’ll show up in the event viewer. For the two events we defined with parameters, the values of the parameters are essentially the only thing that’s stored along with the ID of the event. Likewise, they’ll be the only thing that’s going to be exported with the event — so the files with the exported events you’re going to ask your customers to send you are going to be a lot smaller and won’t contain the static part of the events you already know anyway!

To uninstall the manifest, just run this command:

wevtutil.exe um path\to\my\manifest.man

Both commands need to run elevated (particularly important to remember when writing your installer).

Next Steps

As a next step, you’ll probably want to add the manual steps of compiling the manifest linking into the satellite assembly to the project file as automated targets. I’ll likely write another post about that in the future too.

Summary

As you can see, writing a manifest, compiling it and using the generated code to write to the event log is quite easy. So no more excuses to write each event as one big string (which is can be a lot harder to analyze when they come back from your customers because you first need to parse the strings).

Gzip Encoding an HTTP POST Request Body

I was wondering how difficult it was to Gzip-compress the body of an HTTP POST request (or any HTTP request with a body, that is), for large request bodies. While the .Net HttpClient has supported compression of response bodies for a while, it appears that to this day there is no out-of-the-box support for encoding the body of a request. Setting aside for now that the server may not natively support Gzip-compressed request bodies, let’s look at what we need to do to support this on the client side.

Enter HttpMessageHandler

The HttpMessageHandler abstract base class and its derived classes are used by the HttpClient class to asynchronously send HTTP requests and receive the response from the server. But since we don’t actually want to send the message ourselves – just massage the body and headers a little bit before sending – we’ll derive a new class GzipCompressingHandler from DelegatingHandler so we can delegate sending (and receiving) to another handler and just focus on the transformation of the content. So here’s what that looks like.

public sealed class GzipCompressingHandler : DelegatingHandler
{
    public GzipCompressingHandler(HttpMessageHandler innerHandler)
    {
        if (null == innerHandler)
        {
            throw new ArgumentNullException("innerHandler");
        }

        InnerHandler = innerHandler;
    }

    protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        HttpContent content = request.Content;

        if (request.Method == HttpMethod.Post)
        {
            // Wrap the original HttpContent in our custom GzipContent class.
            // If you want to compress only certain content, make the decision here!
            request.Content = new GzipContent(request.Content);
        }

        return base.SendAsync(request, cancellationToken);
    }
}

As you can see, all we’re doing is just wrapping the original HttpContent in our GzipContent class. So let’s get right to that.

Gzip-compressed HttpContent: GzipContent

We’re almost there, all we need to do is actually compressing the content and modify the request headers to indicate the new content encoding.

internal sealed class GzipContent : HttpContent
{
    private readonly HttpContent content;

    public GzipContent(HttpContent content)
    {
        this.content = content;

        // Keep the original content's headers ...
        foreach (KeyValuePair<string, IEnumerable<string>> header in content.Headers)
        {
            Headers.TryAddWithoutValidation(header.Key, header.Value);
        }

        // ... and let the server know we've Gzip-compressed the body of this request.
        Headers.ContentEncoding.Add("gzip");
    }

    protected override async Task SerializeToStreamAsync(Stream stream, TransportContext context)
    {
        // Open a GZipStream that writes to the specified output stream.
        using (GZipStream gzip = new GZipStream(stream, CompressionMode.Compress, true))
        {
            // Copy all the input content to the GZip stream.
            await content.CopyToAsync(gzip);
        }
    }

    protected override bool TryComputeLength(out long length)
    {
        length = -1;
        return false;
    }
}

Easy, right? Of course you could add other supported compression algorithms, using more or less the same code (or even adding some abstraction for different compression algorithms), but this is basically all that’s required.

Summary

Using the HttpMessageHandler and its associated classes makes it extremely easy to apply transformations to all (or a well-defined subset) of HTTP requests you’re sending. In this case, we’re applying Gzip-compression to the bodies of all outgoing POST requests, but the logic to decide when to compress can be as customized as you want; you could even apply Gzip-compression only if the requested URI ends with “.gzip” or for certain content types.

Dynamic AES Key Exchange Through RSA Encryption

I wanted to prototype encrypted communication channel between a client and a server. Now of course there are HTTPS and other TLS channels that work quite well, but what I have in mind is supposed to be used to transfer rather sensitive data. So how can I establish a secure channel through an HTTP/HTTPS channel?

  1. Have the server generate an RSA key pair and send the public key to the client.
  2. Have the client generate an AES key, encrypt it with the received public key, and send the encrypted key to the server.
  3. Let the server decrypt the AES key.
  4. Both the client and the server are now in possession of the same AES key and can therefore communicate securely.

Of course, the generated AES key should only be used for the communication with the one client which sent it, so some sort of secure key management on the server (also regarding the RSA key pair) is vital. Also, the AES key could periodically be updated (i.e. a new key generated). At the very least, every message sent back and forth encrypted with AES will have to use a separate IV — but naturally that IV could be part of the transmitted message. So let’s get a very basic REST API-based implementation going.

Generate RSA key-pair on the Server

[...]

public sealed class SessionKey
{
    public Guid Id;
    public byte[] SymmetricKey;
    public RSAParameters PublicKey;
    public RSAParameters PrivateKey;
}

[...]

private Dictionary<Guid, SessionKey> sessionKeys;

[...]

public RSAParameters Generate(Guid sessionId)
{
    // NOTE: Make the key size configurable.
    using (RSACryptoServiceProvider rsa = new RSACryptoServiceProvider(2048))
    {
        SessionKey s = new SessionKey()
        {
            Id = sessionId,
            PublicKey = rsa.ExportParameters(false /* no private key info */),
            PrivateKey = rsa.ExportParameters(true /* with private key info */),
            SymmetricKey = null, // To be generated by the client.
        };

        sessionKeys.Add(id, s);

        return s.PublicKey;
    }
}

[...]

This key generation can then be used to generate a new RSA key pair whenever a new client connects and requests secure communication. Of course, make sure you send the public key back to the client, and not the private key — else there’s no point in encrypting in the first place.

Generate an AES key on the Client

[...]

// Get the Public Key from the Server
RSAParameters publicKey = GetFromServer(...);

// Holds the current session's key.
byte[] MySessionKey;

// Send encrypted session key to Server.
SendToServer(GenerateAndEncryptSessionKey(publicKey));

[...]

private byte[] GenerateAndEncryptSessionKey(RSAParameters publicKey)
{
    using (Aes aes = Aes.Create())
    {
        aes.KeySize = aes.LegalKeySizes[0].MaxSize;
        // Setting the KeySize generates a new key, but if you're paranoid, you can call aes.GenerateKey() again.

        MySessionKey = aes.Key;
    }

    using (RSACryptoServiceProvider rsa = new RSACryptoServiceProvider())
    {
        rsa.ImportParameters(publicKey);

        return rsa.Encrypt(MySessionKey, true /* use OAEP padding */);
    }
}

[...]

As you can see, we just take the public key we got from the server to set up the RSA provider and then encrypt the generated AES key using that public key. Once the client sends the encrypted key to the server, they both share the same secret and can securely communicate with each other.

Decrypt AES Key on the Server

[...]

public void SetSymmetricKey(Guid id, byte[] encryptedKey)
{
    SessionKey session = sessionKeys[id];

    using (RSACryptoServiceProvider rsa = new RSACryptoServiceProvider())
    {
        rsa.ImportParameters(session.PrivateKey);

        session.SymmetricKey = rsa.Decrypt(encryptedKey, true /* use use OAEP padding */);
    }
}

[...]

Since we already have the private key for this session, we can just use it to decrypt the AES key we got from the client. Again, making sure that the stored symmetric key is safe, is key to security.

Encrypt / Decrypt

Encrypting and decrypting can now be done the same way on both sides (since we’re using a symmetric-key algorithm). So here’s what that looks like.

[...]

public byte[] EncryptData(byte[] key, string data)
{
    using (Aes aes = Aes.Create())
    {
        byte[] result;

        aes.Key = key;
        aes.GenerateIV();

        using (ICryptoTransform encryptor = aes.CreateEncryptor())
        {
            using (MemoryStream ms = new MemoryStream())
            {
                using (CryptoStream cs = new CryptoStream(ms, encryptor, CryptoStreamMode.Write))
                {
                    using (StreamWriter writer = new StreamWriter(cs))
                    {
                        writer.Write(data);
                    }
                }

                byte[] encrypted = ms.ToArray();
                result = new byte[aes.BlockSize / 8 + encrypted.Length];

                // Result is built as: IV (plain text) + Encrypted(data)
                Array.Copy(aes.IV, result, aes.BlockSize / 8);
                Array.Copy(encrypted, 0, result, aes.BlockSize / 8, encrypted.Length);

                return result;
            }
        }
    }
}

public string Decrypt(byte[] key, byte[] data)
{
    using (Aes aes = Aes.Create())
    {
        aes.Key = key;

        // Extract the IV from the data first.
        byte[] iv = new byte[aes.BlockSize / 8];
        Array.Copy(data, iv, iv.Length);
        aes.IV = iv;

        // The remainder of the data is the encrypted data we care about.
        byte[] encryptedData = new byte[data.Length - iv.Length];
        Array.Copy(data, iv.Length, encryptedData, 0, encryptedData.Length);

        using (ICryptoTransform decryptor = aes.CreateDecryptor())
        {
            using (MemoryStream ms = new MemoryStream(encryptedData))
            {
                using (CryptoStream cs = new CryptoStream(ms, decryptor, CryptoStreamMode.Read))
                {
                    using (StreamReader reader = new StreamReader(cs))
                    {
                        return reader.ReadToEnd();
                    }
                }
            }
        }
    }
}

[...]

As you can see, each time we encrypt something we generate a new IV, which we send at the beginning of the data to the other side. The other side then extracts the IV first and uses it to initialize AES.

REST APIs?

Using all this through REST APIs is trivial: All you really need to make sure is that the client sends the session GUID (or whatever you use to identify a session) with every encrypted message, either through the URL, parameters or headers. Of course it is vital to guarantee that a client cannot get access to another client’s session (e.g. to provide a new session key), but through ordinary (secure) authentication that should easily be doable.

Next Steps

As far as encryption is concerned, this should already do the trick. You may want to add signatures to the encrypted messages too, to make sure that the encrypted blocks have not been tampered with. In addition, the AES key exchange could be repeated periodically (maybe even after every exchanged message).