roger, Author at Tales of a Code Monkey https://cymbeline.ch/author/roger/ ... the adventures of a guy making software. Fri, 06 Jan 2023 12:23:14 +0000 en-US hourly 1 https://wordpress.org/?v=5.9.3 Deprecation Notice https://cymbeline.ch/2023/01/06/deprecation-notice/?utm_source=rss&utm_medium=rss&utm_campaign=deprecation-notice Fri, 06 Jan 2023 12:23:14 +0000 https://cymbeline.ch/?p=626 Hi there! This post is to notify anybody interested, that this page in its current form is soon going to disappear / is going to be turned off soon. Relevant content has already been moved to a new location, https://flrx39.net, or it will be moved there shortly.

The post Deprecation Notice appeared first on Tales of a Code Monkey.

]]>
Hi there!

This post is to notify anybody interested, that this page in its current form is soon going to disappear / is going to be turned off soon. Relevant content has already been moved to a new location, https://flrx39.net, or it will be moved there shortly.

The post Deprecation Notice appeared first on Tales of a Code Monkey.

]]>
Copy files with PowerShell Remoting https://cymbeline.ch/2021/09/24/copy-files-with-powershell-remoting/?utm_source=rss&utm_medium=rss&utm_campaign=copy-files-with-powershell-remoting Fri, 24 Sep 2021 07:57:44 +0000 https://cymbeline.ch/?p=572 Recently, at work, I found myself in the situation where I needed to copy some file from my workstation to a jump box. Now of course, on Linux I’d just use rsync or scp. But our IT doesn’t like provisioning Linux boxes and therefore uses Windows for jump servers too, so no luck here. Luckily, … Continue reading "Copy files with PowerShell Remoting"

The post Copy files with PowerShell Remoting appeared first on Tales of a Code Monkey.

]]>
Recently, at work, I found myself in the situation where I needed to copy some file from my workstation to a jump box. Now of course, on Linux I’d just use rsync or scp. But our IT doesn’t like provisioning Linux boxes and therefore uses Windows for jump servers too, so no luck here. Luckily, I could convince them to turn on and allow PowerShell Remoting, so with some simple scripts I can still easily copy files over without using SMB and looking at more hassle with IT.

function Copy-LocalToRemote(
    [Parameter(Mandatory = $true)] $LocalPath,
    [Parameter(Mandatory = $true)] $RemotePath,
    $ComputerName = 'my.default.target.host'
) {
    Invoke-Command -ComputerName $ComputerName `
        {
            param($path, $content)
            Set-Content -Path $path -Value $content `
                -AsByteStream
        } `
        -ArgumentList $RemotePath,(
            Get-Content $LocalPath Raw -AsByteStream)
}

function Copy-RemoteToLocal(
    [Parameter(Mandatory = $true)] $RemotePath,
    [Parameter(Mandatory = $true)] $LocalPath,
    $ComputerName = 'my.default.source.host'
) {
    Invoke-Command -ComputerName $ComputerName `
        {
            param($path)
            Get-Content -Path $path -Raw -AsByteStream
        } `
        -ArgumentList $RemotePath |
    Set-Content -Path $LocalPath -AsByteStream
}

New-Alias -Name 'ltr' -Value 'Copy-LocalToRemote'
New-Alias -Name 'rtl' -Value 'Copy-RemoteToLocal'

As you can see, this is quite simple. Obviously, functions above can only copy one file at a time though. Maybe in the future I’ll build something that can copy entire file structures recursively. I also haven’t spent any time looking at how efficient it is to pass streams this way. In fact, I wouldn’t be surprised at all if this would perform poorly for large files. But then again, I’m mostly pushing around scripts and config files, so this works just fine.

The post Copy files with PowerShell Remoting appeared first on Tales of a Code Monkey.

]]>
Cheap and Secure Cloud Backups https://cymbeline.ch/2018/09/28/cheap-and-secure-cloud-backups/?utm_source=rss&utm_medium=rss&utm_campaign=cheap-and-secure-cloud-backups Fri, 28 Sep 2018 16:30:03 +0000 https://cymbeline.ch/?p=439 I’ve wanted to find a good provider of cheap and secure cloud backups for a while. I’ve compared some cloud drive providers, but didn’t quite like those. They usually have very limited free plans, somewhat pricey paid plans (e.g. 50GB for about 24$ a year for OneDrive), or like in the case of Google no … Continue reading "Cheap and Secure Cloud Backups"

The post Cheap and Secure Cloud Backups appeared first on Tales of a Code Monkey.

]]>
I’ve wanted to find a good provider of cheap and secure cloud backups for a while. I’ve compared some cloud drive providers, but didn’t quite like those. They usually have very limited free plans, somewhat pricey paid plans (e.g. 50GB for about 24$ a year for OneDrive), or like in the case of Google no information available at all. By the way, “Google one is coming soon” isn’t an announcement that I want to look at for more than a few days when looking for pricing info.

Then, I’ve looked at pricing of cloud storage providers, such as AWS, Azure and Google Cloud. Those offer storage around 1 cent ($0.01) per GB per month. That’s a quarter of the OneDrive cost! It’s even less if you consider their archive offerings (AWS Glacier, Archive in Azure, Coldline Storage for Google). The cheapest offering here is from Microsoft at 0.2 cents ($0.002) per GB per month, but with some usage caveats. Since the point of backups is to keep them for a long time, this quickly adds up though.

Now I’ve written a line or two of code before, so I figured I could as well write my own tool for this. So here is bart, the backup and restore tool. Note that at this point I do not offer bart as a ready-to-use executable, but only as MIT-licensed source code. In addition, bart currently works only with Azure Blob Storage – or with storage mounted into the machine’s file system. However, adding other cloud providers/archive destinations should be relatively easy, given the interfaces used in the tool.

Security

In terms of security, bart encrypts every file before storing it in the archive destination. A user-provided password is used together with a randomly generated salt to derive a key for encryption with AES. On first use of any archive destination, bart generates a random salt, and each archive has its own password and salt. To avoid anybody with access to the archive destination from even snooping the names of your files, the names are hashed (SHA1) and the hashes used to store the encrypted files. This has the disadvantage that renaming/moving a file results in another file in the destination archive, though.

Usage

Once you compiled bart, you can use it as follows.

./bart [-name string] [-path string] [-m noop|restore|delete] -acct string -key string
  -name string
        The name of the backup archive. (default "backup")
  -path string
        The path to the directory to backup and/or restore. (default ".")
  -m string
        A behavior for files missing locally: 'noop' to do nothing, 'restore' to restore them from the backup, 'delete' to delete them in the backup archive. (default "noop")
  -acct string
        The Azure Storage Account name.
  -key string
        The Azure Storage Account Key.

Sources

The sources are on GitHub @ https://github.com/rokeller/bart.

Conclusion

I’ve used bart for backup of some photos/videos for a while now. For the about 42GB I have uploaded so far my monthly bill from Microsoft is about 42 cents ($0.42). Those months where I upload new files the cost is a little higher (a few cents usually) because of the extra transactions. My backed up files are encrypted. If this isn’t cheap and secure cloud backups, what is?

The post Cheap and Secure Cloud Backups appeared first on Tales of a Code Monkey.

]]>
Fix slow kubectl on Windows https://cymbeline.ch/2018/04/10/fix-slow-kubectl-on-windows/?utm_source=rss&utm_medium=rss&utm_campaign=fix-slow-kubectl-on-windows Tue, 10 Apr 2018 17:15:00 +0000 https://cymbeline.ch/?p=407 Over the last few days I noticed that when I use kubectl to manage a k8s test cluster in Azure, it takes forever to actually carry out the operations remotely. Today I took some time to debug this. Here’s how to fix a slow kubectl on Windows. Get Verbose Output I started with changing the … Continue reading "Fix slow kubectl on Windows"

The post Fix slow kubectl on Windows appeared first on Tales of a Code Monkey.

]]>
Over the last few days I noticed that when I use kubectl to manage a k8s test cluster in Azure, it takes forever to actually carry out the operations remotely. Today I took some time to debug this. Here’s how to fix a slow kubectl on Windows.

Get Verbose Output

I started with changing the log level, and capturing the details, like this:

kubectl get pods -v=20

The good news is, given that the commands worked so slowly, I had enough time to just read what was going on, and even understand where the problem was. If it’s not so slow, it helps to redirect stderr to a file, like this:

kubectl get pods -v=20 2> err.txt

In my case, it turned out that the command was going through a cache which was on the H: drive. That may not mean much to you, but my employer’s IT maps the H: drive to the (remote) home directory. They also set the HOMEDRIVE, HOMEPATH and HOMESHARE environment variables on login. HOMEDRIVE in particular is set to H:. Given that Windows (unlike Linux) by default doesn’t come with a HOME environment variable, kubectl for Windows tries to make up by constructing the HOME path using HOMEDRIVE and HOMEPATH. So kubectl ended up caching everything on a remote share, some 8500 km away. Needless to say, the lag between my workstation and the remote share is noticable.

How to fix Slow kubectl on Windows

So, how do you fix this? Well, it’s actually very easy: set the HOME environment variable to a local directory, run kubectl again, and now it’s a lot faster. In PowerShell, for that session, I just did

$env:HOME = $env:USERPROFILE

Now what’s left for me is to try and convince the IT department to stop using the HOMEDRIVE and HOMESHARE for remote users. That’s the tough part 😉

The post Fix slow kubectl on Windows appeared first on Tales of a Code Monkey.

]]>
Lucene.Net.ObjectMapping for .Net Standard 2.0 https://cymbeline.ch/2017/12/19/lucene-net-objectmapping-net-standard-2-0/?utm_source=rss&utm_medium=rss&utm_campaign=lucene-net-objectmapping-net-standard-2-0 Tue, 19 Dec 2017 21:11:58 +0000 https://cymbeline.ch/?p=388 It’s been a long time since I’ve done some work on my Lucene.Net.ObjectMapping library. Recently I accepted a pull request that added support for the 4.8 beta releases of Lucene.Net itself, but when I involuntarily needed to updated one of my services to bring it up to speed with running in a Docker container, I … Continue reading "Lucene.Net.ObjectMapping for .Net Standard 2.0"

The post Lucene.Net.ObjectMapping for .Net Standard 2.0 appeared first on Tales of a Code Monkey.

]]>
It’s been a long time since I’ve done some work on my Lucene.Net.ObjectMapping library. Recently I accepted a pull request that added support for the 4.8 beta releases of Lucene.Net itself, but when I involuntarily needed to updated one of my services to bring it up to speed with running in a Docker container, I decided that it was about time to update Lucene.Net.ObjectMapping for .Net Standard 2.0. The last time I used the library in a Docker container, ASP.NET vNext RC1 was just about to become final. so that’s a long time ago. Accordingly, there was quite a bit of work to understand the changes needed: both in .Net (and ASP.NET) between the 1.0 RC1 and the .Net Standard 2.0 releases, and also between the Lucene.Net 3.x and 4.8 releases. Luckily, the latter was largely taken care of by the pull request for the library itself. The former however proved a bit challenging. After all, the toolset has changed significantly.

Updated Sources

To cut a long story short, the updated sources are now available on GitHub. I decided to track it in a separate branch for better isolation. This new branch is aptly called netstandard. I’ll try to stay up-to-date with the more recent releases of Lucene.Net, and also with .Net Standard 2.0. That is, provided that I find the time for it. You may notice that the project files have become quite a bit simpler. That’s certainly one change in .Net Standard and Core that I welcome. The other is the better integration of Nuget for package referencing and package creation/pushing.

Updated Unit Tests

As a side effect, I also figured that it was going to be easier to update NUnit to the latest version, since its toolset is also well integrated with the new dotnet toolset. Since I’m doing all changes through VSCode and with building/testing/packaging in Docker containers based on the microsoft/aspnetcore-build:2 images, I wanted to keep it simple. The good thing here is that the dotnet toolset seems to offer really everything I need for this, and is suprisingly easy to handle, especially when compared to the RC1 version.

Updated Nuget Package

As I’ve mentioned in the beginning, I primarily made this effort because I needed a newer version of Lucene.Net with compatibility for .Net Standard 2.0. As a result, I published a new RC build as a Nuget package too. It is built on the latest Lucene.Net 4.8 beta release and currently supports only .Net Standard 2.0. If there’s a great demand for it, I’ll see if I can add support for other targets – or accept pull requests accordingly.

Conclusions

Nothing much besides the obvious: .Net Standard seems to be in a good shape wrt libraries and toolset, as well as support on Linux. There are a few gotchas but overall nothing much of a problem. Lucene.Net is still somewhat badly documented itself, and the tracking of braking changes between major/minor versions (and in fact also revisions/beta releases of the same major/minor) could be greatly improved. An online documentation would be very useful – maybe it exists, and I just haven’t found it? In any case, skimming through the Lucene.Net sources on GitHub works too, though being much slower.

You can find more information about object mapping for Lucene.Net on the Lucene.Net.ObjectMapping page.

The post Lucene.Net.ObjectMapping for .Net Standard 2.0 appeared first on Tales of a Code Monkey.

]]>
Azure Queue Agent – Introduction https://cymbeline.ch/2015/12/06/azure-queue-agent-introduction/?utm_source=rss&utm_medium=rss&utm_campaign=azure-queue-agent-introduction Sun, 06 Dec 2015 19:00:36 +0000 http://www.cymbeline.ch/?p=368 For a small side project I’ve been working on I needed a way to schedule background tasks and handle them on a pool of workers. Since I was planning to run this on Azure, the Queues offered with Azure Storage seemed to be a no-brainer. Even more so since the Azure Scheduler, which can be … Continue reading "Azure Queue Agent – Introduction"

The post Azure Queue Agent – Introduction appeared first on Tales of a Code Monkey.

]]>
For a small side project I’ve been working on I needed a way to schedule background tasks and handle them on a pool of workers. Since I was planning to run this on Azure, the Queues offered with Azure Storage seemed to be a no-brainer. Even more so since the Azure Scheduler, which can be used to periodically execute some action, can also be used to add messages to Queues. I figured that I wasn’t the only one needing something to handle such tasks, so I decided to build a lightweight open-source library for this.

Enter Azure Queue Agent (AQuA)

AQuA comes with two main components: a Producer and a Consumer. As the names suggest, the Producer can be used to produce (i.e. enqueue) new jobs, while the Consumer can be used to consume (i.e. dequeue and then handle) jobs from the queue. Job Descriptors are used to define which job should be executed and what parameters should be used for the execution. They are encoded as simple JSON objects like the one below, i.e. they can also easily be written manually (e.g. when used with the Azure Scheduler). That said, with AQuA it is very simple to build a scalable and robust collection of workers which take care of all your background processing jobs.

{ "Job": "HelloWho", "Properties": { "Who": "World" } }

The above example for instance would queue the HelloWho job, which does nothing more but print the value of the Who parameter on stdout like this: “Hello, <Who>!”. In addition, the Azure Queue Agent Consumer can be configured to either delete or requeue messages which were badly formatted, using unknown jobs or which could not be executed successfully, such that you can even use a single queue for multiple different pools of workers, should you ever find yourself in that situation.

Getting Started

The Azure Queue Agent is available as a NuGet package, currently however only in pre-release. You can get it like this:

Install-Package aqua.lib -Pre

Once this is done, you need to create an instance of Producer (if you want to create job requests from your code), and an instance of Consumer (for when you want to handle job requests).

// Setup and initialization
JobFactory factory = new JobFactory();

// Register all the jobs you want your consumer to be able to handle.
factory.RegisterJobType(typeof(HelloWho));
factory.RegisterJobType(typeof(MyBackgroundJob));

// Use the storage account from the emulator with queue "jobs".
ConnectionSettings connection = new ConnectionSettings("jobs");

Producer producer = new Producer(connection, factory);
Consumer consumer = new Consumer(connection, factory);

// Produce (i.e. enqueue) a HelloWho job request.
HelloWho job = new HelloWho() { Who = "Azure Queue Agent Example" };
producer.One(job);

// Consume (i.e. dequeue and handle) a job request.
consumer.One();

This should get you going for now. I’ll follow up with more later. Oh, and you can read the sources on GitHub.

The post Azure Queue Agent – Introduction appeared first on Tales of a Code Monkey.

]]>
Offline JSON Pretty Printing https://cymbeline.ch/2015/04/16/offline-json-pretty-printing/?utm_source=rss&utm_medium=rss&utm_campaign=offline-json-pretty-printing Thu, 16 Apr 2015 19:15:01 +0000 http://www.cymbeline.ch/?p=349 Today when you’re dealing with Web APIs, you often find yourself in the situation of handling JSON, either in the input for these APIs or in the output, or both. Some browsers have the means to pretty print the JSON from their dev tools. But you don’t always have that opportunity. That’s why there are … Continue reading "Offline JSON Pretty Printing"

The post Offline JSON Pretty Printing appeared first on Tales of a Code Monkey.

]]>
Today when you’re dealing with Web APIs, you often find yourself in the situation of handling JSON, either in the input for these APIs or in the output, or both. Some browsers have the means to pretty print the JSON from their dev tools. But you don’t always have that opportunity. That’s why there are tools to pretty print JSON. I’ve found quite a few of them on the web, but all the ones I’ve found have one terrible flaw: they actually send the JSON you’re trying to pretty print to the server (*shudder*). I don’t want my JSON data (sensitive or not) to be sent to some random servers!

All your JSON are belong to us!

Now as I wrote, I don’t particularly like the fact that my JSON data is sent over the wire for pretty printing. It may not be super secret or anything, but in these days, you cannot be careful enough. Besides, it’s completely unnecessary to do it. All you need is already in your browser! So I quickly built my own JSON pretty printer (and syntax highlighter). You can find it right here.

Offline JSON Pretty Printing to the Rescue

Actually, the design is very simple. All my JSON pretty printer is doing, is to take your JSON input and try to parse it as JSON in the browser.

JSON.parse(yourJsonInput)

If that fails, I’m showing the parsing error and it’s done. If it succeeds, I get back a JavaScript object/array/value, which then I’m inspecting. For objects, I’m using basic tree navigation to go through all the properties and nested objects/arrays/values for pretty printing. That’s it, really simple. No need to transmit the data anywhere — it stays right in your browser!

So like it, hate it, use it or don’t: cymbeline.ch JSON Pretty Printer

The post Offline JSON Pretty Printing appeared first on Tales of a Code Monkey.

]]>
LINQ with Lucene.Net.ObjectMapping https://cymbeline.ch/2015/02/10/linq-lucene-net-objectmapping/?utm_source=rss&utm_medium=rss&utm_campaign=linq-lucene-net-objectmapping Tue, 10 Feb 2015 20:30:35 +0000 http://www.cymbeline.ch/?p=330 Last time I mentioned that I started to work on supporting LINQ with Lucene.Net.ObjectMapping. That includes LINQ queries like the following: Now granted, the above example is a very basic one. So here’s a short list of other methods on IQueryable<T> that are already supported at this point: Any *, Count *, First *, FirstOrDefault … Continue reading "LINQ with Lucene.Net.ObjectMapping"

The post LINQ with Lucene.Net.ObjectMapping appeared first on Tales of a Code Monkey.

]]>
Last time I mentioned that I started to work on supporting LINQ with Lucene.Net.ObjectMapping. That includes LINQ queries like the following:

using (Searcher searcher = new IndexSearcher(directory))
{
    IQueryable<BlogPost> posts =
        from post in searcher.AsQueryable<BlogPost>()
        where obj.Tag == "lucene"
        orderby obj.Timestamp descending
        select post;
}

Now granted, the above example is a very basic one. So here’s a short list of other methods on IQueryable<T> that are already supported at this point: Any *, Count *, First *, FirstOrDefault *, OrderBy, OrderByDescending, Single *, SingleOrDefault *, Skip, Take, ThenBy, ThenByDescending, and finally Where.

* Method is supported both with and without a filter predicate.

With this, it becomes easy to build paging based on objects you get back as a result of a query on Lucene.Net. I’m still working on improving the supported filter expressions (most of all for Where, but all the other filterable methods naturally profit too). For instance, with the default JSON-based object mapping it is already possible to search for entries in a dictionary that maps a string to another property or object. Say you have a set of classes, defined as follows.

public class MyClass
{
    public int Id { get; set; }
    public Dictionary<string, MyOtherClass> Map { get; set; }
}

public class MyOtherClass
{
    public string Text { get; set; }
    public int Sequence { get; set; }
    public DateTime Timestamp { get; set; }
}

Now you can actually search for instances of MyClass that satisfy certain conditions in the Map dictionary, like this:

var query = from c in searcher.AsQueryable<MyClass>()
            where c.Map["MyKey"].Sequence == 123
            select c;

Since the items in the dictionary are mapped to analyzed fields in the Lucene.Net document, we can search on them!

Delete and Update By Query

Now since I have this query expression binder to create Lucene.Net queries based on LINQ filter expressions, I’ve added an extension method to update and one to delete documents that match a query. So it is now possible to do this:

indexWriter.Delete<MyClass>(x => x.Id == 1234);
indexWriter.Update(myObject, x => x.Id == myObject.Id);

Call to Action

Now with all this said, I’m looking for volunteers to help me get more coverage on the LINQ queries, because that’s definitely where the weak spot is right now. If you’re interested, leave a comment here or on GitHub.

The post LINQ with Lucene.Net.ObjectMapping appeared first on Tales of a Code Monkey.

]]>
Improvements to Lucene.Net.ObjectMapping https://cymbeline.ch/2015/01/30/improvements-lucene-net-objectmapping/?utm_source=rss&utm_medium=rss&utm_campaign=improvements-lucene-net-objectmapping Fri, 30 Jan 2015 17:30:34 +0000 http://www.cymbeline.ch/?p=311 I’d like to discuss some improvements to Lucene.Net.ObjectMapping which I published yesterday as a new version (1.0.3) to NuGet. In addition, I want to take this opportunity to give a quick outlook on what’s to come next. CRUD Operations The library now comes with support for all of the CRUD operations. Let’s look at them … Continue reading "Improvements to Lucene.Net.ObjectMapping"

The post Improvements to Lucene.Net.ObjectMapping appeared first on Tales of a Code Monkey.

]]>
I’d like to discuss some improvements to Lucene.Net.ObjectMapping which I published yesterday as a new version (1.0.3) to NuGet. In addition, I want to take this opportunity to give a quick outlook on what’s to come next.

CRUD Operations

The library now comes with support for all of the CRUD operations. Let’s look at them one by one, starting with Create.

Create / Add

In Lucene.Net terms, that would be AddDocument. Since the library does object to document mapping, this is simplified to an Add operation.

IndexWriter myIndexWriter = ...;
MyClass myObject = new MyClass(...);

myIndexWriter.Add(myObject);

Or, if you need a specific analyzer for the document the object gets mapped to, you can use the overload which accepts a second parameter of type Analyzer.

IndexWriter myIndexWriter = ...;
MyClass myObject = new MyClass(...);

myIndexWriter.Add(myObject, new MyOwnAnalyzer());

Retrieve / Query

The retrieve operation, or mapping of a document to an object hasn’t changed since v1.0.0. There are examples for how to query and retrieve in my previous post. Of course, if you happen to know the ID of the document without a query, then you can just map that document to your class without going through a query. But since the document IDs can change over time, it’s usually more practical to pivot off a query.

Update

Update is maybe the most interesting operation here. Since document IDs can change over time, there’s really no good way to reliably update a specific document, without making a query. That’s why the UpdateDocument method from the IndexReader asks you for a query/term to use to match the document to update. And that’s why it’s generally a good idea to bring your own unique identifier to the game. Suppose your class has a property of type Guid and name “Id”, which is used as your unique identifier for the objects of that type.

IndexWriter myIndexWriter = ...;
MyClass myObject = ...;

myObject.MyPropertyToUpdate = "new value";

myIndexWriter.Update(
    myObject,
    new TermQuery(new Term("Id", myObject.Id.ToString())));

Under the covers, this will find all the documents matching the query and matching the type (MyClass), delete them and then add a new document for the mapped myObject. If you need an analyzer, for the newly mapped document, you can use the second overload.

IndexWriter myIndexWriter = ...;
MyClass myObject = ...;

myObject.MyPropertyToUpdate = "new value";

myIndexWriter.Update(
    myObject,
    new TermQuery(new Term("Id", myObject.Id.ToString())),
    new MyOwnAnalyzer());

Delete

Just like the retrieve operation, the Delete operation is also supported since v1.0.0. I realize though that I haven’t given any examples yet. But really, it’s quite simple again. You give the type of objects you want to delete the mapped documents for, and you give a query to identify the objects to delete. No magic at all.

IndexWriter myIndexWriter = ...;
myIndexWriter.DeleteDocuments<MyClass>(
    new TermQuery(new Term("Tag", "deleted")));

Naturally, you can use any Query you want for the delete operation (as well as for updates). You can make them arbitrarily complex as long as they’re still supported by Lucene.Net.

Summary and Outlook

That’s it, CRUD with no magic, no tricks. Let me know if there’s functionality you’d like to see added, either by commenting here or by opening a bug/enhancement/whatever on GitHub. I’ve started working on LINQ support for the ObjectMapping library too, with the goal that you can write LINQ queries like the following.

var query = from myObject in mySearcher.AsQueryable<MyClass>()
            where myObject.Tag == "history"
            select myObject;

It will likely take a little longer to get that stable, but I’ll try to make a pre-release on NuGet in the next few weeks.

The post Improvements to Lucene.Net.ObjectMapping appeared first on Tales of a Code Monkey.

]]>
Search Mapped Objects in Lucene.Net https://cymbeline.ch/2015/01/15/search-mapped-objects-lucene-net/?utm_source=rss&utm_medium=rss&utm_campaign=search-mapped-objects-lucene-net Thu, 15 Jan 2015 20:00:12 +0000 http://www.cymbeline.ch/?p=292 In my previous post (Lucene.Net Object Mapping) I introduced the Lucene.Net.ObjectMapping NuGet package. The post describes how the package can be used to map virtually any .Net object to a Lucene.Net Document and how to reconstruct the object from that same Document later. Now it’s time to look at the search aspect of it, so … Continue reading "Search Mapped Objects in Lucene.Net"

The post Search Mapped Objects in Lucene.Net appeared first on Tales of a Code Monkey.

]]>
In my previous post (Lucene.Net Object Mapping) I introduced the Lucene.Net.ObjectMapping NuGet package. The post describes how the package can be used to map virtually any .Net object to a Lucene.Net Document and how to reconstruct the object from that same Document later. Now it’s time to look at the search aspect of it, so how can you search mapped objects in Lucene.Net?

You already know Searcher

The Searcher class in Lucene.Net can be used to run queries on an index and retrieve documents matching that query. The Lucene.Net.ObjectMapping library comes with additional extensions to the Searcher class which help you search for Documents. There’s a variety of different extensions, some which just return a TopDocs object with the number of results you’ve specified, and some which allow sorting, but more powerful are the ones which require you to specify a Collector to gather the results. Using a Collector makes it very easy to support paging over all the results for a specific query, and after all that’s usually what you’d do today if you want to show search results. So let’s look at an example of searching for Documents that contain mapped .Net objects using a Collector. Let’s assume we’re building a blog engine, for which we want to index the posts.

public class BlogPost
{
    public Guid Id { get; set; }
    public DateTime Created { get; set; }
    public string Title { get; set; }
    public string Body { get; set; }
    public string[] Tags { get; set; }
}

// ... as before, you'd store your BlogPost objects like this:
luceneIndexWriter.AddDocument(thePost.ToDocument());

Use a Collector for Paging

Creating an paged index of all your blog posts is very easy, really. You’ll need a Searcher, a Collector (the TopFieldCollector will do for now) and that’s about it. Let’s look at some code.

private const int PageSize = 10;

public BlogPost[] GetPostsForPage(int page)
{
    // Sanitize the 'page' before doing anything with it.
    if (page < 0)
    {
        page = 0;
    }

    int start = page * PageSize;
    int end = start + PageSize;

    using (Searcher searcher = new IndexSearcher(myIndexReader))
    {
        TopFieldCollector collector = TopFieldCollector.Create(
            // Let's sort descending by create date.
            new Sort(new SortField("Created", SortField.LONG, true)),
            end, // Need to get the hits until 'end'.
            false,
            false,
            false,
            false);

        // Let's use the object mapping extensions for Search! This will
        // filter results to only those Documents which hold a BlogPost.
        searcher.Search<BlogPost>(new MatchAllDocsQuery(), collector);

        // At this point we know how many hits there are in total. So
        // let's check that the requested page is within range.
        if (start >= collector.TotalHits)
        {
            page = (collector.TotalHits - 1) / PageSize;
            start = page.Value * PageSize;
            end = start + PageSize;
        }

        TopDocs docs = collector.TopDocs(start, PageSize);
        List<BlogPost> posts = new List<BlogPost>();

        foreach (ScoreDoc scoreDoc in docs.ScoreDocs)
        {
            Document doc = searcher.Doc(scoreDoc.Doc);

            posts.Add(doc.ToObject<BlogPost>());
        }

        return posts.ToArray();
    }
}

That’s it, no magic, no tricks. One thing you could do, instead of just returning a plain array with the results is to return an object which holds some more meta information, like for instance the number of total hits, or the actual page you’re returning results for. But the core logic remains the same. You can play around with different ways to sort the results. Keep in mind though that tokenized/analyzed fields in Lucene.Net are sorted based on the tokens, not based on the actual string value. To help address this, I’m thinking about extending the object mappers to allow to specify not only to analyze a field (because you want to search it), but also to add a non-analyzed copy of the field for sorting purposes. That way, you have the advantage of being able to search and sort on the same logical field in the end. Keep in mind though that the index will grow since the data is indexed twice: once tokenized/analyzed, once as-is.

The post Search Mapped Objects in Lucene.Net appeared first on Tales of a Code Monkey.

]]>