Cheap and Secure Cloud Backups

I’ve wanted to find a good provider of cheap and secure cloud backups for a while. I’ve compared some cloud drive providers, but didn’t quite like those. They usually have very limited free plans, somewhat pricey paid plans (e.g. 50GB for about 24$ a year for OneDrive), or like in the case of Google no information available at all. By the way, “Google one is coming soon” isn’t an announcement that I want to look at for more than a few days when looking for pricing info.

Then, I’ve looked at pricing of cloud storage providers, such as AWS, Azure and Google Cloud. Those offer storage around 1 cent ($0.01) per GB per month. That’s a quarter of the OneDrive cost! It’s even less if you consider their archive offerings (AWS Glacier, Archive in Azure, Coldline Storage for Google). The cheapest offering here is from Microsoft at 0.2 cents ($0.002) per GB per month, but with some usage caveats. Since the point of backups is to keep them for a long time, this quickly adds up though.

Now I’ve written a line or two of code before, so I figured I could as well write my own tool for this. So here is bart, the backup and restore tool. Note that at this point I do not offer bart as a ready-to-use executable, but only as MIT-licensed source code. In addition, bart currently works only with Azure Blob Storage – or with storage mounted into the machine’s file system. However, adding other cloud providers/archive destinations should be relatively easy, given the interfaces used in the tool.

Security

In terms of security, bart encrypts every file before storing it in the archive destination. A user-provided password is used together with a randomly generated salt to derive a key for encryption with AES. On first use of any archive destination, bart generates a random salt, and each archive has its own password and salt. To avoid anybody with access to the archive destination from even snooping the names of your files, the names are hashed (SHA1) and the hashes used to store the encrypted files. This has the disadvantage that renaming/moving a file results in another file in the destination archive, though.

Usage

Once you compiled bart, you can use it as follows.

./bart [-name string] [-path string] [-m noop|restore|delete] -acct string -key string
  -name string
        The name of the backup archive. (default "backup")
  -path string
        The path to the directory to backup and/or restore. (default ".")
  -m string
        A behavior for files missing locally: 'noop' to do nothing, 'restore' to restore them from the backup, 'delete' to delete them in the backup archive. (default "noop")
  -acct string
        The Azure Storage Account name.
  -key string
        The Azure Storage Account Key.

Sources

The sources are on GitHub @ https://github.com/rokeller/bart.

Conclusion

I’ve used bart for backup of some photos/videos for a while now. For the about 42GB I have uploaded so far my monthly bill from Microsoft is about 42 cents ($0.42). Those months where I upload new files the cost is a little higher (a few cents usually) because of the extra transactions. My backed up files are encrypted. If this isn’t cheap and secure cloud backups, what is?

Fix slow kubectl on Windows

Over the last few days I noticed that when I use kubectl to manage a k8s test cluster in Azure, it takes forever to actually carry out the operations remotely. Today I took some time to debug this. Here’s how to fix a slow kubectl on Windows.

Get Verbose Output

I started with changing the log level, and capturing the details, like this:

kubectl get pods -v=20

The good news is, given that the commands worked so slowly, I had enough time to just read what was going on, and even understand where the problem was. If it’s not so slow, it helps to redirect stderr to a file, like this:

kubectl get pods -v=20 2> err.txt

In my case, it turned out that the command was going through a cache which was on the H: drive. That may not mean much to you, but my employer’s IT maps the H: drive to the (remote) home directory. They also set the HOMEDRIVE, HOMEPATH and HOMESHARE environment variables on login. HOMEDRIVE in particular is set to H:. Given that Windows (unlike Linux) by default doesn’t come with a HOME environment variable, kubectl for Windows tries to make up by constructing the HOME path using HOMEDRIVE and HOMEPATH. So kubectl ended up caching everything on a remote share, some 8500 km away. Needless to say, the lag between my workstation and the remote share is noticable.

How to fix Slow kubectl on Windows

So, how do you fix this? Well, it’s actually very easy: set the HOME environment variable to a local directory, run kubectl again, and now it’s a lot faster. In PowerShell, for that session, I just did

$env:HOME = $env:USERPROFILE

Now what’s left for me is to try and convince the IT department to stop using the HOMEDRIVE and HOMESHARE for remote users. That’s the tough part 😉

Azure Queue Agent – Introduction

For a small side project I’ve been working on I needed a way to schedule background tasks and handle them on a pool of workers. Since I was planning to run this on Azure, the Queues offered with Azure Storage seemed to be a no-brainer. Even more so since the Azure Scheduler, which can be used to periodically execute some action, can also be used to add messages to Queues. I figured that I wasn’t the only one needing something to handle such tasks, so I decided to build a lightweight open-source library for this.

Enter Azure Queue Agent (AQuA)

AQuA comes with two main components: a Producer and a Consumer. As the names suggest, the Producer can be used to produce (i.e. enqueue) new jobs, while the Consumer can be used to consume (i.e. dequeue and then handle) jobs from the queue. Job Descriptors are used to define which job should be executed and what parameters should be used for the execution. They are encoded as simple JSON objects like the one below, i.e. they can also easily be written manually (e.g. when used with the Azure Scheduler). That said, with AQuA it is very simple to build a scalable and robust collection of workers which take care of all your background processing jobs.

{ "Job": "HelloWho", "Properties": { "Who": "World" } }

The above example for instance would queue the HelloWho job, which does nothing more but print the value of the Who parameter on stdout like this: “Hello, <Who>!”. In addition, the Azure Queue Agent Consumer can be configured to either delete or requeue messages which were badly formatted, using unknown jobs or which could not be executed successfully, such that you can even use a single queue for multiple different pools of workers, should you ever find yourself in that situation.

Getting Started

The Azure Queue Agent is available as a NuGet package, currently however only in pre-release. You can get it like this:

Install-Package aqua.lib -Pre

Once this is done, you need to create an instance of Producer (if you want to create job requests from your code), and an instance of Consumer (for when you want to handle job requests).

// Setup and initialization
JobFactory factory = new JobFactory();

// Register all the jobs you want your consumer to be able to handle.
factory.RegisterJobType(typeof(HelloWho));
factory.RegisterJobType(typeof(MyBackgroundJob));

// Use the storage account from the emulator with queue "jobs".
ConnectionSettings connection = new ConnectionSettings("jobs");

Producer producer = new Producer(connection, factory);
Consumer consumer = new Consumer(connection, factory);

// Produce (i.e. enqueue) a HelloWho job request.
HelloWho job = new HelloWho() { Who = "Azure Queue Agent Example" };
producer.One(job);

// Consume (i.e. dequeue and handle) a job request.
consumer.One();

This should get you going for now. I’ll follow up with more later. Oh, and you can read the sources on GitHub.