So this is the story, I worked at one of our clients one day. They have this import process running daily and it started to perform badly. The import is an Azure Function, with an HTTP endpoint accepting a file. The function did all this:
- Parse the file
- Validate values
- Check persistence for ‘already exists’
- Add / update values in the database if needed
Now if you read this, you could immediately pinpoint some problems here. Most obvious is, why the heck is this function responsible for all these tasks. An approach like this scales like… well… not… And at least one bullet is missing, what to do in case of an error.
The primary issue here
As said, the client ran into performance issues. The function took over 20 minutes to import a text file with plus-minus 8000 entries. So the Azure Function timed out resulting in an error. There are a lot of ways to solve this issue, for example breaking the file into smaller pieces or (and this is what they did) changing the function into a durable function. Now in the basis, there’s nothing wrong with durable functions, but I think the solution doesn’t fit the scenario. So I came up with a solution, and actually, I’m curious how you think about it.
So the problem is a process timing out. Or at least, that’s what the consumer of the service thinks. In reality (in my opinion) the import function has way too many responsibilities. I proposed a more event-driven design here. So it all starts with one serverless function. It’s not triggered by an HTTP endpoint, but by blob storage. You can use the Valet Key Pattern to upload a file to BLOB storage and create a function that will be triggered after the upload completes. And from here on, we’re going to break things apart.
Parsing the file
This first function has the responsibility to parse the uploaded file. So it tries to read the file. For each import entry, it will post a message on a queue. Also, for monitoring the import process it writes a message to a status queue containing the number of entries found in the file, or else telling the file could not be parsed. Done! This function is done. Its responsibility is to parse the file and it did!
In the meanwhile, you may want to monitor progress over the entire import and so I created two functions that report progress through SignalR. The first function ‘creates’ a new import process with a correlation ID. This ID is used to refer to the current import. At first, a process database entry will be created containing the number of import entries, that correlation ID and the date/time the process was started. The second function accepts updates on that process and broadcasts that update (again) through SignalR.
Validating incoming stuff
Now a second function, triggered by the storage queue will validate the entries. Now let’s say there are 8000 entries in the original uploaded file. This means there are new 8000 messages on that queue. That means this function will be executed 8000 times. Now the advantage here, is that the default Storage Queue client settings say that up to 16 messages can be read simultaneously. This means, that 16 instances of the function will run at the same time in parallel. You can change this setting up to 32! Now, this function validates all these entries and depending on the outcome, stores a message on a success queue, or an error queue.
The error bin
My example import files contain some persons to import in JSON format. So the file basically contains a huge JSON array. Again, each entry in that array is handled separately. Now in case something goes wrong with an entry, it’s sent to the error queue and handled by a function triggered by that error queue. The function stores the entity somewhere in a persistence store, together with the correlation ID, the error message(s) and the original JSON for that entry. This allows you to re-create a JSON file as it was originally sent by the user, containing all entries that failed. This may be convenient in case the user wants to correct those errors and try again.
The success story
And as you probably expected, there’s also a queue full of successfully validated import entries that need to be stored in persistence. I use table storage in my demo project, so then creates and/or updates are pretty easy to manage. If you use SQL Server (for example) it may be nice to know whether you need to create or update an entity. You may want to pass a hint from the validation service if applicable. Also, storing the entity into a persistence store can go wrong. In that case, you simply pass the entity to the error queue with an error message that storing the thing went wrong. Easy right…
So what did we do here
In fact, the process from the beginning to the end is pretty much the same as the situation before. However, we separated different steps in the process into different functions leaving us with functions that only have a single responsibility. We placed some queues between the steps allowing the functions to scale depending on the amount of work queued. The functions are now lean and perform fast, so we also solved the time-out issue. I created a GitHub repo with demo code* which you can play with. You can run the project locally or in the cloud. Note that running the functions locally doesn’t perform as fast as in the cloud
* As we speak there is a tiny issue in the SignalR connection. Due to this bug, the front-end project will not always display the correct progress