15.8.2023

Queueing in Nestjs with BullMQ & Redis

Problem

The project for one of our customers includes a complex price calculation pipeline which soon after it went live burned a lot of money in form of AWS resources. In essence the pipeline was a job system in a 15 minute time grid, i.e. every 15 minutes a scheduled AWS lambda function queried the central postgres database for all jobs for that time and distributed the calculation work via AWS SQS to several other lambda functions.

The purpose of using lambda functions was to be able to scale up price calculations when needed but in reality this approach had a lot of problems:

  • Storing the jobs in postgres had a negative performance impact since a job keeps information about which products need a price update
  • SQS has limitations on message payloads and a rate limit thus we could not query all relevant information in one step but had to pass IDs to subsequent lambda functions increasing the number of database queries dramatically
  • Many lambda functions accessing the database and mainly the same tables at the same time resulting in long waits and very high AWS costs
  • Local testability was a pain since we had to emulate a complex lambda pipeline using localstack

Essentially our customer paid for lambda functions that were waiting for the database to return their data.

Solution

To get rid of all these problems and most important reduce our customer's costs we decided to introduce a dedicated queueing system called Bullmq. It is a Node.js library that implements a fast and robust queue system built on top of Redis (Elasticache for Redis on AWS in our case) intended to be used in a micro-service architecture. For us it solves the problem of queueing jobs and offloading price calculation work to another server. It is very robust and has a lot of standard queueing features like progress updates, retries, delaying jobs, etc. Using the Nestjs wrapper it is also very easy to implement.

Over two sprints (4 weeks) we completely migrated from the old lambda based calculation pipeline to a two server architecture where the first one already existed and now just queues up jobs in Bullmq. The second server is only responsible for processing the queued jobs querying relevant data from the database and calculating prices with it. With this approach we are able to perform one single big database query fetching all the relevant data and thus reducing the database load drastically. As of now we did not have to care about scaling this system up but bullmq is built in such a way that we can just start any number of calculation server instances. Combined with the auto scaling feature from AWS scaling things up would be easy though.

Keeping track of queues and jobs inside them is now also easy. Bullmq comes with a paid tool for that and there are a number of open source tools as well. One of them is Bullboard which we use for now to keep track of jobs and hunt down bugs.

After all we were able to reduce the costs for our customer by a factor of 10 (about 10k$ per month)

Resources
Sven

Softwareentwickler

Zur Übersicht

Standort Hannover

newcubator GmbH
Bödekerstraße 22
30161 Hannover

Standort Dortmund

newcubator GmbH
Westenhellweg 85-89
44137 Dortmund