Problem
Our customer's project includes a complex price calculation pipeline which soon after it went live burned a significant amount of money as AWS resources. The system primarily entailed a job that was scheduled on a 15-minute grid using AWS lambda functions and SQS. However, the implementation was not as beneficial as anticipated; it introduced a plethora of drawbacks such as negative performance impact, increased database queries, high AWS cost due to concurrent access to the same database tables, and poor local testability.
Solution
To resolve the above issues and, importantly, to reduce our customer's costs, we decided to introduce a dedicated queueing system known as Bullmq. Bullmq is a Node.js library implementing a fast and robust queue system on top of Redis (Elasticache for Redis on AWS in our scenario).
Over two sprints, we switched from the old Lambda-based calculation pipeline to a two-server architecture. The first one now queues jobs with BullMQ, while the second one is dedicated to processing queued jobs and calculating prices. With this approach, we were significantly able to reduce the database load, substantially diminishing the costs to our customer.
One advantage of BullMQ is that it's designed to easily scale up by initiating any number of server instances. This feature combined with AWS auto-scaling makes it quite resilient to heavy loads. Also, with tools like Bullboard, monitoring queues and hunting down bugs has become a breeze.