
Scalability and elasticity
Scalability has always been a primary factor while designing a solution. If you ask any enterprise about their existing and new solutions, most of the time they like to plan ahead for scalability. Scalability means giving your system the ability to handle growing workloads, and it can apply to multiple layers, such as the application server, web app, and database.
As most applications nowadays are web-based, let's talk about elasticity. This is not only about growing out your system by adding more capabilities but also shrinking it to save cost. Especially with the adoption of the public cloud, it becomes easy to grow and shrink your workload quickly, and elasticity is replacing the term scalability.
Traditionally, there are two modes of scaling:
- Horizontal scaling: It is becoming increasingly popular as compute commodity has become exponentially cheaper in the last decade. In horizontal scaling, the team adds more instances to handle increasing workloads:

For example, as shown in the preceding diagram, let's say your application is capable of handling a thousand requests per second with two instances. As your user base grows, the application starts receiving 2,000 requests per second, which means you may want to double your application instance to four to handle the increased load.
- Vertical scaling: This has been around for a long time. It's where the team adds additional compute storage and memory power to the same instance to handle increasing workloads. As shown in the following diagram, during vertical scaling, you will get a larger instance rather than adding more new instances to handle increased workload:

The vertical scaling model may not be cost-effective. When you purchase hardware with more compute and memory capacity, the cost increases exponentially. You want to avoid vertical scaling after a certain threshold unless it is essential. Vertical scaling is most commonly used to scale relational database servers. However, you need to think about database sharding here. If your server hits the limits of vertical scaling, a single server cannot grow beyond certain memory and compute capacity.