IT Books Software Architecture

How to Scale your Application from ZERO to Millions users?

Designing an application to cater larger audience is an evolutionary journey. Let’s start this voyage together. We’ll begin with a Single user system and gradually evolve to million users.

Single Server setup

Single server setup
1. User access website through Domain name (i.e.
2. DNS returns the IP address to the browser
3. Browser routes the HTTP request to the web server through the IP address
4. webserver returns an HTML page or response

With the growth of the user base, a single server won’t be sufficient to handle the load. So, separating the web tier and DB server allows them to scale independently.

Which DB to use?

There are two kinds of major DBs:-

1. Relational DB (MySQL, SQL Server, Oracle, PostgreSQL etc.)

Relational DB are well suited for the below use cases:-

  1. Data are structure
  2. Transactions are more important
  3. Normalization is required

· Non-Relational DB (Mongo, couchDB, Cosmos, DynamoDB, Casandra, HBase, Neo4J etc)

NoSql are well suited for the below use cases

  1. Data are unstructured
  2. we need to store a humongous amount of data

This setup looks fine with an initial set of users. our business is doing great in the market. since many users are accessing the application, and with this setup, they are facing some performance and outage issues. So the solution is to scale the infra at a certain level, maybe Vertically or Horizontally.

Vertical scaling vs Horizontal Scaling

  1. vertical scaling (Scale-up) — it refers to add more power (CPU, RAM ) to a server instance.
  • vertical scaling is a good solution in case of low traffic, but there are some limitations as well
  • It is impossible to add infinite CPU and RAM to a server
  • It is a single point of failure. Failover can’t be possible with this setup and ultimately results in an application outage
  • The solution to this problem is Horizontal Scaling

2. Horizontal scaling(Scale Out) — It refers to adding multiple server instances

  • Since we have multiple servers we can cater to multiple requests in w.r.t Single server setup.
  • But still, we have a challenges
  • How to decide on which server we should route the requests to, keeping in mind that servers are used significantly?
  • How to keep track of each request -response?
  • The solution to this challenge is the Load balancer

Load balancer (Layer 7)

It efficiently distributes incoming traffic among the backend servers with the help of certain algorithms (i.e. Round Robin, Least Connects, Least time, IP hash etc..)

- Here, we have achieved horizontal scaling
- we have added an LB(Load Balancer) and a backend server, now with this setup we have resolved the failover issue and improved System Availability
- If traffic grows and 2 backend servers are not enough to handle efficiently, we can add more backend servers to the web server pool and LB will automatically start to route the request

Congratulation !! we have successfully called our application from 1K to 10K users

With the above setup, the web application is scaled, but since we are using only one DB there could be the possibility of a DB outage😐.

DB replication

“Storing copies of data more than one site or node or DB to improve the availability of data”

We can set up DB servers in Master (main) / slave (copies) relations. Master will cater for the Write operations(DML operations) and slave will cater read operation(SELECT). Many applications demand a high frequency of read w.r.t write. so the numbers of slave DBs are larger than the master DBs.

With this setup, we can achieve the below scenarios

  1. Better performance — Segregated load(read/write) among DB servers enables more queries to process in parallel
  2. HA (high availability) — our application will be available in both scenarios

a. Master is Offline — If the master Db server is down one of the slave servers will be promoted as master. In production promoting a new master would be complicated as data in the slave system might not be up to date. Data syncing can be done using data recovery scripts. But, a better approach would be to set up another replication method like multi-master or circular replication. But it has its pros and cons.

b. Slaves are Offline — Master will cater read/write operations temporarily. Once the slave will be online everything will come to its normal state

3. Reliability — In case of an outage or disaster we can minimize our data loss since data is replicated across multiple locations

Now we have scaled from 10000 to 1000000 users 😎. But still, we can improvise the load time of our setup, by adding Cache and CDN.


It is in-memory temporary storage. When the application gets the request it first checks in the cache. If the data is available then it read the data from the cache. If not it queries the DB, stores the response in the cache and sends it back to the client. This caching strategy is called read-through cache.

With this we can leverage better system performance, reduction in DB load and the ability to scale cache independently

Points to consider

1. Expiration policy

2. Data Consistency

3. Mitigating failure

4. Eviction Policy

[Note: There are several considerations we need to explore while using cache. Which we’ll not be discussing in this blog.]

CDN (Content Delivery Network)

It is a network of geo-distributed servers used to deliver static content(i.e. images, videos, CSS, js files etc.). CDN allows a quick transfer of assets which ultimately results in faster HTML page rendering.

We can leverage the below advantage with the help of CDN

  1. Improve page rendering time
  2. Reducing bandwidth costs
  3. Increase content availability & redundancy
  4. Improve Application security — If we configure CDN properly then it may help us to protect the web application from DDOS attacks.

There are a few considerations we need to think through while configuring CDN

  1. Cost
  2. Cache expiry
  3. CDN fallback
  4. Invalidating files
  5. Use object versioning to serve a different version of the object

So far so good, our application is scalable, available and also rendering at a faster speed.

Stateful Architecture

Stateful services keep track of sessions or transactions and react accordingly based on history.

- Consider a scenario, User A and B trying to access the application and LB send  UserA and B to web Server 1 & 2 respectively. So, their individual User Information(Principle), claims & auth. details are also on the respective server.- If at any point User A’s request is routed to Web Server 2, authentication would fail because web server 2 doesn’t have UserA’s session data.

Issue:- Every request from the same client must be routed to the same server.

Solution:- This can be resolved using a Sticky session at LB (Load Balancer),

But this adds an overhead to the system. Adding or removing servers is much more difficult with this approach.

With the help of Stateless architecture, this issue can be mitigated to an extent.

Stateless Architecture

Servers don’t maintain or share session data. Every HTTP request contains minimal information related to the user so that server can authenticate them and share the resource. In this case, it relies on the client to store state information using cookies, local storage or cache. Now the server can get rid of redundant functionalities. Also, we can auto-scale web tier independently.

Now with this, we can cater to international users. To improve availability and provide a better user experience across wider geo-locations we need to set up multiple availability zones.

Availability Zones

We need to set up the availability zone closest to the user or market. So that we can geo-route them to the application at blaze speed.

In the event of any significant Availability Zone outage, we will route all the traffic to the immediate availability zone.

If AZ(India) is offline then we can route 100% of traffic to AZ(Singapore)

Below are technical challenges that must have to resolve to achieve muti-AZ:-

1. Data Synchronization

2. Traffic redirection

3. Test and Deployment

4. Data compliance and governance

This solution is good enough to cater to millions of users. But to further scale this system we need to decouple different components of the system so they can be scaled independently.


We discussed a lot about the scalability and availability of the system. While using such a system we need to set up observability

  1. To troubleshoot the issues and failures
  2. Logging

3. Monitoring

4. Metrics setup

5. Automation


Scaling a system is a continuous process. If we want to cater for million we need to decouple, fine-tune and optimize our components. Stay tuned for the next blog on system design where I’ll be discussing multiple system design concepts.

Reference Materials

  1. System Design Interview an Insider Guide
  2. Hypertext_Transfer_Protocol

Leave a Reply Cancel reply