Programming Software Architecture

From 0 to Millions: A Guide to Scaling Your App

In this article, we’ll discuss a typical architectural evolution of a website/app, and how/why we make technical choices at different stages to Scaling Your App from 0 to Millions.

  • Do we build a monolithic application at the beginning?
  • When do we add a cache?
  • When do we add a full-text search engine?
  • Why do we need a message queue?
  • When do we use a cluster?

In the first two segments, we examine the traditional approach to building an application, starting with a single server and concluding with a cluster of servers capable of handling millions of daily active users. The basic principles are still relevant.

In the final two segments, we examine the impact of recent trends in cloud and serverless computing on application building, how they change the way we build applications, and provide insights on how to consider these modern approaches when creating your next big hit.

Let’s examine how a typical startup used to build its first application. This basic approach was common up until about 5 years ago. Now with serverless computing, it is much easier to start an application that could scale to tens of thousands of users with very little upfront investment. We will talk about how we should take advantage of this trend later in the series.

For now, we will dive into how an application (Llama) is traditionally built at the start. This lays a firm foundation for the rest of our discussion.

Llama 1.0 – Monolithic Application, Single Server

In this first architecture, the entire application stack lives on a single server.

The server is publicly accessible over the Internet. It provides a RESTful API to handle the business logic and for the mobile and web client applications to access. It serves static contents like images and application bundles that are stored directly on the local disk of the server. The application server is connected to the database which also runs on the same server.

With the architecture, it could probably serve hundreds, maybe even thousands, of users. The actual capacity depends on the complexity of the application itself.

When the server begins to struggle with a growing user load, one way to buy a bit more time is to scale up to a larger server with more CPU, memory, and disk space. This is a temporary solution, and eventually, even the biggest server will reach its limit.

Additionally, this simple architecture has significant drawbacks for production use. With the entire stack running on a single server, there is no failover or redundancy. When the server inevitably goes down, the resulting downtime could be unacceptably long.

As the architecture evolves, we will discover how to solve these operational issues.

Llama 5.0 – Add Cache

After implementing the primary-replica architecture, most applications should be able to scale to several hundred thousand users, and some simple applications might be able to reach a million users.

However, for some read-heavy applications, primary-replica architecture might not be able to handle traffic spikes well. For our e-commerce example, flash sale events like Black Friday sales in the United States could easily overload the databases. If the load is sufficiently heavy, some users might not even be able to load the sales page.

The next logical step to handle such situations is to add a cache layer to optimize the read operations.

Redis is a popular in-memory cache for this purpose. Redis reduces the read load for a database by caching frequently accessed data in memory. This allows for faster access to the data since it is retrieved from the cache instead of the slower database. By reducing the number of read operations performed on the database, Redis helps to reduce the load on the database cluster and improve its overall scalability. As summarized below by Jeff Dean et al, in-memory access is 1000X faster than disk access.

For our example application, we deploy the cache using the read-through caching strategy. With this strategy, data is first checked in the cache before being read from the database. If the data is found in the cache, it is returned immediately, otherwise, it is loaded from the database and stored in the cache for future use.

There are other cache strategies and operational considerations when deploying a caching layer at scale. For example, with another copy of data stored in the cache, we have to maintain data consistency. We will have a deep dive series on caching soon to explore this topic in much greater detail.

There is another class of application data that is highly cacheable: the static contents for the application, such as images, videos, style sheets, and application bundles, which are infrequently updated. They should be served by a Content Delivery Network (CDN).

A CDN serves the static content from a network of servers located closer to the end user, reducing latency, and improving the loading speed of the web pages. This results in a better user experience, especially for users located far away from the application server.

Llama 6.0 – DB Sharding

A cache layer can provide some relief for read-heavy applications. However, as we continue to scale, the amount of write requests will start to overload the single primary database. This is when it might make sense to shard the primary database.

There are two ways to shard a database: horizontally or vertically.

Horizontal sharding is more common. It is a database partitioning technique that divides data across multiple database servers based on the values in one or more columns of a table. For example, a large user table can be partitioned based on user ID. It results in multiple smaller tables stored on separate database servers, with each handling a small subset of the rows that were previously handled by the single primary database.

Vertical sharding is less common. It separates tables or parts of a table into different database servers based on the specific needs of the application. This optimizes the application based on specific access patterns of each column.

Database sharding has some significant drawbacks.

First, sharding adds complexity to the application and database layers. Data must be partitioned and distributed across multiple databases, making it difficult to ensure data consistency and integrity.

Second, sharding introduces performance overhead, increasing application latency, especially for operations that require data from multiple shards.

In the first two parts of this series, we explored the traditional approach to building and scaling an application. It started with a single server that ran everything, and gradually evolved to a microservice architecture that could support millions of daily active users.

In the final two parts of this series, we examine the impact of recent trends like cloud and serverless computing, along with the proliferation of client application frameworks and the associated developer ecosystem. We explore how these trends alter the way we build applications, especially for early-stage startups where time-to-market is critical, and provide valuable insights on how to incorporate these modern approaches when creating your next big hit.

Recent Trends

Let’s start by briefly explaining these computing trends we mentioned.

The first trend is cloud computing. Cloud computing, in its most basic form, is running applications on computing resources managed by cloud providers. When using cloud computing, we do not have to purchase or manage hardware ourselves.

The second trend is serverless computing. Serverless computing builds on the convenience of cloud computing with even more automation. It enables developers to build and run applications without having to provision cloud servers. The serverless provider handles the infrastructure and automatically scales the computing resources up or down as needed. This provides a great developer experience since developers can focus on the application code itself, without having to worry about scaling.

The third trend rides on the waves of the first two. It is the proliferation of the client application frameworks and the frontend hosting platforms that make deploying these frontend applications effortless.

Modern Frontend Frameworks and Hosting Platforms

Let’s investigate the third trend a bit more.

In recent years, there has been a significant shift in the way web applications are built. Many of today’s popular applications are what is called a Single Page Application (SPA).

A SPA provides a more seamless user experience by dynamically updating the current page instead of loading a new one each time the user interacts with the application. In a SPA, the initial HTML and its resources are loaded once, and subsequent interactions with the application are handled using JavaScript to manipulate the existing page content.

The traditional way of building web applications, like the one we discussed previously for our e-commerce company Llama, involved serving up new HTML pages from the server every time the user clicked on a link or submitted a form. This conventional model is referred to as a Multi-Page Application (MPA). Each page request typically involves a full page refresh, which could be slow and sometimes disruptive to the user experience.

In contrast, a SPA loads the application’s initial HTML frame and then makes requests to the server for data as needed. This approach allows for more efficient use of server resources. The server is not constantly sending full HTML pages and can focus on serving data via a well-defined API instead.

Another benefit of the API-centric approach of a SPA is that the same API is often shared with mobile applications, making the backend easier to maintain.

SPAs are often built using JavaScript frameworks like React. These tools provide a set of abstractions and tools for building complex applications that are optimized for performance and maintainability. In contrast, building an MPA requires a more server-centric approach, which can be more challenging to scale and maintain as the application grows in complexity.

The rise in popularity of these client frameworks brought a wide array of production-grade frontend hosting platforms to the market. Some popular examples include Netify and Vercel. Major cloud providers have similar offerings.

These hosting platforms handle the complexity of building and deploying modern frontend applications at scale. Developers check their code into a repo, and the hosting platforms take over from there. They automatically build the web application bundle and its associated resources and distribute them to the CDN.

Since these hosting platforms are built on the cloud and serverless computing foundation, using best practices like serving data at the edge close to the user, they offer practically infinite scale, and there is no infrastructure to manage.

This is what the modern frontend landscape looks like. The frontend application is built with a modern framework like React. The client application is served by a production-grade hosting platform for scale, and it dynamically fetches data from the backend via a well-defined API.

Modern Backend Options

As mentioned in the previous section, the role of a modern backend is to serve a set of well-defined APIs to support the frontend web and mobile applications.

What are the modern options for building a backend? The shift is similarly dramatic.

Let’s see what a small, resource-constrained startup should use to build its backend, and see how far such a backend could scale.

When time-to-market is critical and resources are limited, a startup should offload as much non-core work as possible. Serverless computing options are attractive, for the following reasons:

  • Serverless computing manages the operational aspects of the backend, such as scaling, redundancy, and failover, freeing the startup team from managing infrastructure.
  • Serverless computing follows a cost-effective pay-per-use pricing model. There is no up-front commitment.
  • Serverless computing allows developers to focus on writing code and testing the backend without worrying about managing servers, leading to shorter time to market.

What Is App Scaling and Why It Matters

Leave a Reply Cancel reply