Scaling is hard. It's the mantra of lots of startups, it's a sales point for software vendors, and it's true.
It wasn't always. Up to a few years ago, scalability meant your application could handle a few extra people on a sunday night. Nowadays, your app can go viral and have a million new users overnight. The meaning of scalability has changed, and so has the impact of a badly scalable application. There are horror stories about startups that couldn't handle the sudden load and went bust when their application died. There is nothing worse for your reputation than when your website goes down the moment a lot of people want to use it. Suddenly, being able to build something properly scalable is a marketable skill, seen by some as more important than building for correctness or robustness.
Scaling is hard for a reason. Not enough people have done it yet. Sure, there's the Facebooks, Microsofts, Amazons, and Twitters of the world, but ask an average programmer (or architect, for that matter), and they'll look at you with a puzzled expression. It's hard because we're at the start of it all, and the knowledge we have hasn't spread far enough yet. This post hopes to remedy at least a bit of that. I intend to give a few tips and things to think about, in the hopes that you will be able to apply them when needed. They're not technology-specific, or even paradigm-specific, because the rules of scalability are the same no matter what you write. So without further ado...
1. Bottlenecks are everywhere
And they're never where you expect them to be. If your userbase doubles, what part of the application will be first to give? Most will say "the load on the CPU", "not enough memory", "my database might crash", etc. In some cases, that's true. More likely though, is that you'll run into other limits of your system. Maybe it'll be hard drive speed, leading to write and read faults. Maybe it's the network speed. If your application is split over multiple machines, are you sure all those machines can handle the number of requests that might come in? How many requests can your webserver handle per second, and how many can it queue? Is your cache big enough, and is the invalidation system for it robust enough?
It doesn't matter how many boxes you tick, at some point something will give. The clue here is, know which one is most likely to break before you scale. Do load tests, and try to simulate proper user behavior. A load test where you just hit a blank page on the web server is useless if your database cracks. Test thoroughly, and test often. Every major addition to the software, hardware or infrastructure should prompt a new battery of load tests. Call this "scalability hygiene".
Once you know what your primary bottleneck is, fix it and move on. Then fix your secondary and your tertiary, and so on until you're satisfied the application will stay up and running. Don't do this until you're sure about where the problems lie. Remember premature optimization is the root of all evil.
2. Robustness is just as important
The more users you have, the more load you have, and the more machines you need, the more likely it is that something will go wrong. Once you hit a certain scale, it's a guarantee that there will always be some percentage of your infrastructure that is in a failure mode. It's your job to handle these cases gracefully. Google has mastered this, their core functionality keeps working no matter what goes on in their data centers. Some things might run slower, specific (tiny) parts might not work, but the bulk of Google's services almost never goes down.
Endeavor to do the same. Here's a thought experiment: take a UML diagram of your application's architecture on a whiteboard. Take a big red pen, and cross out one of the blocks. Now go through it all, what happens to the other blocks? How much functionality can you still provide without it? What steps do you need to take to ensure you can provide that functionality? Are there single points of failure, blocks that will kill the entire application if they fail? Can they be avoided? Then do the same for 2 blocks, and 3.
Consider using a Chaos Monkey. If none exist for your platform or infrastructure, write a simple one. Don't ever turn it off. It's a great tool for learning where potential disaster are situated, and it's unparalleled for learning about robustness.
3. Load balancers are stupid
I don't mean they are a bad idea. A load balancer is the first part of your application any user will hit, and it is absolutely vital. Love your load balancer, tune it and take good care of it. What I mean is that they are not very intelligent. Because of the amount of data they're supposed to handle, they can be a blindingly stupid part of your infrastructure. The more intelligence you expect from it, the slower it will be, and the more likely that it will end in tears. It would also mean your application is harder to migrate to a different one, since you would need to be sure that the new LB can handle the exact same logic yours does now.
Keep it as dumb as possible. All a load balancer has to do is pick a machine to send a request to, anything else is a risk. This is doubly important if you're using one of the main cloud hosts. All of them have built-in load balancers, that can handle staggering amounts of requests, but that are dumb as nails. Forget doing sticky sessions, forget advanced routing tables or content inspection, they are fire and forget. Build your application around that, by making sure every server on the LB can handle every request.
3. Define a scaling unit
If you carry nothing else from this, remember this heading. A scaling unit is a vertical slice of your application on an infrastructure level. It contains everything, from your front-end server, to your database instance, to your caching and queuing servers. It's the smallest unit of your application that can provide all the functionality, with maximum usage of your resources. In simple applications, this will be a web server and a database server (or instance, for cloud hosts). In advanced applications, the smallest slice might be 15 servers and 5 switches. In Windows Azure Storage, a scaling unit ('stamp') is 10-20 racks of servers with 20PB of storage space. Ideally, under the maximum load the unit can handle, everything in it will be at capacity, with your current primary bottleneck just about holding on. Keep the unit as small as possible, because the smaller it is, the finer you can tune your scale to your needs.
Once you have defined your scaling unit, perform load testing on it, the same testing as above. Make sure you know exactly when your scaling unit is at capacity, when it can handle more and when it will crack, and what its behavior is in the intervals. Most importantly, know exactly how much users your scaling unit can handle. Congratulations, you've defined a minimum scaling step. You now have a template that you can duplicate to scale.
Once you have this down, scaling becomes increasingly easy. If you get close to the tipping point of your first scaling unit, duplicate it. You now have 2 separate instances of your application running side by side, either behind a common load balancer or a more advanced traffic manager. Your application can now handle twice the number of users. Still not enough? Add another. If you've done your testing and building properly, every duplicate of the scaling unit you add increases your maximum userbase by the exact amount your testing showed. It won't matter if your have just one instance of your scaling unit, or a million, it'll be perfectly predictable.
The difficult part in this is making sure data gets to where it belongs. This depends a lot on your application structure. Some share the data among their scaling units through database syncing, others split the units over locations and let a traffic manager send users to the right place. Still others segregate with subdomains. There is no perfect solution, but they all fit an important criterium: they depend as little as possible on the load on the units. What good is having an application that scales perfectly and predictably, but mysteriously give the wrong data after a certain number of scaling units? Make sure that the way you manage your units doesn't ever become a bottleneck.
Don't bother. Not at first. It's very tempting to make scaling an automatic step that you don't have to worry about, but unless you've chosen manually when to scale up and down for a while, you won't know when to do it. Scaling too late can be a problem, but scaling the wrong way or scaling too early can end up costing a lot more money. Start doing it manually, keeping an eye on your metrics, your userbase and your logging, and learn when the best moments are and what the criteria are for needing to scale up. Learn the behavior of your users, and try to divine rules that you can apply so you can anticipate when you will need to do a big or small scaling action.
5. Don't forget to scale down
Last but not least. Not many applications grow in only one direction. Most have peaks where they serve a large number of requests and quiet moments where all is calm. Some grow, only to decline again after a long enough time. Often, your application might need to scale because users are using it differently than you thought. Remember to scale down when the load goes down. If your users' focus has changed, go back and redefine your scaling unit. Maybe you need less of server X and more of server Y in your unit, to handle the most representative load. It might save you a lot of money. And after all, money is the reason you don't just buy enough hardware to handle everybody on the planet.
And many more
I could keep going, but I've taken up enough of your time. There's a million more tricks to do and things to think about, but keep these things in mind and hopefully you'll be able to build an application from scratch, and be confident it will be ready for whatever the future throws at it.