Congratulations to you, budding entrepreneur! You’ve successfully navigated the difficult, even emotional process of slimming down your grand product vision into a lean, focused MVP, and more importantly have gotten it built and launched. While you haven’t yet gone full tilt on marketing and customer acquisition, you are starting to see some traction, maybe some organic growth, or other evidence supporting your product hypothesis. Mazeltov!
Understandably this is an exciting and extremely busy time for you. You’re gathering feedback, establishing a plan for growth, figuring out the rubric of marketing, monetization strategies, pricing models, and all that. There are still bugs to be fixed and basic maintenance to be performed, analytics to be gathered and, well — analyzed. But for the most part, the storm of technical execution has died down. You want to be sufficiently armed with data before coming up with too many drastic changes or new features. Sounds like you’re doing everything right.
But is your app really ready to handle the growth you’re striving for?
Understandably, you probably haven’t spent much time yet obsessing over average request processing times, total request throughput, or doing load testing on your young MVP. You knowingly and rationally incurred some technical debt by not spending valuable development time on fully optimizing an app that will take time to accrue users.
But now you need to accrue users. Hopefully loads of them. I think people in this situation tend to think “OK I’ll just need to get more stuff.” But the truth is, horizontally scaling isn’t always the solution. Sure, at some point, most successful apps will need to do it. There’s a theoretical limit, after all, in how much work one unit of computing can do at any given time. And in certain scenarios (Heroku comes to mind) scaling horizontally can be a quick, temporary solution to unpredicted traffic spikes. But it isn’t a magic bullet; it comes with its own set of caveats. For one, increasing the number or size of one type of resource might not make any difference in overall performance — a “bottleneck” effect. And sometimes scaling one layer in the stack can actually be deleterious. More concurrency can mean more contention for resources, and synchronizing access to those resources can be expensive. I’ve worked on systems where increasing the number of processes concurrently writing to a content store actually decreases data throughput into it. So much time is spent managing who can write, and when, that performance degrades overall.
There’s good news, though. There are likely massive improvements to be made in app performance post-MVP with a relatively low amount of development and architectural effort. Focusing on this vertical scalability now means that when you do scale horizontally, you’ll get much more bang for your buck as each unit you add on is capable of doing much more work. But before running headlong into your codebase to hack, you should spend some time identifying problem areas. Scour your logs for types of requests that have high average or extremely variable response times. What’s high? It depends on who you ask, but anything that on average is taking hundreds of milliseconds, instead of tens, is a good place to start. Chances are those operations have the most room for improvement. Once problem areas are identified, here are some things you may find that, once addressed, might confer huge benefits.
1) Serving static content
I know what you’re thinking: “I already have apache or nginx serving my static assets, so it doesn’t affect my app performance.” Nonsense! Why is your app bothering with static content at all? Add a memory based reverse proxy cache (like Varnish) or use a CDN. Apache and nginx spool assets from disk. In a typical LAMP configuration, this is the same disk that your datastore is reading and writing to. And even though disks and operating systems do their own caching, performance will never approach what Varnish can muster. A CDN goes a step further and keeps static asset requests away from your application almost entirely, and has the added benefit of responding from geographically variable locations based on where the client is. There may also be requests that are handled by your app server, but change infrequently. Obviously, these too can — and should — be cached in the same way as static assets.
2) 100% synchronous database writes
In your app there are likely places where you are doing synchronous database writes on the responder thread, even though it is perfectly fine if the data isn’t written until later. Why is your user waiting for your system to do something that doesn’t affect their experience? Send the data to a memory, memcache, or redis based asynchronous queue for processing. In fact, any operation that is non-critical to responding adequately to a request should be processed asynchronously, especially sending transactional emails or doing data aggregation.
3) Using SQL databases for everything
There are a range of persistence options available to your app. Don’t assume you can only use one at a time, or that it has to be a SQL database. Take high frequency, transient, non relational, non transactional data interactions and move them to Redis, Memcache, Mongo, or any other such store that crushes Postgres and MySQL in terms of write and/or query performance.
4) Not properly indexing tables/collections
This should go without saying, but properly designed database indexes drastically increase performance in large data tables/collections. You may not notice it at first, but as your data grows indexes can keep query performance mostly constant. Of course care should be taken not to over-index a particular table or collection as it can negatively impact write performance.
5) Not using application level memory caches
You should use memcache to cache any data your app uses that does not need to be 100% real time. This is especially important if the data is stored in SQL databases. MongoDB query performance is more in the same ballpark as memcache key retrieval, but is still not quite as fast. Rails — and I’m sure other stacks as well — can also cache action/template output whole hog, if you need to do request filtering even when content doesn’t change frequently.
6) Serving model/information heavy pages
Why are you responding to a request for a page with all the information that may possibly ever be presented on it? For heavy pages, respond with enough to present the UI to the user, and wait until the user intends to interact with a specific set of data before retrieving it via AJAX.
7) Doing operations on your server that you could do on the client (especially third party API requests)
While sometimes it is necessary for your app to sign or otherwise provide authentication for an API request, generally it’s the client who is the authorized, trusted party, not your app. Facebook, google, youtube, and twitter APIs can all be interacted directly with by the client once authorized. Oftentimes, no authorization is even required. Proxying these services is a waste of your app server’s time. There are many other operations that clients can capably perform, too. Sorting, file processing using HTML5′s File Api, you name it. Client computing power scales naturally: more clients means more available resources.
Once you’ve gotten your app nicely tuned with respect to these kinds of items, you should have some breathing room within your current architectural configuration for increases in load. Eventually, when you’re well funded and have full time, dedicated technical resources, you’ll probably want to move off of managed hosts like Heroku and on to your own infrastructure, where you can get substantially more computing resources for the money. Until then, I hope I’ve convinced you that there are many tools at your disposal to help bridge the gap when you are beginning to scale.