|
Twiddla has been getting a ton of attention this week. We picked up the Technical Achievement award at SXSW Interactive, and have been getting a bunch of good press ever since. 25,000 people have signed up for the service since the award was mentioned, with 7,500 of those signups happening in a single day. It's about to get good.
For me though, it's been even better. We're finally getting enough traffic to start thinking about scaling issues. You might remember an article that I wrote a few months back, where I told people
not to sweat Performance and Scaling issues too much, but rather to focus on Readability, Debugability, Maintainability, and Development Pace. The idea was that getting your product to market quickly and being able to move fast if necessary are more important than having the Perfect Dream System that takes forever to build. Of course, the implied point was that when and if that Big Day came, you'd be able to move fast enough to deal with Scalability and Performance concerns as they appeared.
On March 12th, 2008, I got to see first hand whether I was talking out my arse…
3/11/2008 7:00pm: 150 signups/hr, 50 hits/sec, 0-5% CPU
It's the day after the awards, and the first brief announcements are out. Traffic has been building steadily all day, but we've seen worse. The only crisis at the moment is that we don't yet have a Press Kit, so we're seeing writeups with the old logo and screenshots from the old UI. D'oh!
3/11/2008 11:00pm: 350 signups/hr, 120 hits/sec, 1-9% CPU
Japan wakes up. The Asian press really liked us, so we saw a big spike in users from China and Japan the first few days. The sandbox is pretty clogged, and with 30 people drawing simultaneously it's starting to tax people's browsers. Every once in a while, somebody navigates the sandbox over to a porn site, and people write our support line to complain. We're wiping the sandbox every 5 minutes, but it's still not acceptable. Gotta get a handle on that.
3/12/2008 9:00am: 300 signups/hr, 100 hits/sec, 1-6% CPU
The sandbox is completely overloaded. There are 100 people in there, which is too many people communicating at once for any medium to really handle. Imagine 100 people drawing on a real whiteboard at the same time, or 100 people talking over each other on a conference call. It just doesn't work. To bring a little order into the picture, I fire up the Visual Studio.NET and add a little switcher that will direct traffic to any one of 5 sandboxes, each one holding 8 users. Throw that live, and now there are 5 overloaded sandboxes.
3/12/2008 9:30am: 500 signups/hr, 300 hits/sec, 3-15% CPU
I bump up the sandbox count to 10. Then think better of it and bump it up to 20 before pushing. Then think better of THAT and add a new page to show users in case all 20 of those sandboxes fill up. Push that live.
3/12/2008 9:41am
Testing out the above changes, I am immediately redirected to a page saying "Sorry, all the Sandboxes are full." Let me restate that: From the time I pushed those changes live to the time I could test them out, 160 people had beaten me into the sandboxes. Wow.
3/12/2008 10:00am: 700 signups/hr, 500 hits/sec, 5-20% CPU
Looking through the error logs, I'm starting to see our first concurrency issues. These are the little one-in-a-million things that you'd never find in test, but that happen every ten minutes under load. They're mostly low-hanging fruit, so I spend the next hour patching and re-deploying until the error logs go silent.
3/12/2008 12:00pm: 600 signups/hr, 400 hits/sec, 5-17% CPU
I'd been doing all of this from my sister's house up in Ft. Worth, who I had supposedly been visiting for a couple days, but whose house I had been mostly using for an office (thanks Lisa for tolerating that, and I promise to get out and visit sometime when I'm not trying to launch a new website!) Now I had to hop in the car and drive back to Austin to fly home. Our trusty server will be on its own for the next 12 hours, taking the beating of its life. I won't even know if it goes down.
3/13/2008 4000 signups/day, 100 hits/sec, 3-10% CPU
 Twiddla ArtBack in a stable place, and ready to deal with the flood of feedback emails we've been getting. This part is fun, since most people have nice things to say, and it becomes readily apparent what features everybody wants to see. Nothing has broken, so I actually have some time to put a few minor features live. The "Wite-out" button was added this day, I think, and I re-did the way we handle snapshots and image exporting.
3/14/2008 3000 signups/day, 100 hits/sec, 2-5% CPU
I implemented a fix for the last little concurrency bug that we'd been seeing. Then, while profiling that fix on the server, I noticed that TwiddleBot was flipping out. TwiddleBot is the little service that runs the Guided Tour feature, and is also responsible for clearing out the sandboxes from time to time. Turns out, he was also pounding the database 20 times a second, asking for instructions. Hmm… Chill, TwiddleBot. Pushed a fix for that, and suddenly CPU usage dropped to zero. Like, ZERO! Every 5 seconds, it would spike up to 1%. Cool. I think we're gonna be able to scale this thing…
One week later, ~1000 signups per day, 50 hits/sec, 0% CPU
In the end, we came through our first little scaling event rather well. We were actually a bit over-prepared. Our colocation facility ( Easystreet in Beaverton, Oregon) had a couple extra boxes waiting to go for us, and I had taken the time a week earlier to write up and test a little software load balancer to allocate whiteboard sessions to various boxes when needed. In the end, we didn't get to try any of that out. Hell, we never spiked the processor on our one server over 50%. I'd love to congratulate myself for the design choices I made all those months back when I wrote that article, but I think it's still too early in the game to conclude that we'll really scale when we ramp up to the next level.
Still, it's worth noting that everything in Twiddla was built using the simple, Readable, Debuggable backend that we've been using on our more pedestrian sites for years, and it held up just fine under traffic. When it turned out that parts of that backend needed refactoring to handle the kind of concurrency we saw last week, it was a simple 5 minute task to crack open the code, find what needed to change, and change it.
Readable, Debuggable, Maintainable. That's the plan. Thus far, that has enabled us to keep on top of any Performance and Scalability issues that have come along. With luck, things will continue to work that way! Labels: best practices, dogfood, Scalability, software
If you've come within 30 feet of the internet this last month, you'll have come across this list of best practices at least a dozen times. Everybody seems to be writing about it and linking to it and building little tools that tell you you're not doing it right.
 Most of the stuff on that list is low hanging fruit. You can spend 5 minutes in IIS, flipping compression on and telling all your /images/ directories not to expire content until we're all driving flying cars, and suddenly you'll find your site loading a lot faster.
That's cool and all, but what if you also followed their advice and stuck a bunch of your static content out on Amazon S3? I guess you just fire up S3Fox and start playing with the metadata on all those… whoa, hang on… hey, you can't change that stuff once it's written. Crap. You've gotta upload all those files again. And you can't use that cool Firefox tool to do it anymore, because it has no way to set an "Expires" header when you upload a file. Crap. Crap. Crap.
Well if you're running C# and ASP.NET, you're in luck. Because I just went through that pain for a few of my sites, and now I'm going to let you mooch off my code.
First step: download the right library from Amazon
In this case, you're going to need the Amazon S3 REST Library for C#. No, not the SOAP library, because evidently that one is crap. Either drop the source straight into your project or build it elsewhere and link it in.
Last step: swipe this code
This zip contains everything you'll need. Just airlift it into your project and you'll be good to go. Now, since this is an article about programming, I'm legally obligated to provide at least one code sample for you to gloss over. So here is the meat of what we're doing:
public void PushToAmazonS3ViaREST(string bucket, string relativePath, HttpServerUtility server)
{
relativePath = relativePath.TrimStart('/');
string fullPath = _basePath + relativePath.Replace(@"/", @"\");
AWSAuthConnection s3 = new AWSAuthConnection(_publicKey, _secretKey);
string sContentType = "image/jpeg";
SortedList sList = new SortedList();
sList.Add("Content-Type", sContentType);
// Set access control list to "publicly readable"
sList.Add("x-amz-acl", "public-read");
// Set to expire in ten years
sList.Add("Expires", GetHttpDateString(DateTime.Now.AddYears(10)));
S3Object obj = new S3Object(FileContentsAsString(fullPath), sList);
s3.PutObjectAsStream(bucket, relativePath, fullPath, obj.Metadata);
}
There's only two lines you need to care about if you're using S3 to host web content, and they're both commented. One sets the file to be readable by the public, and the other tells it not to expire until after you've left the company. Sorted.
I've included a cheesy .aspx page that you can use to push your files by hand. Hopefully you can figure out how to change which directories it's putting in the list, and how to add your own. It's actually pretty ugly code, but hey, it's just an admin tool that you'll only run a few times in your life.
Be Warned though: I've stripped out the security that keeps people from the outside world (and GoogleBot) from hitting this page and bogging your server. If there's any chance that this might escape to the live site, be sure to lock it down so that you can't see it unless you're logged in as an admin!
Anyway, I hope you find some use out of that code. I certainly wasn't planning to publish it, so please refrain from mentioning the 47-odd things in it that you should never do in production!
Enjoy!
paint chat softwareLabels: best practices, development, Performance, Scalability, software
Back in my Contractor days, I would occasionally take a job bringing a bunch of C++ guys up to speed in C# and ASP.NET. Invariably, I would have to break them of old habits that they had picked up back in the days when memory and hard drive space were expensive, and applications had to run in real time. Most of these little battles were quickly won, so flat files were replaced by relational databases, bit masks gave way to association tables, and data access code was pulled out into its own layer.
But one thing never went over well. Performance. Speed is largely irrelevant for a web application. Sure, it's important that your thing run fast, but there are a half dozen other things that are more important for a big web application. This is difficult to hear if your major skill is writing inline assembly for critical routines, but it's still the truth. Readability, Debugability, Maintainability and Development Pace are much more important than raw speed.
To deal with this rift, I would ask the developers to list out the most important qualities of a piece of software, and to rank those qualities in order. I've hinted at my answer above, but I'll take a few minutes to list them out below. Everything you see in the list is important, but the things toward the top are relatively more important than the ones towards the bottom. For what it's worth, we're talking about Web Applications here, so clearly this list does not apply to Game Development or even Windows Apps. Here goes:
Readability
In my mind, this is the single most important quality of a piece of software. Assuming your thing is going to be around for a while, you're always going to need to return to a given piece of code from time to time and make modifications. The faster you can read and understand what's going on, the sooner you will be able to start making modifications and adding new functionality. Better still, if you can quickly figure out what the code is doing and why, you'll be less likely to break anything in the process.
Debugability
Your code is going to break. Often. That's how it goes, so you'd better structure things so that it's easy to step through and figure out what's going on. That means declaring variables instead of stringing together 17 object methods on a single line. That means using real IF/THEN/ELSE blocks with squiggly brackets instead of inlined immediate if's. And it means thinking twice before committing to some automagically generated database framework that sniffs out all your column names, writes its own SQL, and keeps your data in ArrayLists of ArrayLists.
Keep your design simple enough that any exception will drop you into the debugger looking at a single line of code that does a single thing. Even if it turns out it's doing that single thing wrong, at least you'll be able to find and fix it.
Maintainability
Over time, new features are going to get added and old features are going to get dropped. Some of those new features will be stupid ones, with dorky business logic that rubs the fur the wrong way in your elegantly designed class structure. You want to be able to make those changes quickly, without breaking anything else. This means you need unit testing. You'll also want to refactor large sections of your backend to work in ways you had never anticipated, and you'll need to propagate those changes all the way out to the client code. For that, you'll need even more unit tests (and some good tools), but also you'll need an architecture that doesn't fall apart when you rip chunks out of it.
Development Pace
Modern applications are big and complicated. It doesn't matter how nicely written your thing is or how many simultaneous users it can support if you never manage to get it out the door. If you want to get your application shipped, you're going to need to put out a ton of code in a hurry. That means you're going to need the best tools available, and the most productive environment that you can find.
Side Note: PHP might seem fast if you've never seen the alternatives, but let's see how many Ex-Ruby-on-Rails and Ex-ASP.NET guys you can find doing PHP development by choice.
Keeping the above points in mind, you're going to want a development framework of some description. Here at Expat, we've rolled our own specifically to keep us fast without sacrificing Readability, Debugability, or Maintainability. I'd recommend doing the same, but there are any number of 3rd party frameworks out there that might fit the bill. Just make sure you keep those three qualities in mind when you are evaluating any new framework.
Scalability
At some point, your thing is going to get popular. Actually, chances are it won't, but you shouldn't architect your thing to preclude the possibility that people might start using it in the Millions. So how do you pull that off without undoing all those Important Things further up this list? Simple. Just be aware that one day you might need the ability to add more database and web servers to the mix. Add a few little abstractions such as a Database Connection Factory, and a Session wrapper that you can replace someday with something BEEFY. For now, they don't have to do anything fancier than wrapping the existing stuff in whatever framework you're using. But if you're diligent in using these wherever you would normally use the framework components, you might end up saving yourself a lot of headache down the road.
For the most part though, don't worry too much about scalability. Having a million people that want to use your thing on a single day is a good thing. If you've done a little homework, you'll work things out when the time comes.
Performance
Computers are fast. Seriously, computers are faster than you think. If you try to imagine which piece of your application is slow, you're probably wrong. I once worked with a developer who spent the better part of 6 months hand optimizing an algorithm to do fast fuzzy string comparisons. It turned out that the server doing the text processing was only spending about 10% of its time actually processing text (even with a simplified, non-optimized algorithm), and 90% of its time battling database locks to get the results put away. He could have figured this out in one day with a profiler, and then spent a few hours tweaking database indices and optimizing queries. Instead, he spent half a year solving the wrong problem.
So yeah, keep a profiler handy, and if you see something that is obviously taking a lot of extra time, go ahead and fix it. But don't spend too much time sweating performance issues. At least, wait until they present themselves as issues before you start sweating them!
Life imitates Rant...
As I write this, Blogabond (one of my diversions from real work) is starting to show its first signs of scaling pain. Every once in a while, a misbehaved crawler will swing by and hit it 500 times in a second, causing SQL Server to time out on a specific long-running query. This is a good thing in my mind, as it gives me a chance to tackle a potential bottleneck before it starts affecting real users.
Still, Blogabond has been up and running for almost two years now, and it is only now that I'm having to think about performance at all. Those other qualities though: Readability, Debugability, Maintainability, Development Pace. I'm seeing benefits from them every day. Labels: best practices, Debugability, Development Pace, Performance, Priorities, Readability, Scalability
|
|