|
StackOverflow.com
just launched this last week, and it looks pretty cool. It seems like it might be our best shot at getting back to the sort of useful discussion that we used to have on the Usenet back in the 90's. Lots of signal, hardly any noise, and even the occasional correct answer. Sign me up!
Uh... wait a sec... I can't sign up.
StackOverflow has made the inexplicable blunder of requiring its users to sign in via OpenID. That means you can't simply pick a username and password, but must instead go away and find yourself an OpenID provider, sign up for that, and bring it back to StackOverflow. It's like 14 steps, depending on which provider you choose. Observe:
- Click login
- Read a ton of instructions
- Locate and click the "get one" link
- Dismiss the javascript error popup from openid.net
- Read a bunch more instructions
- Find and click the "ClaimID" link (it's the first one on the list of providers)
- Click "Create a new account"
- Type in your information
- Open your email, find their email, click the link
- Go back to StackOverflow, click login again
- Paste in that giant URL that is now your OpenID
- Type in your Username & Password
- Type in a bunch of Personal Info
- ... and you're in! Easy as that!
Now, for sake of comparison, let's take a look at the steps required to start using
Twiddla
(the web meeting playground that we've been working on these last several months here at Expat):
Can you spot the difference?
Look, it's not just me saying this. Talk to any Usability expert you like, and they'll tell you that every barrier that you put in front of your users will cause a certain percentage of them to leave and not come back. For most sites, even stopping to ask for a Username & Password is too intrusive. That's why we built Twiddla the way we did.
Our stated goal with Twiddla is to get the hell out of your way so that you can get some work done. We've taken that idea so far that most of our users will never see a login screen of any description. Some might not ever know they've used Twiddla at all, since we keep our Logo hidden away in the corner where it's not in your way.
Can we say the same about StackOverflow's new registration system? Unfortunately not. For me, it was 10 minutes of grumbling "StackOverflow", "F'ng StackOverflow" under my breath while stumbling through the painful OpenID signup process. Complete usability failure. I can only hope they'll come to their senses and put in a reasonable username/password login like everybody else. Labels: best practices, development, frustration, software
I just don't get it.
How can so many smart people be so collectively bad at something as simple as Javascript on a web page?
It's just not that hard. And yet, not an hour goes by when I'm not stopped in my tracks by at least one javascript error. And it's especially sad because many of these errors are coming from well known sites, with huge development budgets and plenty of good talent that really should know better. Observe:
That was just a ten minute sample of browsing today.
The One Rule of DHTML Programming
Look, it's not that hard to do this stuff right. In fact, here is everything you'll ever need to know about Dynamic HTML Programming with Javascript:
Test EVERYTHING before you reference it.
That's it. Simple. Every little scrap of code you write needs to live inside its own little IF block that tests to make sure that the things it's expecting to interact with really exist. Here's how:
BAD:
gbN2Loaded.style.display='none';
|
Good:
if (window.gbN2Loaded)
{
gbN2Loaded.style.display='none';
}
|
BAD:
document.getElementById('myDiv').innerHTML
= 'stuff';
|
Good:
if (document.getElementById('myDiv'))
{
document.getElementById('myDiv').innerHTML
= 'stuff';
}
|
I don't care that Google's API Reference told you to put <body onunload='GUnload()'> into all your pages. That's just example code, and it's not intended to be used in the real world.
Real World Javascript will need to survive in dozens of strange browser environments that do things in strange unexpected ways, and as soon as you get it working right, Junior Dev Jimmy will accidently include it on every single page on your site and suddenly it won't be able to find the things it needs to live. When that happens, it needs to quietly stop trying to do stuff instead of throwing error messages all over the place.
What you need to do about it
Ok, cool, you've fixed everything, but you're not done yet. There's one more thing you need to do right this second. You need to turn on those annoying Script Error popups in both Internet Explorer and Firefox, and you need to keep them on from here on out. Don't just do it for your own machine, but for every computer owned by every employee of your company.
Yes, I know that you turned them off on purpose because they make the internet basically unsurfable, but most casual users of your site will have them on by default. That means that most casual users will see every single little script error that your site throws at them, and they won't like it. Those errors are pissing off real people right this minute, and you need to know about it. If you arrange it so that they start pissing off you and your co-workers too, you just might find the incentive to get rid of them once and for all. Labels: best practices, frustration, software
PHP used to have a cool little feature where it would automatically detect single quotes in text strings and escape them for you whenever they needed to be. It was called Magic_quotes_runtime. Maybe you've heard of it. It was a disaster.
Countless developer hours were spent trying to chase down mysterious runtime errors where single quotes were either introduced, doubled up, or removed, causing disastrous crashes, data corruption and so much untold havoc that the feature was deprecated and eventually removed from PHP entirely.
You would think that people would have learned their lesson.
People are, by and large, dumb. We make the same dumb mistakes over and over again because we didn't bother to do any research or read about the last time that somebody tried whatever stupid idea we just re-invented. As a result, we have development frameworks and tools like Hibernate, ASP.NET's SmartNav, and Rails' ActiveRecord, all trying to magically solve problems that weren't very hard in the first place, and silently making a lot of people's lives a lot harder without them even realizing it.
The big problem with Magic tools is that they work fine the first time you try them. "Wow!", you say, " It posted the page back and scrolled my browser back down to the Submit button!" So you turn that feature on for all your pages and start to trust it. You get used to it. You take it for granted. You forget you're even using it. Then suddenly something weird starts happening with one of your pages and you can't figure out why.
Examples of this sort of side effect abound, but nobody yet has taken a stand and done something about it. How many developer hours have been lost trying to figure out what magical SQL statement was running behind the scenes and only “Hibernating” half of an object? How many CPU cycles have been squandered (and slanderous blog entries written) because some poor developer didn’t realize that ActiveRecord was hitting the database three times for every single row in that recordset? Are we really so scared of Outer Joins that we allow ourselves to be subject to this torment?
I’ll leave you with an axiom that I’ve been telling developers for years without much success. Call it Kester’s Caution:
Never use any language feature that describes itself as "Smart" or "Magic."
Such features will invariably be trying to abstract
out some behavior that is not that hard to deal with anyway, and will
make any number of incorrect assumptions about your application that
will result in strange behavior cropping up that could possibly be
described as "Magic", but certainly would never be labeled "Smart".
Labels: best practices, software
Twiddla has been getting a ton of attention this week. We picked up the Technical Achievement award at SXSW Interactive, and have been getting a bunch of good press ever since. 25,000 people have signed up for the service since the award was mentioned, with 7,500 of those signups happening in a single day. It's about to get good.
For me though, it's been even better. We're finally getting enough traffic to start thinking about scaling issues. You might remember an article that I wrote a few months back, where I told people
not to sweat Performance and Scaling issues too much, but rather to focus on Readability, Debugability, Maintainability, and Development Pace. The idea was that getting your product to market quickly and being able to move fast if necessary are more important than having the Perfect Dream System that takes forever to build. Of course, the implied point was that when and if that Big Day came, you'd be able to move fast enough to deal with Scalability and Performance concerns as they appeared.
On March 12th, 2008, I got to see first hand whether I was talking out my arse…
3/11/2008 7:00pm: 150 signups/hr, 50 hits/sec, 0-5% CPU
It's the day after the awards, and the first brief announcements are out. Traffic has been building steadily all day, but we've seen worse. The only crisis at the moment is that we don't yet have a Press Kit, so we're seeing writeups with the old logo and screenshots from the old UI. D'oh!
3/11/2008 11:00pm: 350 signups/hr, 120 hits/sec, 1-9% CPU
Japan wakes up. The Asian press really liked us, so we saw a big spike in users from China and Japan the first few days. The sandbox is pretty clogged, and with 30 people drawing simultaneously it's starting to tax people's browsers. Every once in a while, somebody navigates the sandbox over to a porn site, and people write our support line to complain. We're wiping the sandbox every 5 minutes, but it's still not acceptable. Gotta get a handle on that.
3/12/2008 9:00am: 300 signups/hr, 100 hits/sec, 1-6% CPU
The sandbox is completely overloaded. There are 100 people in there, which is too many people communicating at once for any medium to really handle. Imagine 100 people drawing on a real whiteboard at the same time, or 100 people talking over each other on a conference call. It just doesn't work. To bring a little order into the picture, I fire up the Visual Studio.NET and add a little switcher that will direct traffic to any one of 5 sandboxes, each one holding 8 users. Throw that live, and now there are 5 overloaded sandboxes.
3/12/2008 9:30am: 500 signups/hr, 300 hits/sec, 3-15% CPU
I bump up the sandbox count to 10. Then think better of it and bump it up to 20 before pushing. Then think better of THAT and add a new page to show users in case all 20 of those sandboxes fill up. Push that live.
3/12/2008 9:41am
Testing out the above changes, I am immediately redirected to a page saying "Sorry, all the Sandboxes are full." Let me restate that: From the time I pushed those changes live to the time I could test them out, 160 people had beaten me into the sandboxes. Wow.
3/12/2008 10:00am: 700 signups/hr, 500 hits/sec, 5-20% CPU
Looking through the error logs, I'm starting to see our first concurrency issues. These are the little one-in-a-million things that you'd never find in test, but that happen every ten minutes under load. They're mostly low-hanging fruit, so I spend the next hour patching and re-deploying until the error logs go silent.
3/12/2008 12:00pm: 600 signups/hr, 400 hits/sec, 5-17% CPU
I'd been doing all of this from my sister's house up in Ft. Worth, who I had supposedly been visiting for a couple days, but whose house I had been mostly using for an office (thanks Lisa for tolerating that, and I promise to get out and visit sometime when I'm not trying to launch a new website!) Now I had to hop in the car and drive back to Austin to fly home. Our trusty server will be on its own for the next 12 hours, taking the beating of its life. I won't even know if it goes down.
3/13/2008 4000 signups/day, 100 hits/sec, 3-10% CPU
 Twiddla ArtBack in a stable place, and ready to deal with the flood of feedback emails we've been getting. This part is fun, since most people have nice things to say, and it becomes readily apparent what features everybody wants to see. Nothing has broken, so I actually have some time to put a few minor features live. The "Wite-out" button was added this day, I think, and I re-did the way we handle snapshots and image exporting.
3/14/2008 3000 signups/day, 100 hits/sec, 2-5% CPU
I implemented a fix for the last little concurrency bug that we'd been seeing. Then, while profiling that fix on the server, I noticed that TwiddleBot was flipping out. TwiddleBot is the little service that runs the Guided Tour feature, and is also responsible for clearing out the sandboxes from time to time. Turns out, he was also pounding the database 20 times a second, asking for instructions. Hmm… Chill, TwiddleBot. Pushed a fix for that, and suddenly CPU usage dropped to zero. Like, ZERO! Every 5 seconds, it would spike up to 1%. Cool. I think we're gonna be able to scale this thing…
One week later, ~1000 signups per day, 50 hits/sec, 0% CPU
In the end, we came through our first little scaling event rather well. We were actually a bit over-prepared. Our colocation facility ( Easystreet in Beaverton, Oregon) had a couple extra boxes waiting to go for us, and I had taken the time a week earlier to write up and test a little software load balancer to allocate whiteboard sessions to various boxes when needed. In the end, we didn't get to try any of that out. Hell, we never spiked the processor on our one server over 50%. I'd love to congratulate myself for the design choices I made all those months back when I wrote that article, but I think it's still too early in the game to conclude that we'll really scale when we ramp up to the next level.
Still, it's worth noting that everything in Twiddla was built using the simple, Readable, Debuggable backend that we've been using on our more pedestrian sites for years, and it held up just fine under traffic. When it turned out that parts of that backend needed refactoring to handle the kind of concurrency we saw last week, it was a simple 5 minute task to crack open the code, find what needed to change, and change it.
Readable, Debuggable, Maintainable. That's the plan. Thus far, that has enabled us to keep on top of any Performance and Scalability issues that have come along. With luck, things will continue to work that way! Labels: best practices, dogfood, Scalability, software
Edit:
 Web Stats for Amazon S3
This was written before we launched S3stat, a service that parses your Amazon S3 server access logs and delivers usage reports back to your S3 bucket.
So if you're not interested in the technical details, and just want web stats for your S3 account, you can head over to www.S3stat.com and save yourself a bunch of hassle.
Amazon's Simple Storage Service (S3) is a great content delivery network for those of us with modest needs. It's got all the buzzwords you could possibly want: geo-targeted delivery, fully redundant storage, guaranteed 99.9% uptime, and a bunch of other stuff that you could never pull off on your own. And it's dirt cheap.
Of course, there's always a catch, and in S3's case you'll soon find that your $4.83 a month doesn't buy you much in the way of reports. With some digging around at Amazon's AWS site, you can find out how much you were charged last month, but that's about it. (OK, If you're persistent, you can download a CSV report full of tiny fractions of pennies that, when added together, tell you how much you were charged last month.)
The Motivation
I love my web statistics. I'm up and waiting at 12:07am every morning for the nightly Webalizer job to run so that I can see how many unique visitors came in to Blogabond today (1227), and what they were searching for ( tourist trap in Beijing). I've been hosting my user's photos out at S3 for a few months now, and though I've watched my bandwidth usage drop through the floor, I've also been missing my web stats fix for all those precious pageviews. Something had to be done. I started digging around through Amazon's AWS docs.
It turns out that you can actually get detailed usage logs out of S3, and if you're willing to suffer through some tedium, you can even get useful reports out of them.
Setting it up
Turning on Server Access Logging is just about the easiest thing that you can do in S3. If you've ever tried to use Amazon's APIs, you can translate that to mean that it's hard. It takes two steps, and unless you're looking at a Unix command prompt, you'll need to write some custom code to pull it off. Here's what you do:
1. Set the proper permissions for the bucket you'd like to log. You'll need to add a special Logging user to the Access Control List for the bucket, and give that user permission to upload and modify files.
2. Send the "Start Logging" command, including a little XML packet filled with settings for your bucket.
The nice people at Amazon have put together a simple 4 page walkthrough that you can follow to accomplish the above. I've run through it, and it works as advertised
Parsing the logs
Now we're getting to the fun part. Remember above where we noted that S3 has servers living all over the world delivering redundant copies of your content to users in different countries? Well now we get to pay the price for that. You see, Amazon sort of punted on the issue of how to put all those server logs back together into something you can use. Instead, every once in a while, each server will hand you back a little log fragment containing anywhere between 1 and 1,000,000 lines of data. Over a 24 hour period, you can expect to accumulate about 200 files, ordered roughly by date but overlapping substantially with one another.
So, now in order to get a single day's logs into a usable form, we get to:
3. Download the day's logs. This is simple enough, as the S3 Rest API gives us a nice ListBucket() method that accepts a file filter. We can ask for, say all files that match the pattern "log/access_log-2007-10-25-*", and download each file individually. We'll end up with a folder containing something like this:
10/30/2007 02:13 PM 21,380 access_log-2007-10-25-10-22-37-2C695527C7FEAEE5
10/30/2007 02:13 PM 19,653 access_log-2007-10-25-10-22-37-8FFF80109E278103
10/30/2007 02:13 PM 15,829 access_log-2007-10-25-10-23-24-D97886677E5A8670
10/30/2007 02:13 PM 185,195 access_log-2007-10-25-10-24-11-7F5172BFA139167D
10/30/2007 02:13 PM 94,795 access_log-2007-10-25-10-27-14-3EDC4E89A03E96EB
10/30/2007 02:13 PM 3,812 access_log-2007-10-25-10-32-20-DD96FC8F8B880232
10/30/2007 02:13 PM 121,863 access_log-2007-10-25-10-33-59-A44E699EE741CEF7
10/30/2007 02:13 PM 51,315 access_log-2007-10-25-10-39-52-313F98B8F52AA150
10/30/2007 02:13 PM 34,984 access_log-2007-10-25-11-18-37-DE9AB5D324881BC2
10/30/2007 02:13 PM 8,451 access_log-2007-10-25-11-22-16-BC5BCE4A49C4EC44
10/30/2007 02:13 PM 10,271 access_log-2007-10-25-11-22-54-54F77DE85AD20F84
10/30/2007 02:13 PM 14,949 access_log-2007-10-25-11-23-28-08D3DED923404EA5
4. Transform columns from S3's Server Access Log Format into the more useful Combined Logfile Format. In the Unix world, we could easily pull this off with sed. In this case though, we might actually want to process each line by hand, since we still need to...
5. Concatenate and Sort records into a single file. There are lots of ways to accomplish this, and they're all a bit painful and slow. When I did this myself, I wrote a little combined transformer/sorter that spun through all the files at once and accomplished steps 4 and 5 in a single pass. Still, there's lots of room here for speed tweaking, so I'll leave this one as an exercise for the reader.
6. Feed the output from Step 5 into your favorite Web Log Analyzer. This is the big payoff, since you'll soon be looking at some tasty HTML files full of charts and graphs. I prefer the output produced by The Webalizer, but there are plenty of free and cheap options out there for this.
Wrapping up
And that's about it. Now all that's left is to tape it all together into a single script and set it to run as a nightly job. Keep in mind that S3 dates its files using Greenwich Mean Time, so, depending where you live, you might have to wait a few extra hours past midnight before you can process your logs.
All together, this took me a little more than a day of effort to get a good script running. It wasn't easy, but then nothing about administering S3 ever is.
Epilogue (the birth of S3STAT.com)
I went through this pain and wrote this article about a week ago. Before posting it, it occurred to me that hardly anybody will ever actually follow the steps that I outlined above. It's just too much work, with too little payoff.
What the world needs is a simple service that people can use to just automate the process. Type in your access keys and bucket name, and it will just set everything up for you.
Let's see... People need this thing... I've already built it... ...umm... Hey! I've got an idea!
 So yeah, get yourself over to www.s3stat.com and sign up for an account. It's a service that does everything I described above, and gives you pretty charts and graphs of your S3 usage without any setup hassle. At some point I'm going to start charging a buck a month to cover the bandwidth of moving all those log files around, but for now I just want to get some feedback as to how it's working.
Let me know what you think! Labels: best practices, Performance, S3, S3stat, software
If you've come within 30 feet of the internet this last month, you'll have come across this list of best practices at least a dozen times. Everybody seems to be writing about it and linking to it and building little tools that tell you you're not doing it right.
 Most of the stuff on that list is low hanging fruit. You can spend 5 minutes in IIS, flipping compression on and telling all your /images/ directories not to expire content until we're all driving flying cars, and suddenly you'll find your site loading a lot faster.
That's cool and all, but what if you also followed their advice and stuck a bunch of your static content out on Amazon S3? I guess you just fire up S3Fox and start playing with the metadata on all those… whoa, hang on… hey, you can't change that stuff once it's written. Crap. You've gotta upload all those files again. And you can't use that cool Firefox tool to do it anymore, because it has no way to set an "Expires" header when you upload a file. Crap. Crap. Crap.
Well if you're running C# and ASP.NET, you're in luck. Because I just went through that pain for a few of my sites, and now I'm going to let you mooch off my code.
First step: download the right library from Amazon
In this case, you're going to need the Amazon S3 REST Library for C#. No, not the SOAP library, because evidently that one is crap. Either drop the source straight into your project or build it elsewhere and link it in.
Last step: swipe this code
This zip contains everything you'll need. Just airlift it into your project and you'll be good to go. Now, since this is an article about programming, I'm legally obligated to provide at least one code sample for you to gloss over. So here is the meat of what we're doing:
public void PushToAmazonS3ViaREST(string bucket, string relativePath, HttpServerUtility server)
{
relativePath = relativePath.TrimStart('/');
string fullPath = _basePath + relativePath.Replace(@"/", @"\");
AWSAuthConnection s3 = new AWSAuthConnection(_publicKey, _secretKey);
string sContentType = "image/jpeg";
SortedList sList = new SortedList();
sList.Add("Content-Type", sContentType);
// Set access control list to "publicly readable"
sList.Add("x-amz-acl", "public-read");
// Set to expire in ten years
sList.Add("Expires", GetHttpDateString(DateTime.Now.AddYears(10)));
S3Object obj = new S3Object(FileContentsAsString(fullPath), sList);
s3.PutObjectAsStream(bucket, relativePath, fullPath, obj.Metadata);
}
There's only two lines you need to care about if you're using S3 to host web content, and they're both commented. One sets the file to be readable by the public, and the other tells it not to expire until after you've left the company. Sorted.
I've included a cheesy .aspx page that you can use to push your files by hand. Hopefully you can figure out how to change which directories it's putting in the list, and how to add your own. It's actually pretty ugly code, but hey, it's just an admin tool that you'll only run a few times in your life.
Be Warned though: I've stripped out the security that keeps people from the outside world (and GoogleBot) from hitting this page and bogging your server. If there's any chance that this might escape to the live site, be sure to lock it down so that you can't see it unless you're logged in as an admin!
Anyway, I hope you find some use out of that code. I certainly wasn't planning to publish it, so please refrain from mentioning the 47-odd things in it that you should never do in production!
Enjoy!
paint chat softwareLabels: best practices, development, Performance, Scalability, software
Let's say you're a freelance web designer, and you come across a project description for, say, the redesign of a little site that lets you host web meetings. This could be your ticket to fame and fortune. But how should you proceed? Well, frankly, you could have proceeded in just about any moderately professional way imaginable last week and scored that gig, but instead you did the following. All 200 of you.
 Solid Gold!!! 1. Send out a Canned Proposal. And not just any canned proposal, send a giant letter filled with the entire contents of your website, but without the proofreading. Make sure it reads like a long-form sales letter, complete with opening phrases such as "Webmaster, your search is over! We're just the candidate for you!!!", but with more exclamation points. Be sure to include links to at least 500 websites that you may have been remotely involved with.
2. Send that Canned Proposal within the first 20 minutes, because time is of the essence. You want to make the statement that "I Didn't Give This Any Thought At All!" And you want to make that statement fast. Good employers tend to hand out knowledge work on a first-come, first-served basis, so you want to make sure you end up on top of the stack. Extra credit for sending the same bid twice!
3. Don't Read the Project Description, or visit the site you're expected to be writing a proposal for. Bid on pieces that weren't up for bid. Ignore the direct questions asked of you in the project description.
Honorable mention:
To the guy in Florida who tallied up his hours, then tacked on the 5% overhead that this particular freelance site charges (expected to be borne by the freelancer, not the employer), then made a point of demanding payment up front. Good way to break the ice and build confidence.
4. Don't, under any circumstances, write any text specific to the project at hand. If you somehow feel obligated to write a sentence or two to show how much thought you've put into your proposal, be sure to tack it on to the bottom. After your cheesy promo text, after that list of 50 links to your class projects, and after that special offer of a 20% discount if I ACT NOW because it's your Fall Sale!
5. Quote an exact dollar figure, after having completed the steps above. This is a nice final touch, as it shows that even though you didn't bother to skim through the project description before sending off your automated response, you still have a firm grasp on our needs.
How to do it right:
Freelancing on Guru, Elance, Rentacoder, etc. is all about first impressions. Your prospective client will no doubt be swamped with dozens if not hundreds of proposals, and you need to find a way to make yours stand out. This is surprisingly easy to do in the current climate, as even the slightest hint of professionalism will do it. Your competition is all trying to shout over the top of each other, thinking that's how you get heard. But you know what? Shouting "I'm an Idiot" at the top of your lungs may in fact get you heard, but it won't usually get you hired.
So here's what you need to do if you actually want to hear back from a potential client: Write a simple, three paragraph proposal, from scratch. Spend the whole first paragraph summarizing the project and explaining why it's something you're good at. Go over your planned approach in the second paragraph, and end it with a ballpark estimate if the project description gave you enough information to do so. And finally, wrap up with a little pleasantry, and give the client a way of learning more about your operation and contacting you if they want to proceed.
Sounds simple, eh? Well, I used to do a lot of freelancing, and got a lot of work with that approach. Trust me, nobody else is doing it. Behave like a professional, and you'll clean up! Labels: best practices, freelancing, frustration, software
I have a new candidate for the Most Infuriating Feature Ever. It's an innocuous little part of the source control implementation for Visual Studio.NET.
Let's say you're working on a new and risky set of changes to a project in Visual Studio.NET. You set off and start breaking things in existing files, safe in the knowledge that if you can't make it all work in the end, you'll be able to roll everything back in source control. Cut to half an hour later: things are hopelessly broken, and it's apparent that you're heading in the wrong direction. Best to cut your losses and start again from scratch, so you right click the solution in VS.NET and select "Undo Checkout" to roll everything back. As if to confirm, the following dialog pops up:
Note the default option. It's not really very descriptive, but what it's actually saying is "Roll these changes back in a half-baked way that virtually guarantees I'll accidentally re-implement them all the next time I modify any of these files."
You see, what it's doing by default is leaving a copy of your broken code sitting on your local machine. Forever. Getting latest won't even overwrite it. Neither will checking the file out. So the next time you want to modify that file, it will pull up the changes you thought you had un-done and not even warn you about it. You'll make some innocuous little text modification, check in, and find that the whole application is broken.
This is just one of many hazardous dialogs that developers running VSS have to tiptoe their way past every day. Dialogs with FIVE BUTTONS, only one of which does what Source Control was intended to do, and that one is hidden second from the left. It's enough to make you want to switch over to subversion.
Oh, and in case you're wondering, the correct response (and the only one that anybody should ever use) to that dialog above is to tick the "Replace your local file…" radio button, check the box, and hit OK. Any other combination and you're screwed.
ps. We're currently rebranding Twiddla as a design collaboration tool for distributed teams. If you're in the industry, we'd love to hear your feedback!
Labels: best practices, development, frustration
I didn't get anything done last week. Nothing at all. Most of my days were spent reading random stuff on the internet, making minor tweaks to Blogabond, and obsessing over traffic stats for Twiddla (why did 5000 people suddenly show up from StumbleUpon in one day???)
This week, on the other hand, I've been on fire. In the last 4 days, here's what I've accomplished:
- Built a Reddit clone from the ground up for Rootdown's soon-to-be-live Clinical Pearls section.
- Built a Google-Maps powered acupuncture chart, cut up 3000 tiles for it, incorporated effective Lat/Lon coordinates into Rootdown's database, and built a little Ajax data entry tool to drag & drop acupuncture points and meridians onto the chart.
- Reworked the Photo uploading and Photo management pieces of Blogabond.
- Tore out and streamlined the installation process for Regressor.NET
- Wrote this lame article
Trust me, that's a lot of stuff.
I've noticed this same pattern happening over and over again. I think of it as Sprinting, and I think I'm getting better at harnessing it. There are a few factors that play into it, but I think the key is knowing that I have an entire day to Sprint on whatever it is that I want to do. Knowing Absolutely that the door to my office won't open, the phone won't ring, and no little IM popups will bother me for the Next Twelve Hours. Knowing that I'm free to get as deep into what I'm doing as I need to get whatever I'm doing Done and Done for good. Those are the days that I get the most accomplished.
 Sprinting for a different reason: Team Expat, Running with the Bulls in 1998
Another thing that seems to help, at least for me, is to have more than one ball in the air at a time. When I've only got one project, I seem more content to move slowly, check my email, read the occasional blog, and essentially stuff my productivity. But when I've got 3 things the Need to Get Done, and there just aren't enough hours in the day to do it all, I find that I work a lot faster.
Better still is to have something ELSE that I should really be doing. You should SEE the days I spend blowing off paid work to Sprint on a side project.
But it more than just blowing off one project for another. The real advantage of having more than one project going at a time is that if I get blocked for any reason, I can switch over to another project and continue Sprinting at the same pace. If I'm motivated to move fast, it doesn't really matter all that much what I'm moving on, provided I'm moving. If I can ignore the little bottlenecks and keep Sprinting until the inspiration fades, I can get a lot more done overall. If I only had a single thing to work on, any minor distraction, such as missing graphics from a screen designer, could derail me and send me off to check Reddit (and thus get stuck there for six hours.)
I don't think that any of this is new. It's common knowledge that developers tend to work in bursts. I guess the difference for me is that I'm starting to work on ways to facilitate those bursts. To keep them going once they get started. To finally look up and find it's dark out, and I haven't eaten for 16 hours and wow, did I really get all that done in a single day??? That's where I want to be. That's Sprinting. Labels: best practices, Development Pace, productivity, software
Back in my Contractor days, I would occasionally take a job bringing a bunch of C++ guys up to speed in C# and ASP.NET. Invariably, I would have to break them of old habits that they had picked up back in the days when memory and hard drive space were expensive, and applications had to run in real time. Most of these little battles were quickly won, so flat files were replaced by relational databases, bit masks gave way to association tables, and data access code was pulled out into its own layer.
But one thing never went over well. Performance. Speed is largely irrelevant for a web application. Sure, it's important that your thing run fast, but there are a half dozen other things that are more important for a big web application. This is difficult to hear if your major skill is writing inline assembly for critical routines, but it's still the truth. Readability, Debugability, Maintainability and Development Pace are much more important than raw speed.
To deal with this rift, I would ask the developers to list out the most important qualities of a piece of software, and to rank those qualities in order. I've hinted at my answer above, but I'll take a few minutes to list them out below. Everything you see in the list is important, but the things toward the top are relatively more important than the ones towards the bottom. For what it's worth, we're talking about Web Applications here, so clearly this list does not apply to Game Development or even Windows Apps. Here goes:
Readability
In my mind, this is the single most important quality of a piece of software. Assuming your thing is going to be around for a while, you're always going to need to return to a given piece of code from time to time and make modifications. The faster you can read and understand what's going on, the sooner you will be able to start making modifications and adding new functionality. Better still, if you can quickly figure out what the code is doing and why, you'll be less likely to break anything in the process.
Debugability
Your code is going to break. Often. That's how it goes, so you'd better structure things so that it's easy to step through and figure out what's going on. That means declaring variables instead of stringing together 17 object methods on a single line. That means using real IF/THEN/ELSE blocks with squiggly brackets instead of inlined immediate if's. And it means thinking twice before committing to some automagically generated database framework that sniffs out all your column names, writes its own SQL, and keeps your data in ArrayLists of ArrayLists.
Keep your design simple enough that any exception will drop you into the debugger looking at a single line of code that does a single thing. Even if it turns out it's doing that single thing wrong, at least you'll be able to find and fix it.
Maintainability
Over time, new features are going to get added and old features are going to get dropped. Some of those new features will be stupid ones, with dorky business logic that rubs the fur the wrong way in your elegantly designed class structure. You want to be able to make those changes quickly, without breaking anything else. This means you need unit testing. You'll also want to refactor large sections of your backend to work in ways you had never anticipated, and you'll need to propagate those changes all the way out to the client code. For that, you'll need even more unit tests (and some good tools), but also you'll need an architecture that doesn't fall apart when you rip chunks out of it.
Development Pace
Modern applications are big and complicated. It doesn't matter how nicely written your thing is or how many simultaneous users it can support if you never manage to get it out the door. If you want to get your application shipped, you're going to need to put out a ton of code in a hurry. That means you're going to need the best tools available, and the most productive environment that you can find.
Side Note: PHP might seem fast if you've never seen the alternatives, but let's see how many Ex-Ruby-on-Rails and Ex-ASP.NET guys you can find doing PHP development by choice.
Keeping the above points in mind, you're going to want a development framework of some description. Here at Expat, we've rolled our own specifically to keep us fast without sacrificing Readability, Debugability, or Maintainability. I'd recommend doing the same, but there are any number of 3rd party frameworks out there that might fit the bill. Just make sure you keep those three qualities in mind when you are evaluating any new framework.
Scalability
At some point, your thing is going to get popular. Actually, chances are it won't, but you shouldn't architect your thing to preclude the possibility that people might start using it in the Millions. So how do you pull that off without undoing all those Important Things further up this list? Simple. Just be aware that one day you might need the ability to add more database and web servers to the mix. Add a few little abstractions such as a Database Connection Factory, and a Session wrapper that you can replace someday with something BEEFY. For now, they don't have to do anything fancier than wrapping the existing stuff in whatever framework you're using. But if you're diligent in using these wherever you would normally use the framework components, you might end up saving yourself a lot of headache down the road.
For the most part though, don't worry too much about scalability. Having a million people that want to use your thing on a single day is a good thing. If you've done a little homework, you'll work things out when the time comes.
Performance
Computers are fast. Seriously, computers are faster than you think. If you try to imagine which piece of your application is slow, you're probably wrong. I once worked with a developer who spent the better part of 6 months hand optimizing an algorithm to do fast fuzzy string comparisons. It turned out that the server doing the text processing was only spending about 10% of its time actually processing text (even with a simplified, non-optimized algorithm), and 90% of its time battling database locks to get the results put away. He could have figured this out in one day with a profiler, and then spent a few hours tweaking database indices and optimizing queries. Instead, he spent half a year solving the wrong problem.
So yeah, keep a profiler handy, and if you see something that is obviously taking a lot of extra time, go ahead and fix it. But don't spend too much time sweating performance issues. At least, wait until they present themselves as issues before you start sweating them!
Life imitates Rant...
As I write this, Blogabond (one of my diversions from real work) is starting to show its first signs of scaling pain. Every once in a while, a misbehaved crawler will swing by and hit it 500 times in a second, causing SQL Server to time out on a specific long-running query. This is a good thing in my mind, as it gives me a chance to tackle a potential bottleneck before it starts affecting real users.
Still, Blogabond has been up and running for almost two years now, and it is only now that I'm having to think about performance at all. Those other qualities though: Readability, Debugability, Maintainability, Development Pace. I'm seeing benefits from them every day. Labels: best practices, Debugability, Development Pace, Performance, Priorities, Readability, Scalability
If you've been around software for any length of time, you've probably heard the term "Eating your own Dogfood." Other people have given better definitions of this than I can, but basically it means using your own application in house.
So if your company is developing a little web-based word processor that it hopes will get bought out by Google, you would be well served to force your management and marketing teams to use that little word processor in lieu of Microsoft Word. The idea being that you'll quickly discover about 100 new top-priority bugs in your thing that are stopping the CEO from being able to write a simple letter to his lawyer.
Now once you start thinking about your new thing in terms of Dogfood, you are immediately given a new goal for development: "We've got to get this thing to Dogfood." Meaning, our stupid new mail client has been in development for 3 years now, so why are we still using Outlook internally?!??
The Idea
 Working with a distributed team is hard. I hate to say that, since it's sort of our thing here at Expat Software, but it is true to an extent. We have a design team up in Portland doing mockups for the new Rootdown look and feel. Down in LA, we would look at the designs that came in, mark them up and send them back to Portland, sometimes calling the designers on the phone, sometimes getting in touch via chat. It was taking forever. Just explaining the concept of "This button isn't necessary, and could you move the logo down to here" would take a couple days to get across.
Over dinner one night, we were griping about this process, and somebody suggested WebEx as a solution. "Yeah, but WebEx sucks." "And it's expensive." "And it sucks." And yeah, all the real-time collaboration software out there really does suck. It's all got too many hoops to jump through to get up and working, and it's all too bloated with stuff you don't really need. All we want to do is draw lines on a web page. Why should that be hard?
And that got us thinking. Why should it be hard? What would you need to do something like that in a Browser? Not much, really. All the technology is there. Hell, we've done most of what you'd need before. Like, back in 1998! It's got to be easy to reproduce that today.
Thus, the seed was planted.
The Provocation
A few weeks back I wrote an article that touched on some of the effort we've put into our backend framework here at Expat. It got a lot of feedback, some of which asked how we could possibly be productive with such an expansive backend to maintain. This really took me by surprise, because our experience has shown that we're a lot faster now than we used to be before we had that infrastructure in place. In my mind, that framework is the reason that I was able to put up a site like Blogabond in a few months of my free time, while it has taken other companies in the same space over a year to put up a similar site with a dozen developers and a million dollars in venture capital.
So hey, if we're so fast and all that, and this little collaborative marker-upper is so easy, why can't we just build it? Like fast, even?
Yeah. How about we set aside a day to do a little proof of concept and see how far we get.
The Day
10am
Technical proof of concept
First off, there are a few fundamental questions that need answering. What do we absolutely need? Can we layer a DIV over an IFrame? In every browser? Can we put a transparent-backround Canvas in that DIV and draw on it? Even using the IECanvas hack? Can we hook mouse events to it? Cool. We're in business.
 12pm
Silly Drawing App
Next up, we needed a quick and dirty little drawing application for marking up photos and web pages. One day we'll want to put some more effort into this piece, but for now all we needed to do was draw little scribbly lines on the canvas.
2pm
The Proxy
We needed a simple proxy of some sort to show web pages on that IFrame, to avoid annoying Cross-Site-Scripting issues. It would need to mess with the HTML somehow to ensure that any clicks on those pages got redirected back to the proxy.
For the time being, we just grabbed an open source ASP.NET proxy tool and plugged it in. (This got swapped out about 2 days later for a home built version that worked a lot better for what we were doing.)
3pm
First Cut at the Backend
This was still a proof of concept, so we just mocked up a few basic objects and stored them in static memory on the server. Throw in a few little web services that the client can call to talk to the backend, and we're off to the races. (This piece was blown away and rebuilt the next morning to use a real data layer, but it kept us on task and out of the mire until we had the rest of the thing working.)
 5pm
Testing
First multi-user twiddle session. Basically, 3 guys in one room drawing words and pictures over Google's home page. I'm really glad we don't have screen captures of most of the things we were drawing.
6pm
Chat
Somebody asked for Chat, so we threw in a little ghetto chat window. Nothing fancy, but at least you could see what people were saying (but not who was saying it!)
6:30pm
Refactoring
Much polishing and refinement of the original concept. And we added a few more features like being able to choose what color you were drawing with.
7pm
Outside users
At this point, things were looking basically usable, so we invited a few friends from the outside to try it out. Lots of childish graffiti was drawn, and a few more major issues were uncovered.
 8pm
Dogfood
Finally, with the last showstopper issues out of the way, it was time to get the design team in Portland on to the site. Somebody pulled up Rootdown.us in the main window and we all started drawing lines on it and suggesting things to nudge around.
Holy wow. We were using this thing to do real work!
The Analysis
So how did we pull it off? Simple. We cheated.
The nice thing about Dogfood is that it doesn't have to be a finished product. It just needs to be useful for the task you're trying to accomplish. Sure, it needs to be stable enough to get stuff done, and it can't go losing critical data. But mostly, it just has to limp along well enough that you can start using it to do real work.
Since we weren't trying to build the whole product all at once, we were able to cut a few corners to get that Dogfood version up as quickly as possible. You'll notice that we had to go back the next day and tape on a new back end, and that we had to throw out the crappy third party proxy we were using. Better still, in the version we used that night, you couldn't even log in or create new WhiteBoard sessions. We had a single session, and a hard-limit of 3 users. There was still loads of work to be done before we could let the general public see the thing.
Another thing we had going for us was a really clear vision of what we were trying to accomplish. That vision was small enough to fit inside a single brain, and compact to the point that we could throw a single programmer at it for a day to get it implemented. You get a huge speed advantage with a team size of one. I doubt we would have finished in a day had we had three guys working on it.
One Week Later
 So here we are, a week later, with a big pile of bugs and feature requests in the hopper. All found through simply trying to use the application to get work done. We're on the thing every day reviewing designs with the guys in Portland, and every day I'll spend another couple hours tweaking the thing to be less annoying and more useful.
 With all the positive feedback we've gotten from friends and family, we're starting to think about opening Twiddla up as a public Alpha. Maybe even turning it into a real product at some point.
Lucky for us, we have this little blog with its little readership of early-adopter types. I'd encourage anybody reading this to go to www.twiddla.com and put our little whiteboard app through the paces. Naturally, we'll want to hear honest feedback about what you like and dislike about the thing. And hey, it's only been alive for a week now, so you're not going to hurt our feelings by telling us that it sucks.
We know it sucks, and we have a good idea as to why. That's the power of eating your own dogfood. With luck, maybe you'll have ideas to make it suck less! Labels: best practices, distributed team, dogfood, software
Take a look at this little piece of code. It looks pretty innocuous, like it was taken straight from "Teach Yourself ASP.NET in 21 Days". Pull a list of Trips out of the database for a given user, and bind it to a select list. Nothing fancy. Teaches you a little bit about ADO.NET and databinding all in one place.
Figure 1., Book Sample
public class ExampleCode : System.Web.UI.Page
{
protected HtmlSelect selTripID;
private void Page_Load(object sender, System.EventArgs e)
{
if (!IsPostBack)
{
DataSet ds = new DataSet();
SqlConnection connection = new SqlConnection(
"server=OurProductionServer;database=Payroll;
UID=jimmy;PWD=j1mmy;");
SqlDataAdapter adapter = new SqlDataAdapter(
"select * from Trip where UserID="
+ Request["UserID"], connection);
adapter.Fill(ds);
selTripID.DataSource = ds;
selTripID.DataTextField = "TripName";
selTripID.DataValueField = "TripID";
selTripID.DataBind();
}
}
}
Imagine my surprise, however, when I walked in to a small software shop recently and found a whole project written with code like the above. What were these guys thinking? Are they seriously relying on this fragile, unmaintainable mess in a real software product?
And then it dawned on me. Maybe nobody had ever told them that the little examples in the book are just that: Little Examples. For teaching purposes. Never intended for use in the real world. Come to think of it, it doesn't even tell you that in the book. It aught to be in block capitals across the cover of the book:
WARNING: DO NOT PASTE THE SAMPLES FROM THIS BOOK DIRECTLY INTO PRODUCTION SOFTWARE!!!
Somehow, it seems that this message never got through to a substantial portion of the software industry. Every time I see a "Senior Developer" writing ad-hoc SQL or referencing a hashtable with a string I just want to cry.
So what do we do about it? I guess we try to get the message out. Here is some code I copied out of the Blogabond source that is functionally equivalent to the above:
Figure 2., Production Sample
public class ProductionCode : System.Web.UI.Page
{
protected HtmlSelect selTripID;
private int _userID;
private void Page_Load(object sender, System.EventArgs e)
{
if (!IsPostBack)
{
_userID = StringConvert.ToInt32(
Request[User.Columns.UserID], 0);
if (_userID != 0)
{
PopulateTripList();
}
else
{
// bail gracefully...
}
}
}
private void PopulateTripList()
{
selTripID.DataSource = Trip.GetByUserID(_userID);
selTripID.DataTextField = Trip.Columns.TripName;
selTripID.DataValueField = Trip.Columns.TripID;
selTripID.DataBind();
}
}
Short and to the point. And obviously only the tip of the iceberg. This bit of code goes deep, but we can learn a few things just by looking at it:
- We're using a Data Layer of some sort. Somewhere in the back end lives a class that is wrapping a stored procedure for me. There's a ConnectionFactory back there someplace that knows to hand me a connection to the Production database because of a server setting.
The code that's handing me the DataSet I'm binding to can be reused by any page in the project, so I know that little select statement is only living in a single place. In fact, everything that touches that table is sitting in a single class back there. So if something changes in the schema I won't have to go hunting through the client code to fix it.
Best of all, because we've separated the database code out into its own place, we can drop all the boilerplate CRUD into CodeSmith templates and let it auto-generate itself from the database schema. In the case of Blogabond, I can flip a switch and watch as 50k lines of boilerplate C# and SQL in our backend gets blown away and recreated in about 30 seconds. And since it's also generating all the boilerplate Unit Tests for all that code, we can be sure that the datalayer and the database line up correctly. So if junior dev Jimmy comes by and unchecks a not-null constraint in Enterprise manager, we'll see the continuous build fail on an integrity check a few minutes later.
- We're using Enums instead of inline strings to reference our column names. This may not sound like a big deal, but it buys us a couple things. First, we've abstracted out the concept of the Typo. There is simply no way we can misspell "TripID", because the compiler will catch it for us. It will sit there underlined in red if we even try. We don't even have to remember what columns we have available, since Intellisense will tell us when we hit ctrl+space.
Second, and bigger, is that we buy the freedom to monkey with our tables and never worry that we're causing runtime errors someplace. I can rename the column "TripName" to "BlogName" if I want, re-generate the database wrappers, and watch as the project fails to compile until I fix the references. That is some powerful stuff. And what happens if I forget to re-generate the wrappers for that table? The continuous build will fail in about 5 minutes, since the unit tests that check to make sure the wrappers match the schema will fail.
Where do we go from here?
Copy and paste code reuse is bad. Everybody knows that. Ad-hoc SQL is bad. Everybody knows that. Inline strings are bad. Everybody knows that.
At least that's what I thought. But you know what? They don’t. And they should. And it's our job to tell them. Labels: best practices, example code, software
|
|