Expat Software
A laptop, some ideas, and a one-way ticket.
 
 

Tuesday, May 27, 2008

The One Rule of DHTML Programming

I just don't get it.

How can so many smart people be so collectively bad at something as simple as Javascript on a web page?

It's just not that hard. And yet, not an hour goes by when I'm not stopped in my tracks by at least one javascript error. And it's especially sad because many of these errors are coming from well known sites, with huge development budgets and plenty of good talent that really should know better. Observe:

That was just a ten minute sample of browsing today.

The One Rule of DHTML Programming

Look, it's not that hard to do this stuff right. In fact, here is everything you'll ever need to know about Dynamic HTML Programming with Javascript:

Test EVERYTHING before you reference it.

That's it. Simple. Every little scrap of code you write needs to live inside its own little IF block that tests to make sure that the things it's expecting to interact with really exist. Here's how:

BAD:
gbN2Loaded.style.display='none';
Good:
if (window.gbN2Loaded)
{
  gbN2Loaded.style.display='none';
}
BAD:
document.getElementById('myDiv').innerHTML
     = 'stuff';
Good:
if (document.getElementById('myDiv'))
{
  document.getElementById('myDiv').innerHTML
       = 'stuff';
}

I don't care that Google's API Reference told you to put <body onunload='GUnload()'> into all your pages. That's just example code, and it's not intended to be used in the real world.

Real World Javascript will need to survive in dozens of strange browser environments that do things in strange unexpected ways, and as soon as you get it working right, Junior Dev Jimmy will accidently include it on every single page on your site and suddenly it won't be able to find the things it needs to live. When that happens, it needs to quietly stop trying to do stuff instead of throwing error messages all over the place.

What you need to do about it

Ok, cool, you've fixed everything, but you're not done yet. There's one more thing you need to do right this second. You need to turn on those annoying Script Error popups in both Internet Explorer and Firefox, and you need to keep them on from here on out. Don't just do it for your own machine, but for every computer owned by every employee of your company.

Yes, I know that you turned them off on purpose because they make the internet basically unsurfable, but most casual users of your site will have them on by default. That means that most casual users will see every single little script error that your site throws at them, and they won't like it. Those errors are pissing off real people right this minute, and you need to know about it. If you arrange it so that they start pissing off you and your co-workers too, you just might find the incentive to get rid of them once and for all.

Labels: , ,

Saturday, May 03, 2008

No Magic

PHP used to have a cool little feature where it would automatically detect single quotes in text strings and escape them for you whenever they needed to be. It was called Magic_quotes_runtime. Maybe you've heard of it. It was a disaster.

Countless developer hours were spent trying to chase down mysterious runtime errors where single quotes were either introduced, doubled up, or removed, causing disastrous crashes, data corruption and so much untold havoc that the feature was deprecated and eventually removed from PHP entirely.

You would think that people would have learned their lesson.

People are, by and large, dumb. We make the same dumb mistakes over and over again because we didn't bother to do any research or read about the last time that somebody tried whatever stupid idea we just re-invented. As a result, we have development frameworks and tools like Hibernate, ASP.NET's SmartNav, and Rails' ActiveRecord, all trying to magically solve problems that weren't very hard in the first place, and silently making a lot of people's lives a lot harder without them even realizing it.

The big problem with Magic tools is that they work fine the first time you try them. "Wow!", you say, " It posted the page back and scrolled my browser back down to the Submit button!" So you turn that feature on for all your pages and start to trust it. You get used to it. You take it for granted. You forget you're even using it. Then suddenly something weird starts happening with one of your pages and you can't figure out why.

Examples of this sort of side effect abound, but nobody yet has taken a stand and done something about it. How many developer hours have been lost trying to figure out what magical SQL statement was running behind the scenes and only “Hibernating” half of an object? How many CPU cycles have been squandered (and slanderous blog entries written) because some poor developer didn’t realize that ActiveRecord was hitting the database three times for every single row in that recordset? Are we really so scared of Outer Joins that we allow ourselves to be subject to this torment?

I’ll leave you with an axiom that I’ve been telling developers for years without much success. Call it Kester’s Caution:

Never use any language feature that describes itself as "Smart" or "Magic." Such features will invariably be trying to abstract out some behavior that is not that hard to deal with anyway, and will make any number of incorrect assumptions about your application that will result in strange behavior cropping up that could possibly be described as "Magic", but certainly would never be labeled "Smart".

Labels: ,

Friday, March 21, 2008

6 million hits a day. Time to think scale!

Twiddla has been getting a ton of attention this week. We picked up the Technical Achievement award at SXSW Interactive, and have been getting a bunch of good press ever since. 25,000 people have signed up for the service since the award was mentioned, with 7,500 of those signups happening in a single day. It's about to get good.

For me though, it's been even better. We're finally getting enough traffic to start thinking about scaling issues. You might remember an article that I wrote a few months back, where I told people not to sweat Performance and Scaling issues too much, but rather to focus on Readability, Debugability, Maintainability, and Development Pace. The idea was that getting your product to market quickly and being able to move fast if necessary are more important than having the Perfect Dream System that takes forever to build. Of course, the implied point was that when and if that Big Day came, you'd be able to move fast enough to deal with Scalability and Performance concerns as they appeared.

On March 12th, 2008, I got to see first hand whether I was talking out my arse…

3/11/2008 7:00pm: 150 signups/hr, 50 hits/sec, 0-5% CPU

It's the day after the awards, and the first brief announcements are out. Traffic has been building steadily all day, but we've seen worse. The only crisis at the moment is that we don't yet have a Press Kit, so we're seeing writeups with the old logo and screenshots from the old UI. D'oh!

3/11/2008 11:00pm: 350 signups/hr, 120 hits/sec, 1-9% CPU

Japan wakes up. The Asian press really liked us, so we saw a big spike in users from China and Japan the first few days. The sandbox is pretty clogged, and with 30 people drawing simultaneously it's starting to tax people's browsers. Every once in a while, somebody navigates the sandbox over to a porn site, and people write our support line to complain. We're wiping the sandbox every 5 minutes, but it's still not acceptable. Gotta get a handle on that.

3/12/2008 9:00am: 300 signups/hr, 100 hits/sec, 1-6% CPU

The sandbox is completely overloaded. There are 100 people in there, which is too many people communicating at once for any medium to really handle. Imagine 100 people drawing on a real whiteboard at the same time, or 100 people talking over each other on a conference call. It just doesn't work. To bring a little order into the picture, I fire up the Visual Studio.NET and add a little switcher that will direct traffic to any one of 5 sandboxes, each one holding 8 users. Throw that live, and now there are 5 overloaded sandboxes.

3/12/2008 9:30am: 500 signups/hr, 300 hits/sec, 3-15% CPU

I bump up the sandbox count to 10. Then think better of it and bump it up to 20 before pushing. Then think better of THAT and add a new page to show users in case all 20 of those sandboxes fill up. Push that live.

3/12/2008 9:41am

Testing out the above changes, I am immediately redirected to a page saying "Sorry, all the Sandboxes are full." Let me restate that: From the time I pushed those changes live to the time I could test them out, 160 people had beaten me into the sandboxes. Wow.

3/12/2008 10:00am: 700 signups/hr, 500 hits/sec, 5-20% CPU

Looking through the error logs, I'm starting to see our first concurrency issues. These are the little one-in-a-million things that you'd never find in test, but that happen every ten minutes under load. They're mostly low-hanging fruit, so I spend the next hour patching and re-deploying until the error logs go silent.

3/12/2008 12:00pm: 600 signups/hr, 400 hits/sec, 5-17% CPU

I'd been doing all of this from my sister's house up in Ft. Worth, who I had supposedly been visiting for a couple days, but whose house I had been mostly using for an office (thanks Lisa for tolerating that, and I promise to get out and visit sometime when I'm not trying to launch a new website!) Now I had to hop in the car and drive back to Austin to fly home. Our trusty server will be on its own for the next 12 hours, taking the beating of its life. I won't even know if it goes down.

3/13/2008 4000 signups/day, 100 hits/sec, 3-10% CPU


Twiddla Art
Back in a stable place, and ready to deal with the flood of feedback emails we've been getting. This part is fun, since most people have nice things to say, and it becomes readily apparent what features everybody wants to see. Nothing has broken, so I actually have some time to put a few minor features live. The "Wite-out" button was added this day, I think, and I re-did the way we handle snapshots and image exporting.

3/14/2008 3000 signups/day, 100 hits/sec, 2-5% CPU

I implemented a fix for the last little concurrency bug that we'd been seeing. Then, while profiling that fix on the server, I noticed that TwiddleBot was flipping out. TwiddleBot is the little service that runs the Guided Tour feature, and is also responsible for clearing out the sandboxes from time to time. Turns out, he was also pounding the database 20 times a second, asking for instructions. Hmm… Chill, TwiddleBot. Pushed a fix for that, and suddenly CPU usage dropped to zero. Like, ZERO! Every 5 seconds, it would spike up to 1%. Cool. I think we're gonna be able to scale this thing…

One week later, ~1000 signups per day, 50 hits/sec, 0% CPU

In the end, we came through our first little scaling event rather well. We were actually a bit over-prepared. Our colocation facility (Easystreet in Beaverton, Oregon) had a couple extra boxes waiting to go for us, and I had taken the time a week earlier to write up and test a little software load balancer to allocate whiteboard sessions to various boxes when needed. In the end, we didn't get to try any of that out. Hell, we never spiked the processor on our one server over 50%. I'd love to congratulate myself for the design choices I made all those months back when I wrote that article, but I think it's still too early in the game to conclude that we'll really scale when we ramp up to the next level.

Still, it's worth noting that everything in Twiddla was built using the simple, Readable, Debuggable backend that we've been using on our more pedestrian sites for years, and it held up just fine under traffic. When it turned out that parts of that backend needed refactoring to handle the kind of concurrency we saw last week, it was a simple 5 minute task to crack open the code, find what needed to change, and change it.

Readable, Debuggable, Maintainable. That's the plan. Thus far, that has enabled us to keep on top of any Performance and Scalability issues that have come along. With luck, things will continue to work that way!

Labels: , , ,

Wednesday, February 06, 2008

A Naive Bayesian Spam Filter for C#

Human-powered comment spam has been piling up recently at Blogabond, so I spent a few hours putting together a C# implementation of Paul Graham's Naive Bayesian Spam Filter algorithm.

You can find a nice long-winded article along with the source code over at The Code Project. Let me know if you find it useful. Here's a link:

http://www.codeproject.com/KB/recipes/BayesianCS.aspx

Labels: ,

Monday, November 05, 2007

Roll your own Web Stats for Amazon S3

Edit:
Web Log Analysis and Statistics for Amazon S3
Web Stats for Amazon S3
This was written before we launched S3stat, a service that parses your Amazon S3 server access logs and delivers usage reports back to your S3 bucket.

So if you're not interested in the technical details, and just want web stats for your S3 account, you can head over to www.S3stat.com and save yourself a bunch of hassle.

Amazon's Simple Storage Service (S3) is a great content delivery network for those of us with modest needs. It's got all the buzzwords you could possibly want: geo-targeted delivery, fully redundant storage, guaranteed 99.9% uptime, and a bunch of other stuff that you could never pull off on your own. And it's dirt cheap.

Of course, there's always a catch, and in S3's case you'll soon find that your $4.83 a month doesn't buy you much in the way of reports. With some digging around at Amazon's AWS site, you can find out how much you were charged last month, but that's about it. (OK, If you're persistent, you can download a CSV report full of tiny fractions of pennies that, when added together, tell you how much you were charged last month.)

The Motivation

I love my web statistics. I'm up and waiting at 12:07am every morning for the nightly Webalizer job to run so that I can see how many unique visitors came in to Blogabond today (1227), and what they were searching for (tourist trap in Beijing). I've been hosting my user's photos out at S3 for a few months now, and though I've watched my bandwidth usage drop through the floor, I've also been missing my web stats fix for all those precious pageviews. Something had to be done. I started digging around through Amazon's AWS docs.

It turns out that you can actually get detailed usage logs out of S3, and if you're willing to suffer through some tedium, you can even get useful reports out of them.

Setting it up

Turning on Server Access Logging is just about the easiest thing that you can do in S3. If you've ever tried to use Amazon's APIs, you can translate that to mean that it's hard. It takes two steps, and unless you're looking at a Unix command prompt, you'll need to write some custom code to pull it off. Here's what you do:

1. Set the proper permissions for the bucket you'd like to log. You'll need to add a special Logging user to the Access Control List for the bucket, and give that user permission to upload and modify files.

2. Send the "Start Logging" command, including a little XML packet filled with settings for your bucket.

The nice people at Amazon have put together a simple 4 page walkthrough that you can follow to accomplish the above. I've run through it, and it works as advertised

Parsing the logs

Now we're getting to the fun part. Remember above where we noted that S3 has servers living all over the world delivering redundant copies of your content to users in different countries? Well now we get to pay the price for that. You see, Amazon sort of punted on the issue of how to put all those server logs back together into something you can use. Instead, every once in a while, each server will hand you back a little log fragment containing anywhere between 1 and 1,000,000 lines of data. Over a 24 hour period, you can expect to accumulate about 200 files, ordered roughly by date but overlapping substantially with one another.

So, now in order to get a single day's logs into a usable form, we get to:

3. Download the day's logs. This is simple enough, as the S3 Rest API gives us a nice ListBucket() method that accepts a file filter. We can ask for, say all files that match the pattern "log/access_log-2007-10-25-*", and download each file individually. We'll end up with a folder containing something like this:

10/30/2007  02:13 PM            21,380 access_log-2007-10-25-10-22-37-2C695527C7FEAEE5
10/30/2007  02:13 PM            19,653 access_log-2007-10-25-10-22-37-8FFF80109E278103
10/30/2007  02:13 PM            15,829 access_log-2007-10-25-10-23-24-D97886677E5A8670
10/30/2007  02:13 PM           185,195 access_log-2007-10-25-10-24-11-7F5172BFA139167D
10/30/2007  02:13 PM            94,795 access_log-2007-10-25-10-27-14-3EDC4E89A03E96EB
10/30/2007  02:13 PM             3,812 access_log-2007-10-25-10-32-20-DD96FC8F8B880232
10/30/2007  02:13 PM           121,863 access_log-2007-10-25-10-33-59-A44E699EE741CEF7
10/30/2007  02:13 PM            51,315 access_log-2007-10-25-10-39-52-313F98B8F52AA150
10/30/2007  02:13 PM            34,984 access_log-2007-10-25-11-18-37-DE9AB5D324881BC2
10/30/2007  02:13 PM             8,451 access_log-2007-10-25-11-22-16-BC5BCE4A49C4EC44
10/30/2007  02:13 PM            10,271 access_log-2007-10-25-11-22-54-54F77DE85AD20F84
10/30/2007  02:13 PM            14,949 access_log-2007-10-25-11-23-28-08D3DED923404EA5

4. Transform columns from S3's Server Access Log Format into the more useful Combined Logfile Format. In the Unix world, we could easily pull this off with sed. In this case though, we might actually want to process each line by hand, since we still need to...

5. Concatenate and Sort records into a single file. There are lots of ways to accomplish this, and they're all a bit painful and slow. When I did this myself, I wrote a little combined transformer/sorter that spun through all the files at once and accomplished steps 4 and 5 in a single pass. Still, there's lots of room here for speed tweaking, so I'll leave this one as an exercise for the reader.

6. Feed the output from Step 5 into your favorite Web Log Analyzer. This is the big payoff, since you'll soon be looking at some tasty HTML files full of charts and graphs. I prefer the output produced by The Webalizer, but there are plenty of free and cheap options out there for this.

Wrapping up

And that's about it. Now all that's left is to tape it all together into a single script and set it to run as a nightly job. Keep in mind that S3 dates its files using Greenwich Mean Time, so, depending where you live, you might have to wait a few extra hours past midnight before you can process your logs.

All together, this took me a little more than a day of effort to get a good script running. It wasn't easy, but then nothing about administering S3 ever is.

Epilogue (the birth of S3STAT.com)

I went through this pain and wrote this article about a week ago. Before posting it, it occurred to me that hardly anybody will ever actually follow the steps that I outlined above. It's just too much work, with too little payoff.

What the world needs is a simple service that people can use to just automate the process. Type in your access keys and bucket name, and it will just set everything up for you.

Let's see... People need this thing... I've already built it... ...umm... Hey! I've got an idea!

Web Log Analysis and Statistics for Amazon S3So yeah, get yourself over to www.s3stat.com and sign up for an account. It's a service that does everything I described above, and gives you pretty charts and graphs of your S3 usage without any setup hassle. At some point I'm going to start charging a buck a month to cover the bandwidth of moving all those log files around, but for now I just want to get some feedback as to how it's working. Let me know what you think!

Labels: , , , ,

Monday, October 15, 2007

How to do all that website optimizing stuff that Yahoo recommends if you're running ASP.NET and storing your content at Amazon S3

If you've come within 30 feet of the internet this last month, you'll have come across this list of best practices at least a dozen times. Everybody seems to be writing about it and linking to it and building little tools that tell you you're not doing it right.

Most of the stuff on that list is low hanging fruit. You can spend 5 minutes in IIS, flipping compression on and telling all your /images/ directories not to expire content until we're all driving flying cars, and suddenly you'll find your site loading a lot faster.

That's cool and all, but what if you also followed their advice and stuck a bunch of your static content out on Amazon S3? I guess you just fire up S3Fox and start playing with the metadata on all those… whoa, hang on… hey, you can't change that stuff once it's written. Crap. You've gotta upload all those files again. And you can't use that cool Firefox tool to do it anymore, because it has no way to set an "Expires" header when you upload a file. Crap. Crap. Crap.

Well if you're running C# and ASP.NET, you're in luck. Because I just went through that pain for a few of my sites, and now I'm going to let you mooch off my code.

First step: download the right library from Amazon

In this case, you're going to need the Amazon S3 REST Library for C#. No, not the SOAP library, because evidently that one is crap. Either drop the source straight into your project or build it elsewhere and link it in.

Last step: swipe this code

This zip contains everything you'll need. Just airlift it into your project and you'll be good to go. Now, since this is an article about programming, I'm legally obligated to provide at least one code sample for you to gloss over. So here is the meat of what we're doing:

public void PushToAmazonS3ViaREST(string bucket, string relativePath, HttpServerUtility server)
{
    relativePath = relativePath.TrimStart('/');
    string fullPath = _basePath + relativePath.Replace(@"/", @"\");

    AWSAuthConnection s3 = new AWSAuthConnection(_publicKey, _secretKey);
    string sContentType = "image/jpeg";
    SortedList sList = new SortedList();
    sList.Add("Content-Type", sContentType);

    // Set access control list to "publicly readable"
    sList.Add("x-amz-acl", "public-read"); 

    // Set to expire in ten years
    sList.Add("Expires", GetHttpDateString(DateTime.Now.AddYears(10))); 

    S3Object obj = new S3Object(FileContentsAsString(fullPath), sList);
    s3.PutObjectAsStream(bucket, relativePath, fullPath, obj.Metadata);
}

There's only two lines you need to care about if you're using S3 to host web content, and they're both commented. One sets the file to be readable by the public, and the other tells it not to expire until after you've left the company. Sorted.

I've included a cheesy .aspx page that you can use to push your files by hand. Hopefully you can figure out how to change which directories it's putting in the list, and how to add your own. It's actually pretty ugly code, but hey, it's just an admin tool that you'll only run a few times in your life.

Be Warned though: I've stripped out the security that keeps people from the outside world (and GoogleBot) from hitting this page and bogging your server. If there's any chance that this might escape to the live site, be sure to lock it down so that you can't see it unless you're logged in as an admin!

Anyway, I hope you find some use out of that code. I certainly wasn't planning to publish it, so please refrain from mentioning the 47-odd things in it that you should never do in production!

Enjoy!

paint chat software

Labels: , , , ,

Monday, September 17, 2007

How to Fail at Freelancing, in 5 Easy Steps

Let's say you're a freelance web designer, and you come across a project description for, say, the redesign of a little site that lets you host web meetings. This could be your ticket to fame and fortune. But how should you proceed? Well, frankly, you could have proceeded in just about any moderately professional way imaginable last week and scored that gig, but instead you did the following. All 200 of you.


Solid Gold!!!
1. Send out a Canned Proposal. And not just any canned proposal, send a giant letter filled with the entire contents of your website, but without the proofreading. Make sure it reads like a long-form sales letter, complete with opening phrases such as "Webmaster, your search is over! We're just the candidate for you!!!", but with more exclamation points. Be sure to include links to at least 500 websites that you may have been remotely involved with.

2. Send that Canned Proposal within the first 20 minutes, because time is of the essence. You want to make the statement that "I Didn't Give This Any Thought At All!" And you want to make that statement fast. Good employers tend to hand out knowledge work on a first-come, first-served basis, so you want to make sure you end up on top of the stack. Extra credit for sending the same bid twice!

3. Don't Read the Project Description, or visit the site you're expected to be writing a proposal for. Bid on pieces that weren't up for bid. Ignore the direct questions asked of you in the project description.

Honorable mention:
To the guy in Florida who tallied up his hours, then tacked on the 5% overhead that this particular freelance site charges (expected to be borne by the freelancer, not the employer), then made a point of demanding payment up front. Good way to break the ice and build confidence.
4. Don't, under any circumstances, write any text specific to the project at hand. If you somehow feel obligated to write a sentence or two to show how much thought you've put into your proposal, be sure to tack it on to the bottom. After your cheesy promo text, after that list of 50 links to your class projects, and after that special offer of a 20% discount if I ACT NOW because it's your Fall Sale!

5. Quote an exact dollar figure, after having completed the steps above. This is a nice final touch, as it shows that even though you didn't bother to skim through the project description before sending off your automated response, you still have a firm grasp on our needs.

How to do it right:

Freelancing on Guru, Elance, Rentacoder, etc. is all about first impressions. Your prospective client will no doubt be swamped with dozens if not hundreds of proposals, and you need to find a way to make yours stand out. This is surprisingly easy to do in the current climate, as even the slightest hint of professionalism will do it. Your competition is all trying to shout over the top of each other, thinking that's how you get heard. But you know what? Shouting "I'm an Idiot" at the top of your lungs may in fact get you heard, but it won't usually get you hired.

So here's what you need to do if you actually want to hear back from a potential client: Write a simple, three paragraph proposal, from scratch. Spend the whole first paragraph summarizing the project and explaining why it's something you're good at. Go over your planned approach in the second paragraph, and end it with a ballpark estimate if the project description gave you enough information to do so. And finally, wrap up with a little pleasantry, and give the client a way of learning more about your operation and contacting you if they want to proceed.

Sounds simple, eh? Well, I used to do a lot of freelancing, and got a lot of work with that approach. Trust me, nobody else is doing it. Behave like a professional, and you'll clean up!

Labels: , , ,

Friday, July 20, 2007

Sprinting

I didn't get anything done last week. Nothing at all. Most of my days were spent reading random stuff on the internet, making minor tweaks to Blogabond, and obsessing over traffic stats for Twiddla (why did 5000 people suddenly show up from StumbleUpon in one day???)

This week, on the other hand, I've been on fire. In the last 4 days, here's what I've accomplished:

  • Built a Reddit clone from the ground up for Rootdown's soon-to-be-live Clinical Pearls section.
  • Built a Google-Maps powered acupuncture chart, cut up 3000 tiles for it, incorporated effective Lat/Lon coordinates into Rootdown's database, and built a little Ajax data entry tool to drag & drop acupuncture points and meridians onto the chart.
  • Reworked the Photo uploading and Photo management pieces of Blogabond.
  • Tore out and streamlined the installation process for Regressor.NET
  • Wrote this lame article
Trust me, that's a lot of stuff.

I've noticed this same pattern happening over and over again. I think of it as Sprinting, and I think I'm getting better at harnessing it. There are a few factors that play into it, but I think the key is knowing that I have an entire day to Sprint on whatever it is that I want to do. Knowing Absolutely that the door to my office won't open, the phone won't ring, and no little IM popups will bother me for the Next Twelve Hours. Knowing that I'm free to get as deep into what I'm doing as I need to get whatever I'm doing Done and Done for good. Those are the days that I get the most accomplished.


Sprinting for a different reason:
Team Expat, Running with the Bulls in 1998
Another thing that seems to help, at least for me, is to have more than one ball in the air at a time. When I've only got one project, I seem more content to move slowly, check my email, read the occasional blog, and essentially stuff my productivity. But when I've got 3 things the Need to Get Done, and there just aren't enough hours in the day to do it all, I find that I work a lot faster.

Better still is to have something ELSE that I should really be doing. You should SEE the days I spend blowing off paid work to Sprint on a side project.

But it more than just blowing off one project for another. The real advantage of having more than one project going at a time is that if I get blocked for any reason, I can switch over to another project and continue Sprinting at the same pace. If I'm motivated to move fast, it doesn't really matter all that much what I'm moving on, provided I'm moving. If I can ignore the little bottlenecks and keep Sprinting until the inspiration fades, I can get a lot more done overall. If I only had a single thing to work on, any minor distraction, such as missing graphics from a screen designer, could derail me and send me off to check Reddit (and thus get stuck there for six hours.)

I don't think that any of this is new. It's common knowledge that developers tend to work in bursts. I guess the difference for me is that I'm starting to work on ways to facilitate those bursts. To keep them going once they get started. To finally look up and find it's dark out, and I haven't eaten for 16 hours and wow, did I really get all that done in a single day??? That's where I want to be. That's Sprinting.

Labels: , , ,

Saturday, July 07, 2007

Free Online Flash Card Software

I've been taking Spanish lessons these last couple weeks. I'm getting to the point where I can usually communicate my thoughts, but my vocabulary is seriously lacking. No sweat, I thought, I'll just hop online and find a little Flash Card site. There's got to be half a dozen websites where you can make flash cards and test yourself with them, right?

Uh... Hang on. They all suck! And they're trying to sell me something. This is lame.

Fortunately for me, I write software for a living. Cut to six hours later, and we now have flash-card.org, a site where you can actually make some flash cards and test yourself with them. It will even pay attention to your answers, and stop showing you cards that you always get right after a while.

Flash Card Software from www.flash-card.org

I'm going to keep taking more classes, so I bet the site will get some more attention soon, as I become frustrated by things it can't do. And, since I can never leave things alone, it will probably get a few more useless bells and whistles tacked on. Check it out if you're a student or otherwise need to memorize stuff. Let me know what you think!

www.flash-card.org - Online Flash Cards

Labels: , , , , ,

Wednesday, April 18, 2007

Twiddla - 1000 Signups on Day One!

Twiddla has been getting enough attention this last week that I moved it out to its own blog. Check out this recap of day one at Twiddla.com.

Putting stuff up on Reddit seems to be a good plan. Twiddla got another 1000 signups this morning. Most of it was traffic flowing through that article. Damn. I wish I'd spent some time getting it ready to show off!

Labels: , , ,

Sunday, April 08, 2007

Zero to Dogfood in one day

If you've been around software for any length of time, you've probably heard the term "Eating your own Dogfood." Other people have given better definitions of this than I can, but basically it means using your own application in house.

So if your company is developing a little web-based word processor that it hopes will get bought out by Google, you would be well served to force your management and marketing teams to use that little word processor in lieu of Microsoft Word. The idea being that you'll quickly discover about 100 new top-priority bugs in your thing that are stopping the CEO from being able to write a simple letter to his lawyer.

Now once you start thinking about your new thing in terms of Dogfood, you are immediately given a new goal for development: "We've got to get this thing to Dogfood." Meaning, our stupid new mail client has been in development for 3 years now, so why are we still using Outlook internally?!??

The Idea

Working with a distributed team is hard. I hate to say that, since it's sort of our thing here at Expat Software, but it is true to an extent. We have a design team up in Portland doing mockups for the new Rootdown look and feel. Down in LA, we would look at the designs that came in, mark them up and send them back to Portland, sometimes calling the designers on the phone, sometimes getting in touch via chat. It was taking forever. Just explaining the concept of "This button isn't necessary, and could you move the logo down to here" would take a couple days to get across.

Over dinner one night, we were griping about this process, and somebody suggested WebEx as a solution. "Yeah, but WebEx sucks." "And it's expensive." "And it sucks." And yeah, all the real-time collaboration software out there really does suck. It's all got too many hoops to jump through to get up and working, and it's all too bloated with stuff you don't really need. All we want to do is draw lines on a web page. Why should that be hard?

And that got us thinking. Why should it be hard? What would you need to do something like that in a Browser? Not much, really. All the technology is there. Hell, we've done most of what you'd need before. Like, back in 1998! It's got to be easy to reproduce that today.

Thus, the seed was planted.

The Provocation

A few weeks back I wrote an article that touched on some of the effort we've put into our backend framework here at Expat. It got a lot of feedback, some of which asked how we could possibly be productive with such an expansive backend to maintain. This really took me by surprise, because our experience has shown that we're a lot faster now than we used to be before we had that infrastructure in place. In my mind, that framework is the reason that I was able to put up a site like Blogabond in a few months of my free time, while it has taken other companies in the same space over a year to put up a similar site with a dozen developers and a million dollars in venture capital.

So hey, if we're so fast and all that, and this little collaborative marker-upper is so easy, why can't we just build it? Like fast, even?

Yeah. How about we set aside a day to do a little proof of concept and see how far we get.

The Day

10am
Technical proof of concept
First off, there are a few fundamental questions that need answering. What do we absolutely need? Can we layer a DIV over an IFrame? In every browser? Can we put a transparent-backround Canvas in that DIV and draw on it? Even using the IECanvas hack? Can we hook mouse events to it? Cool. We're in business.

12pm
Silly Drawing App
Next up, we needed a quick and dirty little drawing application for marking up photos and web pages. One day we'll want to put some more effort into this piece, but for now all we needed to do was draw little scribbly lines on the canvas.

2pm
The Proxy
We needed a simple proxy of some sort to show web pages on that IFrame, to avoid annoying Cross-Site-Scripting issues. It would need to mess with the HTML somehow to ensure that any clicks on those pages got redirected back to the proxy.

For the time being, we just grabbed an open source ASP.NET proxy tool and plugged it in. (This got swapped out about 2 days later for a home built version that worked a lot better for what we were doing.)

3pm
First Cut at the Backend
This was still a proof of concept, so we just mocked up a few basic objects and stored them in static memory on the server. Throw in a few little web services that the client can call to talk to the backend, and we're off to the races. (This piece was blown away and rebuilt the next morning to use a real data layer, but it kept us on task and out of the mire until we had the rest of the thing working.)

5pm
Testing
First multi-user twiddle session. Basically, 3 guys in one room drawing words and pictures over Google's home page. I'm really glad we don't have screen captures of most of the things we were drawing.

6pm
Chat
Somebody asked for Chat, so we threw in a little ghetto chat window. Nothing fancy, but at least you could see what people were saying (but not who was saying it!)

6:30pm
Refactoring
Much polishing and refinement of the original concept. And we added a few more features like being able to choose what color you were drawing with.

7pm
Outside users
At this point, things were looking basically usable, so we invited a few friends from the outside to try it out. Lots of childish graffiti was drawn, and a few more major issues were uncovered.

8pm
Dogfood
Finally, with the last showstopper issues out of the way, it was time to get the design team in Portland on to the site. Somebody pulled up Rootdown.us in the main window and we all started drawing lines on it and suggesting things to nudge around.

Holy wow. We were using this thing to do real work!

The Analysis

So how did we pull it off? Simple. We cheated.

The nice thing about Dogfood is that it doesn't have to be a finished product. It just needs to be useful for the task you're trying to accomplish. Sure, it needs to be stable enough to get stuff done, and it can't go losing critical data. But mostly, it just has to limp along well enough that you can start using it to do real work.

Since we weren't trying to build the whole product all at once, we were able to cut a few corners to get that Dogfood version up as quickly as possible. You'll notice that we had to go back the next day and tape on a new back end, and that we had to throw out the crappy third party proxy we were using. Better still, in the version we used that night, you couldn't even log in or create new WhiteBoard sessions. We had a single session, and a hard-limit of 3 users. There was still loads of work to be done before we could let the general public see the thing.

Another thing we had going for us was a really clear vision of what we were trying to accomplish. That vision was small enough to fit inside a single brain, and compact to the point that we could throw a single programmer at it for a day to get it implemented. You get a huge speed advantage with a team size of one. I doubt we would have finished in a day had we had three guys working on it.

One Week Later

So here we are, a week later, with a big pile of bugs and feature requests in the hopper. All found through simply trying to use the application to get work done. We're on the thing every day reviewing designs with the guys in Portland, and every day I'll spend another couple hours tweaking the thing to be less annoying and more useful.

With all the positive feedback we've gotten from friends and family, we're starting to think about opening Twiddla up as a public Alpha. Maybe even turning it into a real product at some point.

Lucky for us, we have this little blog with its little readership of early-adopter types. I'd encourage anybody reading this to go to www.twiddla.com and put our little whiteboard app through the paces. Naturally, we'll want to hear honest feedback about what you like and dislike about the thing. And hey, it's only been alive for a week now, so you're not going to hurt our feelings by telling us that it sucks.

We know it sucks, and we have a good idea as to why. That's the power of eating your own dogfood. With luck, maybe you'll have ideas to make it suck less!

Labels: , , ,

Friday, March 16, 2007

exampleCode != productionCode

Take a look at this little piece of code. It looks pretty innocuous, like it was taken straight from "Teach Yourself ASP.NET in 21 Days". Pull a list of Trips out of the database for a given user, and bind it to a select list. Nothing fancy. Teaches you a little bit about ADO.NET and databinding all in one place.

Figure 1., Book Sample

public class ExampleCode : System.Web.UI.Page
{
    protected HtmlSelect selTripID;

    private void Page_Load(object sender, System.EventArgs e)
    {
        if (!IsPostBack)
        {
            DataSet ds = new DataSet();
            SqlConnection connection = new SqlConnection(
                "server=OurProductionServer;database=Payroll;
                UID=jimmy;PWD=j1mmy;");
            SqlDataAdapter adapter = new SqlDataAdapter(
                "select * from Trip where UserID=" 
                + Request["UserID"], connection);
            adapter.Fill(ds);

            selTripID.DataSource = ds;
            selTripID.DataTextField = "TripName";
            selTripID.DataValueField = "TripID";
            selTripID.DataBind();
        }
    }
}
Imagine my surprise, however, when I walked in to a small software shop recently and found a whole project written with code like the above. What were these guys thinking? Are they seriously relying on this fragile, unmaintainable mess in a real software product?

And then it dawned on me. Maybe nobody had ever told them that the little examples in the book are just that: Little Examples. For teaching purposes. Never intended for use in the real world. Come to think of it, it doesn't even tell you that in the book. It aught to be in block capitals across the cover of the book:

WARNING: DO NOT PASTE THE SAMPLES FROM THIS BOOK DIRECTLY INTO PRODUCTION SOFTWARE!!!

Somehow, it seems that this message never got through to a substantial portion of the software industry. Every time I see a "Senior Developer" writing ad-hoc SQL or referencing a hashtable with a string I just want to cry.

So what do we do about it? I guess we try to get the message out. Here is some code I copied out of the Blogabond source that is functionally equivalent to the above:

Figure 2., Production Sample

public class ProductionCode : System.Web.UI.Page
{
    protected HtmlSelect selTripID;
    private int _userID;

    private void Page_Load(object sender, System.EventArgs e)
    {
        if (!IsPostBack)
        {
            _userID = StringConvert.ToInt32(
                Request[User.Columns.UserID], 0);
            if (_userID != 0)
            {
                PopulateTripList();
            }
            else
            {
                // bail gracefully...
            }
        }
    }

    private void PopulateTripList()
    {
        selTripID.DataSource = Trip.GetByUserID(_userID);
        selTripID.DataTextField = Trip.Columns.TripName;
        selTripID.DataValueField = Trip.Columns.TripID;
        selTripID.DataBind();
    }
}
Short and to the point. And obviously only the tip of the iceberg. This bit of code goes deep, but we can learn a few things just by looking at it:
  1. We're using a Data Layer of some sort. Somewhere in the back end lives a class that is wrapping a stored procedure for me. There's a ConnectionFactory back there someplace that knows to hand me a connection to the Production database because of a server setting.

    The code that's handing me the DataSet I'm binding to can be reused by any page in the project, so I know that little select statement is only living in a single place. In fact, everything that touches that table is sitting in a single class back there. So if something changes in the schema I won't have to go hunting through the client code to fix it.

    Best of all, because we've separated the database code out into its own place, we can drop all the boilerplate CRUD into CodeSmith templates and let it auto-generate itself from the database schema. In the case of Blogabond, I can flip a switch and watch as 50k lines of boilerplate C# and SQL in our backend gets blown away and recreated in about 30 seconds. And since it's also generating all the boilerplate Unit Tests for all that code, we can be sure that the datalayer and the database line up correctly. So if junior dev Jimmy comes by and unchecks a not-null constraint in Enterprise manager, we'll see the continuous build fail on an integrity check a few minutes later.

  2. We're using Enums instead of inline strings to reference our column names. This may not sound like a big deal, but it buys us a couple things. First, we've abstracted out the concept of the Typo. There is simply no way we can misspell "TripID", because the compiler will catch it for us. It will sit there underlined in red if we even try. We don't even have to remember what columns we have available, since Intellisense will tell us when we hit ctrl+space.

    Second, and bigger, is that we buy the freedom to monkey with our tables and never worry that we're causing runtime errors someplace. I can rename the column "TripName" to "BlogName" if I want, re-generate the database wrappers, and watch as the project fails to compile until I fix the references. That is some powerful stuff. And what happens if I forget to re-generate the wrappers for that table? The continuous build will fail in about 5 minutes, since the unit tests that check to make sure the wrappers match the schema will fail.

Where do we go from here?

Copy and paste code reuse is bad. Everybody knows that. Ad-hoc SQL is bad. Everybody knows that. Inline strings are bad. Everybody knows that.

At least that's what I thought. But you know what? They don’t. And they should. And it's our job to tell them.

Labels: , ,

Copyright © 2008 Expat Software