This is a technical overview of the techniques you'll use to collect Metrics on your SaaS Trial Users so that you can leverage that information to stop your trial users from churning, which in turn will make your business a lot of money.
But be warned: There Is Math Here. If you'd prefer to step back a notch and learn what this whole Onboarding Analytics thing is all about, here's a much more entertaining overview I wrote about
How I Quadrupled My SaaS Trial Conversions (with math).
OK. Just us nerds left? Cool. Let me break out the Entity Relationship Diagrams…
Step One: Collecting Data
First, a bit of background. What we're doing here boils down to collecting data about everything your users do during their Trial. Every interaction with your site, every lifecycle mail they open, even every time they didn't show up for an entire week. If it happens, collect it.
We'll then use that data to find patterns about what sort of things your successful, gonna-be-paying, users do versus what your unsuccessful, churn-bound users are doing. And once we have those patterns, we'll be able to flag up failing users so that the we can guide them back onto the happy path.
So again, step one: we need to store everything they do. How about this for a schema:
Here, a Participant is one of your trial users. You could skip this table if you want to hook actions directly to Users, but this gives us a bit of flexibility and keeps our example self-contained. The Label table will hold the names of the Actions that our Participants can do, and allow us to add extra information, and to link off to any supplemental tables we'll build later. And, of course, Actions holds one record for each thing that happens, ever. ParticipantStatus is just a lookup table for "Trial", "Paid", "Expired" & "Cancelled".
Now we can build a little library in the backend of our app to quickly stash Actions for us. Ideally, we'll want to expose a single function to our developers that is so simple that they can't come up with an excuse not to use it:
It can really be that simple, as we'll know who our user is (since he's logged in), and we can probably ask somebody for the current time.
Now all that's left is to sprinkle a few hundred of those calls throughout the codebase, every time anything even mildly interesting happens, and to expose a way to for your marketing folk to manually add more Actions when they've interacted with your trialers in person. (Which, incidentally, is something that marketing folk actually do!)
Step Two: Finding Patterns
Here we go, straight into the Machine Learning stuff. Ready?
No, we're not ready. Sadly, it'll be months of boring data collection before we have enough for our Machine to Learn anything about. So unless you want to embarrass yourself with a Decision Forest that happily maps "User Logged In" to "Always Churns, 90% Confidence", we'll hold off on that for the moment. First, let's step back and just look at that data with our eyes.
Step 2A: Visualizing the data
Here are two of our Participants, cruising through their trials. Which would you say is more likely to convert?
It's amazing how much you'll learn about your users just by watching them. You'll find out that some people reset their password every single time they log in to your site. You'll find users who log in four times every day and check the same screen to see if anything has changed. And you'll find plenty of users who got stuck on something during the first 10 minutes and never came back.
Those are useful things to know, so it's worth building some simple reports right away to get that information to the folks who can use it. This is our Minimum Viable Product to justify collecting this data in the first place. Even though we know the Big Payoffs will come later on, it's surprising how big a win you can get just from this first step.
Here's a few report ideas to get you started:
Actions By User
Actions By Label
Trial Users (with action counts)
Paid Users (with action counts)
Expired Trials (with action counts)
Who Logged In This Week
Who Didn't Log In This Week
Step 2B: Statistics
Even though true ML is still a while off, we'll have trialers start converting (or expiring) right away, so we can start building statistics.
For a given user at the end of his trial, we have a list of things that he has done, and we know his outcome. That gives us enough information to determine which Labels tend to affect outcomes, and (to a lesser extent) by how much. We can at least state something along the lines of
"38% of participants with the Label 'FilledOutProfile' converted to paid, whereas only 8% of participants without that Label converted."
We can build little stories like that around all of our Labels, to see which are the most interesting. We can even combine them to generate something like a Score for a given Participant based on which Labels he has seen and which he hasn't.
So, assuming we can compile a list of Participants that have finished their trials and either paid (which IsGood) or expired or explicitly cancelled (which !IsGood), we can calculate a few properties for our Labels like this:
foreach (var label in Labels)
label.InGoodCount = Participants.Count(p => p.IsGood && p.LabelIDs.Contains(label.LabelID));
label.NotInGoodCount = Participants.Count(p => p.IsGood && !p.LabelIDs.Contains(label.LabelID));
label.InBadCount = Participants.Count(p => !p.IsGood && p.LabelIDs.Contains(label.LabelID));
label.NotInBadCount = Participants.Count(p => !p.IsGood && !p.LabelIDs.Contains(label.LabelID));
And then all we need are a couple helper methods on our Labels:
public double PercentIfIn()
var denominator = InGoodCount + InBadCount;
if (denominator == 0)
return (double)InGoodCount / denominator;
public double PercentIfNotIn()
var denominator = NotInGoodCount + NotInBadCount;
if (denominator == 0)
return (double)NotInGoodCount / denominator;
... and we can generate the little snippet of text describing the performance of a given Label. While we're at it, we can keep track of the expected conversion rate:
var goodCount = Participants.Count(p => p.IsGood);
var badCount = Participants.Count(p => !p.IsGood);
var expected = (double)goodCount / goodCount + badCount;
... and we can use the variance from the mean to assign an "Interestingness" value to each label that we can sort by when presenting them or using them to "score" Partipants:
// for each Label...
var up = Math.Max(PercentIfIn/expected, expected/PercentIfIn);
var down = Math.Max(PercentIfNotIn / expected, expected / PercentIfNotIn);
Interestingness = Math.Abs(up + down);
Part Three: Intervention
Now that we have all the collection and analysis ticking away, we'll have a bit of Calendar Time on our hands, waiting for enough Participants to stumble through their trials for our conversion statistics to be meaningful. That gives us some time to automate things.
We're going to want a nightly job of some description to come through and calculate all those statistics for our Labels and apply them to our Participants. We'll want to generate a report showing which of them are likely to convert and which of them are pretty much hopeless (and why.)
It'd also be cool if that nightly job fired off some webhooks for certain conditions so that we could set up worker tasks to do things like send off rescue mails to problem users, or at the least notify that guy in Marketing so that he can do something drastic like actually call them on the phone or something. You can even build him something that plugs directly into his CRM and Calendar so that he'll have up to date information and action items ready to go each morning.
The important thing is that by this point you'll be the hero, having demonstrated that this whole Customer Lifecycle Metrics thing is real, and that it will in fact make a measurable-in-dollars difference to your business by steering problem trialers back onto the happy path.
It's also worth noting, and I'm almost embarrassed to mention it since we're both software folk and could easily build all this from scratch, but just putting it out there: This is all built already.
You can sign up for a Free Trial today over at Unwaffle.com, and have it all up and running 20 minutes from now. We have drop-in client libraries for every programming language known to man, so all you need to do is sprinkle those Track() calls around in your code then sit back and watch the data flow in.
But either way, build or buy, that's how it works. And you really should be doing it. It'll make you a lot of money.
Discuss on hacker news
Computer Programmers make me sad sometimes. We're so good at so many things, and so good at learning whatever we want, but there are certain things we simply refuse to get better at, despite them not being all that hard, to our own detriment.
Talking to girls and negotiating salary are two things that we're historically bad at. We're actually bad at lots of things, but these two are unique in that rather than finding ways to get better at them, we instead proclaim them to be Impossible Things For Computer Programmers To Learn and refuse even attempt to get a little bit better at them.
This is really strange.
If you were bad at writing Objective-C code, but it was 2008 and knowing Objective-C would let you answer "Yes" to any of those dozen emails a week you were getting offering $150/hr to write Objective-C, what would you do? Would you perhaps spend a few hours learning how to do it?
Now what if you were bad at negotiating?
"No way. Negotiating is evil. They should just pay everybody the same. That'd be fair."
"No way. Talking to girls seems hard and scary. And I'd actually have to walk over there and talk to one of them."
But here's the thing. It's only hard because you've never done it. Go do some of it and you'll find it's really not that bad.
Developers have a reason for all this, of course. They like things to be "fair". They like meritocracies. The idea that somebody off the street could just talk his way in to making more money than somebody with more skill just comes off as wrong. An affront to the way the world should work. Immoral.
Salary Transparency is all the rage these days because it presses those fairness buttons in Engineers. Negotiation is unfair, and I'm no good at it anyway. But no need to bother with all that. Salaries are set. Everything is "Fair". Sign me up!
Transparency also sounds good from the employer perspective. You get to control the negotiation procedure (by completely eliminating it) and set salaries how you like. Take it or leave it.
Unions also resonate nicely with Software guys. Again, negotiation goes away and everybody is paid "fairly" according to some scale. Everybody wins.
But not everybody wins. In fact, both these ideas fall down because roughly 50% of the developer population loses in any arrangement where you set all salaries to the same level. That's how distributions work.
Imagine you are an engineer who's genuinely good at what he does and knows how to negotiate a rate that reflects that. Why would you leave, say, 100k on the table to go work for a place like Buffer, where even the CEO is only making $150k/year? Why would you join a Union where your bill rate was set based on the average joe with the same number of years experience? In either case, you're worth a lot more on the open market, so you'd opt out.
That also means that shops with Transparent Salaries may have a hard time attracting the best talent (since only a guy who knew he was in the bottom 50th percentile would seek them out), but that's their problem not ours, and a bit off topic for today.
We're talking about fairness, and how negotiating is amoral because it places negotiating skill ahead of technical skill in determining how much you get paid, and how companies are ripping developers off by paying them less than they deserve just because they're bad at negotiation.
But I disagree.
- I don't agree that negotiating is "amoral", nor that rewarding people for being good at negotiation is "unfair", nor that you can get "ripped off" by receiving a salary that you agreed to.
- I think it's fine that people can negotiate with one another, and that being good at negotiating has advantages.
- I think the most productive thing to do in a world where people negotiate is to learn some basic negotiation skills so as to live in that world.
- I think the least productive thing you can do in that world is to try to make everybody stop negotiating for things. Doing so can only make things worse for you, since everybody else will continue negotiating after you stop.
Ah, but you counter:
Negotiation is stressful, a waste of time and not interesting at all for many software developers. It would just be better for the public good for no salary negotiations to take place, because people would worry less about being underpaid. Less stress is almost always good for work productivity. And there will be more justice.
Setting aside the disputable bit where the author refers to a process that can add millions of dollars to one's total career earnings as a "waste of time", have a quick re-read of that comment and notice how he uses terms like "public good" and "justice." We're back to capturing the in-built developer-brain need for things to be fair.
As developers, we tend to look for technical solutions to things, so when we see a situation like the one above, we come up with ideas to "fix" it and make everything fair. Transparent Salaries, no-negotiation workplaces, a precise formula translating skill to compensation. Keep building technical solutions until we've coded the problem out of existence. That's how things are done in our world.
But a more productive approach when confronted with the fact that "negotiation skill seems to have more bearing on how much I can make than technical skill" might be to just take advantage of that fact. If tech folk are as generally bad at negotiating as everybody seems to agree, the best course would seem to be to simply get better at negotiating.
That's actually quite easy to do. And if you do so, you'll make a lot more money.
That's a good thing.
Discuss on hacker news
I spend way too much time on Hacker News. It's a fun place, and a good way to keep up to date on all the new tech that us developer folk seem to need to know about. But it also leaves a fella feeling like he really needs to keep up with all this stuff. I mean, if you don't have a side project using the latest client-side framework, well, good luck ever finding a job again in this industry.
Thinking about it, I find that I straddle the line on this. As a long-time contractor, I try to stay up to date on the New Shiny and will happily run with whatever flavor of the month language, framework, and programming paradigm that a given gig wants. Yeah, sure, Node.js with tons of functional stuff mixed in pulling from a NoSQL store and React on the front end. I'm your guy. We're gonna change the world!
there's no way I'd use any of that crap. Good old C#, SQL Server and a proper boring stack and tool set that I know won't just up and fall over on a Saturday morning and leave me debugging NPM dependencies all weekend instead of bouldering in the forest with the kids. This stuff is my proper income stream, and the most important thing is that it works. If that means I have to write a "for" loop and declare variables and risk 19 year old kids snooting down at my code, so be it.
I can't tell you how nice it is to have software in production on a boring stack. It gives you freedom to do other things.
I can (and often do) go entire months without touching the codebase of my main rent-paying products. It means I can, among other things, pick up a full-time development gig to sock away some extra runway, take off and go backpacking around the world, or better still, build yet another rent-paying product without having to spend a significant amount of time keeping the old stuff alive.
It seems like on a lot of stacks, keeping the server alive, patched and serving webpages is a part-time job in itself. In my world, that's Windows Update's job. Big New Releases come and go, but they're all 100% backwards compatible, so when you get around to upgrading it's just a few minutes of point and clicking with nothing broken.
I see it as analogous to Compound Interest, but to productivity. The less effort you need to spend on maintenance, the more pace you can keep going forward.
But yeah, the key is to never get so far down in to that comfy hole that you can't hop back into the present day when it's time to talk shop with the cool kids. Shine on, flavor of the week!
Discuss on hacker news
I run a little single player software empire, and right now the product that’s paying my bills is S3stat
. It’s a SaaS product that processes human readable reports from the basic logfiles that Amazon produces from its Cloudfront and S3 services. It’s kinda like Google Analytics, but for your Cloud stuff. You should totally sign up.
But anyway, the way it works is that you sign up for a Trial account with us and use a little installable tool we provide that lists out your S3 Buckets and Cloudfront Endpoints and walks you through setting up logging for the ones you’re interested in. Then it tells us where to find those logs so we can produce reports. Or at least that’s the way it originally worked.
Over the last few years though, more and more new users started coming on board with logging already running on their stuff. The tool will detect this, of course, and not screw everything up by changing your settings. It’ll just note the log location and send it along as per usual.
Every once in awhile I’d get an email from somebody with a few months of old logfiles sitting around asking if we (because we’re a company and therefore a “we” even though it’s just me) could generate reports for them. Sure, no worries, we’d say. And I’d kick off a job to run those old reports. Happy new customer.
But I’m a nerd. One of those lazy ones who likes to automate things, and even though this only took a few minutes out of my day, I’d much prefer to keep those minutes for things like blowing off work for the day to go rock climbing because it’s sunny and I can do that because I run my own company. The less time I have to spend dealing with these customers, the better. So I built a little sniffer that would detect pre-existing logfiles and gave the users an option of hitting the “go” button on the report runner themselves.
But I’m also a capitalist. So I didn’t push that out just yet.
Instead, I changed it to automatically run just one month
worth of logs, so that new users could get a nice big taste of what they could expect from the service. Then I offered the option to purchase
additional months of service so that they could process any older logs. And I priced those additional months of service at 100% the cost of normal S3stat service. All you’re doing is moving back the start date for your subscription.
And just as it has surprised me every other time I’ve asked my customers for more money for things, people actually started buying
those extra months of service.
But it gets better.
Over time, I’ve noticed that the average purchase for extra service seems to be climbing. People are showing up with more and more logs sitting around that they want to have processed. And why has this started happening? Because of this:
That’s what you see when you go to create a new S3 Bucket in the AWS Console these days. Notice how the Next Step
after naming the thing is to Set Up Logging. Gee, that sounds important, if Amazon is telling me to do it. And sure enough most people seem to do so when creating new Buckets. Cloudfront does something similar when creating a new Distribution, where it throws Logging right in your face before you can move forward.
So I’m sure you’ve made the connection back to my service, but I’ll go ahead and spell it out. Amazon’s new default way of setting up S3 and Cloudfront prompts you to start logging immediately. That means that when and if you do eventually decide to try S3stat, you’ll show up with a bucket full o’ pre-existing logfiles for us to process. And that
means that you are pretty likely to decide to move your start date with us all the way back to the day you set up your stuff in the first place, giving us an extra several months worth of lifetime value from you right off the bat.
Big companies have a way of accidentally stepping on little companies as they move around sometimes, crushing a thriving business with a minor change to their Terms of Service or a decision to stop publishing a certain data feed. But every once in awhile, one of those seemingly random little steps manages to squish up the dirt under one of us little guys, leaving us better off than we were before.
It’s nice to be on the receiving end of that for a change.
Discuss on hacker news
I kinda stumbled into this idea a few months back. I had added a feature on S3stat
where Users can create extra logins for other members of their team. So instead of printing out reports and sending them to their boss, they can simply add Mr. Boss Guy to their team and he can check his own reports. Less work for the account user.
But it turns out that it was a better idea than I'd thought. Because if Joe User let his trial lapse, he'd get an email the next week from Mr. Boss Guy asking where the hell this week's reports were and why couldn't he log in to S3stat anymore? Joe would explain that gee, that thing costs like ten dollars a month
and I don't have a budget for software. And Mr. B would yell at him for 24 seconds then explain about how he had just this minute wasted $10 of his own time yelling at somebody because $10 is not a lot of money for a business
and go subscribe right now. Here's the company card.
Well that was cool. I wonder how many times that has happened?
So I wrote a big ugly SQL query to count how many Users had signed up for trials since that feature went live, how many of them converted to paid, and to see whether Trial Users who added at least one team member converted at a higher rate than users who didn't.
And they did. By a huge margin:
Well wow. I wonder what other features I have (or could build) that can make that sort of difference. And I wonder what sort of tools I could build to get a better handle on this sort of thing. Because this is not the sort of thing I want to have to find out by accident ever again.
Here's an equation:
(let ((g (* 2 (or (gethash word good) 0)))
(b (or (gethash word bad) 0)))
(unless (< (+ g b) 5)
(min .99 (float (/ (min 1 (/ b nbad))
(+ (min 1 (/ g ngood))
(min 1 (/ b nbad)))))))))
If it looks familiar, that's not a surprise since I lifted it verbatim from an article Paul Graham wrote way back in 2003 called A Plan For Spam
. In it, he basically invented the Bayesian Spam Filter, which is the reason that email is still usable today even though roughly 99% of the things headed toward your Gmail inbox are in fact spam.
At its heart, the algorithm is just a way of classifying piles of information into Good and Bad groups. For email, you look at the words in a message and determine whether an email with those words is likely to be Spam (which is Bad) or not (which is Good). There's nothing about the algorithm specific to the problem of Email. It just happens to be good at classifying stuff, and email benefits greatly from being classified.
But Users (of your SaaS product, remember) can be classified too. Over the course of a trial, they do lots of things and give off all sorts of signals that you can collect and analyze. You can feed those signals into a Bayesian Classifier to train it about what Good (subscription activating) users tend to do, and what Bad (trial expiring, canceling) users tend to do. And you can feed the signals from a new Trial User in to that trained Classifier to produce a guess about what he might do in the future.
So in the case above, where we're looking at Users who have invited Team Members vs. those who haven't, we can now just define an "AddedTeam" Signal that we track for that user and the Classifier will take care of determining whether that is in fact statistically more likely to indicate that our User will convert to a paid subscriber.
And that's good. Like measurable-in-dollars good.
I've been experimenting with other things you can track, such as aggregates like "NoLoginLast7Days" and rollups like "10LoginsLast30Days" that might give more for the classifier to grab on to. And, of course, just watching the data as it rolls past is enlightening too. You learn a lot when you start collecting usage data (in human readable form), like just how many people are forgetting their password every time they log in, and that there are customers who do in fact check in several times a day as part of their workflow. Even without the crazy Bayesian math, you can get a good feel for which people are planning to buy the thing when their trial runs out.
I'm going to write more on this as I build it out, but I guess this is a good time for a heads up that I'm hoping to build this in to a product at some point. It's live, in its infancy, at unwaffle.com
. If this stuff sounds like something you'd use for your own SaaS, hit me up with an email or go sign up for an invite.
Discuss on hacker news
There's something rather romantic about the notion of living on a tropical beach and working away on your laptop. And you know what? It's one of those few instances where the real version is actually just as awesome as the romantic ideal.
In fact, you need to quit thinking about it as a romantic notion. It's something you can do. And it's high time you did it.
Have you noticed how much easier it is to run a software company than pretty much any other type of business? You don’t need office space, a retail storefront, telephones, employees, or even an address. Other businesses need all those things.
We get to run the whole thing on a little server stuffed into a data center someplace. probably someplace we’ve never even visited for all we know because the only indication we have is that the machine has “us-east” in its name. Is there any reason that we need also be in that half of the US?
In practice, we can just as easily be in “Thailand-south” and nobody would ever know the difference.
Would anybody know the difference?
Real Companies have phone lines with people who answer them. We have Google Voice patched into Twilio patched into some funky 3rd world carrier who, incidentally, does a much better job of ensuring that your phone will ring on the beach than AT&T ever did drilling through the walls of my apartment in Portland.
Real Companies have mailing addresses and Business Bank Accounts. So do we. We set that bank account up before we left, using the business address, which by coincidence happens to be the same as that of our parents’ house or, if we were getting all fancy, an office-in-a-box in Deleware or Nevada.
Am I allowed to work there?
No, not really.
But they’re not checking. Places like Southeast Asia are chock full of expats living there, doing silly things like “visa runs” to the next country and back every couple months to remain a tourist for years on end.
Even in Europe, where you wouldn’t be able to get a self-employment working visa (if such a thing existed) with less than six months effort and a good lawyer, you’ll find that they’re really a lot more interested in keeping the various people coming in from the South from doing so than they are in messing with you. Given the level of effort required to find somebody at the Santander ferry port to even stamp your passport, it’s unlikely that anybody is conducting a multi-month surveillance of your AirBnB “office” up in the hills above the Cote de Azur.
What about taxes?
Here, you’re in luck. The IRS, being awesome, actually encourages you to piss off to parts unknown with its Foreign Earned Income Exclusion. Basically, if you can prove you’re a Bona Fide Resident of another Country (which you can’t) or that you’ve been Physically Present outside the US for at least 330 days during the last year (which you can), then they’ll let you deduct nearly $100,000 from your income before taxes.
Naturally, it’s never as good as it seems, but even after they’ve clawed back your self-employment taxes and a few other things, you still get to write off a nice chunk of change from your taxes. Certainly enough to pay for your room & board in a place like Southeast Asia.
Can you really work from the beach?
Of course. It’s the Future. The Internet is everywhere. Step One is to find the most pleasant, most remote, cheapest, most Swedish girl having beach available (prefereably with good rock climbing and/or surfing). Step Two is to find a thatch-roofed bar on said beach with cheap beer and a good view of the sunset. And we’re done. They’ll have wifi.
Now we start a radial search outward until we find a pleasant little bungalow with monthly rates. $400/month will buy you a lot of developing-world-luxury in this day and age. That’s even less than it would cost for a 1BR apartment in San Francisco, I’m told, and it comes with Utilities paid.
How long until I run out of money?
An alternate plan where you get to keep your day job...
You don't even need to go out on a limb to pull this off these days.
Plenty of companies offer remote work as an option, and few will actually go so far as to
specify just how remote you can get. Nicaragua is on Dallas time, and yes, they have internet there.
They also have nights and weekends, so if you think you can bootstrap your idea up here, there's no
reason to expect you won't be able to down there.
The unexpected reality is that living out of a backpack in most of the sunnier bits of the world is cheaper by far than living in a city in the US. Even on a couch. With roommates.
You’ll spend your time building a business. A real one that charges people money in exchange for stuff and hits profitability fast. You’ll fight for a while and start bringing in a few hundred bucks a month from paying customers. Beach life will be paid for at this point and now the goal shifts to building the revenues and getting to “I can live on this money”, then “day job replacing money” and hopefully one day “I can retire off this money”.
And you can do it all from that beach.
I’ve actually done all of this. Chances are I’m not the first person you’ve met who has. It’s not in any way difficult except in that it represents a Big Change.
Figure out how to convince yourself that it’s going to work. Then save up $10k in case it doesn’t. Then book that flight.
Can’t ask fairer than that. Good luck!
Discuss on hacker news
You should probably be working remotely.
Seriously. Take a quick look around you right now. Do you appear to be sitting in a felt cube? Does the word "software" appear in your job description? Cool. You're in a position to make your life a whole lot better.
I've been working 100% remotely for the better part of ten years now.
The short answer for why? Because I can.
That's really the single greatest feature of being a developer today. You can do your thing from pretty much anywhere in the world with no reduction in throughput.
I can (and have) set up shop for the winter on some remote Central American surf break. I can (and have) moved my main residence to a small village in the French countryside where the quality of life is good and there's enough bouldering to last me a lifetime of afternoons off. I can (and have) simply packed my whole development world onto a 12" Thinkpad and headed off on the road for an entire year.
And all those places have wifi. And I can work there. So I do.
So even if I found a company that did happen to have an office right next to that perfect left reef pass off the coast of Sumatra, I probably still wouldn't want to commit myself to working there full time. I already have an office there. As well as everywhere else I'd like to be.
It didn't used to be like this. And it still isn't for most professions. But it absolutely is for software. As a developer, I think you'd be crazy to pass up on it.
So yeah, that's why.
And here's the thing, in case you missed it above. You can do this too.
The industry is waking up to the fact that remote working works. There are tons of companies hiring remote workers right now. Enough so that it doesn't even make sense for me to list any of them here.
So yeah, get on it today. Find a way out of that cubicle, and at the very least onto your kitchen table. You can sort out the whole laptop & beach thing later. But the first step is to acknowledge that we're living in the future, and start doing so.
Discuss on hacker news
Common knowledge: You need to do A/B testing on your site.
Why? Because it will make you more money.
Cool, but why? Because if your business sells things on the web or otherwise makes you money as a result of people doing stuff on your website, you want to maximize the percentage of people doing stuff. That's where A/B testing comes in.
What is it?
Basically, if you test one version of your website against another version
, you can measure which one better compels users to do something.
So, for example, you might try testing your normal "Buy Now" button against a double-sized, bright red shiny button. You'd show one version to half your visitors, the other version to the other half. Pay attention to who saw which version, and whether they actually bought your thing.
After a week or so collecting data, you can see that, for example, 4.8% of visitors seeing your old button clicked it, whereas 6.6% of those who saw the big red one clicked it. That's valuable. As in, measurable in dollars
valuable, so you need to be doing it.
How to do it
It's not too hard to code something up yourself, but there are some good libraries out there that you can simply drop in. I'm writing this post to tell you about the one I wrote for ASP.NET and ASP.NET MVC:
FairlyCertain - A/B Split Testing for ASP.NET
I've been using this for the past 6 months for S3stat
, with some pretty impressive results. Now that it's good and stable, I finally motivated myself to package it up and release it as Open Source.
It's essentially a simplified version of Rails' A/Bingo
, with one major departure in that participation data is stored in the users' browser rather than a local database. That helps it scale out better and means that you can pretty much just drop the code into your project and have it start working without having to configure anything.
Check it out and let me know if you find it useful.
is our latest project here at Expat. It's a website that connects Spanish teachers in South America with students in the US and lets them hold live Spanish classes online
We'll be starting Beta classes soon, so if you want to score some free Spanish lessons, you might want to go sign up for the waiting list
Discuss on hacker news
We're happy to announce that S3stat now offers support for CloudFront Streaming distributions. We've offered S3 and CloudFront Analytics
for quite a while, so it was an easy decision to extend the service to include Streaming .
Basically, we'll handle all the setup and configuration needed to get Logging enabled on your CloudFront distribution, and each night we'll download and process those logfiles, and deliver reports back to your S3 bucket.
Web Stats for Cloudfront & Amazon S3
This feature has been out of Beta for about a week, so go ahead and give it a try when you get a chance. I'd love to hear your feedback.
Discuss on hacker news
I remember the day I got my first Spam post at Blogabond
, back in 2005. It was actually kind of flattering, since the site had only been live for a few months. I deleted it by hand and moved on.
Things have progressed substantially since then. Automated Spam Bots gave way to armies of cheap workers posting by hand, and now we've reached a point where roughly 90% of new blog entries on the site are attempted spam. The sheer volume of posts coming in is enough to sneak some of them past the Bayesian Filtering
we have in place, so we're lucky to have some extra measures in place to make sure that the general public never sees any spam on Blogabond.
I've learned a lot about Blog Spam over the years, so I thought I'd share some advice for anybody building their own user-generated-content site. Presuming, of course, that you don't want to be overrun with spam.
Never throw spam away
. It's valuable. You need tons of spam to train your Bayesian filters, and you need to use real spam from your own site to get the filtering results you want. Our filters, for example, can differentiate between a post written by a backpacker traveling through Guatemala
and a resort offering package vacations there.
Mark posts as spam and ensure that nobody can see them, but keep them around. They're handy!
Classify your Users
At Blogabond, we have the concept of a "Trusted User", whose posts we're comfortable showing on our front page, in RSS feeds, sitemaps, location searches, etc. The only way to become Trusted is to have a moderator flip you there by hand after reading enough of your posts. Everybody else is either a Known Spammer or simply Unknown.
These classifications are the main reason that the average person will never see any spam on Blogabond. All publicly browsable content is from Trusted Users, so the only way to see something from an Unknown user is to go to the URL directly. That means that you can start a new blog today and send out a link that people can use to see what you've written, but until you've convinced us you're trustworthy we're not going to let people off the street stumble across your stuff.
Never Give Feedback
The last thing you want to tell a Spammer is that his post was rejected as spam. Never tell him that his account has been disabled. Let him figure these things out on his own, hopefully after a lot of wasted time and effort.
Pages with spam content return a 404 (Not Found) to anybody accessing it from outside the author's IP block. That way, the author can (mistakenly) verify that it's live, while the rest of the world and Google never get to see it.
Never Show Untrusted Content to Google
The whole point of blog spam is SEO. Once Google gets ahold of a post, the game is over and the spammer has won. The worst thing you can do is blindly trust your spam filters to keep spam off your site and out of Google's index.
Assuming you're categorizing your users, this is simple. If it's from a Trusted User, it goes to places that Google can see it. If not, it doesn't. Sorted.
Maximize Collateral Damage
Stack the deck so that every action a Spammer takes increases the odds that he'll undo all his previous work.
When we flag something as spam, we also go back and flag everything in the past that came from that User and from his IP Address Block (as well as poisoning that IPBlock and User in the future). So while he may get lucky and sneak a post through the filter on his first try, chances he'll end up retroactively flagging that post as spam if he presses his luck.
We can actually watch as new messages drop onto the "Maybe Ham" pile, then mysteriously disappear a few minutes later. In essence, the spammer is cleaning up his own mess.
You're going to get a lot of spam, so you need tools to make it really easy to moderate it if you want to stay happy. Our Spam Dashboard has a view showing snippets from every recent post that lets us flag an item with a single click (in a speedy, AJAX fashion). I'll spend maybe a minute a day running down that list turning Maybe's into Spam, and occasionally marking a new user as Trusted.
We also have a pretty view of everything that's been marked as spam recently, along with reasons why and daily stats to see how well we're doing:
That's a screenshot from our Spam Dashboard this morning. As you can see, we're doing pretty well.
items are ones recently caught by the filter, RED
items are attempts by a Known Spammer to post something, and items that have been retroactively flagged (from the spammer pushing his luck too far) are shown in BLUE
items (none shown) are ones that we had to flag by hand because they made it past the filter.
In this shot, you can see a busy spammer creating new accounts, posting enough blog entries to trip the filter and undo all his efforts, then creating a new account and trying again.
There are two categories of people using your site: Real Users and Spammers. When you first start out, you tend to see it less as two distinct groups and more as a broad spectrum with some people falling in between. The longer you run a site, the more you come to realize that no, there are no Real Users with "good intentions" who are mistakenly posting commercial links on your site. Those people are spammers.
So don't hesitate to flag anything that looks even a little bit fishy. Woman talking about her fabulous Caribbean Cruise out of the blue? Spam. Random person posting poetry in China? Spam. Guy from India who really wants to tell you about his hometown? Spam.
And how do you know you were right? Because you will never hear complaints from any of those people. We've labeled thousands and thousands of "bloggers" as Spammers over the years, and so far I've heard back from exactly one of them. Spammers know that what they're doing is Bad Behavior. When you shut down their account, they'll know why.
Make the Spammers feel successful
Spammers will put in a surprising amount of effort to get their posts past your spam filter. The harder you fight back, the harder they'll try. Once they've found something that works, however, they'll sit back and watch the posts flow. That's the place you want them, happily sending post after post into your Spam corpus and training your Bayesian filters.
A happy spammer is a spammer who's not going to spend any more time trying to work your system. A happy spammer is reporting success to his boss and costing the bad guys money. A happy spammer is constantly teaching your filter about new trends in the spam world so that it can do its job better.
You want to cultivate a community of happy spammers on your site.
Discuss on hacker news