A Naive Bayesian Spam Filter for C#
Labels: example code, software
|
![]() |
|
|
|||||||||||
Wednesday, February 06, 2008A Naive Bayesian Spam Filter for C#
Human-powered comment spam has been piling up recently at Blogabond, so I spent a few hours putting together a C# implementation of Paul Graham's Naive Bayesian Spam Filter algorithm.
You can find a nice long-winded article along with the source code over at The Code Project. Let me know if you find it useful. Here's a link:
http://www.codeproject.com/KB/recipes/BayesianCS.aspx
Labels: example code, software Friday, March 16, 2007exampleCode != productionCode
Take a look at this little piece of code. It looks pretty innocuous, like it was taken straight from "Teach Yourself ASP.NET in 21 Days". Pull a list of Trips out of the database for a given user, and bind it to a select list. Nothing fancy. Teaches you a little bit about ADO.NET and databinding all in one place.
Figure 1., Book Samplepublic class ExampleCode : System.Web.UI.Page { protected HtmlSelect selTripID; private void Page_Load(object sender, System.EventArgs e) { if (!IsPostBack) { DataSet ds = new DataSet(); SqlConnection connection = new SqlConnection( "server=OurProductionServer;database=Payroll; UID=jimmy;PWD=j1mmy;"); SqlDataAdapter adapter = new SqlDataAdapter( "select * from Trip where UserID=" + Request["UserID"], connection); adapter.Fill(ds); selTripID.DataSource = ds; selTripID.DataTextField = "TripName"; selTripID.DataValueField = "TripID"; selTripID.DataBind(); } } }Imagine my surprise, however, when I walked in to a small software shop recently and found a whole project written with code like the above. What were these guys thinking? Are they seriously relying on this fragile, unmaintainable mess in a real software product? And then it dawned on me. Maybe nobody had ever told them that the little examples in the book are just that: Little Examples. For teaching purposes. Never intended for use in the real world. Come to think of it, it doesn't even tell you that in the book. It aught to be in block capitals across the cover of the book: WARNING: DO NOT PASTE THE SAMPLES FROM THIS BOOK DIRECTLY INTO PRODUCTION SOFTWARE!!! Somehow, it seems that this message never got through to a substantial portion of the software industry. Every time I see a "Senior Developer" writing ad-hoc SQL or referencing a hashtable with a string I just want to cry.
So what do we do about it? I guess we try to get the message out. Here is some code I copied out of the Blogabond source that is functionally equivalent to the above: Figure 2., Production Samplepublic class ProductionCode : System.Web.UI.Page { protected HtmlSelect selTripID; private int _userID; private void Page_Load(object sender, System.EventArgs e) { if (!IsPostBack) { _userID = StringConvert.ToInt32( Request[User.Columns.UserID], 0); if (_userID != 0) { PopulateTripList(); } else { // bail gracefully... } } } private void PopulateTripList() { selTripID.DataSource = Trip.GetByUserID(_userID); selTripID.DataTextField = Trip.Columns.TripName; selTripID.DataValueField = Trip.Columns.TripID; selTripID.DataBind(); } }Short and to the point. And obviously only the tip of the iceberg. This bit of code goes deep, but we can learn a few things just by looking at it:
Where do we go from here?Copy and paste code reuse is bad. Everybody knows that. Ad-hoc SQL is bad. Everybody knows that. Inline strings are bad. Everybody knows that.At least that's what I thought. But you know what? They don’t. And they should. And it's our job to tell them. Labels: best practices, example code, software |
||||||||||||||
| Copyright © 2008 Expat Software | ||||||||||||||