Tuesday, July 29, 2008

Tripping into the valley of fail

C# 2.0 introduced nullable types to the language (apparently late in the dev cycle - more on that soon), something that I could have used way back when.

I know, LtU duders - nobody can prove that we really need null and it's a terrible idea. Or an OK enough idea in the absence of rigorous mathematical proofs but, and don't let nobody in on this, I nearly flunked out of my 9th grade math class (which was really the advanced 10th grade math class; I can't explain it neither) because I could not prove my way out of a paper bag. Calculus eluded me and vector math haunts my nightmares. I'm no math pro and this is a blind spot I'm all too aware of.

But null's really useful, honest.

In our application, we have to deal with dates that the user's supplying. Since this is client-side input, we have to deal with two possible fundamental problems with the dates. Did they forget to enter a (mandatory) date? Did they enter an invalid date?

I mention dates because they're (in .Net) a value type. Like other value types, once you create the variable, they're automatically assigned a value as opposed to reference types, which are null until you instantiate them (the pointer, she points nowhere). And then there's string, which lives in a state of sin in the gray area between value and reference (it's a reference type but when you pass/copy it, you get a copy of the string rather than a pointer to the object). Officially, a string's an "immutable reference type" (thanks, Google!). Tangentially, are there other immutable reference types in the .Net framework? Damned if I can think of one.

So what's any of this have to do with nullable values?

The user forgot to enter a date. OK. What value should we use to represent the fact that that the user entered an invalid date? DateTime.MinValue, maybe? Sounds reasonable.

That's covered, so on to fundamental bloops number two - the user entered an invalid date. Hmm. DateTime.MinValue's already been recycled as a magic number to represent "missing date", so we'll use DateTime.MaxValue. Game, set, ma... oh, wait.

You mean we'll have a need throughout the application to use DateTime.MaxValue to represent things that are open-ended?

Now we've got a problem. Do we want to pick out a second magic number to represent the fact that the user entered an invalid date? Maybe treat an invalid date and a date that hasn't been entered identically? Maybe we want to wrap dates in a struct that contains a DateTime and booleans for invalid/missing date?

Hmm. That code's starting to stink pretty badly. There's got to be a better way.

That's where I was hoping nullable types would come to the rescue. Where previously we had to use magic numbers to represent error states in our values, we can now use a not-so-magic value - null.

Suddenly, we're not looking to wrap things in a struct and perform all sorts of acts that no shower will ever quite rinse off. User forgot to enter a value? Null. Invalid date? DateTime.Minvalue. Take in the win.

Did I mention that nullable types came late in the development cycle? Let's take a look at a little code and see how it works out.

string nullStr = null;

nullStr.ToString();

Trying to execute a method on an uninitialized variable - what do you get? A NullReferenceException. No-brainer.

int? nullInt = null;

nullInt.ToString();

Let's try the same thing on a nullable integer. We should get the same thing, right? Wrong. It returns string.Empty. To get the same result, you'd have to say...

int? nullInt = null;

nullInt.Value.ToString();

What the shit? Nullable types have a .Value property? Pro move, guys. Way to leak the fuck out of that abstraction.

Truth be told, ToString() not horking a NullReferenceException doesn't bother me that much. It's the unexpected coalescing of a null value that gets me. I set out with my golden hammer to create a new li'l method called ToStringOrNull() and hang it off of Nullable<T> that does what I'd have designed ToString() to do in the first place - return a null string if the value's null and call the generic ToString() function otherwise. But I can't attach a constraint to that function because Nullable<T> is a structure, not a class. Fail, fail again.

Polluting the namespace with this feels wrong, so what do I do?

Tell other developers to always use nullableType.Value.ToString() and hope that nobody slips up?

Add bunches of tests to our increasingly tag soup-y MVC app (and hope that nobody forgets to do it)?

Not good times. Small inconsistencies pile up until you're so busy bookkeeping for them that you can pretty easily lose sight of the bigger picture. Either that or you grow your Unix beard out and spend your days using your phallus to point to chapter and verse for the reference specification for your language of choice on Usenet. The latter's not an option for me since I can't grow a beard and the former ain't pretty neither.

I'm hoping that I can sneak in some elegant solution to calm this jittery behavior, but I've got no idea what it'll look like.

I just wanted to give a special shout-out to whoever for the head-scratching behavior. Wait. Is the person behind [DebuggerStepThrough] behind this? By all that is unholy, I will get you for this. These. Whatever.

Sunday, July 20, 2008

Unit testing - prefer messages

tl;dr version - if your unit test tool lets you associate (informational) messages with your test assertions, use the fuck out of them.  It's great that you're driving towards 100% code coverage.  How much greater will it be in 2 months when someone (probably you) breaks a test and has a useful indicator of what exactly was being exercised rather than trying to puzzle out a simple assertion? 
I'm sadly new to the unit testing game, so I've been learning the wrong way to do things at an astonishing clip, while mistakenly stumbling over things that work by accident every now and then.
I never quite understood the hubbub over unit testing - why do I want to do extra work that doesn't go towards getting a working product out the door?  Now that I'm writing oodles of unit tests, I understand exactly why I want to write them - they save my ass early and often.
Case in point - the object to XML mapper (this isn't NIHS; I genuinely can't use LINQ to XML because the hierarchy the external service produces is not only unpublished but subject to change) I'm writing.  It's been working, but I noticed that it was... how shall I say... less than performant?
So I set out to start refactoring critical sections of the code.  I started by gut - I started taking FxCop up on its suggestion to use IXPathNavigable and knocked bunches of stuffs out.  Minor improvements.
Then I stopped programming by guesswork and profiled a generic run pattern.  Creating objects (with objects creating other objects), updating the persistent XML store, blah blah blah.  Found the genuinely astonishingly slow parts of my code and broke out the chainsaw to fix them up.
For a change, I had a really, really high level of confidence in all the changes that I was making.  Before unit tests, it was just more guesswork as to what I might be breaking outside of the code that I was touching just by looking at it funny and changing this postfix increment to a prefix increment (OK; that's an exaggeration, but you kinda know what I mean).  Now that I've got unit tests in place, it's a whole different story.  I can try things out and see if anything breaks in real time.  If the coverage is good enough, I've got silly confidence that everything's on the up-and-up.  If it isn't, whatever.  Adding a few more tests isn't moving mountains.
But a strange thing happened along the way - unit tests that I'd written a month or two earlier started breaking.  Even stranger, I had no idea what some of them were doing.  Not many of them, but I was clueless about the provenance of some of the tests that I'd written a couple of months earlier.
That sends up red flags for me - there's still value in having those unit tests, but I can recognize that if I don't have a little more context associated with them, they're going to bit rot really, really fast.  I started by putting comments above the tests explaining what they were doing, but that felt kind of unsatisfying.
I find that when I write unit tests, I slip into a lightweight QA state of mind - I think less about the cases that should work and more about the edge cases, the awkward states that I can put my code into to get it to break.  It gives me a chance to stand back and re-examine the code from that stance as well as getting a feel for how easy the class is to use, since I have to instantiate objects (and everything else in its dependency graph) before I can start to test it.
The time that I'm thinking about what the class is doing for me and how to use it lends itself naturally to embedding context in the tests.  Not simple messages like "validating that CountryCode gets populated when the object's hydrated from XML" but "validating that nullable enumerations are being populated properly."
Prefer messages in the unit tests you write.  They'll help you make the most out of your unit tests as you write them and they'll help you understand your unit tests when they break down the road.

Sunday, July 13, 2008

Mea culpa

When I first started writing .Net code, I was all about implementing IDisposable because I figured that the GC wouldn't be as smart, as fast, as efficient as the stuff I could write. I mean, sure - they optimize for the general case, but who knows better than I do just when to free memory and resources? Not some jackhole Microsoft programmer, amiritepeople?

Since those were the heady days of VC and no clients demanding things change yesterday, I actually spent half a day working with the clumsy spike I'd slapped together and let it fly - it worked well enough under load, so I was happy. Then I ripped out my destructor and let it roll again - I was figuring I'd see the CPU thrashing as .Net's garbage collector did its thing, working on the general, sub-optimal, case it was written for. No egghead knows better than I do how and when this should run!

Except that there was no difference that I could see. If anything, run time was a little faster and memory overhead was a little lower. I mean, probably statistical noise faster and lower (that Excel spreadsheet's been lost to the river of time at this point), but that was a pretty well-defined zenslap moment for me.

I thought about it a little and realized that oh yeah - that garbage collector. I'm not allocating memory either. People way smarter than me have already implemented a garbage collector so I don't have to worry about allocating and freeing memory on the fly. The bold promise of distilling your codebase down to actual business logic rather than bookkeeping allocations and all that.

It's a solved problem, so why am I solving it again, only invariably worse this time? Maybe it was written for a "general case" (whatever the fuck that means because I obviously can't defend it) but it was a pretty good general case.

This all came flooding back to me on Friday. I'm working on a bit of code - an object-XML mapper. This isn't as stupid as it sounds (I hope), honest. It's running, well. Not so good. I mean, it does what I want it to, just way way way slower than I want it to run.

One of the "optimizations" I made was ripping out a lambda expression iterating over a singleton (I know, I know) - I figured that there ain't nothing faster than a hand-rolled for loop with a break condition... right? But I wasn't making any headway with the other two offending methods after re-ordering my if block, so I decided that I might as well, you know, test it out to see how it performed.

I didn't check memory this time around, but damn if it wasn't just as fast as the for loop. Maybe a little faster, even.

Again, the zenslap - the framework's made by people way smarter than I am. I need to count on them to have done their homework and made stuff easy to use and scary fast.

Stop reinventing the wheel. I'd bark at co-workers who tried to roll their own second-rate mechanism for mapping objects to an XML hierarchy we don't control, so why am I confident in my ability to roll my own iteration loop? On some level, doesn't it make sense that smart people who get paid to work on iterators might find a way to wring a little more out of them?

It's not easy discovering that something so simple that you've taken for granted for so long (a for loop!!!) is halfway obsolete, but it's liberating once you get over yourself and embrace it.

So here's to you, whoever implemented .Any() - you did a helluva job. Way better than the jackass who shat out [DebuggerStepThrough].