Saturday, April 19, 2008

Python Hack - Unicode Sniffer

Unfortunately, I sort of built the build system we use at work.

I say unfortunately because, since I'm the one that built it and I'm cheap, it was rolled with a minimum of brains and a maximum of free software.

On the database side of things, I use a handy little program called GSAR (global search and replace) that does what it sounds like it does - searches and replaces strings in files.  Unfortunately, it isn't exactly Unicode-aware, which means that files not saved as ASCII can slip through the cracks, causing subtle problems (depending on how subtle of a problem you see a missing piece of functionality being).

This normally wouldn't be a problem, but SQL Server Management Studio seems to flip a coin on installation to decide whether it's going to save files as ASCII or UCS-2/UTF-8/UTF-16 encoded by default.  Most computers in the office save files as ASCII, but there's a few that like to emit Unicode (especially when scripting out tables and stuff).

Like a hawk, or a ninja, or a ninja hawk, I quickly figure out who the offenders are and keep an eye on their check-ins.  I've got an RSS feed for the repository and, to be honest, they don't check in a whole lot of code so it's not a monstrous burden.  I keep an eye on check-ins anyway because, well.  I'm anal like that.

But why bother grepping the files manually when I've got a computer to do the grunt work for me?  I've been putting off knocking together a C# program to do it for me (why should I do it?  I'm not the one who sucks!), Ruby isn't so Unicode-savvy... but wait!  I'm a world-famous Python hacker now and Python knows Unicode!

So I put together a script that will recurse down the directory tree you put it in and, if it finds a file that isn't saved as Unicode, print its name out.  As a word of warning, it's not the greatest thing I've ever done and it catches binary files in its trawl line as well.  For my purposes, that's just fine - there's just text files living in the /Database portion of the central repository.  Now I can have it shut the build down when it finds the file straight-away.  Everyone goes home happy!  Except for the developers who got the bum install of SSMS and have to do some Save As... gymnastics every time they touch a file.

You can grab the little script here.  Feel the magic!  Feel the power!  Marginal utility!

Oh, and I updated the links to my world-famous marginal utility IsDebug for .Net 2.0, too.


4/23/2008 update - I've gone ahead and slapped the WTFPL on it. So do WTF you want with it now. Or don't. You won't hurt my feelings either way, honest.

No comments: