• How Bad Data Happens

    A lot of what we do as data engineers is fix things that have broken. Either we’ve been alerted to something having failed with that job again. Or some user or another is telling us that some value for a record is wrong. How did we get here? When you’re...


  • Hive, managed vs external tables

    One of the things that comes up often in conversations about Hive is using managed vs. external tables. What are managed tables? Managed tables are Hive tables where Hive manages the data; Hive stores the data internally in it’s own warehouse directory and generally you wouldn’t interact with the data...


  • Announcing fn, a serverless framework

    For the last couple of months I’ve been working on fn (pronounced fun), a serverless framework. It’s been slow going, but hopefully I’ll get some more time to spend on it over the coming months. Serverless frameworks like AWS Lambda look really good, the problem is they’re tightly tied to...


  • Installing Java 8 on OS X 10.10 Yosimite

    So I’ve been running OS X 10.10 for a while and tonight decided I’d try to install the Java 8 JDK (JDK8u05) and have a play for a project I’ve been messing around with. Unfortunately I got this: A quick google search and the best I could come up with...


  • Validation 1.0.0 released.

    I just pushed my little JSR-303 validation lib out as a 1.0.0 release. The JSR-303 is not 1.0 so it seemed like a good idea. The biggest changes are probably the ones needed to support those. If you were using the library you won't notice any difference, the changes are all internal and...