OAuth 2.0

I've been updating my OAuth library to support OAuth 2.0 mostly so I can add Facebook to Announce.ly and Sproozi, but more on that later. OAuth 2.0 is similar to 1.0 but changes a few key things fundamentally and isn't backwards compatible. What's wrong with 1.0, doesn't it work? It does, but probably the biggest issue is the fact that you have to sign the message knowing all it's content beforehand. This works well if the data is on the querystring in a GET request or for simple operations but isn't optimal if your data is part of the POST body. It also means you have to construct your requests in a certain way, which is a bad thing. Take photo, audio or video data - to post that you'll need to sign the whole request and it's not clear how it should work with multipart data. There are several extensions to the spec that deal with some of these issues, but the fact that there are non standard extensions to do something pretty standard kinda says it all. Even if you're not dealing with these issues you still have to work with your requests as units where you know the whole content beforehand. What's new in OAuth 2? OAuth 2.0 in it's simplest form works over HTTPS connections and simply asks for a token - the security and trust are built in to the protocol. It's that easy. OAuth 2.0 sill lets users sign messages to transmit them over insecure channels, plain HTTP, but the signing methods are much easier to implement. Gone is the complicated parameter normalisation algorithm and in it's place is a much simpler version that doesn't require POST data in the signature. So even with multipart submissions it should just work. At the moment I'm cleaning things up and preparing the oauth library to work with oauth 2.0 and changing the way it works to reflect the simpler way oauth 2.0 does. You can check it out on GitHub [http://github.com/andrewmccall/oauth]

Another Open Source Library.

I'm having a bit of a clear out, taking a look at some of the code I've written and I've been pushing some of the stuff I'm currently using up to GitHub under and Apache 2 licence. I've used things in Announce.ly, Sproozi and some other small projects and figure they may be useful to someone else. My only criteria has been to ask If I'm using it now in a project, if so I'm actively supporting it and I've started pushing that stuff to GitHub, everything else is dormant and I don't want to release something I'm not actively supporting- it also occurs to me that if even I'm not using it, it can't be all that worthwhile. I've just pushed some code I've been testing for a few months in a couple of projects to GitHub. It's an accounts package written for Spring, that ties my oAuth library and Twitter together with either Hibernate or Hbase as backend storage. In it's simplest form when you login with twitter it creates you a new user and persists it and the oAuth access tokens you need to act on behalf of that user. I'll write some more about it, better documentation and probably throw a little more code up on GitHub over the course of the next couple of weeks as and when I get a chance.

My new open source Java OAuth library

I've just pushed out a new open source java OAuth library because I couldn't find one that did what I needed. My key requirement was simplicity. I didn't like the idea of using the library for HTTP stuff and there is no reason I should. Once I've obtained the Access Token all I'm doing with oAuth is signing my requests. I want to use HttpClient directly and only use the oAuth library to sign the message for various reasons not the least of which being that I already have a HttpClient object setup in my IoC container. The closest I found was signpost but it wasn't very IoC friendly or thread-safe which meant every time I wanted to make a call I'd have to create new objects, or at the very least call a bunch of methods to set them up which highlights the third problem, there were no clear objects that I could store for later. The library I've just release is a fork of the signpost code, that's now thread-safe and should be more IoC friendly. You create your method calls as you would normally, and just before you call HttpClient.execute(HttpMethod) simply call OAuthConsumer.sign(HttpMethod, AccessToken);. I've added a few new objects that handle most of the work. Service, RequestToken and AccessToken are all beans that you pass to a consumer depending on what you want to do. Starting with a Service you call
Service service = new Service();
service.setRequestTokenUrl("http://twitter.com/oauth/request_token");
service.setAccessTokenUrl("http://twitter.com/oauth/access_token");

service.setConsumerKey("b8sA385mBBNqOTD6Omlsw");
service.setSharedSecret("MD4Sve6AdaDasjdvOAsbpAJsA87S8s64e5rE4");

service.setMessageSigner(new PlainTextMessageSigner());
service.setSigningStrategy(new AuthorizationHeaderSigningStrategy());

RequestToken requestToken = oAuthConsumer.getRequestToken(twitter);
You'll have to send the user off to twitter to check their credentials. When they come back they'll be given a verifier set it and trade the request token for an access token
requestToken.setVerifier(verifier):
AccessToken accessToken = oAuthConsumer.getAccessToken(requestToken);
Now you can store the accessToken to use later, when you want to simply setup your http method as you would normally.
HttpUriRequest request...
// do your HttpClient stuff here

oAuthConsumer.sign(request, accessToken);
HttpResponse response = httpClient.execute(request);
There is also code in there for the Jetty HttpClient, but it's a bit rough and I haven't used it. Have play with it and let me know what you think. UPDATE: Forgot to link to it... Dumb. It's on GitHub here.

Downloading maven dependency source jars

I've been working on a new project that I'm planning to open source real soon - stay tuned. When I'm implementing interfaces in a dependent package using Idea/Maven I want to tick the "copy javadoc" button to at least have the documentation from the intereface. The issue of course is that I don't have the sources. Run the following command: mvn dependency:sources Maven will download any sources it can find for in remote repositories for your dependencies and Idea finds them like magic. So now not only can you copy javadoc, you can also click the line number in the stack trace and get something meaningful - not "compiled code".
Media_httpimgzemantac_eaesq

Scaling up vs scaling out

Jeff Atwood goes into some calculations about the cost of scaling up vs scaling out and makes an interesting point, it quickly becomes impractical if you're not using open source software. I think Jeff slightly missed the point though, it's not about open or closed source, it's that scaling out is simply impractical if you're paying traditional software licences. This is something we came across when building Sproozi. If we wanted to store petabytes of data and run hundreds or thousands of concurrent processors there was no way we could ever afford to do it on machines running windows we were paying for by the box. But it's not because we'd have to pay for software, per se, it's how we'd have to pay for it. Software has traditionally been licensed by machine, when machines got bigger vendors wanted to cash in so the licences got a little bigger. They had to cover their losses when you threw a few new processors in the machine rather than getting a new one to put alongside after all. It has always been in their best interest though for you to get a bigger box than to get more cheap ones - scaling out is very hard and the software doesn't do it well. Most RDBMS just can't do it well and they certainly can't get anywhere near the the scale of something like Hadoop. If you want to scale out, forget SQL servers, you need software that's going to scale out. But let's forget the specific software for the time being and just assume that the big boys (MS, Oracle, IBM) will have a scaling out solution soon - don't worry this isn't going to kill them, but it will change them. They will still want to licence an operating system and a data storage and retrieval system to you. What I'm almost positive you're going to see is these companies introduce new pricing schemes to meet the needs of the cloud, they have to or they're going to lose all that revenue to the open source projects that have a head start on them. Just look at EC2, you can already provision MS and other software and I think that's a trend that's just going to continue. So while Jeff is right that if I want to buy as many cheap boxes as I could for the hardware cost of a big iron server and put windows and SQL on them and it would all cost a small fortune. It's not really a fair argument, you're taking an old big iron way of thinking and trying to apply it to the cloud. What it fails to take into account is how much more powerful your new cloud cluster is than the big iron box, let the software vendors figure out the economics of making their software an attractive ROI when compared to OSS because if they want to compete in the cloud they're going to have to.
Media_httpimgzemantac_tnsse

Open Source for Business

I didn't get this posted yesterday because the Internet crapped out in our area. Nothing but excuses, I know. I've been working beyond the bleeding edge, using a version of the Nutch code that's not even made it into the Apache SVN for the project yet. To celebrate the fact that my contributions will make it in I figure its a good time to get into open source and business. To put it briefly, and as you can probably guess, I'm pro open source. I use it extensively and I push back as much as I can. When it comes to the most of the code I write there really isn't much commercial benefit in keeping it hidden so it just makes sense to give back. There are two types of business on the web, one where you provide a software service and that is the product and others where you provide access to data. It's pretty easy to tell which camp you're in. 37 Signals for example, they're in the first and their software probably isn't something they just want to let people download - unless they're incredibly brave. Doing that would mean that they'd be competing on margins for the cheapest hosting, users would flock to the cheaper services, have a poor experience and blame the software. Sproozi on the other hand is the second type, our data is what users mostly care about and we're not planning to be precious about our code. I've already been pushing some of the changes I've made to Nutch back into the project and we're planning open source projects of our own in the coming months. One of our plans we have is to build iPhone, Andriod and other phone based applications for our service and release them as open source projects. We're planning to write them (or have them written for us) and release 'official' versions. Then release that code as open source project to provide a framework for developers so that they can build great things from it and on our API. If there are any experienced iPhone, Android, Blackberry, Symbian or Pre developers out there that want to get involved, drop me a line were a ways off yet but would love to chat about it and get some very early feedback.
Media_httpimgzemantac_slfif