Archive for the 'code' tag
OAuth 2.0
Posted on May 20, 2010
I’ve been updating my OAuth library to support OAuth 2.0 mostly so I can add Facebook to Announce.ly and Sproozi, but more on that later. OAuth 2.0 is similar to 1.0 but changes a few key things fundamentally and isn’t backwards compatible.
What’s wrong with 1.0, doesn’t it work?
It does, but probably the biggest issue is the fact that you have to sign the message knowing all it’s content beforehand. This works well if the data is on the querystring in a GET request or for simple operations but isn’t optimal if your data is part of the POST body. It also means you have to construct your requests in a certain way, which is a bad thing.
Take photo, audio or video data – to post that you’ll need to sign the whole request and it’s not clear how it should work with multipart data. There are several extensions to the spec that deal with some of these issues, but the fact that there are non standard extensions to do something pretty standard kinda says it all.
Even if you’re not dealing with these issues you still have to work with your requests as units where you know the whole content beforehand.
What’s new in OAuth 2?
OAuth 2.0 in it’s simplest form works over HTTPS connections and simply asks for a token – the security and trust are built in to the protocol. It’s that easy.
OAuth 2.0 sill lets users sign messages to transmit them over insecure channels, plain HTTP, but the signing methods are much easier to implement. Gone is the complicated parameter normalisation algorithm and in it’s place is a much simpler version that doesn’t require POST data in the signature. So even with multipart submissions it should just work.
At the moment I’m cleaning things up and preparing the oauth library to work with oauth 2.0 and changing the way it works to reflect the simpler way oauth 2.0 does. You can check it out on GitHub [http://github.com/andrewmccall/oauth]
Continuous Deployment
Posted on February 23, 2010
I’ve been working lately trying to implement some of the things Eric Ries talks about. Primarily the idea of continuous deployments for lean startups and getting away from the fear of release.
On the surface and looking back on some of the disastrous releases I’ve seen (deployed) it’s a frightening concept. The problem with most of those releases though hasn’t so much been that there was a bug or a problem, it was there was no fall back position – the release was made and there was no way to unmake it. Also working in Java releasing WARs has been a pretty big issue – you have to upload the whole release to fix even a small bug like a typo. One bug in these cases can mean you’re 15-20 minutes from uploading a fix.
It’s a very bad place to be and I want to avoid it at all costs in the future, both the problems and as a result the fear of a release.
With that in mind I’ve been fashioning a system for Sproozi where a release is made as an exploded war using Hudson. On commit it’s tested, deployed to a single server in the cluster and tested some more. The server joins the main cluster and it’s monitored as the release is pushed to the rest of the servers to make sure it’s stable.
The only problem I’m having conceptually is how do I treat a commit that comes in before a release is finished – I guess I have to sit on it, but what if the release fails and I have to lock the repository? What about if the release is a success but in the meantime I have 2, 3 or even more commits that I’m sitting on? Do I deploy them one at a time until they’re either all successful or one fails?
How has anyone else practising dealt with these issues?
EC2 Spot instances
Posted on December 14, 2009
The thing I really like about Amazon’s cloud stuff is they’re constantly undermining themselves with new innovations – spot instances are another great example. Taking the utility metaphor a step further you can now rent their services when nobody else is for cheaper, like buying electricity at night.
A lot of the tasks I’m envisioning for Sproozi aren’t really time dependent. While it’s important to show you a page in a timely manner, crawl and index a brand new website you add quickly and basically be interactive there is also a lot I have to do in the background. The huge and growing list of URLs people add all need to be re-crawled and re-indexed regularly is just one of many examples of processing vast amounts of data. These tasks are always running, always in the background.
Spot instances are a perfect fit – I can bid the price I want on extra capacity spin up some extra instances to join the cluster when they’re cheap. Over the next few weeks I’ll probably try to add some spot instances to my Hadoop dev cluster and see what happens.
Useful commit messages
Posted on November 20, 2009
I realised that I’m not very good at this when I looked at my last commit for this theme. The message was “added a few things” a message I find myself using when I have nothing else to say, or if I’ve done too much work and either can’t remember all of what I did or be bothered writing it all out.
It’s bad form though, and I’m trying to learn my way into using git properly to make it happen less. Basically I’m still trying to find the right place between a commit for a unit of work and my previous more traditional view that all commits must compile and pass tests. At the same time I’m trying to reconcile that with a Kanban board way of organising what to do.
How does everyone else commit? When, how often and what ‘rules’ do you apply?
Stop asking questions.
Posted on July 19, 2009
I’ve had an ongoing conversation with a few people over the course of a few projects about the questions to ask when people sign up on the web or what hoops you make them jump through.
Ask most people what they need to know about user’s and you’ll end up with a list of information from the useful like name to the truly useless like job title or title. Give me one good reason for either. Are you going to be sending Mr John Smith or Dr. Sarah Jones a letter using their formal titles?
When it comes to things like Job Title, the reason I usually hear is that it’ll be useful for marketing and demographics later. Maybe, but I’ve never actually seen that later come to anything. So I don’t buy it.
If the signup form is the depth of your engagement with your user you might as well drop the pretense of a web application and go back to sending newsletters. Just don’t do it. Ask your user the bare minimum number of questions you need to get them started using what your product does now and stop throwing up barriers to entry.
Aside from just the questions that are asked there are other things, the hoops almost every site makes you jump through before you can start using it. Probably the best example is the ubiquitous confirm your email step. Do we need to be doing this? In some circumstances I’ll admit it makes sense, if your delivering content via email for example or signing up for a mailing list. Of course you want to make sure the user wants what you’re sending them.
But for every other site, I’m not sold. It’s just another step they have to take to start using your service and all the little steps all add up to users hating signing up for new things.
Also it begs the question, why even bother? For 99% of websites the only reason to make sure a user has entered their email address correctly is so that you can let them recover a password. You’re adding the step for people that have lost their password and haven’t managed to get their email right. That’s going to be a fairly small proportion of your users; even less if you let login with their email address, then they’ll know it’s wrong anyway as soon as they try it.
Do we pander to the idiots or make things as quick and easy for everyone else? I’ll take quick an easy any day.
I’m finding myself not just asking what any given field in a form is for and making it justify it’s self, I’m trying to figure out how to get by without it. I’m leaning towards going beyond just the bare minimum and into the realm of how can I modify this service to work with even less.
Scaling up vs scaling out
Posted on June 24, 2009
Jeff Atwood goes into some calculations about the cost of scaling up vs scaling out and makes an interesting point, it quickly becomes impractical if you’re not using open source software. I think Jeff slightly missed the point though, it’s not about open or closed source, it’s that scaling out is simply impractical if you’re paying traditional software licences.
This is something we came across when building Sproozi. If we wanted to store petabytes of data and run hundreds or thousands of concurrent processors there was no way we could ever afford to do it on machines running windows we were paying for by the box. But it’s not because we’d have to pay for software, per se, it’s how we’d have to pay for it.
Software has traditionally been licensed by machine, when machines got bigger vendors wanted to cash in so the licences got a little bigger. They had to cover their losses when you threw a few new processors in the machine rather than getting a new one to put alongside after all. It has always been in their best interest though for you to get a bigger box than to get more cheap ones – scaling out is very hard and the software doesn’t do it well. Most RDBMS just can’t do it well and they certainly can’t get anywhere near the the scale of something like Hadoop. If you want to scale out, forget SQL servers, you need software that’s going to scale out.
But let’s forget the specific software for the time being and just assume that the big boys (MS, Oracle, IBM) will have a scaling out solution soon – don’t worry this isn’t going to kill them, but it will change them. They will still want to licence an operating system and a data storage and retrieval system to you.
What I’m almost positive you’re going to see is these companies introduce new pricing schemes to meet the needs of the cloud, they have to or they’re going to lose all that revenue to the open source projects that have a head start on them. Just look at EC2, you can already provision MS and other software and I think that’s a trend that’s just going to continue.
So while Jeff is right that if I want to buy as many cheap boxes as I could for the hardware cost of a big iron server and put windows and SQL on them and it would all cost a small fortune. It’s not really a fair argument, you’re taking an old big iron way of thinking and trying to apply it to the cloud. What it fails to take into account is how much more powerful your new cloud cluster is than the big iron box, let the software vendors figure out the economics of making their software an attractive ROI when compared to OSS because if they want to compete in the cloud they’re going to have to.
Related articles by Zemanta
- Hadoop Summit: We Have 10 Tickets to Give Away (gigaom.com)
- Watch out, Oracle: Google tests cloud-based database (computerworld.com)
- Yahoo Releases Internal Hadoop Source Code (techcrunchit.com)
Open Source for Business
Posted on June 8, 2009
I didn’t get this posted yesterday because the Internet crapped out in our area. Nothing but excuses, I know.
I’ve been working beyond the bleeding edge, using a version of the Nutch code that’s not even made it into the Apache SVN for the project yet. To celebrate the fact that my contributions will make it in I figure its a good time to get into open source and business.
To put it briefly, and as you can probably guess, I’m pro open source. I use it extensively and I push back as much as I can. When it comes to the most of the code I write there really isn’t much commercial benefit in keeping it hidden so it just makes sense to give back.
There are two types of business on the web, one where you provide a software service and that is the product and others where you provide access to data. It’s pretty easy to tell which camp you’re in.
37 Signals for example, they’re in the first and their software probably isn’t something they just want to let people download – unless they’re incredibly brave. Doing that would mean that they’d be competing on margins for the cheapest hosting, users would flock to the cheaper services, have a poor experience and blame the software.
Sproozi on the other hand is the second type, our data is what users mostly care about and we’re not planning to be precious about our code. I’ve already been pushing some of the changes I’ve made to Nutch back into the project and we’re planning open source projects of our own in the coming months.
One of our plans we have is to build iPhone, Andriod and other phone based applications for our service and release them as open source projects. We’re planning to write them (or have them written for us) and release ‘official’ versions. Then release that code as open source project to provide a framework for developers so that they can build great things from it and on our API.
If there are any experienced iPhone, Android, Blackberry, Symbian or Pre developers out there that want to get involved, drop me a line were a ways off yet but would love to chat about it and get some very early feedback.
Related articles by Zemanta
- Android Application Development (oreilly.com)
- Palm Pre will debut with only a few apps available (computerworld.com)
- Yahoo! breeds Pig that talks elephant (theregister.co.uk)
Reverse HTTP and the cloud
Posted on March 13, 2009

- Image via Wikipedia
I recently read the IETF draft RFC for Reverse HTTP, and it looks like a pretty simple and elegant solution to a number of problems I’ve seen, especially with the move to cloud computing.
The cloud brings with it some great possibilites but with them some great challenges. Computing on demand is great, if I need more power for a computationally intensive task I can just spin up a few instances for as long as I need them and shut them down when I’m done. Great in an ideal world, but RPC, cluster management and many tasks you’d have to take to run nodes in the cloud can be troublesome.
Apache Hadoop for example, is a great, free, opensource Map/Reduce framework but it makes assumptions based on a traditional datacenter full of real hardware that is always there view of the world. One of the biggest and most troblesome for the cloud is the fact that a master needs to be aware of the slaves before they try to connect. Implementing access controls in a secure manner for nodes connection is no small task because the whole system, from end to end is based on a custom client/server model written specifically for the task.
I’m not singling Hadoop out here, just using it as an example because it’s well known and I’m familiar with it.
Let’s take a very simple API, imagine there is no cluster, just one node. A client submits a job to the server, the server processes it and returns the result. Now let’s make it a little more complicated, let’s make it a Map/Reduce job and add a few nodes to the cluster. As far as the client is concerned the same thing is happening. They’re just submitting the jo to the server and it’s handling everything else, it breaks the job down into work units, submits them to the nodes in the cluster, all the results are merged together and passed back to the client.
In order to implement this you’re going to need at least a basic client/server API between the master and each slave. You could do it using traditional HTTP but you’d run into a scalability issue, imagine you have 10,000 nodes in your cluster. The server is going to need to have 10,000 open HTTP connections and each of them is going to have to poll the server at fixed intervals just to ask “Any work boss?”, “Nope, not at the moment. Take 5.” Sure you could increase the interval between asking, but 10,000 nodes doing nothing for 30 seconds is almost 3 1/2 days of computing power wasted.
To get around the problem you’ve got to design your architecture to push jobs to the nodes as soon as they come in. Which means writing your own client/server architecture and your own access control mechanisms amongst other things. If we flipped things around though, and the slave connected to the master over HTTP and then told the master it wanted to be the server we’ve achieved exactly what we wanted. The master knowing nothing about a slave, can now interact with the slave as if it were a client and it can submit a job as soon as it comes in.
An added benefit, the master/slave API can be the same as the user/master! After all, the master would be doing almost the exact same thing on a slave as the user is doing connecting via HTTP and submitting a job. No more custom client/server and vastly simplified code.
It would be easy to make it even more robust and allow for multiple tiers of masters and sub nodes. Just add a call to the API which asks the server how many slots it has free for jobs. Useful to a user from a management perspective, but also it would allow the master to partition the work into chunks based on the cluster size and based on the number of nodes served by any particular master. This would also be useful in terms of best use of resources given network topology issues – not all nodes are in the same rack or even datacentre.
Add to this the simplicity and power of simply adding something like HTTP AUTH-DIGEST at the server end and you’ve got ready made access controls. One certificate for clients, one for slaves. Clients can submit jobs, slaves get the work and there is no real need to know of anything about a slave before the first time it connects.
Why this is better than something like XMPP I can hear you asking. It’s not better. Not for any real reason, and yes it has some cross over in functionality with other technologies that are already out there. In the right situations though, it gives developers the option to simply things, and that’s never a bad thing.
Related articles by Zemanta
- A new HTTP header that might be useful (clubtroppo.com.au)
- Cloud platforms of the future: Hadoop and Eucalyptus (news.cnet.com)
Openness, strength or weakness?
Posted on October 15, 2008

As I’ve announced here, we’ve recently taken the difficult decision to put development of Citrus in hold, indefinitely (let’s be realistic, probably forever). I’m now wondering what if anything we should announce about our new project.
Is openness a strength or a weakness? Is there value to the community in publicly documenting our design decisions, and how we’ve come to them? Is it something our competitors could and would use against us? Is there anyway to mitigate the effects?
I like what Jeff Atwood and Joel Spolsky have done with their series of podcasts on the Stack Overflow blog where they’ve released their design meetings as a series of podcasts and given us all an insight into how they’ve come to some of the decisions they’ve taken. But let’s face it I’m neither Jeff or Joel and I don’t get anywhere near their traffic. So the first obvious question I’m asking myself is if it’s even worth the effort of writing posts, let alone recording and releasing podcasts.
Next is the question about maintaining a competitive advantage, given the fact that our team is so small, just the two of us at the moment; I’m only working about half time on the project and Simon, who isn’t a coder is working even less. So giving someone a guide to competing with us could be downright foolish. Then again, nobody is really listening are they? So does it even matter?
The other question, which I think is equally important as the project grows – hopefully into a product is what if any level of openness is appropriate post launch? Will an open discussion of ongoing design decisions as they’re being taken before, or even as they’re released be of any benefit to the community? Communities can be some of the biggest supporters and can be some of your most passionate users – sometimes these passions can work both ways? What level of support ongoing will a public discussion of the decisions you’re taking, before users have all the prototyped interfaces to play with, be required? Is there an ongoing effort you’ll need to make to maintain the level of trust and confidence of the community?
I think I’ve talked myself out of a podcast of our internal discussions, for now – but I’m still keen to use community interaction to help us develop a product that meets the needs of the people who are interested in using it and to drum up support and interest in the project. I’ll no doubt return to this topic in the coming weeks as I develop a clearer idea of how I think we should accomplish that.
If any of the few of you out there listening have any questions or comments, please let me know your thoughts.
Related articles by Zemanta
Interfaces first
Posted on September 30, 2008
Don’t write spec documents, throw away any you have near you, they’re next to useless anyway. It might seem like a hearsay to some but it’s true and deep down even the most most anal planner in all of us knows it. Two of the biggest reasons for this are:
- size and scope: In order to address every aspect of your project a full specification document has to, well address every aspect of your project and it’s going to be huge as a result.
- lack of vision: For most people relating what’s written on paper to a vision of a design in their heads is difficult if not impossible.
Loose documents don’t work well either, because leaving the obvious unsaid just leads to problems, chiefly not everything is obvious to everyone. The devil is in the details and avoiding them until the end of a project is just going to cause trouble.
“Why haven’t we implemented x, that should have been obvious!”
create the interfaces
Prototype the interfaces and use them as your spec. Working this way has so many benefits, try it once and I promise you’ll be converted. Before you write a single line of backend code, sit down and work out your interfaces. Build each screen, put all the buttons, fields and elements on the page
I normally start with index cards and write a few simple requirements for each screen or page – just a few simple sentences listing what the page is expected to achieve.
My next step is to turn those requirements into interfaces.
in HTML, not photoshop
Photoshop is great, but let’s face it it’s no better for trying to work out a functional interface prototype than some paper and a pencil. It’s very easy to focus on making things look pretty over making them functional and you risk missing something important out.
The best way to avoid both of these traps is to write your interfaces in HTML, mark them up semantically as they will be marked up in the release and use them as your working spec.
I know good design but I’m no designer
Another benefit of coding up your interfaces in HTML before you start writing code is you can hand them off to designers so they can make them look pretty as you’re working on making them work. I have a pretty basic idea of design principals and the tools – but I find it hard work. So with interface first development I don’t have to think about it.
let the testing begin
Another thing which is made infinitely easier with interface first development is functional testing, because your testers don’t have to wait for a release before they can start developing a test suite. You are using an automated test suite aren’t you? By generating interfaces as developers add functionality testers can already have been through the interfaces and written a whole series of failing tests.
updating later
The great thing about using HTML as your design and communication tool is that when it comes to making a change to a page you have the most up to date working copy there ready to modify in the form of your site. Just save the page, add the change to the HTML and you’ve got your interfaces ready for discussion and later implementation. What could be easier, faster or more clear for everyone?
it’s not a panacea
There have been times when I’ve been working with notoriously difficult individuals where nothing was going to satisfy. An exmaple that springs to mind is a multi month project where we produced an interface design early in the project. The design was a set of PDF images not HTML (see above), but the delivered product was a pixel perfect implementation of it. Only once the project was delivered was there any input on the design or functioning of the project – I’m not just talking things that a PDF can’t show you like what clicking a link looks like. I’m taking basic things like colours, layout and the size of elements!
Would a HTML mock up have made any difference, I actually doubt it. I take blame for miscommunication and resulting mistakes when it’s my fault but in this case even after the site was fully developed and deployed to a staging area for testing none of these issues were highlighted. It was only after the site was put into production that these “critical issues” were discovered.
![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_c.png?x-id=9db9c4cb-c224-45ff-9099-4862e96cfb15)
![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_c.png?x-id=b47b266f-e82c-456e-80c3-4a14b7d0272d)
![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_c.png?x-id=64ec7d45-39af-4e47-9311-3e12191fb228)
![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_c.png?x-id=93eacfca-1806-4aaf-b765-378d607b650a)
![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_c.png?x-id=a3f67971-6b8e-4853-a6f6-384cabf30248)