[[posterous-content:GHwyoxsqqljiFuBuxaIF]]
Image via Wikipedia
I recently read the IETF draft RFC for Reverse HTTP, and it looks like a pretty simple and elegant solution to a number of problems I've seen, especially with the move to cloud computing. The cloud brings with it some great possibilites but with them some great challenges. Computing on demand is great, if I need more power for a computationally intensive task I can just spin up a few instances for as long as I need them and shut them down when I'm done. Great in an ideal world, but RPC, cluster management and many tasks you'd have to take to run nodes in the cloud can be troublesome. Apache Hadoop for example, is a great, free, opensource Map/Reduce framework but it makes assumptions based on a traditional datacenter full of real hardware that is always there view of the world. One of the biggest and most troblesome for the cloud is the fact that a master needs to be aware of the slaves before they try to connect. Implementing access controls in a secure manner for nodes connection is no small task because the whole system, from end to end is based on a custom client/server model written specifically for the task. I'm not singling Hadoop out here, just using it as an example because it's well known and I'm familiar with it. Let's take a very simple API, imagine there is no cluster, just one node. A client submits a job to the server, the server processes it and returns the result. Now let's make it a little more complicated, let's make it a Map/Reduce job and add a few nodes to the cluster. As far as the client is concerned the same thing is happening. They're just submitting the jo to the server and it's handling everything else, it breaks the job down into work units, submits them to the nodes in the cluster, all the results are merged together and passed back to the client. In order to implement this you're going to need at least a basic client/server API between the master and each slave. You could do it using traditional HTTP but you'd run into a scalability issue, imagine you have 10,000 nodes in your cluster. The server is going to need to have 10,000 open HTTP connections and each of them is going to have to poll the server at fixed intervals just to ask "Any work boss?", "Nope, not at the moment. Take 5." Sure you could increase the interval between asking, but 10,000 nodes doing nothing for 30 seconds is almost 3 1/2 days of computing power wasted. To get around the problem you've got to design your architecture to push jobs to the nodes as soon as they come in. Which means writing your own client/server architecture and your own access control mechanisms amongst other things. If we flipped things around though, and the slave connected to the master over HTTP and then told the master it wanted to be the server we've achieved exactly what we wanted. The master knowing nothing about a slave, can now interact with the slave as if it were a client and it can submit a job as soon as it comes in. An added benefit, the master/slave API can be the same as the user/master! After all, the master would be doing almost the exact same thing on a slave as the user is doing connecting via HTTP and submitting a job. No more custom client/server and vastly simplified code. It would be easy to make it even more robust and allow for multiple tiers of masters and sub nodes. Just add a call to the API which asks the server how many slots it has free for jobs. Useful to a user from a management perspective, but also it would allow the master to partition the work into chunks based on the cluster size and based on the number of nodes served by any particular master. This would also be useful in terms of best use of resources given network topology issues - not all nodes are in the same rack or even datacentre. Add to this the simplicity and power of simply adding something like HTTP AUTH-DIGEST at the server end and you've got ready made access controls. One certificate for clients, one for slaves. Clients can submit jobs, slaves get the work and there is no real need to know of anything about a slave before the first time it connects. Why this is better than something like XMPP I can hear you asking. It's not better. Not for any real reason, and yes it has some cross over in functionality with other technologies that are already out there. In the right situations though, it gives developers the option to simply things, and that's never a bad thing.
[[posterous-content:vhGFzmuAwrDhpDzrdwhz]]