by Andrew on November 4, 2008
Previously I made some vague statements about location based services being the future of mobile. I just want to go back and touch on the point I made at the end of that post and highlight it a bit.
when I’m out there the data I want most is about out there
When I’m sitting somewhere waiting for someone I use my phone to check on any any feeds I’m following or to read my email, but I want so much more than that. The fact of the matter is location does matter, because but more often than not I want to know more than just where my nearest coffee shop or bar is; and I want to know more than just how to get there. I want the depth of information I can find on the internet, but I want to see it on a map and I want to see it relative to where I am.
I want more than is on offer right now with location based services. I want to be able to pull a richer set of information off the Internet and overlay it on a map. I’m not entirely sure what that richer set of data is, or how to display it - I just know it’s not what is out there now. It’s not just a glorified business directory, it’s not just directions to somewhere and it’s not just a simple news mashup.
Watch this space.
by Andrew on November 3, 2008
I’ve been faced with a bit of a problem lately, how to test the quality of results being produced from a changing dataset? It’s easy to write unit tests for components of an application, and to make sure they’re producing the expected results, but how do you test end to end?
Andraz from Zemanta asking the same question, How do you test a complex system that is trying to mimic being smart, last year.
when you have new content in the system, you get completely new related stories and you have to go back and have a human judge them. There is expansion of the evaluation data - as you add new tests you generally can’t send them through previous versions of your algorithms, since that would be prohibitely expansive. And there is statistics that hardly gives you overview over what exactly your changes caused, just few final numbers. And then there is the problem of pipelining the processing. Even if you improve the first stage, end results might be worse, since you’ve already adapted the second stage to previous first one. So you need to actually evaluate each part of the system in isolation and then together.
At the end you actually find out that you spend disproportional amount of time evaluating even the smallest changes. So you are in danger to just skip that evaluation which naturally you shouldn’t.
The fundamental problem you run up against is that the index is constantly changing, and it’s meant to change. So it’s hard to automatically test the output without a clear idea of what is going in. It’s also difficult to get an accurate picture of how small changes in code affect the general results if you’re just using a testing index with a small dataset.
One way to go about it is to gauge result quality based on measuring user interaction. Basically there are things users do when they get results they’re expecting and things they do when they haven’t found what they were looking for. So if you can get measure how they’re reacting, you can get an idea of quality.
At the moment I’m a lone developer putting all the data in the index, and I have a good idea of what I should be seeing out if it’s actually working. In the next while though we’re going to be rolling out to a few more internal beta users as we get a prototype system developed and we’re not going to be in control of the inputs or output anymore. So soon enough we’re going to actually be faced with trying to measure the quality of the results we’re giving users in a dynamic system - expect to hear much more about this as we go on.