Sproozi generates a lot of data and when we launch we're going to be generating a lot more. Some of it overt and displayed on the website for all to see. Some of it in a database somewhere. Things like searches, locations, session information and clicks on outbound links. Some of that information could lead back to the people that are searching for it. It's happened before to other sites, even after they'd thought they took steps to protect users.
We're also not the only ones, it's pretty much par for the course in search and on the web. If you didn't know it before now, rest assured that every click you make on any major website of any significance is tracked - including if they can, that you clicked it. Also know that from that data they can learn an awful lot about you, if not even who you are.
Which brings up an obvious question which many people rightly ask - If a company is concerned at all about privacy why in the would would they keep any of this data? Well, there are a few very good reasons first and foremost for us is realted to a previous post of mine about Testing for Search Result Quality. It boils down to a very real problem testing user interaction when the results the system produces for any given input change between submissions of the same input - by design. To measure how we're performing we need to measure the actual user behaviour as opposed to measuring what we spit back. Without collecting the data to measure how changes are improving (or worsening) user experience we're more or less releasing code and hoping it's better. Not exactly a professional, or analytical approach to take.
On the other hand my privacy is important to me and it makes me uncomfortable to think about all the data I generate lying about the web. I'm not sure what makes me uncomfortable, but I also know it's not just me. I want to be in control of what other know about me. I feel like I need to treat everyone else's data with the same respect I want for mine.
Let's face it, I don't want to keep personally identifiable data about anyone for any longer than I need to. After a while they become just a statistic anyway. So we're looking into what to do with the data and at which points we can start to anonymise and filter the data to remove personally identifiable information, but we're starting the process from the beginning as opposed to the end - if we're keeping something, I want a reason to keep it.
Some argue that data is the most valuable thing a business has in the modern world, but we tend to take a smaller, more pragmatic, more local view - we can build up a data set for some metric we want to start measuring, but we can never rebuild trust if we lose it. Plus what value will a click stream have in 2, 3, 6 or even 12 months time? So why bother collecting what we're not using?
Some might ask, why bring this all up, before you've launched? Well partly because I've been thinking about it recently and partly because I think it's important to be upfront about the data we collect, what we use it for and what we'll do with it. I also think it's important to show that privacy isn't an afterthought, it's something we considered before we ever collected any data, and something we continue to not only consider as we develop, but that we take privacy of our users very seriously.
A post a day for the month of June - Day 2
[[posterous-content:izynAIgJJkuIsAddCcez]]