Data Dumps could be full of Gold
are a concept we thought about doing over the last few years through Backstage. We would take everything and anything which was licensed in a way we could use it under the backstage licence, zip it up and just dump it on a web server for you all to unzip and explore.
However three problems crept up, one finding data which we could clearly put out as a dump, two removing all reference to personal data or/and people (anonymised) and thirdly putting it somewhere sensible.
For example, we had tried to get a selection of the web traffic logs out, but at 2+gig per month I believe it was. It would have been a small nightmare even hosting or moving them anywhere like archive.org. And thats after having to remove all the secret and private information. Slearned about last year when it gave away a dump of data for research. Obviously we would never risk our/your data in this way.
About this time last year, it was decided to try experimenting with raw data stacks via a XML Database (existdb) using data which was already public. You can find them under the . The Tweetstore is a good example of what were trying to achieve with Data dumps. Generally it archives all tweets which the official ´óÏó´«Ã½ twitter user create. By there-selves, its not that interesting but the value is in what patterns you can pull out over time. With good analysis it would be possible to find keywords which attract followers for example.
We're interested in peoples view on data dumps, are they useful or its not worth looking at unless its a nice clean API? Also what do people think of a hybrid model like we have done with the XML Database? Is it still too abstract for use?
Comments