Judging Open Data Aggregators

Let’s forget about Google’s domination of search for a min; I believe open data aggregators* have a future as an alternative sort of search engine.  Why? They enable a user to find:

1. Information that is not easily available elsewhere. (Open datasets are generally not indexed for search by Google or other search platforms)

2. Relevant information without knowing exactly the right source.

* Let’s note that I’m defining open data aggregators as any site that allows a user to search through datasets from at least two sources with a single key term. Generally the datasets are from open government sources but they don’t have to be.

Open data aggregators can be even more useful than a search engine for certain purposes. If I am interested in finding out about a particularly private company for instance, searching public filings might provide me with more information the the organization’s website. If I’m a journalist or an academic looking for numbers to back up a claim, an open data aggregator could be just the resource to turn to.

Open data aggregators could do for open data what search did for the web. Which is the Google of open data aggregators in that strained analogy? That part remains to be seen.

I’m impatient though, so I evaluated a few myself to see which might win out.

The contenders

Plenar.io, Quandl, Knoema, Datahub, Engage, Enigma and Google Public Data.

What’s their deal?

1) Plenar.io was started in a partnership between the University of Chicago, Urban Center for Computation and Data and DataMade. Right now Plenar.io focuses on government data but lists “unstructured data such as tweets and crowdsourced observations” in its roadmap. The site just launched and is still only in beta.

2) Quandl is a Canadian company that positions itself as a numerical data marketplace. Public data is available for free on Quandl, but the site also mediates the sale of proprietary datasets.

3) Knoema calls itself a “knowledge platform” – what that means is that is more of an interactive aggregator than other ones here. Users can upload their own datasets.

4) Datahub is the Open Knowledge Foundation’s data management platform. Lots of government datasets are available but they are meant for developers to download. There is little way to interact with the data on the website itself. Like with Knoema, users can upload their own datasets.

5) Engage, funded by the European Commission, aims to bring together public datasets from all over the European Union for researchers to access.

6) Enigma is beautiful. I’ve said this before on this blog, but it’s still true. The company’s site has a range of datasets available to search. Users can look at whatever for free but have to pay for substantial API access.

7) Google Public Data feels empty and sad.

Who’s the best?

Coolest interface: Enigma.io

Easiest APIs: Plenar.io

Prettiest vizualizations: Plenar.io

Most datasets: Quandl

Least datasets: Google Public Data (Only 11! Gooooooooogle…)

Winner: Disappointingly no one aggregator has got it all (yet). It depends what you’re looking to do.