The 70 Online Databases that Define Our Planet

The Metropolitan Travel Survey Archive was identified by Technology Review as one of the The 70 Online Databases that Define Our Planet

based on an article "From Social Data Mining to Forecasting Socio-Economic Crisis" by Helbing and Balietti.

ed, March 25, 2010. Also See BBC article Earth project aims to 'simulate everything'

It could be one of the most ambitious computer projects ever conceived.

An international group of scientists is aiming to create a simulator that can replicate everything happening on Earth - from global weather patterns and the spread of diseases to international financial transactions or congestion on Milton Keynes' roads.

Nicknamed the Living Earth Simulator (LES), the project aims to advance the scientific understanding of what is taking place on the planet, encapsulating the human actions that shape societies and the environmental forces that define the physical world.

"Many problems we have today - including social and economic instabilities, wars, disease spreading - are related to human behaviour, but there is apparently a serious lack of understanding regarding how society and the economy work," says Dr Helbing, of the Swiss Federal Institute of Technology, who chairs the FuturICT project which aims to create the simulator.

Knowledge collider
Thanks to projects such as the Large Hadron Collider, the particle accelerator built by Cern, scientists know more about the early universe than they do about our own planet, claims Dr Helbing.

What is needed is a knowledge accelerator, to collide different branches of knowledge, he says.

"Revealing the hidden laws and processes underlying societies constitutes the most pressing scientific grand challenge of our century."

The result would be the LES. It would be able to predict the spread of infectious diseases, such as Swine Flu, identify methods for tackling climate change or even spot the inklings of an impending financial crisis, he says.


Is it possible to build a social science equivalent to the Large Hadron Collider?
But how would such colossal system work?

For a start it would need to be populated by data - lots of it - covering the entire gamut of activity on the planet, says Dr Helbing.

It would also be powered by an assembly of yet-to-be-built supercomputers capable of carrying out number-crunching on a mammoth scale.

Although the hardware has not yet been built, much of the data is already being generated, he says.

For example, the Planetary Skin project, led by US space agency Nasa, will see the creation of a vast sensor network collecting climate data from air, land, sea and space.

In addition, Dr Helbing and his team have already identified more than 70 online data sources they believe can be used including Wikipedia, Google Maps and the UK government's data repository Data.gov.uk.

Drowning in data
Integrating such real-time data feeds with millions of other sources of data - from financial markets and medical records to social media - would ultimately power the simulator, says Dr Helbing.

The next step is create a framework to turn that morass of data in to models that accurately replicate what is taken place on Earth today.

Continue reading the main story
"
Start Quote

We don't take any action on the information we have"

Pete Warden
OpenHeatMaps
That will only be possible by bringing together social scientists and computer scientists and engineers to establish the rules that will define how the LES operates.

Such work cannot be left to traditional social science researchers, where typically years of work produces limited volumes of data, argues Dr Helbing.

Nor is it something that could have been achieved before - the technology needed to run the LES will only become available in the coming decade, he adds.

Human behaviour
For example, while the LES will need to be able to assimilate vast oceans of data it will simultaneously have to understand what that data means.

That becomes possible as so-called semantic web technologies mature, says Dr Helbing.

Today, a database chock-full of air pollution data would look much the same to a computer as a database of global banking transactions - essentially just a lot of numbers.

But semantic web technology will encode a description of data alongside the data itself, enabling computers to understand the data in context.

What's more, our approach to aggregating data stresses the need to strip out any of that information that relates directly to an individual, says Dr Helbing.


The Living Earth Simulator aims to predict how diseases spread
That will enable the LES to incorporate vast amounts of data relating to human activity, without compromising people's privacy, he argues.

Once an approach to carrying out large-scale social and economic data is agreed upon, it will be necessary to build supercomputer centres needed to crunch that data and produce the simulation of the Earth, says Dr Helbing.

Generating the computational power to deal with the amount of data needed to populate the LES represents a significant challenge, but it's far from being a showstopper.

If you look at the data-processing capacity of Google, it's clear that the LES won't be held back by processing capacity, says Pete Warden, founder of the OpenHeatMap project and a specialist on data analysis.

While Google is somewhat secretive about the amount of data it can process, in May 2010 it was believed to use in the region of 39,000 servers to process an exabyte of data per month - that's enough data to fill 2 billion CDs every month.

Reality mining
If you accept that only a fraction of the "several hundred exabytes of data being produced worldwide every year... would be useful for a world simulation, the bottleneck won't be the processing capacity," says Mr Warden.

"Getting access to the data will be much more of a challenge, as will figuring out something useful to do with it," he adds.

Simply having lots of data isn't enough to build a credible simulation of the planet, argues Warden. "Economics and sociology have consistently failed to produce theories with strong predictive powers over the last century, despite lots of data gathering. I'm sceptical that larger data sets will mark a big change," he says.

"It's not that we don't know enough about a lot of the problems the world faces, from climate change to extreme poverty, it's that we don't take any action on the information we do have," he argues.

Regardless of the challenges the project faces, the greater danger is not attempting to use the computer tools we have now - and will have in future - to improve our understanding of global socio-economic trends, says Dr Helbing.

"Over the past years, it has for example become obvious that we need better indicators than the gross national product to judge societal development and well-being," he argues.

At it's heart, the LES is about working towards better methods to measure the state of society, he says, which

David Levinson

Network Reliability in Practice

Evolving Transportation Networks

Place and Plexus

The Transportation Experience

Access to Destinations

Assessing the Benefits and Costs of Intelligent Transportation Systems

Financing Transportation Networks

View David Levinson's profile on LinkedIn

Subscribe to RSS headline updates from:

About this Entry

This page contains a single entry by David Levinson published on December 10, 2010 8:38 AM.

Conversation with an Engineer was the previous entry in this blog.

The New York subway-map wars, continued is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Categories

Monthly Archives

Pages

Powered by Movable Type 4.31-en