Cloud Data Persistence

is this a renaissance for the database?

today we have new problems and new challenges, but the computing power we now have provides alternatives. This in turn is changing the approaches that are being taken. the following provides examples of the alternatives in the cloud database area.

Physical limitations and computation complexity are driving implementations into new area that we are not use to. Similar to when multi-cores first came out, many thought lots of software would take account of the multi-cores, but this did not happen that quickly, it’s still happening now.

A regular solution to a social network with large database use is to Shard (divide into chunks across multiple servers). But remember this has changes for the developer, who now needs to look-up over multiple shards – you cannot use an SQL JOIN across a shard and you cannot guarantee uniqueness and integrity. We are also seeing developers resorting to unnatural acts to deal with the issues of large data sets i.e turning MYSQL into a keyless store (Friendfeed). good solution for them, but now the DB is no longer a DB.

The Cloud can be views:

Hiding complexity
Scalable – elastic resource availability
Pay as you go
no need to worry about tuning
geographical diversity

And can be broken into loose types = the *aaS Model

Saas – software as a service
PaaS – platfrom as a service
TaaS – tools as a service
IaaS – infrastructure as a service
?aaS

All sounds great, so what’s the catch? Safety, geo-graphical availability and commodity hardware. Also believe it or not the speed of light, which in data terms is still slow when transferring over geographical locations.

Two alternatives to the relational model that can cope with massive datasets in the cloud

Google BigTable

data tables are sharded into tablets and served via a single server, each tablet server can have 1000 tablets. these table servers have a master and this can be removed and the system will still work for a limited period.

Distributed store
hundreds of terabytes
effectively a big sorted map
row keys grouped into column families
data is versioned
fast, scalable and transnational
meta data also stored in the same way in the tablets via a route metadata tablet.

You cannot use BigTable yourself but there are some open source alternatives Hypertable Apche HBase. Also Big Table via Google App Engine, you need to use Python and there is something in-between (although the speaker had not worked out what it is). But you are getting the benefits of Big Table in a round about way.

Amazon Dynamo

projects Voldermort and Cassandra use this idea.

Distributed key value store
Designed for high availability – tolerate network partitions and server failures without effect
decentralized – no master
data replicated via consistant hashing
multi-node reads and writes for redundancy
objects versioned for consistency
uses a Vector clock to disambiguate between server version of the same object

And for the lighter touch the smaller alternatives

Amazon Simple DB

tabular store
domains which are like tables and contain items
schemaless
auto-indexing
eventually consistent
no cross domain joins
query limit to 250 items
everything is a string

MSFT’s Azure SQL Services – in test

non-relational – really an XML document store
Containers which have entities
Queries through LINQ
REST and SOAP interfaces

Apache CouchDB – looks pretty good for JavaScript apps.

Document store in Json
REST API get,put, post

Other things to watch:

MongoDB – the speakers own project in this area
memcachedb – distributed key/value store
Drizzle – a fork of MYSQL for the cloud.
Hadoop – distributted file system
Scalaris – google

2 responses to “Cloud Data Persistence”

Blogposts about QCon London 2009 | JAOO Community Blog says:

March 30, 2009 at 12:16 pm

[…] https://markedgington.com/2009/03/11/web-oriented-architecture-woa/ https://markedgington.com/2009/03/11/cloud-data-persistence/ […]

Rob says:

June 16, 2009 at 9:53 pm

Another one to add: M/DB:X, a lightweight, REST-interfaced XML database, designed for use in the cloud. http://www.mgateway.com/mdbx.html for more details.

2 responses to “Cloud Data Persistence”

Leave a Reply Cancel reply