| Subcribe via RSS

Cloud Data Persistence

March 11th, 2009 | 2 Comments | Posted in Qcon, n810 post, technology, web 2.0

is this a renaissance for the database?

today we have new problems and new challenges, but the computing power we now have provides alternatives.  This in turn is changing the approaches that are being taken. the following provides examples of the alternatives in the cloud database area.

Physical limitations and computation complexity are driving implementations into new area that we are not use to.  Similar to when multi-cores first came out, many thought lots of software would take account of the multi-cores, but this did not happen that quickly, it’s still happening now.

A regular solution to a social network with large database use is to Shard (divide into chunks across multiple servers).  But remember this has changes for the developer, who now needs to look-up over multiple shards – you cannot use an SQL JOIN across a shard and you cannot guarantee uniqueness and integrity. We are also seeing developers resorting to unnatural acts to deal with the issues of large data sets i.e turning MYSQL into a keyless store (Friendfeed).  good solution for them, but now the DB is no longer a DB.

The Cloud can be views:

  • Hiding complexity
  • Scalable – elastic resource availability
  • Pay as you go
  • no need to worry about tuning
  • geographical diversity

And can be broken into loose types = the *aaS Model

  • Saas – software as a service
  • PaaS – platfrom as a service
  • TaaS – tools as a service
  • IaaS – infrastructure as a service
  • ?aaS

All sounds great, so what’s the catch? Safety, geo-graphical availability and commodity hardware.  Also believe it or not the speed of light, which in data terms is still slow when transferring over geographical locations.

Two alternatives to the relational model that can cope with massive datasets in the cloud

Google BigTable

data tables are sharded into tablets and served via a single server, each tablet server can have 1000 tablets.  these table servers have a master and this can be removed and the system will still work for a limited period.

  • Distributed store
  • hundreds of terabytes
  • effectively a big sorted map
  • row keys grouped into column families
  • data is versioned
  • fast, scalable and transnational
  • meta data also stored in the same way in the tablets via a route metadata tablet.

You cannot use BigTable yourself but there are some open source alternatives Hypertable Apche HBase.  Also Big Table via Google App Engine, you need to use Python and there is something in-between (although the speaker had not worked out what it is).  But you are getting the benefits of Big Table in a round about way.

Amazon Dynamo

projects Voldermort and Cassandra use this idea.

  • Distributed key value store
  • Designed for high availability – tolerate network partitions and server failures without effect
  • decentralized – no master
  • data replicated via consistant hashing
  • multi-node reads and writes for redundancy
  • objects versioned for consistency
  • uses a Vector clock to disambiguate between server version of the same object

And for the lighter touch the smaller alternatives

Amazon Simple DB

  • tabular store
  • domains which are like tables and contain items
  • schemaless
  • auto-indexing
  • eventually consistent
  • no cross domain joins
  • query limit to 250 items
  • everything is a string

MSFT’s Azure SQL Services – in test

  • non-relational – really an XML document store
  • Containers which have entities
  • Queries through LINQ
  • REST and SOAP interfaces

Apache CouchDB – looks pretty good for JavaScript apps.

  • Document store in Json
  • REST API get,put, post

Other things to watch:

Tags: , , , ,

Deploy Java EE to Amazon EC2

February 5th, 2009 | No Comments | Posted in technology

Google CodeCloud Tools is a set of tools for deploying, managing and testing Java EE applications on Amazon’s Elastic Computing Cloud (EC2). There are three main parts to Cloud Tools

  • Amazon Machine Images (AMIs) that are configured to run Tomcat and work with EC2Deploy.
  • EC2Deploy – the core framework. This framework manages EC2 instances, configures MySQL, Tomcat, Terracotta and Apache and deploys the application.
  • Maven and Grails plugins that use EC2Deploy to deploy an application to EC2

Tags: , , ,

Cloud Computing – Why is it so good for business?

December 28th, 2008 | No Comments | Posted in technology
  • From a financial perspective, Cloud Computing pushes risks onto the people that own the assets.  The business in effect rent a particular set of assets, based on their usage. For the business this  transforms IT capex into opex.
  • From a development perspective, Cloud Computing  enables you to potentially roll out your solution in minutes or hours, instead of weeks or months. An scale according to demand
  • From a work activity perspective, Cloud Computing enables the enterprise to involve people regardless of organization boundaries and empower them with the necessary knowledge to perform their tasks.

There are three types of Cloud Computing paradigms that build on each other.

  • Infrastructure-as-a-Service (IaaS) -  Amazon Web Services (AWS). At the core, Amazon provides 3 basics services: Storage (S3), Computing (EC2) and Queues (SQS)
  • Platform-as-a-Service (PaaS) – provide different combination’s of services to support the application development lifecycle
  • Software-as-a-Service (SaaS) – Saleforce.com , an application is hosted as a service provided to customers across the Internet. By eliminating the need to install and run the application on the customer’s own computer

And these can be used in the business environment to:

  • Reduce the risks associated with capital expenditure , by moving to a pay-as-you-go model
  • Scaling  based on actual demand rather than best guess
  • Use Amazon or another company to managing Service Level Agreements
  • Allow the cloud provider to Manage problems and incident
  • seamlessly upgrade to new versions of a software as the provider upgrades
  • Securing data, processes and infrastructures, the use of the cloud for disaster recovery is very economical and gets away from supporting redundant data center for disaster recovery purposes
  • reduce the need for staff with specific skill sets

Tags: , , , ,