This article discusses the technical differences between SQL and NOSQL databases and why your organization should consider migrating to NOSQL in the Cloud.
The RDBMS or SQL Database evolved from the original design by Dr. E. F. Codd in 1970. This Database technology has become entrenched in businesses since then and is still used today for business such as Banks that have significant requirements around data integrity.
The RDBMS database rigid static schemas do not allow for agile, flexible application development. This type encumbrance is an issue for businesses that need to compete aggressively and maintain flexibility in the use of technology that enables an efficient time to a market regime. Hence, the reason so many businesses are turning to No SQL.
RDBMS – SQL Database
SQL (Structured Query Language) refers to the language that is used to communicate with Relational Database Management Systems(RDBMS).
In 1970 Dr. E. F. Codd published the paper, “A Relational Model of Data for Large Shared Data Banks”. This paper became the basis of the RDBMS and with it came the development.
The basis of the RDBMS is normalisation, which is the process of organising tables (relations) and the columns (attributes) to reduce unnecessary duplication of data, thereby ensuring that data redundancy is maintained and with it comes improved data integrity. Databases can be designed 1NF, 2NF and 3NF. At a basic level, a table with columns is considered to be 1NF.
Relational databases are defined by the ACID (Atomicity, Consistency, Isolation, Durability) transaction. The ACID transaction can be perceived as a logical operation.
Since Codd’s design around the RDBMS there has been a proliferation of products that service the requirements for business that have a requirement for high levels such as banks. Products on the market such as, but not limited, to SQL Server, Oracle, MySQL and DB2.
The RDBMS databases have the following characteristics:
- Rigid static schemas
- Vertical scalability
- Mature in development and vendor support
- Able to be used for complex queries
- Have structured database objects i.e. Tables
- Downtime can be required to make changes to Database and applications
- Use of SQL for complex queries
As data volumes increased meteorically over time the management, utilisation and value of the RDBMS style database became increasingly problematic. Using the data for Reporting purposes had become a nightmare. These frustrations resulted in new methodologies and technologies that solved many of these issues.
Generational change came to the database world in the form of NoSQL
The beginnings of NoSQL started in various guises in the 1980’s. It’ proliferation really started in 2000 and steadily increased with new products being developed and marketed. In the period between 2000 – 2005 Neo4j, CouchDB and Google Big Table were released. In 2007 saw the release of Amazon Dynamo and MongoDB, followed by Apache Cassandra in 2008 and Riak and HBase in 2009. There are now in excess of 150 NoSQL database products on the market. Check out a NoSQL technology comparison table here
Many of the products are still available today and continue to be improved to meet market demands.
The benefits of NoSQL are many but the most significant is the ability to process large datasets and is also very scalable. One of the main advantages of NoSQL databases is agility. As the schema is flexible, changes to the databases are not disruptive; thereby enabling businesses to implement urgent time to market requirements resulting in enhanced competitiveness.
Many organisations deploy NoSQL databases in the Cloud. It is the perfect candidate for deployment in the Cloud as the applications that use NoSQL as the data stores tend to be very large and require high availability with automated failover, high levels of performance, fault tolerance and data consistency. Deployment of NoSQL databases in the Cloud is on demand quick and easy to deploy. It is cost effective as organisations only pay for what they use.
NoSQL is a distributed system based on the CAP Theorem (Consistency, Availability, Partition) Tolerance. It is imperative that the developer understands the data, utilisation and potential load. The system has the ability to handle inconsistent data, as NoSQL databases do not have transactions in the RDBMS sense. The developer has the discretion to design the transaction points and the required behaviours, as in the write ‘must succeed’ or loss of data is ok.
No SQL Databases have the following characteristics:
- Non- Relational
- Horizontal scalability
- Dynamic (Schema Less)
- Emerging Technology with Community Support Structure
NoSQL Database Types
There are four different types of NoSQL Databases that do not operate under the same principle and are diverse in architecture and function.
- Key Value Databasesg. Riak, Redis, Couchbase. The use case for these databases are for user profile data, shopping carts
- Document Databaseg. MongoDB, CouchDB. The use case for these databases is Analytics, eCommerce applications, blogging sites
- Column Family Storesg. Apache Cassandra, Apache HBase. The use case would be content management, blogging platforms and maintain counters
- Graph Databases g. Neo4j, Infinite Graph. The use case for Graph Databases
NoSQL in the Cloud
Many organisations deploy NoSQL databases in the Cloud. It is the perfect candidate for deployment in the Cloud as the applications that use NoSQL as the data stores tend to be very large and require high availability with automated failover, high levels of performance, fault tolerance and data consistency. Deployment of NoSQL databases in the Cloud is on demand quick and easy to deploy, and cost effective as organisations only pay for what they use.
Apache Cassandra is one of the more popular NoSQL Databases. The underlying design of Cassandra means that it performs very well in the Cloud. As a technology, it has the following characteristics that naturally lend Apace Cassandra to a cloud implementation:
- Fault tolerant. Data is automatically replicated to multiple nodes. It is easy, quick and seamless to add extra nodes across multiple data centres that can in local or global destinations. Failed nodes can be replaced without any downtime.
- Replication across multiple data centres
- Apache Cassandra in the Cloud can be as scalable as required. Apple has over 75000 nodes with over 10 Petabytes of data.