In this post, we have seen some of the Big data technologies that are used to store and analyse data.
For the past few years NoSQL or Non-relational database tools have gained much popular in terms of storing huge amount of data and scaling them easily. There are debates on whether non-relational databases will replace relational databases in future. With the increasing number of social data and other unstructured data, the following are some of the questions raised on relational databases.
Are relational databases capable of handling big data?
Are relational databases able to scale out massive amount of data?
Are relational databases suited for the modern age data?
Before answering these questions, let us know some basics of both Relational and Non-Relational databases.
Relational databases: The concept of Relational Database was developed in 1970s. The most important feature of all relational databases is it’s support of ACID (Automicity,Consistency,Isolation and Durability) properties which assures that all the transactions are reliably processed.
Automicity: Each transaction is unique and make sure that if one logical part of a transaction fails everything is roll backed so that data are unchanged.
Consistency: All data written to the database are subject to the rules defined (constraints, triggers, etc)
Isolation: Changes made in a transaction are not visible to other transactions until they are committed.
Durability: Changes committed in a transaction are stored and available in the database even if there is power failure or the database goes offline suddenly.
Strictly structured: The objects in the relational databases are strictly structured. All data in the table are stored as rows and columns. Each column has a datatype. It is mostly normalized. Structured Query Language (SQL) is suitable to relational databases to store and retrieve data in a structured way. Queries are Plain English commands. There are always fixed number of columns although additional columns can be added later. Most of the tables are related to each other with primary and foreign keys thus providing “Referential Integrity” among the objects.The major vendors are ORACLE, SQL Server, MySQL, PostGreSQL, etc.
Non-relational databases: The concept of non-relational databases came into picture to handle rapid growth of unstructured data and scale them out easily. This provides flexible schema so there is no such thing called “Referential Integrity” as we see in Relational databases. The data are highly de-normalized and do not require JOINs between objects. This relaxes ACID property of relational databases and supports CAP (Consistency, Availability and Partitioning). But out of these three only two are guaranteed at any point of time. So as opposed to ACID, it will only support BASE (Basically Available Soft state, Eventual consistency). The initial databases created based on these concepts are BigTable by Google, HBase by Yahoo, Cassandra by Facebook, etc.
Categories of Non-relational databases: Non-relational databases can be classified into four major categories such as Key-values database, column database, document database and graph database.
Key-values database: This is the simplest form of NoSQL database where each value is associated with unique key.(ex Redis)
Column database: This database is capable of storing and processing large amount of data using a pointer that points to many columns that are distributed over a cluster.(ex HBase)
Document database: This database can contain many key-values documents with many nested level. Efficient Querying is possible with this database. The documents are stored in JSON format.(ex MongoDB)
Graph database: Instead of traditional rows and columns, this database uses nodes and edges to represent graph structures and store data.(ex Neo4J)
In the next post, we will learn about the Scalability of both these databases , their major customers and the right database to choose.