Big data is an evolving term that describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information. Big data often characterised by Volume, Velocity and Variety is difficult to analyze using Relational Database Management System (RDBMS). RDBMS is a collection of data items organized as a set of foformally-describedables from which data can be accessed or reassembled in many different ways. Most commercial RDBMSs use the Structured Query Language (SQL) a standard interactive and programming language for getting information from and updating a database. But what happens when the data is multi-structured or unstructured such as social media feeds, video, e-commerce, third-party data etc? For the analysis of such data, NoSQL is the ideal choice.
NoSQL or Not only SQL is a database technology with non-relational and schema-less data model. This is especially useful when you are working with large amounts of data that don’t necessarily fit a structure. NoSQL databases also differ from relational models as they have the ability to scale out, and take advantage of new nodes which are of particular importance presently as transaction rates and availability requirements are increasing.
Data models in NoSQL are grouped into four categories:
Key-Value stores: It is the simplest form of database in which data consists of unique identifier key and a value. Examples of this type of database include Cassandra, DyanmoDB, Azure Table Storage (ATS), Riak, BerkeleyDB.
Document stores: Expanding the basic idea of Key-value stores, Document store is a data model for storing, retrieving, and managing semi-structured document object data. Examples include MongoDB and CouchDB
Column-oriented stores: These databases are designed to store data in sections of columns of data instead of rows. Wide-column stores offer very high performance and a highly scalable architecture. Examples include HBase, BigTable and HyperTable.
Graph Databases: Graph databases are designed for data whose relations are well represented as a graph and has elements which are interconnected. Graph databases map more directly to object-oriented programming models and are faster for highly associative data sets and graph queries. Furthermore, they typically support ACID transaction properties in the same way as most RDBMS. Examples include: Neo4J and Polyglot.
Utilising the NoSQL database gives organisations access to a range of benefits including the following:
When faced with storage crunch, the database administrator (DBA) has two choices- vertical scaling (scale up) or horizontal scaling (scale out). For years, DBAs have relied on vertical scaling — adding expensive and robust servers. Using horizontal scaling i.e. adding more servers with few processors and RAM, organisations are able to scale out and take advantage of new nodes according to their data storage needs. However, as transaction rates and availability requirements increase, and as databases move into the cloud, the economic advantages of scaling out on commodity hardware become irresistible.
Both structured and unstructured data can be stored as there is no fixed data model. This flexibility gives organisations access to much larger quantities of data. Applications can store data in virtually any structure or format necessary, making change management very easy. Ultimately, this means more uptime and better reliability. Contrast this against relational databases, which must be strictly and attentively managed; where even a minor change may result in downtime or a reduction of service.
Accepting that hardware failures will occur meant the NoSQL database was designed with redundancy in mind. These databases were designed and built at massive scales where the rarest hardware problems go from being small to eventualities. Rather than treating hardware failure as an exceptional event, NoSQL databases are designed to handle it. While hardware failure is still a serious concern, this concern is addressed at the architectural level of the database, rather than requiring developers, DBAs, and operations staff to build their own redundant solutions. For example, Cassandra uses a number of techniques to determine the likelihood of node failure. Riak takes a different approach and can survive network partitioning (when one or more nodes in a cluster become isolated) and repair itself.
NoSQL databases tend to be less complex and considerably simpler to deploy than SQL. It is easy to change how data is stored or the queries you’re running in NoSQL databases. Massive changes to data can be accomplished with simple refactoring and batch processing rather than complex migration scripts and outages. It’s even easier to take nodes in a cluster offline for changes and add them back into a cluster as replication features will take care of syncing up data and propagating the new data design out to the other servers in a cluster.
A key strategic driver of implementing a NoSQL database environment is the ability to perform analytics. Mining the assimilated data to derive insights puts your business at a competitive advantage. Extracting meaningful business intelligence from very high volumes of data is a very difficult task to achieve with traditional relational database systems. Modern NoSQL database systems not only provide storage and management of business application data but also deliver integrated data analytics that deliver instant understanding of complex data sets and facilitate flexible decision-making.
These benefits mean the NoSQL database is ideally suited to those organisations that need a database which can cope with large amounts of disparate data. With the advent of big data, SQL’s use has been limited to structured database. But SQL has been the predominant choice for database technology for storage of information for financial records, manufacturing and logistical information, personnel data, and many databases since the 1980s. So for a perfect big data ecosystem we have to use best of both the database technologies.
At Maruti Techlabs, we use both SQL and NoSQL technologies for building an efficient big data ecosystem with the necessary analytics. Data collected from the client is usually in RDBMS form which is difficult and time-consuming to analyse. We have developed a logic to convert this relational data into NoSQL form. This transformed NoSQL database is analysed using ElasticSearch. Elasticsearch is a tool for querying written words. It returns text similar to a given query and/or statistical analyses of a corpus of text. To understand how NoSQL can boost your big data analysis, visit us at Maruti Techlabs.
Data Analytics and Business Intelligence - 7 MIN READ
What is Elasticsearch and how can it be helpful?
Data Analytics and Business Intelligence - 4 MIN READ
Elasticsearch - Making Big Data Analytics Easier