The Crayon Blog

Hadoop Tutorials: Using Hive with HBase

Tech Articles | Published September 5, 2013 | Tejeswini Kashyappan

Here is another interesting use case that came up when I was working with one of our clients in the insurance industry. The client had enormous amount of claim data residing in multiple databases in SQL Server which were to be consolidated into one. Some of the queries on this data took days because of which we were looking for an alternate solution that could process data in a distributed fashion and save us some time. We started looking into a Hadoop based solution since the company was already using Hadoop.

We had few options on the table such as Hive, Pig, Hbase etc and after some brainstorming decided to go with HBase for the following reasons:

It is an open source distributed database which would yield higher performance while being cost effective at the same time.
We do not have to worry about distributing the data for faster processing since Hadoop takes care of it.
Batch processing with no real indexes.
Data integrity as HBase confirms a write after its write-ahead log reaches all the three in-memory HDFS replicas.
Easily scalable, fault tolerant and highly available.

Now the next step was to move data from the SQL database to HDFS for which we used Sqoop.

Recent Blogs

October 11, 2024

Categories

The Crayon Blog

Tribute to Mr. Ratan Tata: A Life of Purpose, Vision, and Humanity

September 18, 2024

Categories

The Crayon Blog

The Superlative of Efficiency is Here!

May 23, 2024

Categories

The Crayon Blog

Navigating the Future of Lending: How AI is Revolutionizing Consumer Credit

April 16, 2024

Categories

The Crayon Blog

Is the GenAI out of the bottle?

Subscribe to the Crayon Blog. Get the latest posts in your inbox!

SIGN UP HERE

The Crayon Blog

Hadoop Tutorials: Using Hive with HBase

Tech Articles | Published September 5, 2013 | Tejeswini Kashyappan

Here is another interesting use case that came up when I was working with one of our clients in the insurance industry. The client had enormous amount of claim data residing in multiple databases in SQL Server which were to be consolidated into one. Some of the queries on this data took days because of which we were looking for an alternate solution that could process data in a distributed fashion and save us some time. We started looking into a Hadoop based solution since the company was already using Hadoop.

We had few options on the table such as Hive, Pig, Hbase etc and after some brainstorming decided to go with HBase for the following reasons:

It is an open source distributed database which would yield higher performance while being cost effective at the same time.
We do not have to worry about distributing the data for faster processing since Hadoop takes care of it.
Batch processing with no real indexes.
Data integrity as HBase confirms a write after its write-ahead log reaches all the three in-memory HDFS replicas.
Easily scalable, fault tolerant and highly available.

Now the next step was to move data from the SQL database to HDFS for which we used Sqoop.

Recent Blogs

October 11, 2024

Categories

The Crayon Blog

Tribute to Mr. Ratan Tata: A Life of Purpose, Vision, and Humanity

September 18, 2024

Categories

The Crayon Blog

The Superlative of Efficiency is Here!

May 23, 2024

Categories

The Crayon Blog

Navigating the Future of Lending: How AI is Revolutionizing Consumer Credit

April 16, 2024

Categories

The Crayon Blog

Is the GenAI out of the bottle?

Subscribe to the Crayon Blog. Get the latest posts in your inbox!

SIGN UP HERE

The Crayon Blog

Hadoop Tutorials: Using Hive with HBase

Tech Articles | Published September 5, 2013 | Tejeswini Kashyappan

Here is another interesting use case that came up when I was working with one of our clients in the insurance industry. The client had enormous amount of claim data residing in multiple databases in SQL Server which were to be consolidated into one. Some of the queries on this data took days because of which we were looking for an alternate solution that could process data in a distributed fashion and save us some time. We started looking into a Hadoop based solution since the company was already using Hadoop.

We had few options on the table such as Hive, Pig, Hbase etc and after some brainstorming decided to go with HBase for the following reasons:

It is an open source distributed database which would yield higher performance while being cost effective at the same time.
We do not have to worry about distributing the data for faster processing since Hadoop takes care of it.
Batch processing with no real indexes.
Data integrity as HBase confirms a write after its write-ahead log reaches all the three in-memory HDFS replicas.
Easily scalable, fault tolerant and highly available.

Now the next step was to move data from the SQL database to HDFS for which we used Sqoop.

Recent Blogs

October 11, 2024

Categories

The Crayon Blog

Tribute to Mr. Ratan Tata: A Life of Purpose, Vision, and Humanity

September 18, 2024

Categories

The Crayon Blog

The Superlative of Efficiency is Here!

May 23, 2024

Categories

The Crayon Blog

Navigating the Future of Lending: How AI is Revolutionizing Consumer Credit

April 16, 2024

Categories

The Crayon Blog

Is the GenAI out of the bottle?

Subscribe to the Crayon Blog. Get the latest posts in your inbox!

SIGN UP HERE

The Crayon Blog

Hadoop Tutorials: Using Hive with HBase

Tech Articles | Published September 5, 2013 | Tejeswini Kashyappan

Here is another interesting use case that came up when I was working with one of our clients in the insurance industry. The client had enormous amount of claim data residing in multiple databases in SQL Server which were to be consolidated into one. Some of the queries on this data took days because of which we were looking for an alternate solution that could process data in a distributed fashion and save us some time. We started looking into a Hadoop based solution since the company was already using Hadoop.

We had few options on the table such as Hive, Pig, Hbase etc and after some brainstorming decided to go with HBase for the following reasons:

It is an open source distributed database which would yield higher performance while being cost effective at the same time.
We do not have to worry about distributing the data for faster processing since Hadoop takes care of it.
Batch processing with no real indexes.
Data integrity as HBase confirms a write after its write-ahead log reaches all the three in-memory HDFS replicas.
Easily scalable, fault tolerant and highly available.

Now the next step was to move data from the SQL database to HDFS for which we used Sqoop.

Recent Blogs

October 11, 2024

Categories