The most common and traditional data processing applications are becoming insufficient in handling a great volume of data sets so prominent in corporate computing today. Therefore, it is a little difficult to find a platform which can help the companies stay...
Hadoop
Recent Articles
Top open source big data tools for your business analytic needs
Let’s assume you run a business enterprise and it has major data sources that generate real-time information about the users. Choosing a right big data tool for your enterprise is an important step because once you begin with the project, it is extremely cumbersome...
Apache Spark vs Hadoop: Which is the big data winner?
With the evolution of technology, data is present everywhere. Thanks to the internet, which has enabled inter-connectivity of millions of devices across the globe. There has been an unprecedented growth of data usage in the recent years which is likely to expand...
Components of Hadoop Architecture & Frameworks used for Data Science
Every business now recognizes the power of Big Data Analytics in developing deep actionable insights to enjoy business advantages. However, unlike before when businesses were required to deal with gigabytes of data, the present scenario requires to store and process...
What are Hadoop alternatives and should you look for one?
Hadoop’s development from a batch-oriented, large-scale analytics tool to an entire ecosystem comprised of various application, tools, services, and vendors goes hand in hand with the rise of big data marketplace. It is predominantly used for large scale data analysis...
The business of transferring data from Salesforce to Hadoop
The sustained success of Hadoop has brought about a radical change in big data management. This highly popular open-source MapReduce technology allows easy access and provides reliable answers to advanced data questions. Data management has been taken to the next...
Reasons why hadoop as a service is recommended for your business
The importance that data is playing in business is hard to downplay. Data is growing exponentially in size and as it does its ability to not only affect our business decisions but also has the potential to become the bedrock of some of the highest earning industries...
Why use Hadoop? Top pros and cons of Hadoop
Big Data is one of the major areas of focus in today’s digital world. There are tons of data generated and collected from the various processes carried out by the company. This data could contain patterns and methods as to how the company can improve its processes....
How to fetch HBase table data in Apache Phoenix?
This exclusive post is shared by big data services providers to help developers in development. They tell the best way to fetch HBase table data in Apache Phoenix. Read this article and discover what they have to say about Big Data related services. The term 'Big...
Top 11 key tuning checklists for Apache Hadoop
Apache Hadoop is a well know and de-facto framework for processing large big data sets through distributed & parallel computing. YARN(Yet Another Resources Negotiator) allowed Hadoop to evolve from a simple MapReduce engine to a big data ecosystem that can run...
Eight breakthrough changes in Apache Flink 1.0.0
Apache Flink is an open source platform for distributed stream and batch data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink also builds...
What is SMACK (Spark, Mesos, Akka, and Kafka)?
This blog introduces the convergence of complementary technologies – Spark, Mesos, Akka, Cassandra and Kafka (SMACK) stack. And we will see how Apache Kafka can help us to get data under control and what is it role in our data pipeline, how Spark & Akka help us to...
Top ten pointers in the new Apache Spark release (version 1.6)
In 2016, we should be excited that Apache Spark community launched Apache Spark 1.6. Committers – There are around 1000 contributors to Apache Spark, which has doubled. Patches – Apache Spark 1.6 version includes & covers 1000 patches. Run SQL query on files –...
What is the role of RDDs in Apache Spark? – Part 1
This blog introduces Spark’s core abstraction for working with data, the RDD (Resilient Distributed Dataset). An RDD is simply a distributed collection of elements or objects (Java, Scala, Python, and user defined functions) across the Spark cluster. In Spark, all...
Is Apache Hadoop the only option to implement big data?
Yes, Hadoop is not only the options to big data problem. Hadoop is one of the solutions. The HPCC (High-Performance Computing Cluster) Systems technology is an open source data-driven and intensive processing and delivery platform developed by LexisNexis Risk...
The top 12 Apache Hadoop challenges
Hadoop is a large-scale distributed batch processing infrastructure. While it can be used on a single machine, its true power lies in its ability to scale to hundreds or thousands of computers, each with several processor cores. Hadoop is also designed to efficiently...
What are the 3 S's of Spark and its effect on big data?
Many thanks for your cherished time, this time we like to share with you the details on what is 3 S’s of Spark as we all know the 3 V’s of Big Data is Volume, Variety & Velocity. And even added with kernel V’s like Veracity & Values. Big Data is defined as a...
(Big) Data in Data Lake vs. Data Warehouse – Interesting things to consider
Big data is used across verticals like Insurance, Healthcare, Manufacturing, Financial, Retail and more. Companies are using big data to improve top & bottom line revenue with business values. In this data-driven era, enterprise readiness and data management needs...
Advantages of NoSQL Databases – Everything you need to know
From the relational databases, that characterized the last two decades and more. NoSQL databases have gained popularity as a better method of data handling, and below are five reasons why: 1. Elastic Scalability In the past, the best DBA services still had to depend...
Three reasons why business users may want to learn Hadoop
Big data is a popular topic these days, not only in the tech media, but also among mainstream news outlets. Executives see Big Data as providing significant business benefits – greater insight and learning, the ability to obtain answers and make decisions faster and...
Seven common problems of scaling Hadoop
Every Hadoop implementation encounters the occasional crisis, including moments when the folks running Hadoop feel like their hair is on fire. Sometimes it happens before you get to production, which can cause organizations to throw the Hadoop baby out with the...
Global hadoop market is expected to reach $13.95 billion by 2017
According to a new market research report, Global Hadoop Market - Industry Analysis, Size, Share, Growth, Trends, and Forecast 2012 - 2018, published by MarketsandMarkets, the total Hadoop market, which had an estimated value worth USD 1.5 billion in 2012, is expected...
Hadoop Glossary: 20 most important terms
This is a list of most important Hadoop terms you need to know and understand, before going into the Hadoop eco-system. [To read about top 10 most popular myths about Hadoop, click here.] Most important Hadoop terms Apache or Apache Software Foundation (ASF): A...
Top Facebook groups for Analytics, Big Data, Data Mining, Hadoop, NoSQL, Data Science
Facebook may not be a best place for professional, but like in Linkedin, it too has a good number of Big Data groups/communities/public forums that function to spread knowledge about technologies used to mine, manage and analyse data for businesses. This is our...
Exploring the world of data: A complete list of Big Data blogs
This list contains almost all frequently-updated Big Data blogs, belonging to a wide range of categories: Data Science, Data Analytics, Business Intelligence, Machine Leaning, Data Visualization, Data Mining, NoSQL, Hadoop etc. The blogs are arranged alphabetically....
Top 20 essential Hadoop tools for crunching Big Data
Hadoop is an open source distributed processing framework which is at the center of a growing big data ecosystem. Used to support advanced analytics initiatives, including predictive analytics, data mining and machine learning applications, Hadoop manages data...
Top 10 books to get started with Hadoop
These books are our recommendations if you are planning to start your Big Data journey with Hadoop, - an open source distributed processing framework which is at the center of a growing big data ecosystem. The books are listed in no specific order. 1. Hadoop: The...
Learn Hadoop with 10 SlideShare presentations
Want to learn Hadoop? Watch these presentations on SlideShare to understand Hadoop HDFS, the MapReduce algorithm, the Pig Latin language, and the Hive SQL language. 1. Introduction to MapReduce, an Abstraction for Large-Scale Computation by Ilan Horn, Google...
Best LinkedIn groups all Hadoop experts should join
There are hundreds of Hadoop groups on LinkedIn, but these are the best ones you should definitely consider joining. Join them to learn about the latest happenings in the world of Hadoop, and engage in discussions with other professionals online. 1. Hadoop Users Group...
How to install a Virtual Apache Hadoop Cluster with Vagrant and Cloudera Manager
It’s been a while since we provided a how-to for this purpose. Thanks, Daan Debie (@DaanDebie), for allowing us to re-publish the instructions below (for CDH 5)! I recently started as a Big Data Engineer at The New Motion. While researching our best options for...
How to install a Virtual Apache Hadoop Cluster with Vagrant and Cloudera Manager
It’s been a while since we provided a how-to for this purpose. Thanks, Daan Debie (@DaanDebie), for allowing us to re-publish the instructions below (for CDH 5)! I recently started as a Big Data Engineer at The New Motion. While researching our best options for...
Top 10 most popular myths about Hadoop
Hadoop and Big Data are practically synonymous these days. There is so much info on Hadoop and Big Data out there, but as the Big Data hype machine gears up, there's a lot of confusion about where Hadoop actually fits into the overall Big Data landscape. Let’s have a...
Understanding the power of Hadoop as a Service
Across a wide range of industries from health care and financial services to manufacturing and retail, companies are realizing the value of analyzing data with Hadoop. With access to a Hadoop cluster, organizations are able to collect, analyze, and act on data at a...
Business or Pleasure? – why not both: The Roadmap to ‘Hadoop in the Cloud’
The Twitter ball started rolling again just now. Matt Asay posed an interesting question about Forrester suggesting Hadoop isn't a great fit for the cloud. (Even) without context Vijay Vijayasankar and I started firing off questions and answers which inevitable led to...
Hadoop, big data, and the elephant in the room
In the 1800s, John Godfrey Saxe wrote a poem about six blind men and an elephant based on an old Indian story. In an effort to discover what the elephant is, each man touches a different part of the creature and subsequently draws his own unique — and incorrect —...
Hadoop’s ability to deliver business growth is worth the bother
Hadoop is going to be big, but today, its adoption is still small. According to Gartner, there are only 1,000 Hadoop systems in production, with most companies not moving Hadoop beyond the proof of concept phase. Partly, this is a matter of difficulty: Hadoop isn't...
Hadoop and big data: Where Apache Slider slots in and why it matters
Code submitted this week for inclusion in the Hadoop stack will help speed the spread of the distributed big-data platform, according to Hortonworks co-founder Arun Murthy. The submission of the Slider framework to the Apache Software Foundation Incubator will result...
Hadoop Market is Expected to Reach $50.2 Billion, Globally, by 2020
Hadoop is a distributed processing technology used for Big Data analysis. Hadoop market is expanding at a significant rate, as Hadoop technology provides cost effective and quick solutions compared to traditional data analysis tools such as RDBMS. The Hadoop Market...
A Guide to Checkpointing in Hadoop
Checkpointing is an essential part of maintaining and persisting filesystem metadata in HDFS. It’s crucial for efficient NameNode recovery and restart, and is an important indicator of overall cluster health. However, checkpointing can also be a source of confusion...
Hadoop 2 puts big data environment on friendlier analytics turf
In a recent conversation with project team members from a client, one shared an internal slide deck used to promote the benefits of big data (in general) and Hadoop (in particular) among both key management decision makers and the development and implementation groups...
The Use and Abuse of Big Data and Hadoop
Big data analytics is having a huge impact on us all. The ability to collect, store and analyze massive amounts of disparate data---using analytics platforms such as Hadoop in the cloud to uncover hidden connections, correlations and insights---is playing a bigger...
5 examples of big companies managing big data on Hadoop
As one of the world’s most popular free Java-based programming networks, Apache Hadoop is being used by an increasing number of companies who can no longer manage their data using traditional methods. The open-source platform can deal with a wide variety of data,...
5 Reasons Why Hadoop Is Ready for Enterprise Prime Time
As 2014 gets into full swing, Hadoop is increasingly being used for applications that are integral to daily business operations. No longer is Hadoop viewed by some organizations as just a platform for big data proof-of-concept applications. IT leaders should be...
Faster, more capable: What Apache Spark brings to Hadoop
Apache Spark is an execution engine that broadens the type of computing workloads Hadoop can handle, while also tuning the performance of the big data framework. Hadoop specialist Cloudera recently announced that it will offer commercial support for Apache Spark,...
Big data infrastructure goes far beyond Hadoop
Wikibon Principal Research Contributor Jeff Kelly provides an inclusive basic tutorial of the big data environment, including technologies, skill sets, and use cases, in “Big Data: Hadoop, Business Analytics and Beyond”, and while the environment starts with Hadoop...
8 Features of a True Enterprise-Grade Platform for Hadoop and NoSQL
Businesses have several options when looking for a Hadoop and NoSQL solution. The advantage of using the right enterprise-grade solution is that it can provide the dependability, ease-of-use, and speed required for real production use. Without these, you can’t deploy...
How-to: Create a Simple Hadoop Cluster with VirtualBox
I wanted to get familiar with the big data world, and decided to test Hadoop. Initially, I used Cloudera’s pre-built virtual machine with its full Apache Hadoop suite pre-configured (called Cloudera QuickStart VM), and gave it a try. It was a really interesting and...
Top 3 reasons Hadoop is heading to the Cloud
Cloud computing and big data have been vying for the attention of business owners for several years now. Both initiatives are compelling, as big data analytics promises the potential of powerful new business insights, and cloud computing offers greater flexibility,...
Big data: 5 major advantages of Hadoop
By now, you have probably heard of Apache Hadoop - the name is derived from a cute toy elephant but Hadoop is all but a soft toy. Hadoop is an open source project that offers a new way to store and process big data. The software framework is written in Java for...
The three most common ways data junkies are using Hadoop
Analytic applications come in all shapes and sizes–and most importantly, are oriented around addressing a particular vertical need. At first glance, they can seem to have little relation to each other across industries and verticals. But in reality, when observed at...