Hadoop and Big Data are practically synonymous these days. There is so much info on Hadoop and Big Data out there, but as the Big Data hype machine gears up, there’s a lot of confusion about where Hadoop actually fits into the overall Big Data landscape. Let’s have a look at some of the popular myths about Hadoop.
Myth #1: Hadoop is a database
Hadoop is often talked about like it’s a database, but it isn’t. Hadoop is primarily a distributed file system and doesn’t contain database features like query optimization, indexing and random access to data. However, Hadoop can be used to build a database system.
Myth #2: Hadoop is a complete, single product
It’s not. This is the biggest myth of all! Hadoop consists of multiple open source products like HDFS (Hadoop Distributed File System), MapReduce, PIG, Hive, HBase, Ambari, Mahout, Flume and HCatalog. Basically, Hadoop is an ecosystem — a family of open source products and technologies overseen by the Apache Software Foundation (ASF).
Myth #3: Hadoop is cheap
This is a common misconception associated with anything open source. Just because you’re able to reduce or eliminate the initial costs of purchasing software doesn’t mean you’ll necessarily save money. Though Hadoop is open source, there are a lot of costs associated with deploying Hadoop.
Myth #4: Hadoop needs a bunch of programmers
This totally depends on what the organization plans to do. If the plan is to build a fancy Hadoop based Big Data suite, then programmers come into picture. If not, then programming should not be a worry at all, as most data integration tools have GUIs that abstract MapReduce programming complexity and pre-built templates.
Myth #5: Hadoop can only handle web analytics
When it comes to Hadoop, Web Analytics is highlighted as most of the companies use it for analyzing web logs and other web data. But, its application is not limited to web analytics alone. Hadoop is capable of handling a wider range of data and analytics appealing to broader range of organizations.
Myth #6: Big Data can do without Hadoop
When we say Big Data, then immediate thing that comes to mind is Hadoop, in-spite of other options available in the market. Therefore, when dealing with Big Data, there has to be Hadoop. The two have become synonymous.
Myth #7: Hive resembles SQL
People who know SQL can quickly learn to hand code Hive, but that doesn’t solve compatibility issues with SQL-based tools. Over the time, it is believed that Hadoop products will support standard SQL and SQL based vendor tools will support Hadoop.
Myth #8: Hadoop requires MapReduce
Hadoop and MapReduce are related, but they are not married to each other. Saying this, they are not mutually exclusive to each other. There are some variations of MapReduce that work with a variety of storage technologies that includes HDFS and some relational DBMSs. Some users opt to deploy HDFS with Hive or HBase, but not MapReduce.
Myth #9: MapReduce only controls analytics
MapReduce handles parallel programming, fault tolerance of wide variety of coded logics and other applications, than just analytics.
Myth #10: Hadoop is too risky for enterprise use
Many organizations fear that Hadoop is too new and untested to be suited for the enterprise. Nothing could be further from the truth. Today, Hadoop is used by everyone from Netflix to Twitter to eBay, and major vendors including Microsoft, IBM and Oracle all sell Hadoop tools.