These books are our recommendations if you are planning to start your Big Data journey with Hadoop, – an open source distributed processing framework which is at the center of a growing big data ecosystem. The books are listed in no specific order.
1. Hadoop: The Definitive Guide
Author: Tom White
Publisher: Hadoop: The Definitive Guide
The books nicely covers Hadoop basic concepts as well as the whole Hadoop galaxy (HDFS, MapReduce, HBase, Zookeeper, Hive, Pig…) With this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. You’ll find illuminating case studies that demonstrate how Hadoop is used to solve specific problems. This third edition covers recent changes to Hadoop, including material on the new MapReduce API, as well as MapReduce 2 and its more flexible execution model (YARN).
2. Hadoop in Practice
Author: Alex Holmes
Publisher: Manning Publications
Hadoop in Practice collects 85 Hadoop examples and presents them in a problem/solution format. Each technique addresses a specific task you’ll face, like querying big data using Pig or writing a log file loader. You’ll explore each problem step by step, learning both how to build and deploy that specific solution along with the thinking that went into its design. As you work through the tasks, you’ll find yourself growing more comfortable with Hadoop and at home in the world of big data.
3. Hadoop in Action
Author: Chuck Lam
Publisher: Manning
Hadoop in Action introduces the subject and shows how to write programs in the MapReduce style. It starts with a few easy examples and then moves quickly to show Hadoop use in more complex data analysis tasks. Included are best practices and design patterns of MapReduce programming.
4. Hadoop Operations
Author: Eric Sammers
Publisher: O’Reilly Press
A guide to running large-scale Hadoop clusters, written by someone who has practical experience in such deployments. If you’ve been asked to maintain large and complex Hadoop clusters, this book is a must.
5. Pro Hadoop
Author: Jason Venner
Publisher: Apress
This book is a step by step guide to writing, running and debugging Map/Reduce jobs using Hadoop, and to installing and managing Hadoop Clusters. It is ideal for training new Map/Reduce users and Cluster administrators and for polishing existing Hadoop skills.
6. Hadoop Beginner’s Guide
Author: Garry Turkington
Publisher: Packt Publishing
Written for complete beginners to Hadoop, the book covers how to install and run Hadoop on a local Ubuntu host or create an on-demand Hadoop cluster on Amazon Web Services (EC2), before getting to grips with MapReduce.
7. Optimizing Hadoop for MapReduce
Author: Khaled Tannir
Publisher: Packt Publishing
Optimizing Hadoop for MapReduce book is an example-based tutorial that deals with Optimizing Hadoop for MapReduce job performance. This book introduces you to advanced MapReduce concepts and teaches you everything from identifying the factors that affect MapReduce job performance to tuning the MapReduce configuration. Based on real-world experience, this book will help you to fully utilize your cluster’s node resources to run MapReduce jobs optimally.
8. Scaling Big Data with Hadoop and Solr
Author: Hrishikesh Karambelkar
Publisher: Packt Publishing
Scaling Big Data with Hadoop and Solr is a step-by-step guide to building a search engine while scaling data. Starting with the basics of Apache Hadoop and Solr, this book then dives into advanced topics of optimizing search with some real-world use cases and sample Java code.
9. Hadoop Operations and Cluster Management Cookbook
Author: Shumin Guo
Publisher: Packt Publishing
Hadoop Operations and Cluster Management Cookbook is a guide for designing and managing a Hadoop cluster.
10. Hadoop Real World Solutions Cookbook
Author: Jonathan Owens, Brian Femiano, Jon Lentz
Publisher: Packt Publishing
Collection of real world code analytics and design patterns using various tools from the Hadoop community. Each recipe walks the reader through the implementation, or in some cases debugging and configuration tuning. The book covers various tools including MapReduce, Hive, Pig, MRUnit, serialization using Avro/Thrift/ProtoBuffs, Giraph, Accumulo and several others.