It’s been a while since we provided a how-to for this purpose. Thanks, Daan Debie (@DaanDebie), for allowing us to re-publish the instructions below (for CDH 5)!
I recently started as a Big Data Engineer at The New Motion. While researching our best options for running an Apache Hadoop cluster, I wanted to try out some of the features available in the newest version of Cloudera’s Hadoop distribution: CDH 5. Of course I could’ve downloaded the QuickStart VM, but I rather wanted to run a virtual cluster, making use of the 16GB of RAM my shiny new 15″ Retina Macbook Pro has 😉
There are some tutorials, and repositories available for installing a local virtualized cluster, but none of them did what I wanted to do: install the bare cluster using Vagrant, and install the Hadoop stack using the Cloudera Manager. So I created a simple Vagrant setup myself. You can find it here.
Setting up the Virtual Machines
As per the instructions from the Gitub repo:
Depending on the hardware of your computer, installation will probably take between 15 and 25 minutes.
First install VirtualBox and Vagrant.
Install the Vagrant Hostmanager plugin.