Create a Hadoop Cluster

Hadoop Cluster Setup

Chef-bach can be used to create a Hadoop test cluster using VirtualBox virtual machines on a hypervisor host with enough resources.

The virtual cluster created by chef-bach consists of five nodes. One of the nodes acts as the bootstrap node and hosts a chef server. The other four nodes act as Hadoop nodes. Three of the four Hadoop nodes are master nodes, and the last node is a worker node.

Steps to create the test cluster are detailed below. This process has been tested on hypervisor hosts running Mac OS and Ubuntu.

Pre-Installation

Install curl on the hypervisor host.
Install virtualbox on the hypervisor host (virtualbox.org).
- This will likely require make and gcc to already be installed.
- sudo apt-get install virtualbox
Install vagrant on the hypervisor host (vagrantup.com).
- sudo apt-get install vagrant
- It's recommended you manually install the latest version of vagrant from vagrantup.com. See the note below.
Delete the default DHCP server inbuilt in virtualbox.

$ vboxmanage list dhcpservers
NetworkName:    HostInterfaceNetworking-vboxnet0
IP:             192.168.56.100
NetworkMask:    255.255.255.0
lowerIPAddress: 192.168.56.101
upperIPAddress: 192.168.56.254
Enabled:        Yes

$ vboxmanage dhcpserver remove --netname HostInterfaceNetworking-vboxnet0

# Run the above two commands with sudo too in case the current user does not have access to view/edit the dhcpservers

Run sudo pkill -f VBox on the hypervisor host.
Clone chef-bach repository onto the hypervisor host.
git clone https://github.com/bloomberg/chef-bach.git
rename chef-bach to chef-bcpc directory on the hypervisor host.
cd to chef-bcpc directory on the hypervisor host.

Note: The vagrant release MUST support your current version of VirtualBox. You may need to downgrade VirtualBox to an earlier version if an older version of vagrant is used. For more information, check the vagrant release notes. You can download the latest version of vagrant here.

Installation

Run the installation script under the test directory.
./tests/automated_install.sh
This will download all the required software, create the five node cluster, and install all the HDP (Hortonworks Data Platform) Hadoop components. As you can imagine, this takes some time. Depending on the size of the hypervisor host, network bandwidth, etc., it can take around 2 to 4 hours to complete.

Post-Installation

Once the installation is complete, login to the bootstrap node with: vagrant ssh
You need to be in the chef-bcpc directory on the hypervisor.
cd to the chef-bcpc directory and run the following set of commands twice in sequence:

for vm in bcpc-vm{1..4}; do
  ./cluster-assign-roles.sh Test-Laptop hadoop $vm;
done

This completes the creation of the four Hadoop nodes.
bcpc-vm1 (10.0.100.11) is a master node which hosts an HDFS Namenode, HBase HMaster, and MySQL server.
bcpc-vm2 (10.0.100.12) is a master node which hosts a YARN Resource Manager, Hive HCatalog, and MySQL Server.
bcpc-vm3 (10.0.100.13) is a master node which hosts an AmbariServer.
bcpc-vm4 (10.0.100.14) is a worker node which hosts an HDFS Datanode, HBase RegionServer, and YARN NodeManager.
Passwords for various components, including those to login to the Hadoop nodes, can be retrieved with the following commands.
All of these commands MUST be run from the chef-bcpc directory on the bootstrap host (vagrant@bcpc-bootstrap:~/chef-bcpc$)

# SSH keys for various role accounts
sudo knife data bag show configs Test-Laptop  

# The cobbler-root-password is also stored in Chef Vault
# Use this password to login to the four Hadoop noes as the "ubuntu" user
# ubuntu is also part of the sudoers list
sudo knife vault show os cobbler "root-password" --mode client

Telemetry

Graphite (https://10.0.100.5:8888) collects various system and JMX stats from the various nodes and Hadoop components.
Zabbix (https://10.0.100.5:7777) manages the metrics monitoring and alarming.

Creating a kerberos principal

At this point we can start testing various components, since the cluster uses kerberos authentication we would need to create a kerberos principal for the ubuntu user.

root@bcpc-bootstrap:/home/vagrant/chef-bcpc# kadmin.local
Authenticating as principal root/[email protected] with password.
kadmin.local:  add_principal ubuntu
WARNING: no policy specified for [email protected]; defaulting to no policy
Enter password for principal "[email protected]":
Re-enter password for principal "[email protected]":
Principal "[email protected]" created.
kadmin.local:

Grant the ubuntu user HBase privileges

For the purposes of testing, make ubuntu a superuser

$ sudo -u hbase hbase shell
    hbase(main):001:0> grant 'ubuntu', 'RWCAX'
     ..skipping standard messages..
     0 row(s) in 0.9490 seconds

The execute permission 'X' is unnecessary if users do not require hbase delegation tokens.

Hadoop Cluster Verification

Once cluster is created and setup, its health can be verified by examining the results of automated smoke tests or by running manual tests. Following sections explain the process to verify cluster using both automated and manual tests. The verification process requires a valid kerberos ticket which can be created by executing kinit statement. kinit statement would use the logged in user (ubuntu in this case) and would ask for password to complete the ticket creation process.

kinit

Automated Cluster Health Verification (Smoke Tests)

Chef-Bach also runs smoke tests to verify cluster health. These smoke tests are scheduled to run after every 10 minutes. To check cluster's health through smoke tests, please follow the steps below:

Oozie server is installed on vm2 node, make sure OOZIE_URL is set if you are running these steps from other nodes, i.e. vm3 or vm1 nodes.

export OOZIE_URL=http://f-bcpc-vm2:11000/oozie

Run following command to find out Job Id (Coordinator Id) for Oozie-Smoke-Test-Coordinator jobs.

oozie jobs --jobtype=coordinator

Use the Job Id from step 1 to find the workflow run.

oozie job -info <JobId_FROM_STEP 1>

Step 2 will list all the workflow runs, please copy the latest Ext ID (workflow ID). It will look like xxxxxxx-xxxxxxxxxxxxxx-oozie-oozie-W where x represents a random digit.
Workflow ID will show us result of all workflow actions by executing following command.

oozie job -info <WORKFLOW_ID_FROM_STEP3>

To make sure, cluster is healthy, all the tests should be succeeded.

Manual Cluster Health Verification

This section explains the process to verify cluster manually. Tests are performed on each Hadoop components to confirm its health.

HDFS

Log on to bcpc-vm3. You can do ssh [email protected] using the cobbler_root password from above from the hypervisor or from the bootstrap node chef-bcpc directory issue ./nodessh.sh Test-Laptop 10.0.100.13 - Take note of the cobbler_root password that is printed if you used the second option to login. This is the password you will need for sudo below.
Run sudo -u hdfs hdfs dfs -copyFromLocal /etc/passwd /passwd
Run sudo -u hdfs hdfs dfs -cat /passwd
Run sudo -u hdfs hdfs dfs -rm /passwd
If all these are successful the hdfs component is verified

HBase

Run hbase shell
Under the hbase shell, run create 't1','cf1'
Run list which should display the newly created table as a list
Run put 't1','r1','cf1:c1','v1'
Run scan 't1' which should display the row create in the previous step
Run disable 't1'
Run drop 't1'
Run list and it should display an empty list
Run exit
If all these steps are complete, the HBase component is verified along with ZooKeeper

Map/Reduce

Assuming your username is xyz, we need to create that user on all the hadoop nodes and a home directory on HDFS
Run the following on any node:

sudo -u hdfs hdfs dfs -mkdir /user/xyz 
sudo -u hdfs hdfs dfs -chown xyz /user/xyz

Create a new user in all the three hadoop nodes using adduser command
Login to the bcpc-vm2 (10.0.100.12) node and switch to the new user created in the previous step.
Run yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar pi 1 100 Note: Replace the hdp version based on your installation. You may need to make sure that the requested container size is within the yarn minimum container size (see yar-site.xml)

yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar pi -Dmapreduce.map.memory.mb=256 -Dmapreduce.reduce.memory.mb=512 -Dmapreduce.reduce.java.opts="-Xmx410m" -Dmapreduce.map.java.opts="-Xmx205m" -Dyarn.app.mapreduce.am.resource.mb=256 1 10

If the previous step completes successfully it verifies YARN and MapReduce components

Hive

Prepare the warehouse location for the ubuntu user:

sudo -u hdfs hdfs dfs -mkdir /user/hive/warehouse/t1
sudo -u hdfs hdfs dfs -chown ubuntu:hdfs /user/hive/warehouse/t1

If you plan to use Hive, being on bcpc-vm2 hadoop node bring up Hive shell by running hive
Create a table. create table t1 (id int, data string);
Populate data

insert into table t1 values(1, "aaa111");
insert into table t1 values(2, "bbb222");
insert into table t1 values(3, "ccc333");

Retrieve data select * from t1;
Describe the newly created table. describe t1;
Trigger a Map/Reduce job select count(*) from t1;
Drop the newly created table. drop table t1;
If these steps are successful, it verifies the Hive component You may have noticed that in some queries such as where aggregations are used hive will use Map/Reduce, just as in previous Map/Reduce example, hive's Map/Reduce job may also stall. You can also give hive hints as to how much memory to request by passing command line options

hive -hiveconf mapreduce.map.memory.mb=256 -hiveconf mapreduce.reduce.memory.mb=512 -hiveconf mapreduce.reduce.java.opts="-Xmx410m" -hiveconf mapreduce.map.java.opts="-Xmx205m" -hiveconf yarn.app.mapreduce.am.resource.mb=256

Verify Spark

On a worker node

cd /usr/spark/current
./bin/spark-submit --class org.apache.spark.examples.SparkPi  --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 examples/jars/spark-examples*.jar

Examine yarn logs and see if you will find output similar to this

LogType:stdout
Log Upload Time:Tue Sep 13 17:17:44 -0400 2016
LogLength:22
Log Contents:
Pi is roughly 3.14482
End of LogType:stdout

Provide feedback

Saved searches

Use saved searches to filter your results more quickly