-
Notifications
You must be signed in to change notification settings - Fork 66
Create a Hadoop Cluster
Chef-bach can be used to create a Hadoop test cluster using VirtualBox virtual machines on a hypervisor host with enough resources.
The virtual cluster created by chef-bach consists of five nodes. One of the nodes acts as the bootstrap node and hosts a chef server. The other four nodes act as Hadoop nodes. Three of the four Hadoop nodes are master nodes, and the last node is a worker node.
Steps to create the test cluster are detailed below. This process has been tested on hypervisor hosts running Mac OS and Ubuntu.
- Install
curl
on the hypervisor host. - Install
virtualbox
on the hypervisor host (virtualbox.org).- This will likely require
make
andgcc
to already be installed. sudo apt-get install virtualbox
- This will likely require
- Install
vagrant
on the hypervisor host (vagrantup.com).sudo apt-get install vagrant
- It's recommended you manually install the latest version of vagrant from vagrantup.com. See the note below.
- Delete the default
DHCP server
inbuilt invirtualbox
.
$ vboxmanage list dhcpservers
NetworkName: HostInterfaceNetworking-vboxnet0
IP: 192.168.56.100
NetworkMask: 255.255.255.0
lowerIPAddress: 192.168.56.101
upperIPAddress: 192.168.56.254
Enabled: Yes
$ vboxmanage dhcpserver remove --netname HostInterfaceNetworking-vboxnet0
# Run the above two commands with sudo too in case the current user does not have access to view/edit the dhcpservers
- Run
sudo pkill -f VBox
on the hypervisor host. - Clone chef-bach repository onto the hypervisor host.
git clone https://github.com/bloomberg/chef-bach.git
- rename
chef-bach
tochef-bcpc
directory on the hypervisor host. - cd to
chef-bcpc
directory on the hypervisor host.
Note: The vagrant release MUST support your current version of VirtualBox. You may need to downgrade VirtualBox to an earlier version if an older version of vagrant is used. For more information, check the vagrant release notes. You can download the latest version of vagrant here.
- Run the installation script under the test directory.
./tests/automated_install.sh
- This will download all the required software, create the five node cluster, and install all the HDP (Hortonworks Data Platform) Hadoop components. As you can imagine, this takes some time. Depending on the size of the hypervisor host, network bandwidth, etc., it can take around 2 to 4 hours to complete.
- Once the installation is complete, login to the bootstrap node with:
vagrant ssh
You need to be in thechef-bcpc
directory on the hypervisor. - cd to the
chef-bcpc
directory and run the following set of commands twice in sequence:
for vm in bcpc-vm{1..4}; do
./cluster-assign-roles.sh Test-Laptop hadoop $vm;
done
-
This completes the creation of the four Hadoop nodes.
bcpc-vm1
(10.0.100.11) is a master node which hosts an HDFS Namenode, HBase HMaster, and MySQL server.
bcpc-vm2
(10.0.100.12) is a master node which hosts a YARN Resource Manager, Hive HCatalog, and MySQL Server.
bcpc-vm3
(10.0.100.13) is a master node which hosts an AmbariServer.
bcpc-vm4
(10.0.100.14) is a worker node which hosts an HDFS Datanode, HBase RegionServer, and YARN NodeManager. -
Passwords for various components, including those to login to the Hadoop nodes, can be retrieved with the following commands.
All of these commands MUST be run from thechef-bcpc
directory on the bootstrap host (vagrant@bcpc-bootstrap:~/chef-bcpc$)
# SSH keys for various role accounts
sudo knife data bag show configs Test-Laptop
# The cobbler-root-password is also stored in Chef Vault
# Use this password to login to the four Hadoop noes as the "ubuntu" user
# ubuntu is also part of the sudoers list
sudo knife vault show os cobbler "root-password" --mode client
- Graphite (https://10.0.100.5:8888) collects various system and JMX stats from the various nodes and Hadoop components.
- Zabbix (https://10.0.100.5:7777) manages the metrics monitoring and alarming.
At this point we can start testing various components, since the cluster uses kerberos authentication we would need to create a kerberos principal for the ubuntu user.
root@bcpc-bootstrap:/home/vagrant/chef-bcpc# kadmin.local
Authenticating as principal root/[email protected] with password.
kadmin.local: add_principal ubuntu
WARNING: no policy specified for [email protected]; defaulting to no policy
Enter password for principal "[email protected]":
Re-enter password for principal "[email protected]":
Principal "[email protected]" created.
kadmin.local:
For the purposes of testing, make ubuntu a superuser
$ sudo -u hbase hbase shell
hbase(main):001:0> grant 'ubuntu', 'RWCAX'
..skipping standard messages..
0 row(s) in 0.9490 seconds
The execute permission 'X' is unnecessary if users do not require hbase delegation tokens.
Once cluster is created and setup, its health can be verified by examining the results of automated smoke tests or by running manual tests. Following sections explain the process to verify cluster using both automated and manual tests. The verification process requires a valid kerberos ticket which can be created by executing kinit
statement. kinit
statement would use the logged in user (ubuntu
in this case) and would ask for password to complete the ticket creation process.
kinit
Automated Cluster Health Verification (Smoke Tests)
Chef-Bach also runs smoke tests to verify cluster health. These smoke tests are scheduled to run after every 10 minutes. To check cluster's health through smoke tests, please follow the steps below:
- Oozie server is installed on vm2 node, make sure OOZIE_URL is set if you are running these steps from other nodes, i.e. vm3 or vm1 nodes.
export OOZIE_URL=http://f-bcpc-vm2:11000/oozie
- Run following command to find out
Job Id (Coordinator Id)
forOozie-Smoke-Test-Coordinator
jobs.
oozie jobs --jobtype=coordinator
- Use the
Job Id
from step 1 to find the workflow run.
oozie job -info <JobId_FROM_STEP 1>
-
Step 2 will list all the workflow runs, please copy the latest Ext ID (workflow ID). It will look like
xxxxxxx-xxxxxxxxxxxxxx-oozie-oozie-W
wherex
represents a random digit. -
Workflow ID will show us result of all workflow actions by executing following command.
oozie job -info <WORKFLOW_ID_FROM_STEP3>
To make sure, cluster is healthy, all the tests should be succeeded.
Manual Cluster Health Verification
This section explains the process to verify cluster manually. Tests are performed on each Hadoop components to confirm its health.
HDFS
- Log on to
bcpc-vm3
. You can dossh [email protected]
using the cobbler_root password from above from the hypervisor or from the bootstrap nodechef-bcpc
directory issue./nodessh.sh Test-Laptop 10.0.100.13 -
Take note of the cobbler_root password that is printed if you used the second option to login. This is the password you will need for sudo below. - Run
sudo -u hdfs hdfs dfs -copyFromLocal /etc/passwd /passwd
- Run
sudo -u hdfs hdfs dfs -cat /passwd
- Run
sudo -u hdfs hdfs dfs -rm /passwd
- If all these are successful the hdfs component is verified
HBase
- Run
hbase shell
- Under the hbase shell, run
create 't1','cf1'
- Run
list
which should display the newly created table as a list - Run
put 't1','r1','cf1:c1','v1'
- Run
scan 't1'
which should display the row create in the previous step - Run
disable 't1'
- Run
drop 't1'
- Run
list
and it should display an empty list - Run
exit
- If all these steps are complete, the HBase component is verified along with ZooKeeper
Map/Reduce
- Assuming your username is xyz, we need to create that user on all the hadoop nodes and a home directory on HDFS
- Run the following on any node:
sudo -u hdfs hdfs dfs -mkdir /user/xyz
sudo -u hdfs hdfs dfs -chown xyz /user/xyz
- Create a new user in all the three hadoop nodes using
adduser
command - Login to the bcpc-vm2 (10.0.100.12) node and switch to the new user created in the previous step.
- Run
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar pi 1 100
Note: Replace the hdp version based on your installation. You may need to make sure that the requested container size is within the yarn minimum container size (see yar-site.xml)
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar pi -Dmapreduce.map.memory.mb=256 -Dmapreduce.reduce.memory.mb=512 -Dmapreduce.reduce.java.opts="-Xmx410m" -Dmapreduce.map.java.opts="-Xmx205m" -Dyarn.app.mapreduce.am.resource.mb=256 1 10
- If the previous step completes successfully it verifies YARN and MapReduce components
Hive
- Prepare the warehouse location for the ubuntu user:
sudo -u hdfs hdfs dfs -mkdir /user/hive/warehouse/t1
sudo -u hdfs hdfs dfs -chown ubuntu:hdfs /user/hive/warehouse/t1
- If you plan to use Hive, being on bcpc-vm2 hadoop node bring up Hive shell by running
hive
- Create a table.
create table t1 (id int, data string);
- Populate data
insert into table t1 values(1, "aaa111");
insert into table t1 values(2, "bbb222");
insert into table t1 values(3, "ccc333");
- Retrieve data
select * from t1;
- Describe the newly created table.
describe t1;
- Trigger a Map/Reduce job
select count(*) from t1;
- Drop the newly created table.
drop table t1;
- If these steps are successful, it verifies the
Hive
component You may have noticed that in some queries such as where aggregations are used hive will use Map/Reduce, just as in previous Map/Reduce example, hive's Map/Reduce job may also stall. You can also give hive hints as to how much memory to request by passing command line options
hive -hiveconf mapreduce.map.memory.mb=256 -hiveconf mapreduce.reduce.memory.mb=512 -hiveconf mapreduce.reduce.java.opts="-Xmx410m" -hiveconf mapreduce.map.java.opts="-Xmx205m" -hiveconf yarn.app.mapreduce.am.resource.mb=256
Verify Spark
On a worker node
cd /usr/spark/current
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 examples/jars/spark-examples*.jar
Examine yarn logs and see if you will find output similar to this
LogType:stdout
Log Upload Time:Tue Sep 13 17:17:44 -0400 2016
LogLength:22
Log Contents:
Pi is roughly 3.14482
End of LogType:stdout