Configures and compiles nutch 2.1 and installs it in configured cloudera machine.
- nutch21 (default) - Installs nutch on cloudera
Create a role in "roles/nutch.rb":
name "nutch21"
description "Installs nutch 2.1 with all dependencies on cloudera 3"
default_attributes(
:java => {
:oracle => {
"accept_oracle_download_terms" => true
},
"remove_deprecated_packages" => false
}
)
override_attributes(
"cloudera" => {
"version" => 3,
"installyarn" => false
},
"java" => {
"install_flavor" => "oracle"
},
"ant" => {
"install_method" => "source"
}
)
run_list(
"recipe[hosts]",
"recipe[apt]",
"recipe[java]",
"recipe[ant]",
"recipe[cloudera]",
"recipe[cloudera::hbase]",
"recipe[nutch21]"
)
I've used the following Cheffile:
#!/usr/bin/env ruby
#^syntax detection
site 'http://community.opscode.com/api/v1'
cookbook "apt"
cookbook "java", "1.6.0"
cookbook "ant"
cookbook "sudo"
cookbook "vim"
cookbook "git"
cookbook "hosts"
cookbook "openssh"
cookbook "openssl"
cookbook "cloudera", :git => "git://github.com/timoschmidt/cloudera.git"
cookbook "nutch21", :git => "git://github.com/timoschmidt/nutch21-cookbook.git"
Create a Vagrant file (nutch currently not works with hbase 0.94 thats why we use cloudera 3 and lucid32):
# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant::Config.run do |config|
config.vm.box = "lucid32"
config.vm.box_url = "http://files.vagrantup.com/lucid32.box"
config.vm.network :hostonly, "192.168.33.10"
config.vm.host_name = "localhost"
config.vm.provision :chef_solo do |chef|
chef.cookbooks_path = "cookbooks"
chef.roles_path = "roles"
chef.add_role "nutch"
end
# chef.validation_client_name = "ORGNAME-validator"
config.vm.customize ["modifyvm", :id,"--memory", "4096"]
end
java cookbook: https://github.com/opscode-cookbooks/java cloudera cookbook: https://github.com/timoschmidt/cloudera (https://github.com/danielpoe/cloudera)
- Ubuntu