Name: bman
Owner: Hortonworks Inc
Description: Bman - An Apache Hadoop cluster manager
Created: 2018-04-16 17:08:18.0
Updated: 2018-04-24 21:25:28.0
Pushed: 2018-04-24 21:25:27.0
Size: 219
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
The Bman cluster manager is named after the legendary indian warrior named Bheem. He was a warrior with the strength of 10 thousand elephants, and we hope this cluster manager is able to manage thousands of machines.
Bman is a python tool that deploys Apache Hadoop tarballs to a cluster. Bman reads a set of configuration values from an YAML file called config.yaml
. This configuration file describes the machines in the cluster as well as Hadoop settings.
Bman requires
~/.config/bman/config.yaml
on bman host (more on this below).bman
developers test with Centos 7 however most Linux distributions should work well.yum install -y krb5-libs rng-tools krb5-workstation
. bman will not install Kerberos packages.bman can be installed as a Python3 package. Download a release package from GitHub and install it with pip e.g.
p3 install bman-0.1.tar.gz
bman is not available on pypi yet.
bman
is driven by a YAML file called config.yaml
. It is intended to be a self-documenting configuration file.
Copy the supplied config.yaml.template
to ~/.config/bman/config.yaml
. Edit config.yaml
as appropriate for your cluster by defining the cluster nodes, location of the Hadoop distribution tarball, locations of NameNode metadata and DataNode storage directories and any custom site settings for core-site.xml
, hdfs-site.xml
etc. (optional).
bman
is scriptable e.g. the following shell script installs Apache Hadoop on a cluster. In scriptable mode, the ForceWipe
property must be set to True
in config.yaml
.
sr/bin/env bash
-euo pipefail
prepare # Wipe existing data on cluster nodes.
deploy # Deploy packages and config files, and start all services.
As the script shows, cluster installation occurs in two steps:
prepare
: The existing cluster data is wiped. Service users are recreated.deploy
: Hadoop config files are generated. The Hadoop distribution and config files are copied to all cluster nodes. If Kerberos is enabled, then service principals and keytabs are created. Also the HDFS NameNode is formatted at this step, 'tmp' directories created and (optionally) Tez distribution is uploaded to the cluster. Finally services are started.Run bman
without any parameters to launch the interactive shell.
an
s ctrl-D or type 'quit' to exit.
'help' to get the list of commands.
Dev> prepare
Dev> deploy
Dev> stop all
bman
can enable Hadoop Security on the cluster. This requires the following settings in config.yaml
(see the template for more details):
Additionally, the following four settings must be defined in CoreSiteSettings
:
SiteSettings:
doop.security.authentication: 'kerberos'
doop.security.authorization: 'true'
doop.rpc.protection: 'authentication'
doop.security.auth_to_local: |-
RULE:[2:$1@$0](rm@.*REALM)s/.*/yarn/
RULE:[2:$1@$0](nm@.*REALM)s/.*/yarn/
RULE:[2:$1@$0](nn@.*REALM)s/.*/hdfs/
RULE:[2:$1@$0](dn@.*REALM)s/.*/hdfs/
RULE:[2:$1@$0](snn@.*REALM)s/.*/hdfs/
RULE:[2:$1@$0](jn@.*REALM)s/.*/hdfs/
RULE:[2:$1@$0](jhs@.*REALM)s/.*/mapred/
DEFAULT
Hadoop security requires many other configuration settings including principals, service keytabs and other NameNode/DataNode settings. bman
will auto-generate sensible values for all of these.
It is assumed that you have installed Kerberos client on all the cluster nodes and that all nodes have a valid /etc/krb5.conf
file.
Here is a set of required values in the config.yaml
.
core-site.xml
. The only required setting is fs.defaultFS
.hdfs-site.xml
.yarn-site.xml
. If absent, then YARN services will not be started. There is only one mandatory yarn-site.xml setting: yarn.resourcemanager.address
.mapred-site.xml
.tez-site.xml
.ozone-site.xml
.There are a bunch of settings like OzoneEnabled
or CblockCacheEnabled
, which can be turned on by the user if they want to run Ozone or cblocks. Once again please take a look at config.yaml
.
Clone the source repository. Install venv (virtual env) and dependencies from a POSIX compatible shell like bash (zsh should also work). venv does not support alternative shells like fish.
p3 install virtualenv
rtualenv -p $(which python3) venv
urce venv/bin/activate
p install -r requirements.txt
ew install https://raw.githubusercontent.com/kadwanev/bigboybrew/master/Library/Formula/sshpass.rb
Now you should have a working virtual env.
v) username@hostname /bman$ fab --list
Before using bman, activate the Python venv with:
ce venv/bin/activate
In developer mode, start the bman shell with `python -m bman
`.
v) username@hostname /bman$ python -m bman
s ctrl-D or type 'quit' to exit.
'help' to get the list of commands.
Dev >
bman.py
- Trivial wrapper module to launch the shell.
bman/bman.py
- is a simple shell loop, it reads commands from the user and dispatches them to commands.py.
bman/bman_commands.py
- is the command parser that dispatches to execution methods.
bman/remote_tasks.py
- Most commands that work against the cluster is located in this file.
bman/deployment_manager.py
- understands the steps necessary to deploy different Apache Hadoop configurations (with/without NameNode HA, federation, and combinations thereof).
bman/local_tasks.py
- contains routines to generate the config, private key, routines to copy the files to right location etc.
bman/bman_config.py
- reads the YAML file and puts each of the key into map. These keys can be accessed via calling to cluster.get_config
. To add new keys, please define a Key name at the top of the file and use appropriate key reading function in the cluster_constructor
function.
utils.py
- A few utility functions used by both local_tasks.py
and remote_tasks.py
.
These keys can be accessed anywhere using the cluster.get_config
. You can see many examples in the code.
If you make source code changes and wish to build a new Python package for testing/release, run the following commands. If you are building a new release, you must update the package version in setup.py
before building the package.
-fr bman.egg-info/ dist/
thon setup.py sdist
p3 uninstall -y bman
p3 install $PWD/dist/bman-*.tar.gz
The first step is important to ensure your changes are picked up. See Python Packaging Pitfalls
bman includes contributions from @anuengineer, @arp7, @elek, @mukul1987, @nandakumar131, @chen-liang and @ajayydv.
Apache®, Apache Hadoop, Hadoop®, and the yellow elephant logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.