Skip to content

datafibers/lab_env

Repository files navigation

Build Status

Overview

This is very lightweighted vagrant image for Hadoop big data lab. The total memory needed is only 4G (450M left after all service are started). It will take around 30 minutes to download and setup depending your download speed. The above daily status indicates if the software download url is live or broken.

Soft Installed

This distribution is compatible with HDP 2.6.4, besides upgrade hive and hadoop to stable version.

Hadooper Stream Visualization Utility
hadoop-2.7.7 flink-1.5.0 grafana-5.1.3 git
hive-1.2.2 spark-2.3.3 zeppelin-0.8.1 mysql
hive-2.3.6 confluent-4.1.1 maven
hbase-1.3.6 dos2unix
phoenix-4.13.2 aria2

Quick Setup

  1. Install Oracle VirtualBox in main operation system
  2. Install Vagrant in main operation system
  3. Go to a proper folder and git clone this repository git clone https://github.com/datafibers/lab_env.git
  4. If you prefer to install specific configuration from branch, git checkout <branch_name>
  5. If you prefer to customize install config, you can modify conf/install_config.sh or install_version.sh or this section.
  6. To install cd lab_env && vagrant up
  7. After installed, you'll need to run ops format to format hadoop only for the first time.
  8. To update the default settings, use command cd lab_env && git pull && vagrant provision

Operation Command Reference (Run inside VM)

  • Enter ops to get full command help
  • Enter ops start all to start all service
  • Enter ops status to check status as follows
vagrant@vagrant:~$ ops status
****************Starting Operations****************
[INFO]   [ZooKeeper]          is running at [3232]
[INFO]   [Kafka]              is running at [3302]
[INFO]   [Kafka_Connect]      is running at [3464]
[INFO]   [Schema_Registry]    is running at [3387]
[INFO]   [Flink_JobMgr]       is running at [4298]
[INFO]   [Flink_TaskMgr]      is running at [4644]
[INFO]   [Spark_Master]       is running at [4702]
[INFO]   [Spark_Worker]       is running at [4926]
[INFO]   [Zeppelin_Server]    is running at [5060]
[INFO]   [HBase_Master]       is running at [3658]
[INFO]   [HBase_Region]       is running at [3777]
[INFO]   [Hadoop_NameNode]    is running at [2131]
[INFO]   [Hadoop_DataNode]    is running at [2251]
[INFO]   [Yarn_ResourceMgr]   is running at [2615]
[INFO]   [Yarn_NodeMgr]       is running at [2737]
[INFO]   [HiveServer2]        is running at [2953]
[INFO]   [HiveMetaStore]      is running at [2952]
[INFO]   [Hive2Server2]       is running at [2954]
[INFO]   [Hive2MetaStore]     is running at [2955]

Tool Command Reference (Run inside VM)

Vagrant Command Reference (Run outside VM)

Purpose Command
Start the vm/image install vagrant up
Stop the vm vagrant halt
Update the vm git pull && vagrant provision
Suspend the vm/hibernate vagrant suspend
Wake up the vm vagrant resume
Restart the vm vagrant reload

Customization

  • Customize VM memory, either modify this line before install or adjust memory setting in virtualbox once install is done.

Known Issues

  • If the start up requires password, please do following setting.
ssh-keygen -t rsa -P ''
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 755 ~/.ssh/authorized_keys
  • To re-associate the vagrant and virtualbox at here
  • When vagrant provision has SSH authentication issues, add following in the Vagrantfile.
config.ssh.username = "vagrant"  
config.ssh.password = "vagrant"  
config.ssh.insert_key = false