0% found this document useful (0 votes)
46 views54 pages

BIGDATALABCURRENT

Big data lab

Uploaded by

nandhusekar2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views54 pages

BIGDATALABCURRENT

Big data lab

Uploaded by

nandhusekar2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

S.

No Date Name of the Experiment Page Marks Signature


No.

1. HADOOP INSTALLATION 1

2. HADOO IMPLEMENTATION OF FIE MANAGEMENT 11


TASKS

3. IMPLEMENT OF MATRIX MULTIPLICATION WITH 17


HADOOP MAP REDUCE

4. IMPLEMENTATION OF WORD COUNT PROGRAMS 37


USING MAP REDUCE

5. INSTALLATION OF HIVE ALONG WITH PRACTICE 49


EXAMPLES

6. INSTALLAION OF HBASE, INSTALLING THRIFT 75


ALONG WITH EXAMPLES

7. IMPORTING AND EXPORTING DATA FROM 85


VARIOUS DATABASES
Ex. No:1
HADOOP INSTALLATION
Date:

AIM
To Download and install Hadoop and Understanding different Hadoop modes, Startup scripts,
Configuration files.

PROCEDURE:
Step by step Hadoop 2.8.0 installation on Windows 10 Prepare:

These software‘s should be prepared to install Hadoop 2.8.0 on window 10 64 bits.

1) Download Hadoop 2.8.0


(Link: [Link]
2.8.0/[Link] OR
[Link]
[Link])
2) Java JDK [Link]
(Link: [Link]
[Link])

Set up:
1) Check either Java 1.8.0 is already installed on your system or not, use “Javac -version" to
check Java version

2) If Java is not installed on your system then first install java under "C:\JAVA" Javasetup
3) Extract files Hadoop [Link] or [Link] and place under
"C:\Hadoop-2.8.0" hadoop

4) Set the path HADOOP_HOME Environment variable on windows 10(see Step 1,2, 3 and 4
below) hadoop

5) Set the path JAVA_HOME Environment variable on windows 10(see Step 1, 2,3 and 4 below)
java

6) Next we set the Hadoop bin directory path and JAVA bin directory path

1
Configuration
a) File C:/Hadoop-2.8.0/etc/hadoop/[Link], paste below xml paragraph andsave this
file.
<configuration>
<property>
<name>[Link]</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

b) Rename "[Link]" to "[Link]" and edit this file C:/Hadoop-


2.8.0/etc/hadoop/[Link], paste below xml paragraph and savethis file.
<configuration>
<property>
<name>[Link]</name>
<value>yarn</value>
</property>
</configuration
>

c) Create folder "data" under "C:\Hadoop-2.8.0"


1) Create folder "datanode" under "C:\Hadoop-2.8.0\data"
2) Create folder "namenode" under "C:\Hadoop-2.8.0\data" data
d) Edit file C:\Hadoop-2.8.0/etc/hadoop/[Link], paste below xml paragraphand save
this file.
<configuration>
<property>
<name>[Link]</name>
<value>1</value>
</property>
<property>

3
<name>[Link]</name>
<value>C:\hadoop-2.8.0\data\namenode</value>
</property>
<property>
<name>[Link]</name>
<value>C:\hadoop-2.8.0\data\datanode</value>
</property>
</configuration>

e) Edit file C:/Hadoop-2.8.0/etc/hadoop/[Link], paste below xml paragraphand save


this file.
<configuration>
<property>
<name>[Link]-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>[Link]</name>
<value>[Link]</value>
</property>
</configuration>
f) Edit file C:/Hadoop-2.8.0/etc/hadoop/[Link] by closing the command line
"JAVA_HOME=%JAVA_HOME%" instead of set "JAVA_HOME=C:\Java" (On C:\java this
is path to file jdk.18.0)

Hadoop Configuration
7) Download file Hadoop [Link]
(Link:[Link]
WINDOW-10/blob/master/Hadoop%[Link])

8) Delete file bin on C:\Hadoop-2.8.0\bin, replaced by file bin on file just download(from
Hadoop [Link]).

5
9) Open cmd and typing command "hdfs namenode –format" .You will see hdfs
namenode –format

Testing
10) Open cmd and change directory to "C:\Hadoop-2.8.0\sbin" and type "[Link]"
to start apache.

11) Make sure these apps are running.


a) Name node
b) Hadoop data node
c) YARN Resource Manager
d) YARN Node Manager hadoop nodes
12) Open: [Link]
13) Open: [Link]

7
RESULT
Installing Hadoop and Understanding different Hadoop modes. Startup scripts, Configuration
files was executed successfully.

9
[Link]
HADOOP IMPLEMENTATION OF FILE MANAGEMENT TASKS

Date:

AIM:
To write a program to implement Hadoop Implementation of file management tasks

PROGRAM :
Implement the following file management tasks in Hadoop:
i. Adding files and directories
ii. Retrieving files
iii. Deleting files
Adding Files and Directories to HDFS
Before you can run Hadoop programs on data stored in HDFS, you„ll need to put the data into
HDFS first . Let„s create a directory and put a file in it. HDFS has a default working directory of
/user/$USER, where $USER is your login user name. This directory isn„t automatically created for
you, though, so let„s create it with the mkdir command. For the purpose of illustration, we use chuck.
You should substitute your user name in the example commands.
hadoop fs -mkdir /user/chuck
hadoop fs -put [Link] /user/chuck

Retrieving Files from HDFS


The Hadoop command get copies files from HDFS back to the local filesystem. To retrieve
[Link],
we can run the following command:
hadoop fs -cat [Link]
Deleting files from HDFS
hadoop fs -rm [Link]

11
12
• Command for creating a directory in hdfs is “hdfs dfs –mkdir /lendicse”.
• Adding directory is done through the command “hdfs dfs –put lendi_english /”.

RESULT

Thus, the program to implement Hadoop Implementation of file management tasks was
executed and verified successfully.
13
[Link]
IMPLEMENT OF MATRIX MULTIPLICATION WITH HADOOP
MAP REDUCE
Date:

AIM:
To Develop a Map Reduce program to implement Matrix Multiplication.

Procedure:
In mathematics, matrix multiplication or the matrix product is a binary operation that produces a
matrix from two matrices. The definition is motivated by linear equations and linear transformations on
vectors, which have numerous applications in applied mathematics, physics, and engineering. In more
detail, if A is an n × m matrix and B is an m × p matrix, their matrix product AB is an n × p matrix, in
which the m entries across a row of A are multiplied with the m entries down a column of B and
summed to produce an entry of AB. When two linear transformations are represented by matrices, then
the matrix product represents the composition of the two transformations.

• Create two files M1, M2 and put the matrix values. (sperate columns with spaces and rows with
a line break)

For this example I am taking matrices as:


123 7 8
456 9 10
11 12

Put the above files to HDFS at location /user/clouders/matrices/


hdfs dfs -mkdir /user/cloudera/matrice
hdfs dfs -put /path/to/M1 /user/cloudera/matrices/
hdfs dfs -put /path/to/M2 /user/cloudera/matrices/

17
Algorithm for Map Function.

a. for each element mij of M do


produce (key,value) pairs as ((i,k), (M,j,mij), for k=1,2,3,.. upto the number of
columns of N
b. for each element njk of N do
produce (key,value) pairs as ((i,k),(N,j,Njk), for i = 1,2,3,.. Upto the number of
rows of M.

c. return Set of (key,value) pairs that each key (i,k), has list with values
(M,j,mij) and (N, j,njk) for all possible values of j.

Algorithm for Reduce Function.

d. for each key (i,k) do


e. sort values begin with M by j in listM sort values begin with N by j in listN
multiply mij and njk for jth value of each list
f. sum up mij x njk return (i,k), Σj=1 mij x njk
19
Step 1. Download the hadoop jar files with these links.
Download Hadoop Common Jar files: [Link]
$ wget [Link] -O [Link]
Download Hadoop Mapreduce Jar File: [Link]
$ wget [Link] -O [Link]

Step 2. Creating Mapper file for Matrix Multiplication.


import [Link];
import [Link];
import
[Link];import
[Link];

import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link].*;
import [Link].*;
import [Link];
class Element implements writable {
int tag;
int index;
double value;
Element(){
tag = 0;
index = 0;
value = 0.0;
}
Element(int tag, int index, double value)
{
[Link] = tag;
21
[Link] = index;
[Link] = value;
}
@Override
public void readFields(DataInput input) throws
IOException {tag = [Link]();
index = [Link]();
value = [Link]();
}
@Override
public void write(DataOutput output) throws
IOException {[Link](tag);
[Link](index);
[Link](value);
}
}
class Pair implements WritableComparable<Pair>
{
int i;
int j;

Pair() {
i = 0;
j = 0;
}

Pair(int i, int j)
{ this.i = i;
this.j = j;

}
@Override
public void readFields(DataInput input) throws
IOException
{
i = [Link]();
j = [Link]();
}
23
@Override
public void write(DataOutput output) throws
IOException {[Link](i);
[Link](j);
}
@Override
public int compareTo(Pair compare)
{
if (i > compare.i) {
return 1;
} else if ( i <
compare.i)
{return -1;
} else {
if(j > compare.j) {
return 1;
} else if (j <
compa
re.j) {
return
-1;
}}
return 0;
}
public String
toString()
{
return i + " " + j + " ";
}
}
public class Multiply {
public static class MatriceMapperM extends Mapper<Object,Text,IntWritable,Element>
{
@Override
public void map(Object key, Text value, Context context) throws
IOException, InterruptedException
{
String readLine =

25
[Link](); String[]
stringTokens =
[Link](",");
int index = [Link](stringTokens[0]);
double elementValue =
[Link](stringTokens[2]);
Element e = new Element(0, index, elementValue);
IntWritable keyValue = new
IntWritable([Link](stringTokens[1]));
[Link](keyValue, e);
}
}
public static class MatriceMapperN extends
Mapper<Object,Text,IntWritable,Element> {@Override
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException
{
String readLine = [Link]();
String[] stringTokens = [Link](",");
int index = [Link](stringTokens[1]);
double elementValue = [Link](stringTokens[2]);
Element e = new Element(1,index, elementValue);
IntWritable keyValue = new
IntWritable([Link](stringTokens[0]));
[Link](keyValue, e);
}
}
public static class ReducerMxN extends
Reducer<IntWritable,Element, Pair,DoubleWritable>
{
@Override
public void reduce(IntWritable key, Iterable<Element> values, Context
context) throwsIOException, InterruptedException
{
ArrayList<Element> M = new ArrayList<Element>();
ArrayList<Element> N = new ArrayList<Element>();Configuration conf
= [Link](); for(Element element : values) {
Element tempElement = [Link]([Link], conf);

27
[Link](conf, element, tempElement);
if
(tempElem [Link] == 0)
{
[Link](tem pElement);

if(tempElemen [Link] == 1) {
} else [Link](tempEle ment);

}
}

for(int i=0;i<[Link]();i++) {
for(int j=0;j<[Link]();j++) {

Pair p = new Pair([Link](i).index,[Link](j).index);


double multiplyOutput = [Link](i).value * [Link](j).value;

[Link](p, new DoubleWritable(multiplyOutput));


}
}
}
}
public static class MapMxN extends Mapper<Object, Text, Pair,
DoubleWritable>
{
@Override
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String readLine = [Link](); String[]
pairValue = [Link](" ");
Pair p = new
Pair([Link](pairValue[0]),[Link](pairValue[1]));
DoubleWritable val = new
DoubleWritable([Link](pairValue[2]));
[Link](p, val);
}
29
}
public static class ReduceMxN extends Reducer<Pair,
DoubleWritable, Pair,DoubleWritable>
{
@Override
public void reduce(Pair key, Iterable<DoubleWritable> values,
Context context)throws IOException, InterruptedException
{
double sum = 0.0;
for(DoubleWritable value : values) {
sum += [Link]();
}
[Link](key, new DoubleWritable(sum));
}
}
public static void main(String[] args) throws
Exception {Job job =
[Link]();
[Link]("MapIntermediate");
[Link]([Link]);
[Link](job, new Path(args[0]),
[Link],[Link]);
[Link](job, new Path(args[1]),
[Link],[Link]);
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
[Link](job, new
Path(args[2]));[Link](true);
Job job2 = [Link]();
[Link]("MapFinalOutput");
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);

31
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
[Link](job2, new Path(args[2]));
[Link](job2, new Path(args[3]));
[Link](true);
}
}

Step 3. Compiling the program in particular folder named as operation

#!/bin/bash
rm -rf [Link] classes
module load
hadoop/2.6.0
mkdir -p
classes
javac -d classes -cp classes:`$HADOOP_HOME/bin/hadoop classpath`
[Link] cf [Link] -C classes .
echo "end"

Step 4. Running the program in particular folder named as operation

export
HADOOP_CONF_DIR=/home/$USER/cometcl
ustermodule load hadoop/2.6.0
myhadoop-
[Link]
[Link]
[Link]
hdfs dfs -mkdir -p /user/$USER
hdfs dfs -put [Link] /user/$USER/M-
[Link] dfs -put [Link]
/user/$USER/[Link]
hadoop jar [Link] [Link] /user/$USER/M-matrix-
[Link]
/user/$USER/[Link] /user/$USER/intermediate
/user/$USER/outputrm -rf output-distr
mkdir output-distr
hdfs dfs -get /user/$USER/output/part* output-distr

33
Output:

module load
hadoop/2.6.0
rm -rf output
intermediate
hadoop --config $HOME jar [Link] [Link] [Link] N-
[Link] intermediate output

34
[Link]
[Link]
[Link]

RESULT

Thus, the program to Implement of Matrix Multiplication with Hadoop Map Reduce cluster was
written, executed and verified successfully.
35
[Link] IMPLEMENTATION OF WORD COUNT PROGRAMS
USING MAP REDUCE
Date:

AI
M:
To write a program to implement MapReduce application for word
counting on Hadoop
cluste
r

PROGRAM
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
import [Link];
public class WordCount
{
public static class Map extends Mapper<LongWritable,Text,Text,IntWritable> {
public void map(LongWritable key, Text value,Context context) throws
IOException,InterruptedException{
String line = [Link]();
StringTokenizer tokenizer = new StringTokenizer(line);
while ([Link]()) {
[Link]([Link]());
[Link](value, new IntWritable(1));
}
}}
37
public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable> {

38
public void reduce(Text key, Iterable<IntWritable> values,Context context)
throws IOException,InterruptedException {
int sum=0;
for(IntWritable x: values)
{
sum+=[Link]();
}
[Link](key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf= new Configuration();
Job job = new Job(conf,"My Word Count Program");
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
Path outputPath = new Path(args[1]);
//Configuring the input/output path from the filesystem into the job
[Link](job, new Path(args[0]));
[Link](job, new Path(args[1]));
//deleting the output path automatically from hdfs so that we don't have to
delete it explicitly
[Link](conf).delete(outputPath);
//exiting the job only if the flag value becomes false
[Link]([Link](true) ? 0 : 1);
}
}
The entire MapReduce program can be fundamentally divided
into three parts:
• Mapper Phase Code
• Reducer Phase Code
• Driver Code
We will understand the code for each of these three parts
sequentially.

39
Mapper code:
public static class Map extends

1. Create an input directory in HDFS.

hadoop fs -mkdir /input_dir


2. Copy the input text file named input_file.txt in the input directory (input_dir)of
HDFS.

hadoop fs -put C:/input_file.txt /input_dir

3. Verify i nput_file.txt avhaaildaboloepinfs ls /iinnppuut td_id icrt/ory (input_dir) .

40
Mapper<LongWritable,Text,Text,IntWritable> {
public void map(LongWritable key, Text value, Context context) throws
IOException,InterruptedException {
String line = [Link]();
StringTokenizer tokenizer = new StringTokenizer(line);
while ([Link]()) {
[Link]([Link]());
[Link](value, new IntWritable(1));
}
• We have created a class Map that extends the classMapper which is already defined in the
MapReduce
Framework.
• We define the data types of input and output key/valuepair after the class declaration using angle
brackets.
• Both the input and output of the Mapper is a key/valuepair.
• Input:
◦ The key is nothing but the offset of each line in
the text file:LongWritable
◦ The value is each individual line (as shown in the
figure at the right): Text
• Output:
◦ The key is the tokenized words: Text
◦ We have the hardcoded value in our case which is
1: IntWritable
◦ Example – Dear 1, Bear 1, etc.
• We have written a java code where we have tokenizedeach word and assigned them a hardcoded
value equal
to 1.
Reducer Code:
public static class Reduce extends
Reducer<Text,IntWritable,Text,IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values,Context
context)
throws IOException,InterruptedException {
int sum=0;
for(IntWritable x: values)
{ sum+=[Link]();

}[Link](key, new IntWritable(sum));


41
1. Verify content of the copied file.

hadoop dfs -cat /input_dir/input_file.txt

42
}
}
• We have created a class Reduce which extends class
Reducer like that of Mapper.
• We define the data types of input and output key/valuepair after the class declaration using angle
brackets as
done for Mapper.
Both the input and the output of the Reducer is a keyvalue pair.
• Input:
◦ The key nothing but those unique words whichhave been generated after the sorting and shuffling
phase: Text
◦ The value is a list of integers corresponding to
each key: IntWritable
◦ Example – Bear, [1, 1], etc.
• Output:
◦ The key is all the unique words present in the input
text file: Text
◦ The value is the number of occurrences of each of
the unique words: IntWritable
◦ Example – Bear, 2; Car, 3, etc.
• We have aggregated the values present in each of thelist corresponding to each key and produced
the final
answer.
• In general, a single reducer is created for each of theunique words, but, you can specify the number
of
reducer in [Link].
Driver Code:
Configuration conf= new Configuration();
Job job = new Job(conf,"My Word Count Program");
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
[Link]([Link]);
Path outputPath = new Path(args[1]);
//Configuring the input/output path from the filesystem into the job
[Link](job, new Path(args[0]));
[Link](job, new Path(args[1]));
43
• In the driver class, we set the configuration of our

1. Run [Link] and also provide input and out directories.

hadoop jar C:/[Link] wordcount /input_dir /output_dir

MapReduce job to run in Hadoop.


• We specify the name of the job , the data type of input/
output of the mapper and reducer.
• We also specify the names of the mapper and reducer
classes.
• The path of the input and output folder is also specified.
44
• The method setInputFormatClass () is used for specifying
that how a Mapper will read the input data or what will
be the unit of work. Here, we have chosen
TextInputFormat so that single line is read by the mapper
at a time from the input text file
The main () method is the entry point for the driver. In
this method, we instantiate a new Configuration object
for the job.
Run the MapReduce code:
The command for running a MapReduce code is:
hadoop jar [Link] WordCount /
sample/input /sample/output

45
1. Verify content for generated output file.

hadoop dfs -cat /output_dir/*

Some Other useful commands


3) To leave Safe mode

hadoop dfsadmin –safemode leave

4) To delete file from HDFS directory

hadoop fs -rm -r /iutput_dir/input_file.txt


5) To delete directory from HDFS directory

hadoop fs -rm -r /iutput_dir

46
[Link]
INSTALLATION OF HIVE ALONG WITH PRACTICE EXAMPLES
Date:

AIM:

To write a program to Installation of Hive along with practice examples

PROCEDURE:
Step 1: Verifying JAVA Installation
Java must be installed on your system before installing Hive. Let us verify java installation using the
following command:
$ java –version
If Java is already installed on your system, you get to see the following response:
java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b13)
Java HotSpot(TM) Client VM (build 25.0-b02, mixed mode)
If java is not installed in your system, then follow the steps given below for installing java.
Installing Java
Step I:
Download java (JDK <latest version> - [Link]) by visiting the following
link [Link]
Then [Link] will be downloaded onto your system.
Step II:
Generally you will find the downloaded java file in the Downloads folder. Verify it and extract the jdk-
[Link] file using the following commands.
$ cd Downloads/
$ ls
[Link]
$ tar zxf [Link]
$ ls
jdk1.7.0_71 [Link]
Step III:

49
To make java available to all the users, you have to move it to the location “/usr/local/”. Open root, and
type the following commands.
$ su
password:
# mv jdk1.7.0_71 /usr/local/
# exit
Step IV:
For setting up PATH and JAVA_HOME variables, add the following commands to ~/.bashrc file.
export JAVA_HOME=/usr/local/jdk1.7.0_71
export PATH=$PATH:$JAVA_HOME/bin
Now apply all the changes into the current running system.
$ source ~/.bashrc
Step V:
Use the following commands to configure java alternatives:
# alternatives --install /usr/bin/java/java/usr/local/java/bin/java 2

# alternatives --install /usr/bin/javac/javac/usr/local/java/bin/javac 2

# alternatives --install /usr/bin/jar/jar/usr/local/java/bin/jar 2

# alternatives --set java/usr/local/java/bin/java

# alternatives --set javac/usr/local/java/bin/javac

# alternatives --set jar/usr/local/java/bin/jar


Now verify the installation using the command java -version from the terminal as explained above.
Step 2: Verifying Hadoop Installation
Hadoop must be installed on your system before installing Hive. Let us verify the Hadoop installation
using the following command:
$ hadoop version
If Hadoop is already installed on your system, then you will get the following response:
Hadoop 2.4.1 Subversion [Link] -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
If Hadoop is not installed on your system, then proceed with the following steps:
Downloading Hadoop
Download and extract Hadoop 2.4.1 from Apache Software Foundation using the following commands.

51
$ su
password:
# cd /usr/local
# wget [Link]
[Link]
# tar xzf [Link]
# mv hadoop-2.4.1/* to hadoop/
# exit
Installing Hadoop in Pseudo Distributed Mode
The following steps are used to install Hadoop 2.4.1 in pseudo distributed mode.
Step I: Setting up Hadoop
You can set Hadoop environment variables by appending the following commands to ~/.bashrc file.
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export
PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
Now apply all the changes into the current running system.
$ source ~/.bashrc
Step II: Hadoop Configuration
You can find all the Hadoop configuration files in the location “$HADOOP_HOME/etc/hadoop”. You
need to make suitable changes in those configuration files according to your Hadoop infrastructure.
$ cd $HADOOP_HOME/etc/hadoop
In order to develop Hadoop programs using java, you have to reset the java environment variables
in [Link] file by replacing JAVA_HOME value with the location of java in your system.
export JAVA_HOME=/usr/local/jdk1.7.0_71
Given below are the list of files that you have to edit to configure Hadoop.
[Link]
The [Link] file contains information such as the port number used for Hadoop instance, memory
allocated for the file system, memory limit for storing the data, and the size of Read/Write buffers.
Open the [Link] and add the following properties in between the <configuration> and
</configuration> tags.
<configuration>
<property>

53
<name>[Link]</name>
<value>hdfs://localhost:9000</value>
</property>

</configuration>
[Link]
The [Link] file contains information such as the value of replication data, the namenode path, and
the datanode path of your local file systems. It means the place where you want to store the Hadoop infra.
Let us assume the following data.
[Link] (data replication value) = 1

(In the following path /hadoop/ is the user name.


hadoopinfra/hdfs/namenode is the directory created by hdfs file system.)

namenode path = //home/hadoop/hadoopinfra/hdfs/namenode

(hadoopinfra/hdfs/datanode is the directory created by hdfs file system.)


datanode path = //home/hadoop/hadoopinfra/hdfs/datanode
Open this file and add the following properties in between the <configuration>, </configuration> tags in
this file.
<configuration>

<property>
<name>[Link]</name>
<value>1</value>
</property>
<property>
<name>[Link]</name>
<value>[Link] </value>
</property>
<property>
<name>[Link]</name>
<value>[Link] </value >
</property>

</configuration>
Note: In the above file, all the property values are user-defined and you can make changes according to
your Hadoop infrastructure.
[Link]

55
This file is used to configure yarn into Hadoop. Open the [Link] file and add the following
properties in between the <configuration>, </configuration> tags in this file.
<configuration>

<property>
<name>[Link]-services</name>
<value>mapreduce_shuffle</value>
</property>

</configuration>
[Link]
This file is used to specify which MapReduce framework we are using. By default, Hadoop contains a
template of [Link]. First of all, you need to copy the file from mapred-site,[Link] to mapred-
[Link] file using the following command.

$ cp [Link] [Link]
Open the <configuration>,
mapr
ed-
site.x
ml
file
and
add
the
follo
wing
prope
rties
in
betwe
en
</con
figura
tion>
tags
in this
file.

<configuration>

<property>
<name>[Link]</name>
<value>yarn</value>
</property>

</configuration>
Verifying Hadoop Installation
The following steps are used to verify the Hadoop installation.
Step I: Name Node Setup
Set up the namenode using the command “hdfs namenode -format” as follows.
$ cd ~
$ hdfs namenode -format
The expected result is as follows.
10/24/14 [Link] INFO [Link]: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = localhost/[Link]
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.4.1
...
...

57
10/24/14 [Link] INFO [Link]: Storage directory
/home/hadoop/hadoopinfra/hdfs/namenode has been successfully formatted.
10/24/14 [Link] INFO [Link]: Going to
retain 1 images with txid >= 0
10/24/14 [Link] INFO [Link]: Exiting with status 0
10/24/14 [Link] INFO [Link]: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost/[Link]
************************************************************/
Step II: Verifying Hadoop dfs
The following command is used to start dfs. Executing this command will start your Hadoop file
system.
$ [Link]
The expected output is as follows:
10/24/14 [Link]
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/hadoop/hadoop-2.4.1/logs/hadoop-hadoop-namenode-
[Link]
localhost: starting datanode, logging to /home/hadoop/hadoop-2.4.1/logs/hadoop-hadoop-datanode-
[Link]
Starting secondary namenodes [[Link]]
Step III: Verifying Yarn Script
The following command is used to start the yarn script. Executing this command will start your yarn
daemons.
$ [Link]
The expected output is as follows:
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop-2.4.1/logs/yarn-hadoop-resourcemanager-
[Link]
localhost: starting nodemanager, logging to /home/hadoop/hadoop-2.4.1/logs/yarn-hadoop-
[Link]
Step IV: Accessing Hadoop on Browser
The default port number to access Hadoop is 50070. Use the following url to get Hadoop services on
your browser.
[Link]

Step V: Verify all applications for cluster


The default port number to access all applications of cluster is 8088. Use the following url to
visit this service.

59
Step 3: Downloading Hive
We use hive-0.14.0 in this tutorial. You can download it by visiting the following link
[Link] Let us assume it gets downloaded onto the /Downloads
directory. Here, we download Hive archive named “[Link]” for this tutorial. The
following command is used to verify the download:
$ cd Downloads
$ ls
On successful download, you get to see the following response:
[Link]

Step 4: Installing Hive


The following steps are required for installing Hive on your system. Let us assume the Hive archive is
downloaded onto the /Downloads directory.
Extracting and verifying Hive Archive
The following command is used to verify the download and extract the hive archive:
$ tar zxvf [Link]
$ ls
On successful download, you get to see the following response:

Copying files to /usr/local/hive directory


We need to copy the files from the super user “su -”. The following commands are used to copy the
files from the extracted directory to the /usr/local/hive” directory.
$ su -
passwd:

# cd /home/user/Download
# mv apache-hive-0.14.0-bin /usr/local/hive
# exit
Setting up environment for Hive
You can set up the Hive environment by appending the following lines to ~/.bashrc file:
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HIVE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/Hadoop/lib/*:.
export CLASSPATH=$CLASSPATH:/usr/local/hive/lib/*:.
The following command is used to execute ~/.bashrc file.

61
$ source ~/.bashrc
Step 5: Configuring Hive
To configure Hive with Hadoop, you need to edit the [Link] file, which is placed in the
$HIVE_HOME/conf directory. The following commands redirect to Hive config folder and copy the
template file:
$ cd $HIVE_HOME/conf
$ cp [Link] [Link]
Edit the [Link] file by appending the following line:
export HADOOP_HOME=/usr/local/hadoop
Hive installation is completed successfully. Now you require an external database server to configure
Metastore. We use Apache Derby database.

Step 6: Downloading and Installing Apache Derby


Follow the steps given below to download and install Apache Derby:
Downloading Apache Derby
The following command is used to download Apache Derby. It takes some time to download.
$ cd ~
$ wget [Link]
The following command is used to verify the download:
$ ls
On successful download, you get to see the following response:
db-derby-[Link]-[Link]
Extracting and verifying Derby archive
The following commands are used for extracting and verifying the Derby archive:
$ tar zxvf db-derby-[Link]-[Link]
$ ls
On successful download, you get to see the following response:
db-derby-[Link]-bin db-derby-[Link]-[Link]
Copying files to /usr/local/derby directory
We need to copy from the super user “su -”. The following commands are used to copy the files from
the extracted directory to the /usr/local/derby directory:
$ su -
passwd:
# cd /home/user

63
# mv db-derby-[Link]-bin /usr/local/derby
# exit
Setting up environment for DerbyYou can set up the Derby environment by appending the following lines to
~/.bashrc file:
export DERBY_HOME=/usr/local/derby
export PATH=$PATH:$DERBY_HOME/bin
Apache Hive
18 export
CLASSPATH=$CLASSPATH:$DERBY_HOME/lib/[Link]:$DERBY_HOME/lib/[Link]
The following command is used to execute ~/.bashrc file:
$ source ~/.bashrc
Create a directory to store Metastore
Create a directory named data in $DERBY_HOME directory to store Metastore data.
$ mkdir $DERBY_HOME/data
Derby installation and environmental setup is now complete.
Step 7: Configuring Metastore of Hive
Configuring Metastore means specifying to Hive where the database is stored. You can do this by editing
the [Link] file, which is in the $HIVE_HOME/conf directory. First of all, copy the template file
using the following command:
$ cd $HIVE_HOME/conf
$ cp [Link] [Link]
Edit [Link] and append the following lines between the <configuration> and </configuration>
tags:
<property>
<name>[Link]</name>
<value>jdbc:derby://localhost:1527/metastore_db;create=true </value>
<description>JDBC connect string for a JDBC metastore </description>
</property>
Create a file named [Link] and add the following lines into it:
[Link] =

[Link]
[Link] = false
[Link] = false
[Link] = false
[Link] = false
[Link] = rdbms

65
[Link] = true
[Link] = checked
[Link] = read_committed
[Link] = true
[Link] = true
[Link] = [Link]
[Link] = jdbc:derby://hadoop1:1527/metastore_db;create = true
[Link] = APP
[Link] = mine
Step 8: Verifying Hive Installation
Before running Hive, you need to create the /tmp folder and a separate Hive folder in HDFS. Here, we
use the /user/hive/warehouse folder. You need to set write permission for these newly created folders as
shown below:
chmod g+w
Now set them in HDFS before verifying Hive. Use the following commands:
$ $HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
The following commands are used to verify Hive installation:
$ cd $HIVE_HOME
$ bin/hive
On successful installation of Hive, you get to see the following response:
Logging initialized using configuration in jar:file:/home/hadoop/hive-0.9.0/lib/hive-common-
[Link]!/[Link]
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201312121621_1494929084.txt
………………….
hive>
The following sample command is executed to display all the tables:
hive> show tables;
OK
Time taken: 2.798 seconds
hive>

67
Hive Examples:

OBJECTIVE:
Use Hive to create, alter, and drop databases, tables, views, functions, and indexes.

PROGRAM :
SYNTAX for HIVE Database Operations
DATABASE Creation
CREATE DATABASE|SCHEMA [IF NOT EXISTS] <database name>
Drop Database Statement
DROP DATABASE Statement DROP (DATABASE|SCHEMA) [IF EXISTS]
database_name [RESTRICT|CASCADE];
Creating and Dropping Table in HIVE
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]
table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment] [ROW FORMAT row_format] [STORED AS
file_format]
Loading Data into table log_data
Syntax:
LOAD DATA LOCAL INPATH '<path>/[Link]' OVERWRITE INTO TABLE
u_data;
Alter Table in HIVE
Syntax
ALTER TABLE name RENAME TO new_name
ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...])
ALTER TABLE name DROP [COLUMN] column_name
ALTER TABLE name CHANGE column_name new_name new_type
ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])
Creating and Dropping View
CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT
column_comment], ...) ] [COMMENT table_comment] AS SELECT ...
Dropping View
Syntax:
DROP VIEW view_name
Functions in HIVE
String Functions:- round(), ceil(), substr(), upper(), reg_exp() etc
Date and Time Functions:- year(), month(), day(), to_date() etc
Aggregate Functions :- sum(), min(), max(), count(), avg() etc
43
INDEXES
CREATE INDEX index_name ON TABLE base_table_name (col_name, ...)
AS '[Link]'
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, ...)]
[IN TABLE index_table_name]

69
[PARTITIONED BY (col_name, ...)]
[
[ ROW FORMAT ...] STORED AS ...
| STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]
Creating Index
CREATE INDEX index_ip ON TABLE log_data(ip_address) AS
'[Link]' WITH DEFERRED
REBUILD;
Altering and Inserting Index
ALTER INDEX index_ip_address ON log_data REBUILD;
Storing Index Data in Metastore
SET
[Link]=/home/administrator/Desktop/big/metastore_db/tmp/index_ipadd
ress_result;
SET
[Link]=[Link]
mat;
Dropping Index
DROP INDEX INDEX_NAME on TABLE_NAME;

71
RESULT:
Thus, the program to Installation of Hive along with practice examples was written, executed and
verified successfully.
73
[Link]
INSTALLATION OF HBASE, INSTALLING THRIFT ALONG WITH
EXAMPLES
Date:

AIM:

To write a procedure for Installation of HBase, Installing thrift along with examples.

PROCEDURE:

Installing HBase
We can install HBase in any of the three modes: Standalone mode, Pseudo Distributed mode, and Fully
Distributed mode.
Installing HBase in Standalone Mode
Download the latest stable version of HBase form [Link] [Link]/apache/hbase/stable/
using “wget” command, and extract it using the tar “zxvf” command. See the following command.
$cd usr/local/
$wget [Link]
[Link]
$tar -zxvf [Link]
Shift to super user mode and move the HBase folder to /usr/local as shown below.
$su
$password: enter your password here
mv hbase-0.99.1/* Hbase/
Configuring HBase in Standalone Mode
Before proceeding with HBase, you have to edit the following files and configure HBase.
[Link]
Set the java Home for HBase and open [Link] file from the conf folder. Edit JAVA_HOME
environment variable and change the existing path to your current JAVA_HOME variable as shown
below.
cd /usr/local/Hbase/conf
gedit [Link]
This will open the [Link] file of HBase. Now replace the existing JAVA_HOME value with your
current value as shown below.

75
export JAVA_HOME=/usr/lib/jvm/java-1.7.0
[Link]
This is the main configuration file of HBase. Set the data directory to an appropriate location by opening
the HBase home folder in /usr/local/HBase. Inside the conf folder, you will find several files, open the
[Link] file as shown below.
#cd /usr/local/HBase/
#cd conf
# gedit [Link]
Inside the [Link] file, you will find the <configuration> and </configuration> tags. Within
them, set the HBase directory under the property key with the name “[Link]” as shown below.
<configuration>
//Here you have to set the path where you want HBase to store its files.
<property>
<name>[Link]</name>
<value>file:/home/hadoop/HBase/HFiles</value>
</property>

//Here you have to set the path where you want HBase to store its built in zookeeper files.
<property>
<name>[Link]</name>
<value>/home/hadoop/zookeeper</value>
</property>
</configuration>
With this, the HBase installation and configuration part is successfully complete. We can start HBase by
using [Link] script provided in the bin folder of HBase. For that, open HBase Home Folder and
run HBase start script as shown below.
$cd /usr/local/HBase/bin
$./[Link]
If everything goes well, when you try to run HBase start script, it will prompt you a message saying that
HBase has started.
starting master, logging to /usr/local/HBase/bin/../logs/[Link]
Installing HBase in Pseudo-Distributed Mode
Let us now check how HBase is installed in pseudo-distributed mode.
Configuring HBase
Before proceeding with HBase, configure Hadoop and HDFS on your local system or on a remote system
and make sure they are running. Stop HBase if it is running.
[Link]
Edit [Link] file to add the following properties.

77
<property>
<name>[Link]</name>
<value>true</value>
</property>
It will mention in which mode HBase should be run. In the same file from the local file system, change
the [Link], your HDFS instance address, using the hdfs://// URI syntax. We are running HDFS on
the localhost at port 8030.
<property>
<name>[Link]</name>
<value>hdfs://localhost:8030/hbase</value>
</property>
Starting HBase
After configuration is over, browse to HBase home folder and start HBase using the following
command.
$cd /usr/local/HBase
$bin/[Link]
Note: Before starting HBase, make sure Hadoop is running.
Checking the HBase Directory in HDFS
HBase creates its directory in HDFS. To see the created directory, browse to Hadoop bin and type the
following command.
$ ./bin/hadoop fs -ls /hbase
If everything goes well, it will give you the following output.
Found 7 items
drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/.tmp
drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/WALs
drwxr-xr-x - hbase users 0 2014-06-25 18:48 /hbase/corrupt
drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/data
-rw-r--r-- 3 hbase users 42 2014-06-25 18:41 /hbase/[Link]
-rw-r--r-- 3 hbase users 7 2014-06-25 18:41 /hbase/[Link]
drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/oldWALs
Starting and Stopping a Master
Using the “[Link]” you can start up to 10 servers. Open the home folder of HBase,
master and execute the following command to start it.
$ ./bin/[Link] 2 4
To kill a backup master, you need its process id, which will be stored in a file named “/tmp/hbase-
[Link].” you can kill the backup master using the following command.
$ cat /tmp/[Link] |xargs kill -9

79
HBase Web Interface
To access the web interface of HBase, type the following url in the browser.
[Link]
This interface lists your currently running Region servers, backup masters and HBase tables.
HBase Region servers and Backup Masters

80
Starting and Stopping RegionServers
You can run multiple region servers from a single system using the following command.
$ .bin/[Link] start 2 3
To stop a region server, use the following command.
$ .bin/[Link] stop 3

Starting HBaseShell
After Installing HBase successfully, you can start HBase Shell. Below given are the sequence of steps
that are to be followed to start the HBase shell. Open the terminal, and login as super user.
Start Hadoop File System
Browse through Hadoop home sbin folder and start Hadoop file system as shown below.
$cd $HADOOP_HOME/sbin
$[Link]
Start HBase
Browse through the HBase root directory bin folder and start HBase.
$cd /usr/local/HBase
$./bin/[Link]
Start HBase Master Server
This will be the same directory. Start it as shown below.
$./bin/[Link] start 2 (number signifies specific
server.)
Start Region
Start the region server as shown below.
$./bin/./[Link] start 3
Start HBase Shell
You can start HBase shell using the following command.
$cd bin
$./hbase shell
This will give you the HBase Shell Prompt as shown below.
2014-12-09 [Link],526 INFO [main] [Link]:
[Link] is deprecated. Instead, use [Link]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.98.8-hadoop2, r6cfc8d064754251365e070a10a82eb169956d5fe, Fri
Nov 14 [Link] PST 2014
hbase(main):001:0>

81
HBase Tables

82
RESULT:

Thus, the program to Installation of HBase, Installing thrift along with examples was executed
and verified successfully.

83
[Link]
IMPORTING AND EXPORTING DATA FROM VARIOUS
DATABASES
Date:

AIM:

To write a procedure for Importing and exporting data from various databases

PROCEDURE:
SQOOP is basically used to transfer data from relational databases such as MySQL, Oracle to data
warehouses such as Hadoop HDFS(Hadoop File System). Thus, when data is transferred from a relational
database to HDFS, we say we are importing data. Otherwise, when we transfer data from HDFS to
relational databases, we say we are exporting data.
Note: To import or export, the order of columns in both MySQL and Hive should be the same.

Importing data from MySQL to HDFS

In order to store data into HDFS, we make use of Apache Hive which provides an SQL-like interface
between the user and the Hadoop distributed file system (HDFS) which integrates Hadoop. We perform
the following steps:
Step 1: Login into MySQL
mysql -u root –pcloudera

Step 2: Create a database and table and insert data.


create database geeksforgeeeks;
create table [Link](author_name varchar(65), total_no_of_articles int,

85
phone_no int, address varchar(65));
insert into geeksforgeeks values(“Rohan”,10,123456789,”Lucknow”);

Step 3: Create a database and table in the hive where data should be imported.
create table geeks_hive_table(name string, total_articles int, phone_no int, address string) row format
delimited fields terminated by „,‟;

Step 4: Run below the import command on Hadoop.


sqoop import --connect \
jdbc:mysql://[Link]:3306/database_name_in_mysql \
--username root --password cloudera \
--table table_name_in_mysql \
--hive-import --hive-table database_name_in_hive.table_name_in_hive \
--m 1

In the above code following things should be noted.


• [Link] is localhost IP address.
• 3306 is the port number for MySQL.
• m is the number of mappers
Step 5: Check-in hive if data is imported successfully or not.

Exporting data from HDFS to MySQL

To export data into MySQL from HDFS, perform the following steps:
87
Step 1: Create a database and table in the hive.
create table hive_table_export(name string,company string, phone int, age int) row format
delimited
fields terminated by „,‟;

Step 2: Insert data into the hive table.


insert into hive_table_export

values("Ritik","Amazon",234567891,35); Data in Hive table

Step 3: Create a database and table in MySQL in which data should be exported.

Step 4: Run the following command on


Hadoop. sqoop export --connect \
jdbc:mysql://[Link]:3306/database_name_in_mysql \
--table table_name_in_mysql \
--username root --password cloudera \
--export-dir /user/hive/warehouse/hive_database_name.db/table_name_in_hive \
--m 1 \
-- driver [Link]
--input-fields-terminated-by ','

Step 5: Check-in MySQL if data is exported successfully or not.

RESULT:

Thus, the program to Importing and exporting data from various databases was,
executed and verified successfully.
89

You might also like