0% found this document useful (0 votes)

43 views56 pages

Big Data & Analytics (CSE6005) L6

Uploaded by

pradhrahulgdrive

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views56 pages

Big Data & Analytics (CSE6005) L6

Uploaded by

pradhrahulgdrive

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

SESSION 2017-2018

[Link] (CSE) YEAR: III SEMESTER: VI

INTRODUCTION TO HIVE

(CSE6005)
MODULE 2 (L6)
Presented By
Vivek Kumar
Dept of Computer Engineering & Applications
GLA University India
Agenda
Learning Objectives Learning Outcomes

Introduction to Hive
a) To understand the hive
1. To study the Hive architecture.
Architecture b) To create databases, tables
and execute data
2. To study the Hive File manipulation language
format statements on it.
c) To differentiate between
3. To study the Hive Query static and dynamic
Language partitions.
d) To differentiate between
managed and external
tables.
Agenda

 What is Hive?
 Hive Architecture
 Hive Data Types
 Primitive Data Types
 Collection Data Types
 Hive File Format
 Text File
 Sequential File
 RCFile (Record Columnar File)
Agenda …

 Hive Query Language

 DDL (Data Definition Language) Statements
 DML (Data Manipulation Language)
Statements
 Database
 Tables
 Partitions
 Buckets
 Aggregation
 Group BY and Having
 SERDER
Case Study: Retail
 Major Indian retailers
include FutureGroup, Reliance
Industries, Tata Group and Aditya Birla
Group are using Hive.
 One of the retail groups, let’s call it BigX,
wanted their last 5 years semi-
structured dataset to be analyzed for
trends and patterns.
 Let us see how we can solve their
problem using Hadoop.
Case Study: Retail cont..
About BigX
 BigX is a chain of hypermarket in India.

Currently there are 220+ stores across

85 cities and towns in India and employs
35,000+ people. Its annual revenue for
the year 2011 was USD 1 Billion. It offers
a wide range of products including
fashion and apparels, food products,
books, furniture, electronics, health care,
general merchandise and entertainment
sections.
Case Study: Retail cont..
Problem Scenario
1. One of BigX log datasets that needs to
be analyzed was approximately 12TB in
overall size and holds 5 years of vital
information in semi structured form.
Case Study: Retail cont..
2. Traditional business intelligence (BI)
tools are good up to a certain degree,
usually several hundreds of gigabytes.
But when the scale is of the order of
terabytes and petabytes, these
frameworks become inefficient. Also, BI
tools work best when data is present in
a known pre-defined schema. The
particular dataset from BigX was mostly
logs which didn’t conform to any
specific schema.
Case Study: Retail cont..
3. It took around 12+ hours to move the
data into their Business Intelligence
systems bi-weekly. BigX wanted to
reduce this time drastically.
4. Querying such large data set was taking
too long
Case Study: Retail cont..
Solution
 This is where Hadoop shines in all its

glory as a solution. Since the size of the

logs dataset is 12TB, at such a large
scale, the problem is 2-fold:
 Problem 1: Moving the logs dataset to

HDFS periodically
 Problem 2: Performing the analysis on

this HDFS dataset

Case Study: Retail cont..
Solution of
Problem1
 Since logs are

unstructured in
this case,
Sqoop was of
little or no use.
So Flume was
used to move
the log data
periodically into
Case Study: Retail cont..
Solution of Problem2
 Hive is a data warehouse infrastructure built on
top of Hadoop for providing data summarization,
query and analysis. It provides an SQL-like
language called HiveQL and converts the query
into MapReduce tasks.
Hive in this Case Study
 Hive uses “Schema on Read” unlike a
traditional database which uses
“Schema on Write”.
 While reading log files, the simplest
recommended approach during Hive
table creation is to use a RegexSerDe.
 By default, Hive metadata is usually
stored in an embeddedDerbydatabase
which allows only one user to issue
queries. This is not ideal for production
Conclusion- Case Study:
Retail
 Using the Hadoop system, log transfer
time was reduced to ~3 hours bi-weekly
and querying time also was significantly
improved.
 Thanks to Vijay, for case study, Big Data
Lead at 8KMiles, holds M. Tech in
Information Retrieval from IIIT-B.
 [Link]
retail-analysis/
What is Hive?
 Hive is a Data Warehousing tool. Hive is
used to query structured data built on
top of Hadoop.
 Facebook created Hive component to
manage their ever-growing volumes of
data. Hive makes use of the following:
1. HDFS for Storage
2. MapReduce for execution
3. Stores metadata in an RDBMS.
What is Hive ?

 Apache Hive is a popular SQL interface

for batch processing on Hadoop.
 Hadoop was built to organize and store
massive amounts of data.
 Hive gives another way to access Data
inside the cluster in easy, quick way.
 Hive provides a query language
called HiveQL that closely resembles the
common Structured Query
Language (SQL) standard.
 Hive was one of the earliest project to
bring higher-level languages to Apache
Hadoop.
 Hive Gives ability to Analysts and Data
Scientists to access data with out being
expert in Java .
 Hive gives structure to Data on HDFS
 This interface to Hadoop
 not only accelerates the time required to
produce results from data analysis,
 it significantly broadens who can use
Hadoop and MapReduce.
 Let us take a moment to thank Facebook
team because
 Hive was developed by the Facebook Data
team and, after being used internally,
 it was contributed to the Apache Software
Foundation .
 Currently Hive is freely available as an open
What Hive is not?
 Hive is not Relational Database, it uses a
database to store meta data, but the
data that hive processes is stored in
HDFS.
 Hive is not designed for on-line
transaction processing(OLTP).
 Hive is not suited for real-time queries
and row level updates and it is best used
for batch jobs over large sets of
immutable data such as web logs.
Typical Use-Case of Hive
 Hive takes large amount of unstructured
data and place it into a structured view.
 Hive supports use cases such as Ad-hoc
queries, summarization, data analysis.
 HIVEQL can also be exchange with
custom scalar functions means user
defined functions(UDF'S),
aggregations(UDFA's) and table
functions(UDTF's)
 It converts SQL queries into MapReduce
jobs.
Features of Hive
1. It is similar to SQL.
2. HQL is easy to code.
3. Hive supports rich data types such as
structs, lists, and maps.
4. Hive supports SQL filters, group-by and
order-by clauses.
Prerequisites of Hive in
Hadoop
 The prerequisites for setting up Hive and
running queries are
1. User should have stable build of Hadoop
2. Machine Should have Java 1.6 installed

3. Basic Java Programming skills

4. Basic SQL Knowledge

 Start all the services of Hadoop using the

command $ [Link].
 Check all services are running, then use $

hive to start HIVE

Hive Integration and
Workflow
 Hourly Log
data can be
Hourly Log
stored directly
into HDFS Hadoop HDFS
 And then
datacleaning is Log
performed on Compression
the log file
 Finally Hive Hive table 2 Hive Table 1
Table can be
created to
Hive Architecture
Hive

Command-Line Hive Web Hive Server

Interface Interface (Thrift)

Driver (Query Compiler,

Metastore
Executor)

JobTracker TaskTracker

HDFS
Hadoop
Hive Architecture
The various parts are as follows:
 Hive [Link] Interface (Hive CLI):

The most commonly used interface to interact

with Hive.
 Hive Web Interface: It is a simple Graphic User

Interface to interact with Hive and to execute

query.
 Hive Server: This is an optional server. This can

be used to submit Hive Jobs from a remote client.

 JDBC / ODBC: Jobs can be submitted from a JDBC

Client. One can write a Java code to connect to

Hive and submit jobs on it.
Hive Architecture
 Driver: Hive queries are sent to the driver for
compilation, optimization and execution.
 Metastore: Hive table definitions and mappings
to the data are stored in a Metastore. A
Metastore
consists of the following:
'Metastore service: Offers interface to
the Hive.
' Database: Stores data definitions,
mappings to the data and others.
 The metadata which is stored in the metastore
includes IDs of Database, IDs of Tables, IDs of Indexes
etc, the time of creation of a Table, the Input Format
used for a Table, the Output Format used for a table
Hive Architecture
 1. Embedded Metatore: This metastore is
mainly used for unit tests. Here, only one process
is allowed to connect to the metastore at a time.
This is the default metastore for Hive. It is
Apache Derby Database. In this metastore, both
the database and the metastore service runs,
embedded in the main
Hive Server process. Figure 9.8 shows an
Embedded Mecastore.
 2. Local Metastore: Metadata can be stored in
any RDBMS component like MySQL Local
metastore allows multiple connections at a time.
In this mode, the Hive metastore service runs in
the main Hive Server process, but the metastore
Hive Architecture
 3. Remote Metastore: In this, the Hive driver
and the metastore interface run on different JVMs
(which can run on different machines as well) as
in Figure 9.10. This way the database can be fire-
walled from the Hive user and also database
credentials are completely isolated from the
users of Hive.
Hive Data Units
Hive Data Model Contd.
 Tables
- Analogous to relational tables
- Each table has a corresponding directory in

HDFS
- Data serialized and stored as files within that

directory
- Hive has default serialization built in which
supports compression and lazy deserialization
- Users can specify custom serialization –
deserialization schemes (SerDe’s)
Hive Data Model Contd.
 Partitions
- Each table can be broken into partitions
- Partitions determine distribution of data within

subdirectories
Example -
CREATE_TABLE Sales (sale_id INT, amount FLOAT)
PARTITIONED BY (country STRING, year INT, month
INT)
So each partition will be split out into different folders
like
Sales/country=US/year=2012/month=12
Hierarchy of Hive Partitions

/hivebase/
Sales
/
country=U /
S country=CANA
DA
/year=2012 /year=2012
/year=2015
/year=2014
/month=12
/month=11 /month=11
File File File
Partition
 The general definition of Partition is
horizontally dividing the data into
number of slice in a equal and
manageable manner.
 Every partition is stored as directory within
data warehouse table.
 In data warehouse this partition concept
is common but there is two types of
Partitions are available in data
warehouse concepts.
 There are
Hive Partition
 The main work of Hive partition is also
same as SQL Partition but
 the main difference between SQL partition
and hive partition is SQL partition is only
supported for single column in table but in
Hive partition it supported for Multiple
columns in a table .
 The main work of Hive partition is also
same as SQL Partition but the main
difference between SQL partition and hive
partition is SQL partition is only supported
for single column in table but in Hive
Hive Data Model Contd.

 Buckets
- Data in each partition divided into
buckets
- Based on a hash function of the column
- H(column) mod NumBuckets =
bucket number
- Each bucket is stored as a file in partition
directory
Hive Data Types
Numeric Data Type
TINYINT 1 - byte signed integer
SMALLINT 2 -byte signed integer
INT 4 - byte signed integer
BIGINT 8 - byte signed integer
FLOAT 4 - byte single-precision floating-point
DOUBLE 8 - byte double-precision floating-point number

String Types
STRING
VARCHAR Only available starting with Hive 0.12.0
CHAR Only available starting with Hive 0.13.0
Strings can be expressed in either single quotes (‘) or double quotes (“)
Miscellaneous Types
BOOLEAN
BINARY Only available starting with Hive
Hive Data Types cont..
Collection Data Types

STRUCT Similar to ‘C’ struct. Fields are accessed using dot notation.
E.g.: struct('John', 'Doe')

MAP A collection of key - value pairs. Fields are accessed using []

notation.
E.g.: map('first', 'John', 'last', 'Doe')

ARRAY Ordered sequence of same types. Fields are accessed using array
index.
E.g.: array('John', 'Doe')
Hive File Format
 Text File: The default file format is text
file.
 Sequential File: Sequential files are

flat files that store binary key-value

pairs.
 RCFile (Record Columnar File):

RCFile stores the data in Column

Oriented Manner which ensures that
Aggregation operation is not an
expensive operation.
Hive Query Language (HQL)
 Works on Databases, Tables, Partitions,
Buckets (Clusters)
 Create and manage tables and partitions.
 Support various Relational, Arithmetic, and
Logical Operators.
 Evaluate functions.
 Downloads the contents of a table to a local
directory or result of queries to HDFS
directory.
Database

 To create a database named

“STUDENTS” with comments and
database properties.
CREATE DATABASE IF NOT EXISTS
STUDENTS COMMENT 'STUDENT
Details' WITH DBPROPERTIES
('creator' = 'JOHN');
Database

 To describe a database
DESCRIBE DATABASE STUDENTS;
 To show Databases
SHOW DATABASES;

 To drop database.
DROP DATABASE STUDENTS;
Tables
 There are two types of tables in Hive:
Managed table
External table

 The difference between two is when you

drop a table:
 if it is managed table hive deletes both data
and meta data,
if it is external table hive only deletes
metadata.
 Use external keyword to create
a external table
Tables
To create managed table named
‘STUDENT’.

CREATE TABLE IF NOT EXISTS

STUDENT(rollno INT,name STRING,gpa
FLOAT) ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';
Tables
To create external table named
‘EXT_STUDENT’.

CREATE EXTERNAL TABLE IF NOT

EXISTS EXT_STUDENT(rollno INT,name
STRING,gpa FLOAT) ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t'
LOCATION ‘/STUDENT_INFO;
Tables
To load data into the table from file named
[Link].
LOAD DATA LOCAL INPATH
‘/root/hivedemos/[Link]'
OVERWRITE INTO TABLE
EXT_STUDENT;

To retrieve the student details from

“EXT_STUDENT” table.
SELECT * from EXT_STUDENT;
Table ALTER Operations
 ALTER TABLE mytablename RENAME to
mt;
 ALTER TABLE mytable ADD COLOUMNS (mycol
STRING);
 ALTER TABLE name RENAME TO
new_name
 ALTER TABLE name DROP [COLUMN]
column_name
 ALTER TABLE name CHANGE
column_name new_name new_type
 ALTER TABLE name REPLACE COLUMNS
Partitions
 Partitions split the larger dataset into more meaningful chunks.
 Hive provides two kinds of partitions: Static Partition and Dynamic
Partition.
• To create static partition based on “gpa” column.
CREATE TABLE IF NOT EXISTS
STATIC_PART_STUDENT (rollno INT, name
STRING) PARTITIONED BY (gpa FLOAT) ROW
FORMAT DELIMITED FIELDS TERMINATED BY '\t';
Load data into partition table from table.
INSERT OVERWRITE TABLE STATIC_PART_STUDENT
PARTITION (gpa =4.0) SELECT rollno, name from
EXT_STUDENT where gpa=4.0;
Partitions
• To create dynamic partition on column date.
CREATE TABLE IF NOT EXISTS
DYNAMIC_PART_STUDENT(rollno INT, name STRING)
PARTITIONED BY (gpa FLOAT) ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t';
To load data into a dynamic partition table from table.
SET [Link] = true;
SET [Link] = nonstrict;
Note: The dynamic partition strict mode requires at least one
static partition column. To turn this off,
set [Link]=nonstrict
INSERT OVERWRITE TABLE DYNAMIC_PART_STUDENT
PARTITION (gpa) SELECT rollno,name,gpa from
EXT_STUDENT;
Buckets
 Tables or partitions are sub-divided
into buckets, to provide extra structure
to the data that may be used for more
efficient querying. Bucketing works
based on the value of hash function of
some column of a table.
 We can add partitions to a table by
altering the table. Let us assume we
have a table called employee with fields
such as Id, Name, Salary, Designation,
Dept, and yoj.
Buckets
• To create a bucketed table having 3 buckets.
CREATE TABLE IF NOT EXISTS STUDENT_BUCKET (rollno
INT,name STRING,grade FLOAT)
CLUSTERED BY (grade) into 3 buckets;
Load data to bucketed table.
FROM STUDENT
INSERT OVERWRITE TABLE STUDENT_BUCKET
SELECT rollno,name,grade;
To display the content of first bucket.
SELECT DISTINCT GRADE FROM STUDENT_BUCKET
TABLESAMPLE(BUCKET 1 OUT OF 3 ON GRADE);
Aggregations
 Hive supports aggregation functions like
avg, count, etc.
 To write the average and count
aggregation function.
SELECT avg(gpa) FROM STUDENT;
SELECT count(*) FROM STUDENT;
Group by and Having

To write group by and having function.

SELECT rollno, name,gpa

FROM STUDENT
GROUP BY rollno,name,gpa
HAVING gpa > 4.0;
SerDer
 SerDer stands for Serializer/Deserializer.
 Contains the logic to convert
unstructured data into records.
 Implemented using Java.
 Serializers are used at the time of
writing.
 Deserializers are used at query time
(SELECT Statement).
Fill in the blanks
 The metastore consists of ______________
and a ______________.
 The most commonly used interface to
interact with Hive is ______________.
 The default metastore for Hive is
______________.
 Metastore contains ______________ of Hive
tables.
 ______________ is responsible for
compilation, optimization, and execution
of Hive queries.

Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
Hadoop - Hive
No ratings yet
Hadoop - Hive
190 pages
Introduction to Hive Data Warehousing
No ratings yet
Introduction to Hive Data Warehousing
4 pages
Understanding Apache Hive Architecture
No ratings yet
Understanding Apache Hive Architecture
5 pages
04 Bigdata Hive
No ratings yet
04 Bigdata Hive
22 pages
7 Hive
No ratings yet
7 Hive
30 pages
HIVE
No ratings yet
HIVE
18 pages
IET Udaipur BDA Unit-5
No ratings yet
IET Udaipur BDA Unit-5
9 pages
Understanding Hive for Data Warehousing
No ratings yet
Understanding Hive for Data Warehousing
69 pages
Bda Unit 4 - Mam
No ratings yet
Bda Unit 4 - Mam
57 pages
Hive
No ratings yet
Hive
30 pages
Hive Unit VI
No ratings yet
Hive Unit VI
39 pages
BDA Unit-5
No ratings yet
BDA Unit-5
25 pages
Hive Data Warehousing Overview
No ratings yet
Hive Data Warehousing Overview
9 pages
Apache Hive for Big Data Processing
No ratings yet
Apache Hive for Big Data Processing
19 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
Unit-4 Hive
No ratings yet
Unit-4 Hive
10 pages
Apache Hive: Prashant Gupta
100% (1)
Apache Hive: Prashant Gupta
61 pages
BDA Hive
No ratings yet
BDA Hive
22 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Introduction to Hive and Its Architecture
No ratings yet
Introduction to Hive and Its Architecture
25 pages
Apache Hive: Data Warehousing & SQL-like Queries
No ratings yet
Apache Hive: Data Warehousing & SQL-like Queries
41 pages
Big Data Analytics Module-4
No ratings yet
Big Data Analytics Module-4
39 pages
HIVE
No ratings yet
HIVE
16 pages
Hive
No ratings yet
Hive
28 pages
Introduction to Apache Hive
No ratings yet
Introduction to Apache Hive
61 pages
Apache Hive: Data Warehousing on Hadoop
No ratings yet
Apache Hive: Data Warehousing on Hadoop
28 pages
5 - Hive
No ratings yet
5 - Hive
51 pages
Understanding Hive Map Types
No ratings yet
Understanding Hive Map Types
49 pages
Understanding Hive in Hadoop
No ratings yet
Understanding Hive in Hadoop
17 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Session 3.1
No ratings yet
Session 3.1
29 pages
BDA Unit V
No ratings yet
BDA Unit V
23 pages
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
No ratings yet
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
30 pages
BD - Unit - IV - Hive and Pig
No ratings yet
BD - Unit - IV - Hive and Pig
41 pages
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
Bda Report
No ratings yet
Bda Report
16 pages
Hive Database & Analytics Guide
No ratings yet
Hive Database & Analytics Guide
10 pages
BDA Unit-5
No ratings yet
BDA Unit-5
26 pages
Chapter 7
No ratings yet
Chapter 7
84 pages
Chapter - 4 - Data Access - Hive
No ratings yet
Chapter - 4 - Data Access - Hive
35 pages
Apache Hive Overview & Architecture
No ratings yet
Apache Hive Overview & Architecture
27 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
33 pages
Hive Slides-2
No ratings yet
Hive Slides-2
25 pages
Hive Final
No ratings yet
Hive Final
75 pages
Bda Unit 5 Notes
No ratings yet
Bda Unit 5 Notes
23 pages
Introduction to Apache Hive Overview
No ratings yet
Introduction to Apache Hive Overview
8 pages
Hive - A Warehousing Solution Over A Map-Reduce Framework
No ratings yet
Hive - A Warehousing Solution Over A Map-Reduce Framework
24 pages
DSS U4 HIVE Rev1.1
No ratings yet
DSS U4 HIVE Rev1.1
23 pages
Hive
No ratings yet
Hive
52 pages
Hive
No ratings yet
Hive
12 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
182 pages
Apache Hive for Data Analysts
No ratings yet
Apache Hive for Data Analysts
8 pages
Big Data Analytics: Welcome
No ratings yet
Big Data Analytics: Welcome
69 pages
RaichalK (2y 2m)
No ratings yet
RaichalK (2y 2m)
3 pages
Pointer (Data Structures) - Javatpoint
No ratings yet
Pointer (Data Structures) - Javatpoint
8 pages
Snake & Ladder Game
No ratings yet
Snake & Ladder Game
12 pages
Local to HDFS File Copy in Java
No ratings yet
Local to HDFS File Copy in Java
10 pages
Microsoft Authenticator MFA Setup Guide
No ratings yet
Microsoft Authenticator MFA Setup Guide
5 pages
Software Metrics Framework Guide
No ratings yet
Software Metrics Framework Guide
5 pages
Steps To Deactivate Activity in Pipeline 1740623841
No ratings yet
Steps To Deactivate Activity in Pipeline 1740623841
12 pages
Ccidf Half Manual
No ratings yet
Ccidf Half Manual
14 pages
Daikin Configurator Software Overview
No ratings yet
Daikin Configurator Software Overview
6 pages
Paul Aubin 2017 - Update PDF
No ratings yet
Paul Aubin 2017 - Update PDF
15 pages
Dambal
No ratings yet
Dambal
4 pages
Multiple Choice Set A
No ratings yet
Multiple Choice Set A
5 pages
T.Y.B.sc. CS Advanced Java Syllabus
No ratings yet
T.Y.B.sc. CS Advanced Java Syllabus
2 pages
Data Analysis
No ratings yet
Data Analysis
1 page
Business Unit Education: Job Description
No ratings yet
Business Unit Education: Job Description
15 pages
欢迎来到osu Web Assign！
100% (1)
欢迎来到osu Web Assign！
5 pages
JavaFX for Beginners
No ratings yet
JavaFX for Beginners
41 pages
Cloud Computing Course: Week 2 Quiz
No ratings yet
Cloud Computing Course: Week 2 Quiz
3 pages
Hands-On Guide To AgileOps A Guide To Implementing Agile, DevOps, and SRE For Cloud Operations (Navin Sabharwal, Raminder Rathore Etc.) (Z-Library)
No ratings yet
Hands-On Guide To AgileOps A Guide To Implementing Agile, DevOps, and SRE For Cloud Operations (Navin Sabharwal, Raminder Rathore Etc.) (Z-Library)
237 pages
Cognizant Openings Till 9th Sept
No ratings yet
Cognizant Openings Till 9th Sept
4 pages
DevOps - Module 1
No ratings yet
DevOps - Module 1
56 pages
Altium Designer PCB Design Tutorial - PCBCart
No ratings yet
Altium Designer PCB Design Tutorial - PCBCart
11 pages
Data Analysis in Retail Using Python
No ratings yet
Data Analysis in Retail Using Python
7 pages
NS Lab
No ratings yet
NS Lab
29 pages
Adventure Works
No ratings yet
Adventure Works
14 pages
Test: Java Fundamentals Midterm Exam
No ratings yet
Test: Java Fundamentals Midterm Exam
75 pages
(P Intlk) Instuction
No ratings yet
(P Intlk) Instuction
36 pages
GLTF Tutorials - Wei Zhi PDF
No ratings yet
GLTF Tutorials - Wei Zhi PDF
106 pages
Lab 4
No ratings yet
Lab 4
3 pages
Dynamic Datasets and Market Environments For Financial Reinforcement Learning
No ratings yet
Dynamic Datasets and Market Environments For Financial Reinforcement Learning
49 pages

Big Data & Analytics (CSE6005) L6

Uploaded by

Big Data & Analytics (CSE6005) L6

Uploaded by

SESSION 2017-2018

[Link] (CSE) YEAR: III SEMESTER: VI

 Hive Query Language

Currently there are 220+ stores across

glory as a solution. Since the size of the

this HDFS dataset

 Apache Hive is a popular SQL interface

3. Basic Java Programming skills

4. Basic SQL Knowledge

hive to start HIVE

Command-Line Hive Web Hive Server

Driver (Query Compiler,

The most commonly used interface to interact

Interface to interact with Hive and to execute

be used to submit Hive Jobs from a remote client.

Client. One can write a Java code to connect to

MAP A collection of key - value pairs. Fields are accessed using []

flat files that store binary key-value

RCFile stores the data in Column

 To create a database named

 The difference between two is when you

CREATE TABLE IF NOT EXISTS

CREATE EXTERNAL TABLE IF NOT

To retrieve the student details from

To write group by and having function.

SELECT rollno, name,gpa

You might also like