CCS334 BDA Lab Manual Final

Big Data Analytics

Uploaded by

dhananjeyans41

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

1K views40 pages

CCS334 BDA Lab Manual Final

Big Data Analytics

Uploaded by

dhananjeyans41

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

MISRIMAL NAVAJEE MUNOTH JAIN ENGINEERING COLLEGE OWNED AND MANAGED BY TAMILNADU EDUCATIONAL AND MEDICAL FOUNDATION A Jain Minority institution ‘ogrammes Accredited by NBA, Now Delhi, (UG Programmes — Mi by the Government of Tamil Na ted to Guru MarudharKosari Building, Jyothi Nagar, Rajiv Gandhi Salal, OMA Thoraipakkam, Chennai DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE CCS334_ BIG DATA ANALYTICS LABORATORY REGULATION-2021 NAME REGISTER NUMBER YEAR/SEMESTER : TIV/VMISRIMAL NAVAJEE MUNOTH JAIN , ENGINEERING COLLEGE OWNED AND MANAGED BY TAMILNADU EDUCATIONAL AND MEDICAL FOUNDATION A Jain Minority institution ‘Approved by AICTE &Programmes Accredited by NBA, Now Delhi, (UG Programmes ~ MECH, AIKDS, ECE, SEIT) Al Programmes Recognized by the Government of Tamil Nadu and A¥fiiated to Anna University, Chennai Guru MarudharKesari Building, 4yothi Nagar, Rajiv Gandhi Salal, OMR Thoraipakkam, Chennai - 600 097, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE VISION To produce high quality, creative and ethical engineers, and technologists contributing effectively to the ever-advancing Artificial Intelligence and Data Science field. MISSION To educate future software engineers with strong fundamentals by continuously improving the teaching-learning methodologies using contemporary aids. To produce ethical engineers/researchers by instilling the values of humility, humaneness, honesty and courage to serve the society. To create a knowledge hub of Artificial Intelligence and Data Science with everlasting urge to learn by developing, maintaining and continuously improving the resources/Data Science.MISRIMAL NAVAJEE MUNOTH JAIN , ENGINEERING COLLEGE OWNED AND MANAGED BY TAMILNADU EDUCATIONAL AND MEDICAL FOUNDATION A Jain Minority institution ‘Approved by AICTE &Programmes Accredited by NBA, Now Delhi, (UG Programmes ~ MECH, AIKDS, ECE, SEIT) Al Programmes Recognized by the Government of Tamil Nadu and A¥fiiated to Anna University, Chennai Guru MarudharKesari Building, 4yothi Nagar, Rajiv Gandhi Salal, OMR Thoraipakkam, Chennai - 600 097, Register No: BONAFIDE CERTIFICATE This is to certify that this is a bonafide record of the work done by Mr./Ms. of II YEAR/ V SEM [Link]- ARTIFICIAL INTELLIGENCE AND DATA SCIENCE in CCS334- BIG DATA ANALYTICS LABORATORY during the Academic year 2023 — 2024. Faculty-in-charge Head of the Department Submitted for the University Practical Examination held on :_/_/ Internal Examiner External Examiner DATE: DATE:37] MISRIMAL NAVAJEE MUNOTH JAIN fe ENGINEERING COLLEGE OWNED AND MANAGED BY TAMILNADU EDUCATIONAL AND MEDICAL FOUNDATION A Jain Minority institution ‘Approved by AICTE &Programmes Accredited by NBA, New Delhi, (US Programmes ~ MECH, EEE, ECE, CSE & IT) {Al Programmes Recognized by the Government of Tamil Nadu and A¥fiiated to Anna University, Chennai Guru MarudharKesari Building, 4yothi Nagar, Rajiv Gandhi Salal, OMR Thoraipakkam, Chennai - 600 097, CCS334 BIG DATA ANALYTICS LABORATORY COURSE OUTCOMES Describe big data and use cases from selected business domains, Explain NoSQL big data management. Install, configure, and run Hadoop and HDFS. Perform map-reduce analyties using Hadoop. Use Hadoop-related tools such as HBase, Cassandra, Pig, and Hive for big data analyticsCCS334 BIG DATA ANALYTICS LABORATORY CONTENT PAGE EXPERIMENTS, No SIGNATURE Downloading and installing Hadoop; Understanding different Hadoop modes. Startup scripts, Configuration files. Hadoop Implementation of file management tasks, such as Adding files and directories, retrieving files and Deleting files Implement of Matrix Multiplication with Hadoop Map Reduce Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm. Installation of Hive along with practice examples. Installation of HBase along with Practice examples. Installing thrift. Practice importing and exporting data from various databases.SyllabusCCS334 BIG DATA ANALYTICS LABORATORY COURSE OBJECTIVES: To understand big data. To learn and use NoSQL big data management. To learn mapreduce analytics using Hadoop and related tools. To work with map reduce applications To understand the usage of Hadoop related tools for Big Data Analytics Tools: Cassandra, Hadoop, Java, Pig, Hive and HBase. Suggested Exercises: 1 Downloading and installing Hadoop; Understanding different Hadoop modes. Startup scripts, Configuration files. 2. Hadoop Implementation of file management tasks, such as Adding files and directories, retrieving files and Deleting files 3. Implement of Matrix Multiplication with Hadoop Map Reduce 4, Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm. 5. Installation of Hive along with practice examples. 6. Installation of HBase, Installing thrift along with Practice examples 7. Practice importing and exporting data from various databases.[Link]:I DOWNLOADING AND INSTALLING HADOOP; UNDERSTANDING DATE: DIFFERENT HADOOP MODES. STARTUP SCRIPTS, * CONFIGURATION FILES. AIM: To Downloading and installing Hadoop; Understanding different Hadoop modes. Startup scripts, Configuration files. PREREQUISITES TO INSTALL HADOOP ON WINDOWS VIRTUAL BOX (For Linux): itis used for installing the operating system on it. OPERATING SYSTEM: You can install Hadoop on Windows or Linux based operating systems. Ubuntu and CentOS are very commonly used. JAVA: You need to install the Java 8 package on your system. HADOOP: You require Hadoop latest version Install Java © Java IDK Link to download, [Link] Extract and install Java in C:Mava Open ema and type -> javae —version BY Command Prompt eee ur) TCOEC REC Leys] (Cera. el la hs hoe eh a \Users\asus>javac -version ae eee 2. Download Hadoop [Link] extract to C:AHadoopADE app cfrbackup-ECPREIHB eSupport Games hadoop © hadoop-2.8.0 | hadoop-28.4 + hadoop-3.3.0 © Hortanwork 1 informatica 7 logs 3 Set the path JAVA_HOME Environment variable 4, Set the path HADOOP_HOME Environment vari < > This PC Nar ick access a @& OneDrive ~_& This Pc Collapse Manage Pin to Start Map network drive... ‘Open in new window Pin to Quick access Disconnect network drive. Add a network location Delete Rename iable > os ic) = ADE = app © cfrbackup-ECPRPIHB files (86)View basic information ebout your computer © device Manage dows ection @ mete setting & siten protection © 2019 Micosot Compton Al ght re @ Aavenced Stem Installed memory (RAM): 00.6 (789 GB usable Sistem toe bit Oeratna Sytem x6l-baced processor Computer name Desctop-s75Fci Workgroup worKsRaur System Properties Computer Name Hardware Advanced system Protection Remote ‘You must be logged on as an Administrator to make most of these changes. Performance Visual effects, processor scheduling, memory usage, and virtual memory Settings User Profiles Desktop settings related to your sig Startup and Recovery System startup, system failure, and debugging information SettingsEnvironment Variables User variables for asus Variable Value HaDOOP HOME Intel IDEA Commer JAVA HOME OnaDrive OneDriveConsumer Pat SEF MASK NO7ONF User variables for asus Varable ChocoateylastPathlpeate HADOO? HOME ‘\hadoop-3.3.0\bin nity Edi... C\Program Files etBrains\intlld IDEA Community Ecltion 207, CAlava\idet 8. 0.241\bin CAUsers\asus\OneDrive ‘AUsers\asus\OneDrive CAPython39\Seripts:C:\Python39\CA\Python37\Seripts\.CAPytho, ‘CHECKS. 4 Value snanz949n25673884 ‘Chadoop-3.3.0\bin Intell IDEA Community Eck. C\Progrom Fle etSrainatintelld IDEA Community Eitton 2019 Java, HOME OneDrive OneDriveconsumer Path SF it User Vavinle Vasile name: "Variable vale Browse Directory. EAlavalt 8.0.24%\bin CA\cers\asusiOnedrive €.\eers\asus\Onebive AP ython29\Sesipte\C\PythorB9\CAPy thon Serpts\CPytho. HADOOP HOME Cahadoop-3 306i Browse FilEnvironment Variables User variables for asus Variable 4ADOOP_HOME Value ‘CAhadoop-3.3.0\bin Intelld IDEA Community Edi... C:\Program Files\etBrains\Intelld IDEA Cormunity Edition 2019, JAVA HOME OneDrive ‘OneDriveConsumer Path CAavaldkt 80_241\bin ‘CAUsers\asus\OneDrive ‘CAUsers\asus\OneDrive CAPython39\Scripts\:C\Python39\.C\Python3\Scripts\.CAP tho, SFF MASK NOZONFCHECKS 1 Jet vibes for aus Vorable ChocoateystPathupdate — = 132412948225523054 CAhadoop 33.0\bin Inteld IOLA Community Edi. CAProgram Fes eran JAVA HOME OneDrive Path SF ce User Variable ‘Variable name: " Verisble value Downe Decoy. Almac 80 241\6i0 CAUsersasus\OneDive CAPythond% Scripts. CA\Python39\C\P thon’ Scipls\CPytho AVA HOME [CNavayaet80_24m\bid Downe Fe5. Configurations Edit file C:/Hadoop-3,3.0/etc/hadoop/[Link],paste the xml code in folder and save [Link] hdfs://localhost:9000 Rename “[Link]” to “[Link]” and edit this file C:/Hadoop- 3.3.0/ete/hadoop/[Link], paste xml code and save this file. [Link] yam Create folder “data” under “C:\Hadoop-3.3.0" Create folder “datanode” under “C:\Hadoop-3.3.0\data” Create folder “namenode” under “C:\Hadoop-3.3.0\data” Edit file C:\Hadoop-3.3.0/ete/hadoop/[Link], paste xml code and save this file, [Link] 1 [Link] /hadoop-3.3.0/data/namenode [Link]/hadoop-3.3.0/data/datanode Edit file C:/Hadoop-3.3.0/etc/hadoop/[Link],, paste xml code and save this file. [Link]-services mapreduce_shufile [Link] [Link] ShufileHandler Edit file C:/Hadoop-3.3.0/ete/hadoop/[Link] by closing the command line “JAVA_HOME=%JAVA_HOME%” instead of set “JAVA_HOME=C:\Java” 6. Hadoop Configurations Download [Link] or (for hadoop 3) [Link] Copy folder bin and replace existing bin folder in C:\Hadoop-3.3.0\bin Format the NameNode Open emd and type command “hdfs namenode format”El C:\Windows\System32\[Link] Pee as uC eC Tey! (c) 2020 Microsoft Corporation. All rights re: AULT eo EERO Su Malad 7. Testing Open emd and change directory to C:\Hadoop-3.3.0\sbin © type [Link] BBE C\Windows\System32\[Link] sion [email protected] soft Windo rea on ¢) 202@ Microsoft Corporation. All K :\hadoop-3.3.@\sbin>[Link] (Or you can start like this) Start namenode and datanode with this command © type [Link] ‘© Start yarn through this command © type [Link] Make sure these apps are running + Hadoop Namenode © Hadoop datanode * YARN Resource Manager~ YARN Node Manager Open: [Link] All Applications Open: [Link]SO © bocaihost eee Overview ‘ocalhost:9000' (va Compiled: “Tue Ju 7 [Link] +0530 2020 by bata fom braneh.3:3.0 Custer io: d Book Poot a: Summary Hadoop installed Successfully... RESULT: Downloaded and installed Hadoop and also understand different Hadoop modes. Startup scripts, Configuration files are successfully implemented.HADOOP IMPLEMENTATION OF FILE MANAGEMENT /[Link] TASKS, SUCH AS ADDING 1 FILES AND DIRECTORIES, RETRIEVING FILES AND pat DELETING FILES. AIM: To implement the following file management tasks in Hadoop: 1, Adding files and directories 2. Retrieving files 3. Deleting Files [Link] a directory in HDFS at given paths). Usage: hadoop fi -mkdir Example: hadoop fs -mkdir /user/saurzcode/dirl /user/saurzcode/dir2 [Link] the contents of adirectory. Usage hadoop fs -Is Example: hadoop f -Is /user/saurzcode [Link] and download a file in HDFS. Upload: hadoop fs -put: Copy single sre file, or multiple src files from local file system to the Hadoop data file system Usage: hadoop fs -put ... Example: hadoop fs -put /home/saurzcode/[Link] /user/ saurzcode/dir3/ Download: hadoop fs -get: Copies/Downloads files to the local file system Usage: hhadoop fs -get Example: hadoop fs -get /user/saurzcode/dir3/[Link] /home/ [Link] contents of a file Same as unix cat command: Usage: hadoop fs -cat Example: hadoop fs -cat /user/saurzcode/dirl /[Link] 1. Copy a file from source todestination This command allows multiple sources as well in which casethe destination must be a directory. Usage: hadoop fs -cp Example: hadoop fs Juser/saurzcode/dirl/[Link] juser/saurzcode/dir2 2. Copy a file from/To Local file system to HDF copyFromLocal Usage: hadoop fs -copyFromLocal URI Example: hadoop fs -copyFromLocal /home/saurzcode/[Link] /user/ saurzcode/[Link] Similar to put command, except that the source is restricted to a local file reference copyToLocal Usage: hadoop fs -copyToLocal [-ignorecre] [-cre] URI Similar to get command, except that the destination is restricted to a local file reference. 3. Move file from source to destination. Note:~ Moving files across filesystem is not permitted. Usage hadoop fs -mv Example: hadoop fs -mv /user/saurzcode/dirl /[Link] /user/saurzcode/ dir24, Remove a file or directory in HDFS. Remove files specified as argument, Deletes directory only when it is empty Usage hadoop fs -rm Example: hadoop fs -rm /user/saurzcode/dirl /[Link] Reeursive version of delete, Usage hadoop fs -rmr Example: hadoop fs -rmr /user/saurzcode/ 5. Display last few lines of a file. Similar to tail command in Unix. Usage : hadoop fs -tail Example: hadoop fi -tail /user/saurzcode/dirl /[Link] 6. Display the aggregate length of a file. Usage hadoop fs -du Example: hadoop fs -du /user/saurzcode/dir /[Link] RESULT: Thus,the Hadoop Implementation of file management tasks, such as Adding files and directories, retrieving files and Deleting files is executed successfully.[Link] IMPLEMENT OF MATRIX MULTIPLICATION WITH HADOOP MAP REDUCE To write a Map Reduce Program that implements Matrix Multiplication. ALGORITE We assume that the input matrices are already stored in Hadoop Distributed File System (HDES) in a suitable format (e.g., CSV, TSV) where each row represents a matrix element. The matrices are compatible for multiplication (the number of columns in the first matrix is equal to the number of rows in the second matrix). STEP 1: MAPPER ‘The mapper will take the input matrices and emit key-value pairs for each element in the result matrix. The key will be the (row, column) index of the result element, and the value will be the corresponding element value, STEP 2: REDUCER The reducer will take the key-value pairs emitted by the mapper and calculate the partial sum for each clement in the result matrix. STEP 3: MAIN DRIVER ‘The main driver class sets up the Hadoop job configuration and specifies the input and output paths for the matrices. STEP 4: RUNNING THE JOB To run the MapReduce job, you need to package your classes into a JAR file and then submit it to Hadoop using the hadoop jar command. Make sure to replace input_path and output_path with the actual HDFS paths to your input matrices and desired output directory. PROGRAM: import [Link] JOException; import [Link]; import [Link]; import [Link]; import [Link] .io. Text; import [Link] Mapper;import [Link]. Reducer; import [Link]. [Link], Configuration; import [Link] Job; import [Link] input. TextInputFormat; import [Link]. [Link]. TextOutputFormat; import [Link] input FilelnputFormat; import [Link] .[Link]; import [Link]; public class MatrixMultiplicationMapper extends Mapper { protected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException { int result = 0; for (Text value : values) { #/ Accumulate the partial sum for the result element result += [Link]([Link]()); 3 // Emit the final result for the result element [Link](key, new IntWritable(result));} public class MatrixMultiplicationDriver { public static void main(Stringf] args) throws Exception { Configuration conf = new Configuration(); Job job = [Link](conf, "Matrix Multiplication"); [Link]([Link]); [Link]([Link]); job. setReducerClass([Link]); [Link]([Link]); [Link]([Link]); [Link](job, new Path(args[0])); FileOutputFormat setOutputPath(job, new Path(args[1); [Link]([Link](true) ? 0 : 1); Run the program hadoop jar matrixmultiplication jar MatrixMultiplicationDriver input_path output_path Dy Jlendigubuntu:~/Desktop$ hadoop jar HatrixHulttplicatton. jar /matrix_data/ /natr\x_output_nev] part-00000(4) 3% €,0,240.0 61,250. 62,260. 10,880. 11,930. 1,2,980.RESUL’ ‘Thus the Map Reduce Program that implements Matrix Multiplication was executed and verified successfully./[Link] RUN A BASIC WORD COUNT MAP REDUCE PROGRAM TO UNDERSTAND MAP REDUCE PARADIGM [DATE: AIM: To write a Basic Word Count program to understand Map Reduce Paradigm. ALGORITHM: The entire MapReduce program can be fundamentally divided into three parts: + Mapper Phase Code * Reducer Phase Code © Driver Code STEP 1: MAPPER CODE: We have created a class Map that extends the class Mapper which is already defined in the MapReduce Framework. We define the data types of input and output key/value pair afler the class declaration using angle brackets. * Both the input and output of the Mapper is a key/value pair. Input: © The key is nothing but the offset of each line in the text file:LongWritable + The value is each individual line : Text Output: + The key is the tokenized words: Text We have the hardcoded value in our case which is 1: IntWritable © Example - Dear 1, Bear 1, We have written a java code where we have tokenized cach word and assigned them a hardcoded value equal to 1 STEP 2: REDUCER CODE: * We have created a class Reduce which extends class Reducer like that of Mapper. * We define the data types of input and output key/value pair after the class declaration using angle brackets as done for Mapper. Both the input and the output of the Reducer is a key value pair.‘The key nothing but those unique words which have been generated after the sorting and shuffling phase: Text The value is a list of integers corresponding to each key: IntWritable Example — Bear, [1, 1], ete Output: * The key is all the unique words present in the input text file: Text, * The value is the number of occurrences of each of the unique words: IntWritable © Example — Bear, 2; Car, 3, ete, «We have aggregated the values present in each of the list corresponding to each key and produced the final answer. + In general, a single reducer is created for each of the unique words, but, you can specify the number of reducer in mapred-site xml. STEP 3: DRIVER CODE: * Inthe driver class, we set the configuration of our MapReduce job to run in Hadoop. We specify the name of the job , the data type of input/ output of the mapper and reducer We also specify the names of the mapper and reducer classes. The path of the input and output folder is also specified. The method setInputFormatClass () is used for specifying that how a Mapper will read the input data or what will be the unit of work. Here, we have chosen TextInputFormat so that single line is read by the mapper at a time from the input text file. The main () method is the entry point for the driver. In this method, we instantiate a new Configuration object for the job. PROGRAM: import [Link]; import [Link]; import [Link]; import [Link]; import [Link] io. Text; import [Link] Mapper; import [Link] [Link]. Reducer;import [Link]. [Link]. Configuration; import [Link]; import [Link] [Link] input. TextInputFormat; import [Link]; import [Link] input FilelnputFormat; import [Link], import [Link]. hadoop. [Link]; public class WordCount { public static class Map extends Mapper { public void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException{ String line = [Link](); StringTokenizer tokenizer = new StringTokenizer(line); while ([Link]()) { [Link]([Link]()); [Link](value, new IntWritable(1)); public static class Reduce extends Reducer { public void reduce(Text key, Iterable values Context context) throws IOException InterruptedException { int sum=0; for(IntWritable x: values) { sum+=[Link](); } [Link](key, new IntWritable(sum)); public static void main(String[] args) throws Exception { Configuration conf= new Configuration;Job job = new Job(conf,"My Word Count Program"); [Link]([Link]); [Link](Map class); [Link]([Link]); job setOutputKeyClass([Link]); [Link]([Link]); [Link]([Link]); [Link](TextOutputF [Link]); Path outputPath = new Path(args[1]); J/Configuring the input/output path from the filesystem into the job [Link](job, new Path(args[0|)); FileOutputFormat setOutputPath(job, new Path(args[1])); /ideleting the output path automatically from hdfs so that we don't have to delete it explicitly [Link](conf).delete(outputPath); Hexiting the job only if the flag value becomes false [Link]([Link](true) ? 0 : 1); } } Run the MapReduce code: The command for running a MapReduce code is: hadoop jar hadoop-mapreduce-example jar WordCount /sample/input /sample/output OUTPUT: Sears ere ei ter re COC’) Ms Se cares INFO napreduce.20b: nop OX reduce OX INFO Rapreduce.2eb: ap 100% reduce 109% INFO napreduce. 20 yab_3473930730090 e663 completed suced irapart:r-00000(3) % 2 a ADRIAN, 2 Agdiles, AEsculaptus? ALARBUS, ALENCON, ALL'S (25 ANDRONICUS ANGELO, 2 RESULT: Thus the Map Reduce Program that implements word count was executed and verified successfully.[Link] INSTALLATION OF HIVE ALONG WITH PRACTICE EXAMPLES. DATE: AIM: To install HIVE along with practice examples. PREREQUISITES: * Java Development Kit (JDK) installed and the JAVA_HOME environment variable set + Hadoop installed and configured on your Windows system. STEP-BY-STEP INSTALLATION: 1. Download HIVE: Visit the Apache Hive website and download the latest stable version of Hive. Official Apache Hive website: [Link] 2. Extract the Downloaded Hive Archive to a Directory on Your Windows Machine, e.g., CAhive. 3. Configure Hive: * Open the Hive configuration file ([Link]) located in the conf folder of the extracted Hive directory. Set the necessary configurations, such as Hive Metastore connection settings and Hadoop configurations. Make sure to adjust paths accordingly for Windows. Here's an example of some configurations: [Link] jdbe:derby:;databaseName~/path/to/metastore_db;ereate-true JDBC connect string for a JDBC metastore. [Link] Variables Setup: Add the Hive binary directory (C:\hive\bin in this example) to your PATH environment variable. Set the HIVE HOME environment variable to point to the Hive installation directory (CAhive in this example).5. Start the Hive Metastore service: To start the Hive Metastore service, you can use the schematool script: Stree wer 6. Start Hive: * Open a command prompt or terminal and navigate to the Hive installation directory. © Execute the hive command to start the Hive shell. EXAMPLES: 1. Create a Database: To create a new database in HIVE, use the following syntax: CREATE DATABASE database_name; Example: CREATE DATABASE mydatabase; 2. Use a Database: To use a specific database in HIVE, use the following syntax: USE database_name; Example: USE mydatabase; Show Databases: To display a list of available databases in HIVE, use the following syntax: SHOW DATABASES; Create a Table: To create a table in HIVE, use the following syntax: CREATE TABLE table_name ( column! datatype, column? datatype,Example: CREATE TABLE mytable (id INT, name STRING, age INT. Show Tables: To display a list of tables in the current database, use the following syntax: SHOW TABLES; 6. Deseribe a Table: To view the schema and details of a specific table, use the following syntax: DESCRIBE table_name; Example: DESCRIBE mytable; 7. Insert Data into a Table: To insert data into a table in HIVE, use the following syntax: INSERT INTO table_name (column1, column2, ...) VALUES (valuel, value2, ...); Example: INSERT INTO mytable (id, name, age) VALUES (1, ‘John Doe', 25); 8. Select Data from a Table: SELECT column], column2, ... FROM table_name WHERE condition; Example: SELECT * FROM mytable WHERE age > 20; RESUL’ ‘Thus the Installation of HIVE was done successfully.[Link] INSTALLATION OF HBASE ALONG WITH PRACTICE EXAMPLES DATE: AIM: To install HBASE using Virtual Machine and perform some operations in HBASE. ALGORITHM: Step 1: Install a Virtual Machine * Download and install a virtual machine software such as VirtualBox ([Link] or VMware ([Link] * Create a new virtual machine and install a Unix-based operating system like Ubuntu or CentOS, You can download the ISO image of your desired Linux distribution from their official websites, Step 2: Set up the Virtual Machine + Launch the virtual machine and install the Unix-based operating system following the installation wizard. ‘© Make sure the virtual machine has network connectivity to download software packages. Step 3: Install Java Open the terminal or command line in the virtual machine. Update the package list sudo apt update Install OpenJDK (Java Development Kit) sudo apt install default-jdk Verify the Java installation: java -version Step 4: Download and Install HBase * Inthe virtual machine, navigate to the directory where you want to install HBase. * Download the HBase binary distribution from the Apache HBase website ([Link] Look for the latest stable version. ‘Extract the downloaded archive tar -xvf .[Link] * Replace with the actual name of the HBase archive file 34Move the extracted HBase directory to a desired location: sudo my /opthhbase Replace with the actual name of the extracted HBase directory. Step 5: Configure HBase * Open the HBase configuration file for editing: sudo nano /opt/hbase/conf/[Link] ‘Add the following properties to the configuration file: [Link] [Link] [Link] /var/lib/zookeeper Save the file and exit the text editor. Step 6: Start HBase © Start the HBase server: sudo /opt/hbase/bin/[Link] HBASE PRACTICE EXAMPLES: Step 1: Start HBase * Make sure HBase is installed and running on your Windows system. Step 2: Open HBase Shell © Open a command prompt or terminal window and navigate to the directory where the HBase installation is located. Run the following command to start the HBase shell. >>hbase shelltep 3: Create a Table «Inthe HBase shell, you can create a table with column families. For example, let's create a table named "my_table" with a column family called "cf": >> ereate 'my_table', 'ef* Step 4: Insert Data * To insert data into the table, you can use the put command, ‘Here's an example of inserting a row with a specific row key and values: >> put 'my_table’, ‘row!’ ‘ef:columnt’, 'valuel’ >> put 'my_table’, ‘row’, 'cf:column2', 'value2 Step 5: Get Data * You can retrieve data from the table using the get command. + For example, to get the values of a specific row: >> get 'my_table', 'rowl’ * This will display all the column family values for the specified row Step 6: Sean Data * To scan and retrieve multiple rows or the entire table, use the scan command, ‘* For instance, to scan all rows in the table: >> scan 'my_table’ * This will display all rows and their corresponding column family values. Step 7: Delete Data * To delete a specific row or a particular cell value, you can use the delete command. «Here's an example of deleting a specific row: >>delete 'my_table', ‘row!’ Step 8: Disable and Drop Table * Ifyou want to remove the table entirely, you need to disable and drop it. * Use the following commands: >>disable 'my_table’ >>drop 'my_table’ RESUL! ‘Thus the installation of HBase using Virtual Machine was done successfilly.[[Link] INSTALLATION OF THRIFT DATE: To install Apache thrift on Windows OS. ALGORITHM: Step 1: Download Apache Thrift: * Visit the Apache Thrift website: https:/[Link]/ * Go to the "Downloads" section and find the latest version of Thrift. * Download the Windows binary distribution (ZIP file) for the desired version. Step 2: Extract the ZIP file: * Locate the downloaded ZIP file and extract its contents to a directory of your choice. © This directory will be referred to as in the following steps, Step 3: Set up environment variables: Open the Start menu and search for "Environment Variables" and select "Edit the system environment variables." Click the "Environment Variables" button at the bottom right of the "System Properti¢ window. Under the "System variables” section, find the "Path" variable and click "Edit.” Add the following entries to the "Variable value" field (replace with the actual directory path): \bin \lib Click "OK" to save the changes. Step 4: Verify the installation: * Open a new Command Prompt window. ‘Run the following command to verify that Thrift is installed and accessible: thrift version ‘* Ifeverything is set up correctly, you should see the version number of Thrift printed on the screen,RESULT: Thus the installation of Thrift on windows OS was done successfully.[Link] PRACTICE IMPORTING AND EXPORTING DATA FROM DATE: VARIOUS DATABASES. AIM: ‘To import and export data from various Databases using SQOOP. ALGORITHM: Step 1: Install SQOOP. # First, you need to install Sqoop on your Hadoop cluster or machine * Download the latest version of Sqoop from the Apache Sqoop website (hutp://[Link]/) and follow the installation instructions provided in the documentation Step 2: Importing data from a database: ‘* To import data from a database into Hadoop, use the following Sqoop command: Sqoop import ~connect jdbe:://:/\ ~-username ~-password ~table \ --target-dir \ --m Replace the placeholders (, , , , , , , , and ) with the appropriate values for your database and Hadoop environment. Step 3: Exporting data to a database: To export data from Hadoop to a database, use the following Sqoop command: sqoop export —connect jdbe:://:/ \ ~-username ~-password ~-table ~-export-dir \ --input-fields-terminated-by '" Replace the placeholders (, , , , , , , , and ) with the appropriate values for your database and Hadoop environment, RESUL’ Thus the implementation export data from various Databases using SQOOP was done successfully.

Big Data Analytics Lab Manual 2023-24
No ratings yet
Big Data Analytics Lab Manual 2023-24
35 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
38 pages
CCS334 Big Data Analytics Overview
No ratings yet
CCS334 Big Data Analytics Overview
16 pages
CCS334 - Bda Lab Manual
No ratings yet
CCS334 - Bda Lab Manual
40 pages
BDA Lab Manual AI&DS
No ratings yet
BDA Lab Manual AI&DS
60 pages
Big Data Analysis Lab Manual
No ratings yet
Big Data Analysis Lab Manual
39 pages
List The Computer Security Hybrid Policies and Explain
No ratings yet
List The Computer Security Hybrid Policies and Explain
21 pages
CSE 5th Semester - Software Testing and Automation - CCS366 - Question Bank and Important 2 Marks Questions With Answer
No ratings yet
CSE 5th Semester - Software Testing and Automation - CCS366 - Question Bank and Important 2 Marks Questions With Answer
25 pages
CD3291 Data Structures and Algorithms Lecture Notes 1
No ratings yet
CD3291 Data Structures and Algorithms Lecture Notes 1
162 pages
ccs355 Lab Manual
No ratings yet
ccs355 Lab Manual
24 pages
Challenges Motivating Deep Learning
No ratings yet
Challenges Motivating Deep Learning
13 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
CCS334 BDA Practical Question
No ratings yet
CCS334 BDA Practical Question
2 pages
CCS345 Ethics and AI Lab Record
No ratings yet
CCS345 Ethics and AI Lab Record
35 pages
Compiler Design - CS3501 2021 Regulation - Notes - Hand Writing
No ratings yet
Compiler Design - CS3501 2021 Regulation - Notes - Hand Writing
110 pages
Cs3301 Unit Important Q-Data-Structures
No ratings yet
Cs3301 Unit Important Q-Data-Structures
8 pages
CCS334 Big Data Analytics Key Questions
No ratings yet
CCS334 Big Data Analytics Key Questions
1 page
Data Warehouse 21reg
100% (1)
Data Warehouse 21reg
2 pages
Question Paper - AI (Feb 1)
No ratings yet
Question Paper - AI (Feb 1)
2 pages
CS3491 Ai & ML Lab Manual
No ratings yet
CS3491 Ai & ML Lab Manual
57 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
CSM Laboratory Manual Edited
No ratings yet
CSM Laboratory Manual Edited
22 pages
Big Data Analytics - CCS334 - Notes - Unit 1 - Understanding Big Data
No ratings yet
Big Data Analytics - CCS334 - Notes - Unit 1 - Understanding Big Data
40 pages
Ccs334 Big Data Analytics
0% (1)
Ccs334 Big Data Analytics
2 pages
Ad3391 LAB MANUAL
No ratings yet
Ad3391 LAB MANUAL
23 pages
CS-605 Data - Analytics - Lab Complete Manual (2) - 1672730238
No ratings yet
CS-605 Data - Analytics - Lab Complete Manual (2) - 1672730238
56 pages
Ccs334 - Big Data Analytics
60% (5)
Ccs334 - Big Data Analytics
2 pages
CCS341 Data Warehousing Lab Manual
No ratings yet
CCS341 Data Warehousing Lab Manual
89 pages
P.prabu (29x61c) CCS334 BDA - Unit 2
No ratings yet
P.prabu (29x61c) CCS334 BDA - Unit 2
29 pages
Cloud Computing Lab Record 2021
No ratings yet
Cloud Computing Lab Record 2021
5 pages
CS3452 Theory of Computation Apr May 2023 Question Paper Download
100% (2)
CS3452 Theory of Computation Apr May 2023 Question Paper Download
3 pages
CCS374 Web Application Security Q&A
No ratings yet
CCS374 Web Application Security Q&A
18 pages
Ccs368-Stream Processing Lab Manual
No ratings yet
Ccs368-Stream Processing Lab Manual
50 pages
Distributed Computing - Syllabus: Course Code: Cs3551 REGULATION:2021
No ratings yet
Distributed Computing - Syllabus: Course Code: Cs3551 REGULATION:2021
7 pages
IF4071 - Deep Learning Laboratory
No ratings yet
IF4071 - Deep Learning Laboratory
1 page
CCS358 QP
100% (2)
CCS358 QP
3 pages
AD3271 Data Structures Lab Manual
No ratings yet
AD3271 Data Structures Lab Manual
50 pages
CB3491 Cryptography and Cyber Security Nov Dec 2023 Question Paper Download
No ratings yet
CB3491 Cryptography and Cyber Security Nov Dec 2023 Question Paper Download
3 pages
Big - Data Lab Manual
No ratings yet
Big - Data Lab Manual
65 pages
CS8091 Important Questions BDA
No ratings yet
CS8091 Important Questions BDA
1 page
Bda Lab Manual
0% (1)
Bda Lab Manual
40 pages
Ad3381 DDM Lab Manual
No ratings yet
Ad3381 DDM Lab Manual
55 pages
AD3251 Data Structures Design Question Bank 1
No ratings yet
AD3251 Data Structures Design Question Bank 1
1 page
Data Warehousing Course Syllabus 2023
100% (1)
Data Warehousing Course Syllabus 2023
5 pages
ccs355 Syllabus NNDL
100% (1)
ccs355 Syllabus NNDL
3 pages
Ad3351 Daa Unit I
No ratings yet
Ad3351 Daa Unit I
135 pages
Ad3271 Lab Manual
0% (1)
Ad3271 Lab Manual
39 pages
Ad3301 Dev Full Notes
No ratings yet
Ad3301 Dev Full Notes
53 pages
CS3362 Set3
No ratings yet
CS3362 Set3
3 pages
Ad3311 Set4
No ratings yet
Ad3311 Set4
2 pages
DDM Lab Manual
100% (1)
DDM Lab Manual
80 pages
AD3501-DL-Unit 1 Notes
No ratings yet
AD3501-DL-Unit 1 Notes
43 pages
ccs346 Anna University
No ratings yet
ccs346 Anna University
4 pages
Data Warehousing ccs341
No ratings yet
Data Warehousing ccs341
103 pages
Ccs341 DW Notes All 5 Units
100% (1)
Ccs341 DW Notes All 5 Units
159 pages
CW3551 Data and Information Security
100% (2)
CW3551 Data and Information Security
2 pages
Essentials of Computing Syllabus
100% (1)
Essentials of Computing Syllabus
2 pages
CCS366 Software Testing and Automation Notes
No ratings yet
CCS366 Software Testing and Automation Notes
105 pages
Data Engineering Lab: List of Programs
No ratings yet
Data Engineering Lab: List of Programs
2 pages
CCS334 BDA Syllabus
No ratings yet
CCS334 BDA Syllabus
5 pages
Vir Lab Record
No ratings yet
Vir Lab Record
32 pages
LAKSHMI PUBLICATIONS(LOCAL AUTHOR BOOK) (1)
No ratings yet
LAKSHMI PUBLICATIONS(LOCAL AUTHOR BOOK) (1)
14 pages
CCS371
No ratings yet
CCS371
32 pages
Ai Lab Manual (2ND Year 3RD Sem)
No ratings yet
Ai Lab Manual (2ND Year 3RD Sem)
59 pages
Ad3301 Dev QB-3,4,5
100% (1)
Ad3301 Dev QB-3,4,5
27 pages

CCS334 BDA Lab Manual Final

Uploaded by

CCS334 BDA Lab Manual Final

Uploaded by

You might also like