0% found this document useful (0 votes)
151 views34 pages

Installing Python for Data Visualization

Uploaded by

h472688
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views34 pages

Installing Python for Data Visualization

Uploaded by

h472688
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

EXP.

NO:1 INSTALLATION OF DATA ANALYSIS AND VISUALIZATION TOOL

AIM:
To install the data analysis and visualization tools like Python and R
OBJECTIVES :
● To understand the process of visualization
● To learn the software tools available for visualization
● To know the packages that support data visualization python and r

SOFTWARE REQUIRED:
Setup files of python and R
DESCRIPTION (MAPPING THE THEORY):
Data is a collection of discrete objects, numbers, words, events, facts, measurements, observations,
or descriptions of things. It is collected and stored by every event or process occurring in several disciplines,
including biology, economics, engineering, marketing, and others. Processing such data elicits useful
information and processing such information generates useful knowledge.
There are several software tools that are available
1. Python: This is an open source programming language widely used in data analysis,
data mining, and data science.
2. R programming language: R is an open source programming language that is
widely utilized in statistical computation and graphical data analysis
3. Weka: This is an open source data mining package that involves several EDA tools
and algorithms
4. KNIME: This is an open source tool for data analysis and is based on Eclipse.

Data visualization is concerned with visually presenting sets of primarily quantitative raw data in a
schematic form. The visual formats used in data visualization include tables, charts and graphs (e.g. pie
charts, bar charts, line charts, area charts, cone charts, pyramid charts, donut charts, histograms,
spectrograms, cohort charts, waterfall charts, funnel charts, bullet graphs, etc.), diagrams, plots (e.g. scatter
plots, distribution plots, box-and-whisker plots), geospatial maps (such as proportional symbol maps,
choropleth maps, isopleth maps and heat maps), figures, correlation matrices, percentage gauges, etc., which
sometimes can be combined in a dashboard.

INSTALLATION OF PYTHON
Step1:
● Open website [Link] from the web browser.
● Click at Downloads. Choose the version to download as per the operating system.
● Click at all releases to download an older version of Python.
Step2:
● Double click at Python installer donwloaded on the computer. Dialogbox will appear as follow:

Step3:
Click at Run button to continue installation. Dialog Box will appear as

Step4:
Click at Install Now.
Python installation will start and following window will appear
Step5:
When installation gets completed following window will appear

Step6:
Click close to close this window and the installation is complete

INSTALLATION OF R
Step1:
Visit the RStudio official site and click on Download RStudio.

Step 2:
Select RStudio desktop for open-source license and click on download.

Step 3:
Select the appropriate installerto start downloading of RStudion setup.

Step 4:
Run the setup in the following the below steps:
1) Click on Next.

2) Click on Install.
3) Click on finish.

4) RStudio is ready to work.

Real time application:

● Retail: Exploratory data analysis can enable analysts to represent different sales trends
graphically and visualize data related to best-selling product categories, buyer demographics and
preferences, customer spending patterns, and units sold over a certain period.
● Fraud detection: When EDA data mining techniques are used on Medicare datasets, it’s possible
to evaluate the risk of a given individual for fraudulent activity.
● Auditing: EDA can be applied to several stages of auditing, for both internal and external
audit cycles.
● Geography: Exploratory spatial data analysis (ESDA) is a branch of EDA that is concerned
specifically with geographical data. Those with training in this field can perform a variety of
geographical tasks, such as visualizing spatial distributions, spotting physical outliers, and
uncovering spatial clusters or patterns.

VIVA:
1. What is meant by data analysis?
Data analysis is a process for obtaining raw data, and subsequently converting it into
information useful for decision-making by users. Data is collected and analyzed to
answer questions, test hypotheses, or disprove theories.
2. What are the uses of python?
Python is commonly used for developing websites and software, task automation, data
analysis and
data visualization.
3. What are the advantages of
python?
● Simplicity and Readability
● Versatility and Flexibility
● Community and Support
● Integration and Compatibility
● Speed of Development
● Open Source Advantage
● Machine Learning and AI
● Education and Research

RESULT:
The installation of data analysis and visualization tools like python and R are completed successfully.

EX NO : 2 IMPLEMENTATION OF EXPLORATORY DATA ANALYSIS (EDA)

AIM :
To perform exploratory data analysis using the personal email dataset.
OBJECTIVES :
• To Export the mails from mailbox to a dataset
• To import the dataset as a dataframe
• To visualize and get different insights of the data
SOFTWARE REQUIRED:
Python

DESCRIPTION (MAPPING THE THEORY):

Data encompasses a collection of discrete objects, numbers, words, events, facts, measurements,
observations, or even descriptions of things. Such data is collected and stored by every event or process
occurring in several disciplines, including biology, economics, engineering, marketing, and others.
Processing such data elicits useful information and processing such information generates useful knowledge.
EDA is a process of examining the available dataset to discover patterns, spot anomalies, test hypotheses,
and check assumptions using statistical measures. Learn about how to export all your emails as a dataset,
how to use import them inside a pandas dataframe, how to visualize them.

Implementation:
Technical requirement:
1. Log in to your personal Gmail account.
2. Go to the following link: [Link]
3. Deselect all the items but Gmail, as shown in the following screenshot:
a. Select Send download link by email, One-time archive, .zip, and the
maximum allowed size. Customize the format. Once done, hit Create
archive.

4. Select the archive format, as shown in the following screenshot:


5. Use the path to the mbox file for further analysis.

Loading the dataset


1. load the required libraries:
code:

2. load the dataset:


code:

Output:

3. list the available keys:


code:

Output:

Data transformation
Data cleansing
1. Import the csv package:
Code:

2. Create a CSV file with only the required


attributes: Code:

3. Loading the CSV


file Code:

4. Converting the date


Check the datatypes of each column as shown here:
Code:

Output:

Note that a date field is an object. So, we need to convert it into a DateTime argument. In
the next step, we are going to convert the date field into an actual DateTime argument. We
can do this by using the pandas to_datetime() method.

Code:

5. Removing NaN values


Next, we are going to remove NaN values from the field.
Code:

6. It is good to save the preprocessed file into a separate CSV


file. Code:

Applying descriptive statistics

1. Sanity checking using descriptive statistics


techniques Code:

Output:

[Link] first few entries of the email


dataset Code:

Output:

Data refactoring
The from field contains more information than we need. We just need to extract an email address from that
field. Let's do some refactoring:

1. import the regular expression package:


Code:

2. Create a function that takes an entire string from any column and extracts an email
address Code:

3. apply the function to the from column:


Code:

4. Refactor the label field. If an email is from your email address, then it is the sent email. Otherwise,
it is a received email, that is, an inbox email:
Code:
Dropping columns
[Link] the column
Code:

Output:

Refactoring timezones

1. refactor timezone into the US/Eastern


timezone. Code:

2. call the
function Code:

3. convert the day of the week variable into the name of the
day Code:

4. do the same process for the time of the day


Code:

5. refactor the
hour Code:

6. Refactor the year


integer Code:

7. Refactor the year


fraction Code:

8. Set the date to index and we will no longer require the original date field. So, we can remove
that: Code:

Data analysis
1. Number of

emails Code:

Output:
2. Time of day - the Email was sent and received
2.1 create two sub-dataframe—one for sent emails and another for received emails:

Code:

2.2 import the required libraries:

Code:

2.3 create a function that takes a dataframe as an input and creates a


plot. Code:

2.4 plot both received and sent


emails. Code:

Output:

3. Average emails per day and hour


3.1 create two functions, one that counts the total number of emails per day and one that plots
the average number of emails per hour:
Code:

3.2 import the required libraries:


Code:

3.3 create a function that plots the average number of emails per
hour: Code:

Output:

3.4 create a class that plots the time of the day versus year for all the emails within the
given timeframe:
Code:
c
Output:

4. Number of emails per day


4.1 find the busiest day of the week in terms of

emails: Code:

Output:

Code:
Output:

4.2 most active days for receiving and sending emails separately:
Code:

Output

4.3 find the most active time of day for email


communication. Code:

Output:

4.4 analyze about your emails is the most frequently used words. We can create a word cloud to
see the most frequently used words. first remove the archived emails:

Code:

4.5 plot the word


cloud Code:

Output:

REAL-TIME APPLICATIONS :
• Professional sports: Sports Analysts rely on EDA to search out the most successful players
and teams, as well as to discover the variables that contribute to a team’s wins and losses. EDA is
a helpful tool for deciding which players or teams should be selected for a company to endorse.
• History: EDA can be applied to create new data about past events.
• Healthcare: EDA is helpful for spotting natural patterns embedded in large stores of medical data.

VIVA QUESTIONS:

1. How to drop column from the dataset?

dataset_name.drop(columns='column_name', inplace=True)

2. What is meant by data


cleaning? Data cleaning:
Preprocessed data may still not be ready for detailed analysis. The tasks are performed in the
data cleaning stage includes
• matching the correct record
• finding inaccuracies in the dataset
• understanding the overall data quality
• removing duplicate items and
• filling in the missing values

3. How to create email dataset?


1. Log in to personal Gmail account.
2. Go to the following link: [Link]
3. Deselect all the items but Gmail
4. Select the archive format and hit Create archive
Note that Send download link by email, One-time archive, .zip, and the maximum allowed size
are being selected
5. Download the email archive that will be received in the specified mail
6. The path to the mbox file has to be used for further analysis

RESULT:
The exploratory data analysis on personal email dataset was implemented successfully.

EX NO : 3 WORKING WITH PYTHON

PACKAGES AIM:
To perform various operations on Numpy arrays, Pandas data frames and construct basic plots using
Matplotlib.

OBJECTIVES :
• To learn different operations supported by Numpy package
• To implement operations on dataframes using Pandas package
• To visualize the data using plots from Matplotlib package

SOFTWARE REQUIRED:
Python - Numpy, Pandas, Matplotlib packages
DESCRIPTION (MAPPING THE THEORY):

Visual Aids for EDA:


The steps associated with creating various plots are:
1. Import the required libraries
2. Set up the data
3. Specify the layout of the figure and allocate space
4. Plot the graph:
5. Display the graph on the screen

1. Line chart
A line chart is used to illustrate the relationship between two or more continuous variables.

2. Bar charts
Bar charts are used to distinguish objects between distinct collections to track variations over time.

[Link] plot
Scatter plots use a Cartesian coordinates system to display values of typically two variables for a set of data.

[Link] plot
A bubble plot is a manifestation of the scatter plot where each data point on the graph is shown as a bubble.

[Link] plot and stacked plot


The stacked plot represents the area under a line plot and several such plots can be stacked on top of one
another, giving the feeling of a stack. It can be useful when we want to visualize the cumulative effect
of multiple variables being plotted on the y axis.

[Link] chart
The pie chart fails to appeal to most experts. The purpose of the pie chart is to communicate proportions.

[Link] chart
A table chart combines a bar chart and a table.

[Link] chart or spider web plot


Polar chart is a diagram that is plotted on a polar axis. Its coordinates are angle and radius.

[Link]
Histogram plots are used to depict the distribution of any continuous variable. These types of plots are very
popular in statistical analysis.

[Link] chart
A lollipop chart can be used to display ranking in the data. It is similar to an ordered bar chart.

Data Transformation
Data transformation is a set of techniques used to convert data from one format or structure to another format
or structure. The main reason for transforming the data is to get a better representation such that the
transformed data is compatible with other data

1. Merging on index
Index acts as the keys for merging dataframes - pass left_index=True or right_index=True to indicate that the
index should be accepted as the merge key.

2. Reshaping and pivoting


Helps to arrange data in a dataframe in some consistent manner. This can be done with hierarchical indexing
using two actions:
• Stacking: Stack rotates from any particular column in the data to the rows.
• Unstacking: Unstack rotates from the rows into the column.

3. Transformation techniques
Includes data transformations like cleaning, filtering and deduplication
(i) Performing data deduplication - Removing duplicate rows to enhance the quality of the dataset
(ii) Replacing values - find and replace values inside a dataframe
(iii) Handling missing data - NaN - indicates that there is no value specified for the particular index.
(iv) Filling missing values- Replace NaN values with any particular values- use fillna() method

IMPLEMENTATION:
1. Create and display a 1D, 2D and 3D numpy array
Code:

Output:

[Link] the attributes of a 2D array


Code:

Output:

[Link] basic arithmetic operation on 2D arrays


Code:

Output:

[Link] NumPy and pandas to create a dataframe and perform the following operations
A) Create a dataset by combining Numpy array and Pandas dataframes
Code:

Output:

B) Change the background and text colour of the dataframe


Code:

Output:

C) Apply different colors to the values that are less than 0, greater than zero and others
Code:

Output:

D) Highlight the NaN values in DataFrame


Code:

Output:

E) Highlight the Min values in each column


Code:

Output:

F) Highlight the maximum value, minimum value and null value in each column
Code:

Output:
G) Generate background gradient color variation
Code:

Output:

[Link] a function using the faker Python library to generate a dataset with two columns: Date and
Price, indicating the stock price on that date.
Code:

Output:

6. Create a line char for the stock price dataset


Code:

Output:

7.(A) Draw a bar chart to keep track of the amount of a material sold every month. Use the calendar Python
library to keep track of the months of the year (1 to 12) corresponding to January to December
Code:

Output:

(B) Use horizontal bar chart to represent the same


Code:

Output:

[Link] scatter plot to represent total hours of sleep required by perosns of different age using an available
dataset.
Code:

Output:

9.(A) Fit a line to the scatter plot so that it becomes more interpretable
Code:

Output:

(B) Generate scatter plot for Iris dataset


Code:

Output:

10. Use the [Link] method to draw a bubble chart for Iris dataset
Code:

Output:

11. Construct Area plot and stacked plot for the House loan Mortgage cost per month for a year
using appropriate dataset
Code:

Output:

12. (A) Use the Pokemon dataset to draw a pie chart

Code:
Output:

(B) Use the pandas library to create a pie chart

Code:

Output:

13. Prepare a dataset with informations about LED bulbs that come in different wattages namely 4.5 Watts, 6
Watts, 7 Watts, 8.5 Watts, 9.5 Watts, 13.5 Watts, and 15 Watts. It has the following variables: the year,
wattage, and number of units sold in a particular year. Draw a table chart to represent the information and
add the table to the bottom of the chart

Code:

Output:

14. Create a dataset with subjects, planned grade and actual grade and represent them using a Polar chart

Code:

Output:

15. (A) Create a dataset with number of years of Python programming experience ranging from 0 to 20
and visualize it using a Histogram. Draw a green vertical line at the average experience.

Code:

Output:

(B) Plot a normal distribution over the histogram

Code:

Output:

16. Use the carDF dataset to group it based on manufacturer and visualize it
using Lollipop chart

Code:

Output:

Data Transformation

17. Merging database-style dataframes


A. Create two dataframes for two subjects with student id and their scores in
the subject

Code:

B. Concatenate along an axis

Code:

Output:
C. Concatenate the dataframes using [Link] with an inner join

Code:

Output:

[Link] [Link]() method with a left join for cancatenation

Code:

Output:

E. Use [Link]() method with a right join for concatenation

Code:

Output:

F. Use [Link]() method with outer join for concatenation

Code:

Output:

G. Create two dataframes and merge them on index using inner and outer join

Code:

Output:

H. create a dataframe that records the rainfall, humidity, and wind conditions of five different countries.
pivot the columns into rows to produce a series. Rearrange the series into a dataframe Unstack the
concatenated dataframe

Code:

Output:

Transformation techniques

18. Perform data deduplication


A. Identify the rows that are duplicated

Code:

Output:

C. Add a new column and find duplicated items based on the second column

Code:

Output:

19. Replacing values

A. Replace one value with the other value


Code:

Output:

B. Replace multiple values at once

Code:

Output:

20. Handling missing data

A. Create a dataframe and add some missing values to it

Code:

Output:

B. Identify NaN values from the dataframe

Code:

Output:

C. Count the number of NaN values in each volumn

Code:

Output:

D. Find the total number of missing values

Code:

Output:

E. Count the number of valid values

Code:

Output:

21. Dropping missing values


A. Display the not null values of a specific column

Code:

Output:

B. Remove the null values of a specific column

Code:

Output:

C. Drop therows that have NaN values


Code:

Output:

D. Drop rows that have all NaN values

Code:

Output:

E. Drop columns that have all NaN values

Code:

Output:

F. Drop a column which has a minimum number of NaNs

Code:

Output:

22. Filling missing values

A. Replace NaN values with 0 and show that the replacement affects the mean value

Code:

Output:

B. Replace NaN values using forward-filling technique

Code:

Output:

C. Replace NaN values using backward-filling technique

Code:

Output:

D. Performs linear interpolation of missing values


Code:

Output:

REAL-TIME APPLICATIONS :

• Education: In the education industry, data visualization facilitates tracking student performance,
identifying learning outcomes, and informing pedagogical decisions. Analysis can include student
achievement, learning progress, and assessment results.

• Data Science: Data visualization is essential in the field of data science, enabling professionals to
extract insights from complex datasets and communicate findings effectively.
• Military: In the military sector, data visualization plays a critical role in enhancing decision-making
capabilities and situational awareness. Analyses can include intelligence data visualization,
operational analytics, and real-time tracking.

VIVA QUESTIONS:

RESULT:
Implementation of various operations on Numpy arrays, Pandas data frames and construction of basic plots
using Matplotlib were done successfully.

EX NO : 4 IMPLEMENTATION OF DATA CLEANING AND VISUALIZATION

AIM:
To implement data cleaning and visualization of data using R.

OBJECTIVES :
• To learn various operations supported by R
• To explore various variable and row filters in R for cleaning data
• To apply various plot features in R for data visualization

SOFTWARE REQUIRED:
Python - Numpy, Pandas, Matplotlib packages
DESCRIPTION (MAPPING THE THEORY):
R is a language and environment for statistical computing and graphics. One of R’s strengths is the ease with
which well-designed publication-quality plots can be produced, including mathematical symbols and
formulae. There are about eight packages supplied with the R distribution. There are more than 100 datasets
available in R, included in the datasets package. The function data() provides the list of available datasets. All
available datasets in R can be accessed by their explicit names.

Functions in dplyr package:


• select() - is used to pick specific variables or features of a DataFrame. It selects columns based on
provided conditions. The select() function takes a minus sign (-) before the column name to specify that the
column should be removed.

• rename() - changes the names of individual variables and columns.

• filter() - is used to produce a subset of the data frame, retaining all rows that satisfy the specified
conditions. The subset data frame has to be retained in a separate variable.

Plotting packages in R
• Base R - takes a canvas approach to plot by painting layer after layer of detail onto the graphics.

• ggplot2 - includes themes for personalizing charts. With the theme function components, the colours,
line types, typefaces, and alignment of the plot can be changed. Various options allow users to personalize
the graph by adding titles, subtitles, arrows, texts, or lines.

• Plotly - is an alternative graphing library to ggplot2. Plotly uses JavaScript to render the final
graphics, which provides several advantages for digital viewing. Plotly graphics automatically contain
interactive elements that allow users to modify, explore, and experience the visualized data.

IMPLEMENTATION:

1. Install the required packages

Code:

Output:

2. Load the required libraries to the R environment

Code:

Output:

3. Load the Iris Dataset

Code:

Output:

4. Perform Data Cleaning and Filtering by selecting, deleting,renaming and filtering columns of the dataset

Code:

Output:

5. Visualize the data using various plotting functions from base , ggplot2 and plotly packages
Code:

Output:

REAL-TIME APPLICATIONS :

• Education: providing charts for students to understand the concept


• Business meetings: -using posters to present statistics
• Video games: displaying scores or progress

VIVA QUESTIONS:

RESULT:
Implementation of various operations on Numpy arrays, Pandas data frames and construction of basic plots
using Matplotlib were done successfully.

EX NO : 5 IMPLEMENTATION OF TIME SERIES ANALYSIS AND VISUALIZATION

AIM:
To perform Time Series Analysis and visualize the dataset using different plots.

OBJECTIVES :
• To understand time series data
• To load time series data to dataframes using Pandas package
• To visualize the the time series data using plots from Matplotlib package

SOFTWARE REQUIRED:
Python - Numpy, Pandas, Matplotlib packages
DESCRIPTION (MAPPING THE THEORY):
Time series refers to a sequence of data points that are collected, recorded, or observed at regular intervals
over a specific period of time. In a time series, each data point is associated with a specific timestamp or time
period, which allows for the chronological organization of the data. The time series data could be visualized
using python. Analyzing and visualizing time series data series data plays a crucial role in gaining insights,
making predictions, and understanding the underlying dynamics of a system or process over time.

Line plot:The line plots can be used to show seasonality, which is the presence of variations that occur at
specific regular time intervals less than a year, such as weekly, monthly, or quarterly.

Resampling: It is a methodology of economically using a data sample to improve the accuracy and quantify
the uncertainty of a population parameter.

Differencing: It is used to make the difference in values of a specified interval. By default, it’s one, It is the
most popular method to remove trends in the data.

Trend In The Dataset: Trend helps to identify the point where the value of data is moves upward or
downward in the long run.

Shifting: It is used to plot the changes that occurred in data over time. The shift function is used to shift the
data before or after the specified time interval.

Box Plot: It is used to view the distribution of values in a specific column.

IMPLEMENTATION:

1. Import the required libraries

Code:

2. Load the Dataset

Code:

Output:

3. Drop Unwanted Columns

Code:

Output:

4. Construct line plot


(i) ‘Volume’ column data

Code:

Output:

(ii) All columns using subplot

Code:

Output:

5. Resample and Plot The Data


Code:

Output:

6. Implement Differencing to remove trends in the data

Code:

Output:

7. Plot the Changes in Data Shift

Code:

Output:

8. Plot the date on a specific time interval - 2021

Code:

Output:

9. Construct Box Plot for the data

Code:

Output:

10. Visualize the Trend In The Dataset

Code:

Output:

REAL-TIME APPLICATIONS :

• Temperature Data: Continuous temperature recordings collected at regular


intervals, such as hourly or daily measurements.
• Stock Market Data: Continuous data representing the prices or values of stocks,
which are recorded throughout trading hours.
• Sensor Data: Measurements from sensors that record continuous variables
like pressure, humidity, or air quality at frequent intervals.

VIVA QUESTIONS:

RESULT:
The analysis and visualization of time series data by constructing various plots using Matplotlib was
implemented successfully.
EX NO : 6 IMPLEMENTATION OF DATA ANALYSIS AND REPRESENTATION ON A MAP

AIM:
To implement data analysis and representation on a map using various map datasets with mouse rollover
effect and user interaction.

OBJECTIVES :
• To learn different operations supported by folium package
• To visualize data on a map using folium package
• To learn to implement different effects on map using folium package
SOFTWARE REQUIRED:
Python - Numpy, Pandas, Matplotlib, folium packages

DESCRIPTION (MAPPING THE THEORY):

Python’s Folium library allows to create interactive geographic visualizations of geospatial [Link] is
built on the data wrangling strengths of the Python ecosystem and the mapping strengths of the [Link]
(JavaScript) library. Simple manipulation of the data could be done in python, which are then visualized on a
leaflet map via Folium. Folium makes it easy to visualize data that’s been manipulated in Python, on an
interactive Leaflet map. This library has a number of built-in tilesets from OpenStreetMap, Mapbox etc.
Folium enables users to generate a base map of specified width and height with either default map tilesets
(i.e., map styles) or a custom tile set URL. The following tilesets are available by default with Folium:
OpenStreetMap, Mapbox Bright, Mapbox Control Room, Stamen, Cloudmade, Mapbox and CartoDB.
Folium also supports choropleth maps, which is a thematic map in which areas are shaded or patterned in
proportion to the measurement of the statistical variable being displayed on the map, such as population
density or per-capita income.

IMPLEMENTATION:

1. Import the libraries

Code:

2. Create a map

Code:

Output:

3. Load the crimes dataset

Code:

4. Extract the data from the dataset satisfying a specific condition

Code:

Output:

5. Mark the places on a map in which daytime robberies have occured

Code:

Output:
REAL-TIME APPLICATIONS :

• Fleet Tracking and Management:


Logistics companies can track the real-time location of their vehicles, optimize routes, and monitor delivery
progress. Each vehicle is represented by a marker on the map, and its status, speed, and direction can be
updated in real time.

• Emergency Response and Crisis Management:


Emergency services can use real-time mapping to coordinate responses during disasters or accidents. Display
of the locations of emergency vehicles, affected areas, and critical infrastructure on a map can facilitate
quick decision-making.

• Urban Planning and Infrastructure Management:


City planners can monitor real-time data on traffic, pedestrian flow, and energy consumption for
optimizing city infrastructure. Display of real-time information on a map, helps planners in making
informed decisions for urban development and resource allocation.

VIVA QUESTIONS:

RESULT:
Implementation of data analysis and representation on a map using various map datasets with mouse
rollover effect and user interaction has been implemented successfully.

EX NO:7 IMPLEMENTATION OF CARTOGRAPHIC

VISUALISATION AIM:
To perform cartographic visualization for multiple datasets using GeoPandas

OBJECTIVES :
• To learn different operations supported by GeoPandas package
• To construct maps using GeoPandas package
• To learn to create geomap of India and visualize the data over it using shapefiles

SOFTWARE REQUIRED:
Python - Numpy, Pandas, Matplotlib, GeoPandas, Seaborn, Shapefile packages
DESCRIPTION (MAPPING THE THEORY):
Data visualization provides insights of the data. Like bar charts, line graphs, and scatter plots, maps also help
to know the data better. GeoPandas is an open-source project to make working with geospatial data in
python easier. GeoPandas produces a tangible, visible output that is directly linked to the real world.

REQUIRED DATASET:

Download the file required from the link and unzip it. Keep all of the files in the same folder
1. Shape files of India map :

[Link]

2. Global landslide data :

[Link]
Analysis/blob/main/[Link]

3. State wise latitudes and longitudes of India:

[Link] Analysis/blob/main/state%20wise%20lat
%20and%[Link]

IMPLEMENTATION:

1. Install GeoPandas and Shapely

Code:

Output:

2. Import the libraries

Code:

Output:

3. Plot the Shapefiles

Code:

Output:
4. Plot a map of only landslides that have happened within India

Code:

Output:

5. Merge the state data which contains landslide information with map shapefile.

Code:

Output:

6. Plot the data on the Shapefile


Code:

Output:

7. Find the latitudes and longitudes of landslides that took place in India

Code:

Output:

8. Load the dataset with coordinates of Indian states

Code:

Output:

9. Perform required preprocessing

Code:

Output:

# Handling missng values

Code:

10. Plot the map of India showing where landslides have occured over the years

Code:

Output:

REAL-TIME APPLICATIONS :

• Wildland firefighting

Firefighters have been using sandbox environments to rapidly and physically model topography and fire for
wildfire incident command strategic planning.

• Forestry

Geovisualizers, working with European foresters, used CommonGIS and Visualization Toolkit to visualize
a large set of spatio-temporal data related to European forests, allowing the data to be explored by non-
experts over the Internet.

• Archaeology

Geovisualization provides archaeologists with a potential technique for mapping unearthed archaeological
environments as well as for accessing and exploring archaeological data in three dimensions.

VIVA QUESTIONS:

RESULT:
Implementation of cartographic visualization for multiple datasets using GeoPandas has been implemented
successfully.
EX NO:8 VISUALIZATION USING POWER BI

AIM:
To implement exploratory data analysis on wine quality dataset using PowerBI.

OBJECTIVES :
• To install and explore the features of power BI
• To learn to import dataset into power BI.
• To learn to generate descriptive analytics and visualize the data in power BI.

SOFTWARE REQUIRED:
Python - Numpy, Pandas, Matplotlib, folium packages
DESCRIPTION (MAPPING THE THEORY):

Exploratory Data Analysis (EDA) is a technique used to analyze and understand data by summarizing its
characteristics. EDA is a powerful method for identifying patterns and trends, detecting outliers, and
understanding the distribution of data. Power BI is a business intelligence tool developed by Microsoft
that allows users to create and share interactive reports and dashboards. It can help businesses gain
insights and make data-driven decisions. Power BI offers a range of features that can be used to explore
data, including data modeling, data visualization, and data analysis. It involves the following steps:

Step 1: Import Data


• Get Power BI: Install Power BI Desktop which can be downloaded from the official Power BI website.
• Load Data: Open Power BI Desktop and click on "Get Data" to load wine quality dataset. Common
formats include CSV, Excel, or direct database connections.

Step 2: Data Cleaning and Transformation


• Clean Data: Identify and handle missing values, duplicates, or outliers in the dataset.
• Transform Data: Use Power Query Editor to perform transformations such as renaming columns,
changing data types, or creating calculated columns.

Step 3: Data Exploration


• Build Visualizations: Create visualizations like scatter plots, bar charts, histograms, or box plots
to understand the distribution and relationships in the data.
• Create Measures: Define measures or calculated columns based on the analysis needs.

Step 4: Relationship Analysis


• Establish Relationships: If the dataset includes multiple tables, establish relationships between them
using the Relationship view.
• Create Hierarchies: Create hierarchies to explore data at different levels of granularity easily.

Step 5: Statistical Analysis


• Descriptive Statistics: Use Power BI visuals to compute descriptive statistics such as mean, median,
or standard deviation.
• Correlation Analysis: Utilize scatter plots or correlation matrices to explore relationships between
different variables.

Step 6: Dashboard Design


• Design Dashboard: Create a dashboard by arranging visuals, charts, and tables. Ensure that the layout
is intuitive and follows best practices for data visualization.
• Interactivity: Add slicers, filters, or drill-through options to make the dashboard interactive.

Step 7: Insights and Reporting


• Narrative Insights: Use Power BI's "Quick Insights" or add text boxes to provide narrative insights
or comments about your findings.
• Generate Reports: Create multiple reports within your Power BI file to present different aspects of
your analysis.

Step 8: Data Publishing and Sharing


• Save and Publish: Save the Power BI file and publish it to the Power BI service if it has to be shared
with others.
• Share and Collaborate: Share the dashboard with others, and collaborate on the Power BI service.

Step 9: Monitor and Update


• Monitoring: Monitor the dashboard for any changes in the data or new insights.
• Update as Needed: Regularly update the analysis and dashboard based on changing data or new
business requirements.

IMPLEMENTATION:
1. Install and open power BI tool
2. Load the wine quality dataset as a csv file into power BI
3. Perform descriptive and visualizations on the wine quality dataset

REAL-TIME APPLICATIONS :

• Quality Monitoring in Winemaking: Wineries can use real-time analysis to monitor and maintain
the quality of wine during the production process.

• Consumer Recommendations and Personalization: Online wine retailers or recommendation


platforms can provide personalized suggestions to consumers based on real-time analysis of their
preferences and current market trends.

• Health and Nutritional Insights: Providing health-conscious consumers with real-time information
about the nutritional content of wines.

VIVA QUESTIONS:

RESULT:
Implementation of exploratory data analysis on wine quality dataset using PowerBI has been implemented
successfully.

Common questions

Powered by AI

In the educational sector, data visualization is used to track student performance, learning outcomes, and progress, helping educators tailor instructional practices to improve educational strategies and outcomes. In the military sector, visualization aids in operational analytics and intelligence data presentation, enhancing situational awareness and decision-making abilities. By graphically representing critical data, these sectors can extract actionable insights, continuously refining tactics and pedagogy to meet evolving challenges and objectives .

Time series data visualization can be used to predict future trends by identifying patterns such as seasonality, trends, and cyclic behaviors in the data through methods like line plots, which visually depict data points over time. Box plots can be used to understand distribution and variations at specific intervals. Resampling and differencing techniques help in removing noise and making patterns more visible for accurate predictions. Plotting the data over specific time intervals allows analysts to observe consistent patterns that might indicate future occurrences .

Constructing an area plot to represent house loan mortgage costs involves plotting data points over a time interval to show variations and trends in costs. This method benefits financial analysis by clearly highlighting periods of high or low cost, enabling easy comparison over time. The cumulative nature of area plots makes them effective for observing overall trends and total consumption, allowing analysts to make data-driven predictions about future financial requirements or fluctuations and strategize accordingly .

Techniques for transforming data frames include data deduplication, value replacements, and handling missing data. These transformations have significant implications on data management. Deduplication ensures data accuracy by removing redundant entries, improving storage efficiency and reducing processing load. Replacing values can enhance data consistency, though it risks introducing errors if not correctly aligned with data semantics. Handling missing data involves decisions that affect analysis reliability, necessitating methods like imputation or deletion to maintain dataset integrity while highlighting the need for careful consideration of context and analytical goals .

Handling missing data involves various techniques such as filling missing values using methods like fillna() with specific values, performing backward or forward-filling, or using linear interpolation to estimate values. These methods can affect the analysis outcome by potentially introducing biases if not properly handled. Therefore, the chosen method should align with the nature of the data and the context of the analysis to maintain the integrity and accuracy of the results .

Merging dataframes using different join operations can significantly affect data integrity in analysis. Inner joins only include data with matching keys, potentially excluding important but unmatched information. Left and right joins retain all data from one dataframe, possibly introducing null values representing missing data in the merged result. Outer joins include all data from both frames, which can lead to large, complex datasets with many null values. Each method influences the completeness and accuracy of the result, thus impacting the conclusions drawn from the data .

Data visualization enhances understanding in various disciplines by allowing complex data sets to be represented graphically, which helps in identifying patterns and trends. In retail, for instance, visualization can depict sales trends, best-selling products, and buyer demographics, making it easier to strategize marketing efforts. In fraud detection, visualization helps in spotting anomalies and identifying suspicious activities through curated visual presentations of data, enabling quicker and more effective fraud detection measures .

When installing Python and R for data analysis and visualization, it is crucial to consider the version compatibility with your operating system, as different versions may have varying support for libraries used in data analysis and visualization tasks. Additionally, you should decide whether to install the latest version or an older version based on specific requirements for certain projects that may require older versions of packages or libraries .

Power BI offers several advantages for exploratory data analysis, such as the ability to easily import and transform data through a user-friendly interface. It supports the creation of interactive visualizations, which provide insights into data distributions and relationships. With features like descriptive statistics, correlation analysis, and comprehensive dashboard design, Power BI allows users to explore data at various levels and derive meaningful insights. It also facilitates the creation of reports that can be easily shared and updated for ongoing data analysis .

LibreOffice serves various practical applications in business and administrative tasks, such as document editing, spreadsheet analysis, and presentation preparation. Its suite of tools, including Writer, Calc, and Impress, enhances productivity by providing free and robust alternatives to commercial software, enabling tasks like financial modeling, data analysis, and report generation. Its open-source nature ensures broad compatibility with different formats, facilitating seamless collaboration and communication while reducing software costs [Hypothetical Scenario given the sources mentioned LibreOffice briefly for visualization; context from a logical extension informed by available data on Power BI and similar tools].

You might also like