0% found this document useful (0 votes)
13 views30 pages

Report Python

Uploaded by

berrtit.mohammed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
13 views30 pages

Report Python

Uploaded by

berrtit.mohammed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 30

Report Final Year Project

HIGHER SCHOOL OF TECHNOLOGY-SAFI

Cycle : Licence Professionnelle


2019-2020

PYTHON FOR DATA ANALYSTIS

Réaliser par :
Abderrahim Bertit
Encadré par :
Mr othman Alaoui fdili
Sommaire
Dedication ______________________________________________________________________ 2
Thanks _________________________________________________________________________ 3
INTRODUCTION __________________________________________________________________ 4
The Problem _____________________________________________________________________ 5
I. what is a data Analytics : ______________________________________________________ 6
A. Data Analytics ? ____________________________________________________________________ 6
B. Why is Data Analytics important ? ______________________________________________________ 6
C. Top Tools in Data Analytics ___________________________________________________________ 7
D. Types of Data Analysis: Techniques and Methods__________________________________________ 8
1. Text Analysis ____________________________________________________________________ 8
2. Statistical Analysis________________________________________________________________ 8
3. Diagnostic Analysis _______________________________________________________________ 8
4. Predictive Analysis _______________________________________________________________ 9
5. Prescriptive Analysis ______________________________________________________________ 9
E. Data Analysis Process________________________________________________________________ 9
II. Python For Data Analysis _____________________________________________________ 10
A. Definition ________________________________________________________________________ 10
B. Why Python for Data Analysis? _______________________________________________________ 10
C. Best Python Libraries _______________________________________________________________ 11
1. Matplotlib _____________________________________________________________________ 11
2. Numpy ________________________________________________________________________ 11
3. Pandas ________________________________________________________________________ 11
III. development environment and tools used _______________________________________ 12
A. Introduction ______________________________________________________________________ 12
B. What is Google Colab ? _____________________________________________________________ 12
C. Setting up your drive _______________________________________________________________ 12
IV. practical side : ______________________________________________________________ 16
A. Introduction: _____________________________________________________________________ 16
B. Getting the Dataset:________________________________________________________________ 16
C. Imports for downloading data set: ____________________________________________________ 16
D. Downloading datasets into respective data frames: _______________________________________ 17
E. Exploratory Analysis ________________________________________________________________ 18
V. Data Modelling & Analysing Coronavirus: The Morocco Focus _____________________ 23
A. Data Modelling and Prediction _______________________________________________________ 25
CONCLUSION __________________________________________________________________ 28
WEBOGRAPHY _________________________________________________________________ 29

1
Dedication

I dedicate this work:

To my father, my first supervisor since my birth.

To my very dear mother, may she find here the homage of my gratitude which, however
great she

may be, will not be worthy of her sacrifices and her prayers for me.

To my brothers and sisters, to whom I wish a lot of success and happiness.

To all my friends who are dear to me, to all those whom I love and who love me: that they
find here , the expression of my most devoted sentiments and my most sincere wishes.

May Almighty God preserve you all and bring you wisdom and happiness.

2
Thanks

Before any development on this professional experience, it seems appropriate to start my


final year project report with thanks to those who taught me a lot during this project,
and even to those who were kind

enough to make this project a very moment profitable.

First of all, I would like to thank the higher school of technology safi for opening my door
and giving me the opportunity to study in her.

Then, I would like to thank all the people who brought me their help throughout this Project
, namely all the people working in the " IT department" services.

I would like to give special thanks and tribute:

Mr Othman alaoui fdili my supervisor during this project for their gentleness and hospitality.

Finally, I thank each and every one of those who participated in the completion of my final
year project.

3
INTRODUCTION

Data has been the buzzword for ages now. Either the data being generated from large-

scale enterprises or the data generated from an individual, each and every aspect of data

needs to be analyzed to benefit yourself from it. But how do we do it? Well, that’s where

the term ‘Data Analytics’ comes in. In this blog on ‘What is Data Analytics? ,you will get

an insight of this term with a hands-on.

4
The Problem

Coronaviruses are a large family of viruses that cause a variety of conditions ranging

from the common cold to more serious illnesses such as Middle East Respiratory

Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS).

during this project we will see how this dessease devloopp with time and the country has

affected by cornavuris and some handling estimations, we will use python for data

sicense like a programing language and we will work on google collab.

5
I. What is a data Analytics :
A. Data Analytics ?
As the word suggests Data Analytics refers to the techniques to analyze data to enhance
productivity and business gain. Data is extracted from various sources and is cleaned and
categorized to analyze different behavioral patterns. The techniques and the tools used
vary according to the organization or individual.

So, in short, if you understand your Business Administration and have the capability to
perform Exploratory Data Analysis, to gather the required information, then you are good
to go with the career in Data Analytics.

So, now that you know what is Data Analytics, let me quickly cover the top tools used in
this field.

B. Why is Data Analytics important ?


As an enormous amount of data gets generated, the need to extract useful insights is a
must for a business enterprise. Data Analytics has a key role in improving your business.
Here are 4 main factors which signify the need for Data Analytics:

 Gather Hidden Insights – Hidden insights from data are gathered and then analyzed
with respect to business requirements.
 Generate Reports – Reports are generated from the data and are passed on to the
respective teams and individuals to deal with further actions for a high rise in
business.
 Perform Market Analysis – Market Analysis can be performed to understand the
strengths and the weaknesses of competitors.

Improve Business Requirement – Analysis of Data allows improving Business to customer


requirements and experience.

Now that you know the need of Data Analytics, let me quickly elaborate on what is Data
Analytics for you.

6
C. Top Tools in Data Analytics
With the increasing demand for Data Analytics in the market, many tools have emerged
with various functionalities for this purpose. Either open-source or user-friendly, the top
tools in the data analytics market are as follows.

 R programming – This tool is the leading analytics tool used for statistics and data
modeling. R compiles and runs on various platforms such as UNIX, Windows, and
Mac OS. It also provides tools to automatically install all packages as per user-
requirement.
 Python – Python is an open-source, object-oriented programming language which
is easy to read, write and maintain. It provides various machine learning and
visualization libraries such as Scikit-learn, TensorFlow, Matplotlib, Pandas, Keras
etc. It also can be assembled on any platform like SQL server, a MongoDB database
or JSON
 Tableau Public – This is a free software that connects to any data source such as
Excel, corporate Data Warehouse etc. It then creates visualizations, maps,
dashboards etc with real-time updates on the web.
 QlikView – This tool offers in-memory data processing with the results delivered
to the end-users quickly. It also offers data association and data visualization with
data being compressed to almost 10% of its original size.
 SAS – A programming language and environment for data manipulation and
analytics, this tool is easily accessible and can analyze data from different sources.
 Microsoft Excel – This tool is one of the most widely used tools for data analytics.
Mostly used for clients’ internal data, this tool analyzes the tasks that summarize
the data with a preview of pivot tables.
 RapidMiner – A powerful, integrated platform that can integrate with any data
source types such as Access, Excel, Microsoft SQL, Tera data, Oracle, Sybase etc.
This tool is mostly used for predictive analytics, such as data mining, text
analytics, machine learning.
 KNIME – Konstanz Information Miner (KNIME) is an open-source data analytics
platform, which allows you to analyze and model data. With the benefit of visual
programming, KNIME provides a platform for reporting and integration through its
modular data pipeline concept.
 OpenRefine – Also known as GoogleRefine, this data cleaning software will help
you clean up data for analysis. It is used for cleaning messy data, the transformation
of data and parsing data from websites.

7
 Apache Spark – One of the largest large-scale data processing engine, this tool
executes applications in Hadoop clusters 100 times faster in memory and 10 times
faster on disk. This tool is also popular for data pipelines and machine learning
model development.

Now, that you know all this about Data Analysis, let me tell you what you can become by
gaining knowledge about this field.

Well, you can become a well-renowned Data Analyst. Now, if you ask me Who is a Data
Analyst?, then my answer would be that a Data Analyst is a professional who can analyze
data by applying various tool and techniques and gathering the required insights.

D. Types of Data Analysis: Techniques and Methods


There are several types of data analysis techniques that exist based on business and
technology. The major types of data analysis are:

 Text Analysis
 Statistical Analysis
 Diagnostic Analysis
 Predictive Analysis
 Prescriptive Analysis

1. Text Analysis

Text Analysis is also referred to as Data Mining. It is a method to discover a pattern in


large data sets using databases or data mining tools. It used to transform raw data into
business information. Business Intelligence tools are present in the market which is used
to take strategic business decisions. Overall it offers a way to extract and examine data
and deriving patterns and finally interpretation of the data.

2. Statistical Analysis

Statistical Analysis shows "What happen?" by using past data in the form of dashboards.
Statistical Analysis includes collection, Analysis, interpretation, presentation, and
modeling of data. It analyses a set of data or a sample of data. There are two categories
of this type of Analysis - Descriptive Analysis and Inferential Analysis.

3. Diagnostic Analysis

Diagnostic Analysis shows "Why did it happen?" by finding the cause from the insight
found in Statistical Analysis. This Analysis is useful to identify behavior patterns of data.
If a new problem arrives in your business process, then you can look into this Analysis to
find similar patterns of that problem. And it may have chances to use similar
prescriptions for the new problems.

8
4. Predictive Analysis

Predictive Analysis shows "what is likely to happen" by using previous data. The simplest
example is like if last year I bought two dresses based on my savings and if this year my
salary is increasing double then I can buy four dresses. But of course it's not easy like
this because you have to think about other circumstances like chances of prices of
clothes is increased this year or maybe instead of dresses you want to buy a new bike, or
you need to buy a house!

So here, this Analysis makes predictions about future outcomes based on current or past
data. Forecasting is just an estimate. Its accuracy is based on how much detailed
information you have and how much you dig in it.

5. Prescriptive Analysis

Prescriptive Analysis combines the insight from all previous Analysis to determine which
action to take in a current problem or decision. Most data-driven companies are utilizing
Prescriptive Analysis because predictive and descriptive Analysis are not enough to
improve data performance. Based on current situations and problems, they analyze the
data and make decisions.

E. Data Analysis Process


Data Analysis Process is nothing but gathering information by using proper application or
tool which allows you to explore the data and find a pattern in it. Based on that, you can
take decisions, or you can get ultimate conclusions.

Data Analysis consists of the following phases:

 Data Requirement Gathering


 Data Collection
 Data Cleaning
 Data Analysis
 Data Interpretation
 Data Visualization

9
II. Python For Data Analysis :
A. Definition
As, Python is one of the most flexible programming languages, hence, it is loved by the
data sciences. Also, people who want to enter the world of data sciences prefer Python
over a plenty of other programming languages. So, if the programmers want to try
something interesting and unique, they can do it with Python. The programmers can
even script applications and websites, on their and in creative ways if they want to.
Python is also one of the easiest language to master. The language is quite simple, and
highly readable as well. For the people who want to build a career in the field of data
science or data analysis prefer Python more than anything. The python programmers
won’t have to spend a lot of time on learning.

Python is one of the most valuable and interesting languages for data analysis.
Therefore, the popularity of Python is growing day by day, especially in the world of data
analysis or data sciences.

B. Why Python for Data Analysis?


Python is an object-oriented, high-level and extremely interpreted programming
language. Also, it is known for dynamic semantics. Python is known worldwide for its
immense capabilities of Rapid Application Development, especially because of dynamic
binding and typing. Python is also used extensively for scripting, and it is even used as a
glue language to link the present existing components together. Also, Python is quite
versatile, and thus the popularity of the programming language is growing day by day.
Python jibes pretty well with data analysis as well, and therefore, it is touted as one of
the most preferred language for data science.

Python is also known as a general-purpose programming language. Though, it


emphasizes a lot on being readable. With the help of Python, the engineers are able to
use less lines of code to complete the tasks. Python is pretty quick, and there are many
libraries that make Python more preferred as well, like Matplotlib. And, many libraries are
used for scientific computing as well. Hence, the four major reasons that make Python a
perfect language of data science include, the fact that it is an open source programming
language. Apart from this, the features like Python is high on speed, and there is a lot of
support available for Python are a few of the other reasons that make Python a favorite
for many. In fact, people involved with data analysis also get a scope to try many
different things.

10
C. Best Python Libraries
1. Matplotlib
Matplotlib is a Python library that uses Python Script to write 2-dimensional graphs and
plots. Often mathematical or scientific applications require more than single axes in a
representation. This library helps us to build multiple plots at a time. You can, however,
use Matplotlib to manipulate different characteristics of figures as well.

2. Numpy
Numpy is a popular array – processing package of Python. It provides good support for
different dimensional array objects as well as for matrices. Numpy is not only confined to
providing arrays only, but it also provides a variety of tools to manage these arrays. It is
fast, efficient, and really good for managing matrice and arrays.

3. Pandas
pandas is an open-source library built on top of numpy providing high-performance,
easy-to-use data structures and data analysis tools for the Python programming
language. It allows for fast analysis and data cleaning and preparation. It excels in
performance and productivity. It can work with data from a wide variety of
sources. pandasis suited for many different kinds of data: tabular data, time-series data,
arbitrary matrix data with row and column labels, and Any other form of
observational/statistical data sets.

11
III. development environment and tools used :
A. Introduction
In any development project we need the scene where we will start our work and tools
what we will used, so during this project we will work with python like a programming
language for data science and for the platform we will apply our work is google collab.

B. What is Google Colab ?


Google Colab or Colaboratory is a cloud service, offered by Google (free), based on
Jupyter Notebook and intended for training and research in machine learning. This
platform allows training of Machine Learning models directly in the cloud. So without
having to install anything on our computer except a browser. Cool, right? Before
presenting this wonderful service, we will remind what a Jupyter Notebook is.

C. Setting up your drive


Create a folder for your notebooks

Technically speaking, this step isn’t totally necessary if you want to just start working in
Colab. However, since Colab is working off of your drive, it’s not a bad idea to specify the
folder where you want to work. You can do that by going to your Google Drive and
clicking “New” and then creating a new folder.

12
If you want, while you’re already in your Google Drive you can create a new Colab
notebook. Just click “New” and drop the menu down to “More” and then select
“Colaboratory.”

Game on!

You can rename your notebook by clicking on the name of the notebook and
changing it or by dropping the “File” menu down to “Rename.”

13
It’s easy to create a new notebook by dropping “File” down to “New Python 3 Notebook.”

If you want to open something specific, drop the “File” menu down to “Open Notebook…”

14
Then you’ll see a screen that looks like this:

As you can see, you can open a recent file, files from your Google Drive, GitHub files, and
you can upload a notebook right there as well.

15
IV. Practical side :
A. Introduction:
in this part we will try to take a dataset and work with it and make some estimation and
plot somes graphe and we will try to make prediction for coronavirus for all coming days

B. Getting the Dataset:


For The Dataset I used is provided by the John Hopkins University’s Center for Systems
Science and Engineering (JHU CSSE).
We will work with four Dataset like is shown below

C. Imports for downloading data set:


Here I import all libraries what I need which mean Pandas ,numpy ,ploty
.graph_objects,ploty.express and folium

16
D. Downloading datasets into respective data frames:
We will using pandas to cane import dataset directly into data-frames and I’m keeping
safe my dataset in some variables and I’m call the shape function to can get number of
line and columns.

Here is a view of the columns in the confirmed_cases data frame.

17
E. Exploratory Analysis

We will try take a look for globally situation using sum function.

- The 10 countries most affected by coronavirus


In this example below we will try to take a look on the most 10 countries affected by
coronavirus (confirmed cases).

18
Viewing data on map

After running the code above we will have result like this.

19
Total Confirmed Coronavirus Cases (Globally).

In this part we will plot a graph for Total confirmed coronavirus Cases Globally, he will
help us to can see how this diseases develop with time and know the nature of curve ,

So after running this two cell we will get a results lik this.

The sharp exponential curve that can be seen on the right side of the graph shows the
devastating rate at which the pandemic is spreading worldwide.

20
Covid-19 Case Status

Here in this graph below I drew all coronavirus cases which mean confirmed cases and
recovered and death and active.
The sharp is exponential curve too for recovered cases and death and active cases that
can be seen on the right side of the graph which the pandemic is spreading worldwide

21
For Country Level Drill Down
In this part I will do a sorting for all countries we have in ours dataset and im sorting
them by confirmed cases.

So as you see the first country has a lot of confirmed cases is us and for the second we
have Spain until last one.

22
V. Data Modelling & Analysing Coronavirus: The
Morocco Focus
In this section, I will focus on the data points with respect to Morocco. For this, data
needed to be filtered out from each of the data-frame conditionally for Morocco. This can
be done as follows:

let’s goo to see how covid19 has spread across Morocco so far by plotting the four
Morocco specific time-series and annotating those with the events manually.

A more scientific way to look at the Morocco data or even the global data would be to
look at it on a Semi-Log scale. This is how the visualization would be on a semi-log scale.
This can be achieved by a small change in the y-axis setting (type = “log”).

23
After running this cell we get a results like this in the screen below:

24
A. Data Modelling and Prediction

It is not because the increase in the number of cases is exponential that we can adjust
the data to an exponential curve and predict the number of cases in the coming days.
Compartmental modeling techniques are normally used to model infectious diseases. The
same could be used in the case of COVID-19 too. The simplest compartmentalized model
is the SIR model.

The predict function and train function defined as follows:

25
Simulations for italy with Actual Data and Morocco

- For italy
It can be observed that the model looks like a good approximation.
 Infected data & Infected curve are far
 Recovered data & Recovered are also close.
 Deths cases and Rocovred data are rise up

26
- For Morocco
It can be observed that the model looks like a good approximation.
 Deaths cases also decrease
 Recovered data & Recovered are also close .
 Number of Recovered is growing up.

27
CONCLUSION

This project allowed us to put into practice the skills acquired during our training at

caddy ayad University, as it also allowed us to familiarize ourselves with several

development and design tools.

Thanks to this project, we have been able to see the various difficulties that will come up

to us, the attitudes to adapt in the future in the face of delicate situations in the field.

I admit that the success of our project is only the result of the valuable help provided by

our trainers, which allowed us to have a clearer idea on the main ideas to realize this

project.

Finally, all we have to do is thank all our trainers without exception, as well as all the

staff of the caddy ayad-safi University.

28
WEBOGRAPHY

1. https://github.com: is an open source version management

2. https://www.kaggle.com: is a web platform organizing data science

3. https://scipython.com: is a web site for python and libraries.

4. https://www.lewuathe.com/COVID-19-dynamics-with-sir-model.html

5. https://github.com/Lewuathe/COVID19-SIR

6. https://medium.com/@tomaspueyo/coronavirus-act-today-or-people

29

You might also like