Report Python
Report Python
Réaliser par :
Abderrahim Bertit
Encadré par :
Mr othman Alaoui fdili
Sommaire
Dedication ______________________________________________________________________ 2
Thanks _________________________________________________________________________ 3
INTRODUCTION __________________________________________________________________ 4
The Problem _____________________________________________________________________ 5
I. what is a data Analytics : ______________________________________________________ 6
A. Data Analytics ? ____________________________________________________________________ 6
B. Why is Data Analytics important ? ______________________________________________________ 6
C. Top Tools in Data Analytics ___________________________________________________________ 7
D. Types of Data Analysis: Techniques and Methods__________________________________________ 8
1. Text Analysis ____________________________________________________________________ 8
2. Statistical Analysis________________________________________________________________ 8
3. Diagnostic Analysis _______________________________________________________________ 8
4. Predictive Analysis _______________________________________________________________ 9
5. Prescriptive Analysis ______________________________________________________________ 9
E. Data Analysis Process________________________________________________________________ 9
II. Python For Data Analysis _____________________________________________________ 10
A. Definition ________________________________________________________________________ 10
B. Why Python for Data Analysis? _______________________________________________________ 10
C. Best Python Libraries _______________________________________________________________ 11
1. Matplotlib _____________________________________________________________________ 11
2. Numpy ________________________________________________________________________ 11
3. Pandas ________________________________________________________________________ 11
III. development environment and tools used _______________________________________ 12
A. Introduction ______________________________________________________________________ 12
B. What is Google Colab ? _____________________________________________________________ 12
C. Setting up your drive _______________________________________________________________ 12
IV. practical side : ______________________________________________________________ 16
A. Introduction: _____________________________________________________________________ 16
B. Getting the Dataset:________________________________________________________________ 16
C. Imports for downloading data set: ____________________________________________________ 16
D. Downloading datasets into respective data frames: _______________________________________ 17
E. Exploratory Analysis ________________________________________________________________ 18
V. Data Modelling & Analysing Coronavirus: The Morocco Focus _____________________ 23
A. Data Modelling and Prediction _______________________________________________________ 25
CONCLUSION __________________________________________________________________ 28
WEBOGRAPHY _________________________________________________________________ 29
1
Dedication
To my very dear mother, may she find here the homage of my gratitude which, however
great she
may be, will not be worthy of her sacrifices and her prayers for me.
To all my friends who are dear to me, to all those whom I love and who love me: that they
find here , the expression of my most devoted sentiments and my most sincere wishes.
May Almighty God preserve you all and bring you wisdom and happiness.
2
Thanks
First of all, I would like to thank the higher school of technology safi for opening my door
and giving me the opportunity to study in her.
Then, I would like to thank all the people who brought me their help throughout this Project
, namely all the people working in the " IT department" services.
Mr Othman alaoui fdili my supervisor during this project for their gentleness and hospitality.
Finally, I thank each and every one of those who participated in the completion of my final
year project.
3
INTRODUCTION
Data has been the buzzword for ages now. Either the data being generated from large-
scale enterprises or the data generated from an individual, each and every aspect of data
needs to be analyzed to benefit yourself from it. But how do we do it? Well, that’s where
the term ‘Data Analytics’ comes in. In this blog on ‘What is Data Analytics? ,you will get
4
The Problem
Coronaviruses are a large family of viruses that cause a variety of conditions ranging
from the common cold to more serious illnesses such as Middle East Respiratory
during this project we will see how this dessease devloopp with time and the country has
affected by cornavuris and some handling estimations, we will use python for data
5
I. What is a data Analytics :
A. Data Analytics ?
As the word suggests Data Analytics refers to the techniques to analyze data to enhance
productivity and business gain. Data is extracted from various sources and is cleaned and
categorized to analyze different behavioral patterns. The techniques and the tools used
vary according to the organization or individual.
So, in short, if you understand your Business Administration and have the capability to
perform Exploratory Data Analysis, to gather the required information, then you are good
to go with the career in Data Analytics.
So, now that you know what is Data Analytics, let me quickly cover the top tools used in
this field.
Gather Hidden Insights – Hidden insights from data are gathered and then analyzed
with respect to business requirements.
Generate Reports – Reports are generated from the data and are passed on to the
respective teams and individuals to deal with further actions for a high rise in
business.
Perform Market Analysis – Market Analysis can be performed to understand the
strengths and the weaknesses of competitors.
Now that you know the need of Data Analytics, let me quickly elaborate on what is Data
Analytics for you.
6
C. Top Tools in Data Analytics
With the increasing demand for Data Analytics in the market, many tools have emerged
with various functionalities for this purpose. Either open-source or user-friendly, the top
tools in the data analytics market are as follows.
R programming – This tool is the leading analytics tool used for statistics and data
modeling. R compiles and runs on various platforms such as UNIX, Windows, and
Mac OS. It also provides tools to automatically install all packages as per user-
requirement.
Python – Python is an open-source, object-oriented programming language which
is easy to read, write and maintain. It provides various machine learning and
visualization libraries such as Scikit-learn, TensorFlow, Matplotlib, Pandas, Keras
etc. It also can be assembled on any platform like SQL server, a MongoDB database
or JSON
Tableau Public – This is a free software that connects to any data source such as
Excel, corporate Data Warehouse etc. It then creates visualizations, maps,
dashboards etc with real-time updates on the web.
QlikView – This tool offers in-memory data processing with the results delivered
to the end-users quickly. It also offers data association and data visualization with
data being compressed to almost 10% of its original size.
SAS – A programming language and environment for data manipulation and
analytics, this tool is easily accessible and can analyze data from different sources.
Microsoft Excel – This tool is one of the most widely used tools for data analytics.
Mostly used for clients’ internal data, this tool analyzes the tasks that summarize
the data with a preview of pivot tables.
RapidMiner – A powerful, integrated platform that can integrate with any data
source types such as Access, Excel, Microsoft SQL, Tera data, Oracle, Sybase etc.
This tool is mostly used for predictive analytics, such as data mining, text
analytics, machine learning.
KNIME – Konstanz Information Miner (KNIME) is an open-source data analytics
platform, which allows you to analyze and model data. With the benefit of visual
programming, KNIME provides a platform for reporting and integration through its
modular data pipeline concept.
OpenRefine – Also known as GoogleRefine, this data cleaning software will help
you clean up data for analysis. It is used for cleaning messy data, the transformation
of data and parsing data from websites.
7
Apache Spark – One of the largest large-scale data processing engine, this tool
executes applications in Hadoop clusters 100 times faster in memory and 10 times
faster on disk. This tool is also popular for data pipelines and machine learning
model development.
Now, that you know all this about Data Analysis, let me tell you what you can become by
gaining knowledge about this field.
Well, you can become a well-renowned Data Analyst. Now, if you ask me Who is a Data
Analyst?, then my answer would be that a Data Analyst is a professional who can analyze
data by applying various tool and techniques and gathering the required insights.
Text Analysis
Statistical Analysis
Diagnostic Analysis
Predictive Analysis
Prescriptive Analysis
1. Text Analysis
2. Statistical Analysis
Statistical Analysis shows "What happen?" by using past data in the form of dashboards.
Statistical Analysis includes collection, Analysis, interpretation, presentation, and
modeling of data. It analyses a set of data or a sample of data. There are two categories
of this type of Analysis - Descriptive Analysis and Inferential Analysis.
3. Diagnostic Analysis
Diagnostic Analysis shows "Why did it happen?" by finding the cause from the insight
found in Statistical Analysis. This Analysis is useful to identify behavior patterns of data.
If a new problem arrives in your business process, then you can look into this Analysis to
find similar patterns of that problem. And it may have chances to use similar
prescriptions for the new problems.
8
4. Predictive Analysis
Predictive Analysis shows "what is likely to happen" by using previous data. The simplest
example is like if last year I bought two dresses based on my savings and if this year my
salary is increasing double then I can buy four dresses. But of course it's not easy like
this because you have to think about other circumstances like chances of prices of
clothes is increased this year or maybe instead of dresses you want to buy a new bike, or
you need to buy a house!
So here, this Analysis makes predictions about future outcomes based on current or past
data. Forecasting is just an estimate. Its accuracy is based on how much detailed
information you have and how much you dig in it.
5. Prescriptive Analysis
Prescriptive Analysis combines the insight from all previous Analysis to determine which
action to take in a current problem or decision. Most data-driven companies are utilizing
Prescriptive Analysis because predictive and descriptive Analysis are not enough to
improve data performance. Based on current situations and problems, they analyze the
data and make decisions.
9
II. Python For Data Analysis :
A. Definition
As, Python is one of the most flexible programming languages, hence, it is loved by the
data sciences. Also, people who want to enter the world of data sciences prefer Python
over a plenty of other programming languages. So, if the programmers want to try
something interesting and unique, they can do it with Python. The programmers can
even script applications and websites, on their and in creative ways if they want to.
Python is also one of the easiest language to master. The language is quite simple, and
highly readable as well. For the people who want to build a career in the field of data
science or data analysis prefer Python more than anything. The python programmers
won’t have to spend a lot of time on learning.
Python is one of the most valuable and interesting languages for data analysis.
Therefore, the popularity of Python is growing day by day, especially in the world of data
analysis or data sciences.
10
C. Best Python Libraries
1. Matplotlib
Matplotlib is a Python library that uses Python Script to write 2-dimensional graphs and
plots. Often mathematical or scientific applications require more than single axes in a
representation. This library helps us to build multiple plots at a time. You can, however,
use Matplotlib to manipulate different characteristics of figures as well.
2. Numpy
Numpy is a popular array – processing package of Python. It provides good support for
different dimensional array objects as well as for matrices. Numpy is not only confined to
providing arrays only, but it also provides a variety of tools to manage these arrays. It is
fast, efficient, and really good for managing matrice and arrays.
3. Pandas
pandas is an open-source library built on top of numpy providing high-performance,
easy-to-use data structures and data analysis tools for the Python programming
language. It allows for fast analysis and data cleaning and preparation. It excels in
performance and productivity. It can work with data from a wide variety of
sources. pandasis suited for many different kinds of data: tabular data, time-series data,
arbitrary matrix data with row and column labels, and Any other form of
observational/statistical data sets.
11
III. development environment and tools used :
A. Introduction
In any development project we need the scene where we will start our work and tools
what we will used, so during this project we will work with python like a programming
language for data science and for the platform we will apply our work is google collab.
Technically speaking, this step isn’t totally necessary if you want to just start working in
Colab. However, since Colab is working off of your drive, it’s not a bad idea to specify the
folder where you want to work. You can do that by going to your Google Drive and
clicking “New” and then creating a new folder.
12
If you want, while you’re already in your Google Drive you can create a new Colab
notebook. Just click “New” and drop the menu down to “More” and then select
“Colaboratory.”
Game on!
You can rename your notebook by clicking on the name of the notebook and
changing it or by dropping the “File” menu down to “Rename.”
13
It’s easy to create a new notebook by dropping “File” down to “New Python 3 Notebook.”
If you want to open something specific, drop the “File” menu down to “Open Notebook…”
14
Then you’ll see a screen that looks like this:
As you can see, you can open a recent file, files from your Google Drive, GitHub files, and
you can upload a notebook right there as well.
15
IV. Practical side :
A. Introduction:
in this part we will try to take a dataset and work with it and make some estimation and
plot somes graphe and we will try to make prediction for coronavirus for all coming days
16
D. Downloading datasets into respective data frames:
We will using pandas to cane import dataset directly into data-frames and I’m keeping
safe my dataset in some variables and I’m call the shape function to can get number of
line and columns.
17
E. Exploratory Analysis
We will try take a look for globally situation using sum function.
18
Viewing data on map
After running the code above we will have result like this.
19
Total Confirmed Coronavirus Cases (Globally).
In this part we will plot a graph for Total confirmed coronavirus Cases Globally, he will
help us to can see how this diseases develop with time and know the nature of curve ,
So after running this two cell we will get a results lik this.
The sharp exponential curve that can be seen on the right side of the graph shows the
devastating rate at which the pandemic is spreading worldwide.
20
Covid-19 Case Status
Here in this graph below I drew all coronavirus cases which mean confirmed cases and
recovered and death and active.
The sharp is exponential curve too for recovered cases and death and active cases that
can be seen on the right side of the graph which the pandemic is spreading worldwide
21
For Country Level Drill Down
In this part I will do a sorting for all countries we have in ours dataset and im sorting
them by confirmed cases.
So as you see the first country has a lot of confirmed cases is us and for the second we
have Spain until last one.
22
V. Data Modelling & Analysing Coronavirus: The
Morocco Focus
In this section, I will focus on the data points with respect to Morocco. For this, data
needed to be filtered out from each of the data-frame conditionally for Morocco. This can
be done as follows:
let’s goo to see how covid19 has spread across Morocco so far by plotting the four
Morocco specific time-series and annotating those with the events manually.
A more scientific way to look at the Morocco data or even the global data would be to
look at it on a Semi-Log scale. This is how the visualization would be on a semi-log scale.
This can be achieved by a small change in the y-axis setting (type = “log”).
23
After running this cell we get a results like this in the screen below:
24
A. Data Modelling and Prediction
It is not because the increase in the number of cases is exponential that we can adjust
the data to an exponential curve and predict the number of cases in the coming days.
Compartmental modeling techniques are normally used to model infectious diseases. The
same could be used in the case of COVID-19 too. The simplest compartmentalized model
is the SIR model.
25
Simulations for italy with Actual Data and Morocco
- For italy
It can be observed that the model looks like a good approximation.
Infected data & Infected curve are far
Recovered data & Recovered are also close.
Deths cases and Rocovred data are rise up
26
- For Morocco
It can be observed that the model looks like a good approximation.
Deaths cases also decrease
Recovered data & Recovered are also close .
Number of Recovered is growing up.
27
CONCLUSION
This project allowed us to put into practice the skills acquired during our training at
Thanks to this project, we have been able to see the various difficulties that will come up
to us, the attitudes to adapt in the future in the face of delicate situations in the field.
I admit that the success of our project is only the result of the valuable help provided by
our trainers, which allowed us to have a clearer idea on the main ideas to realize this
project.
Finally, all we have to do is thank all our trainers without exception, as well as all the
28
WEBOGRAPHY
4. https://www.lewuathe.com/COVID-19-dynamics-with-sir-model.html
5. https://github.com/Lewuathe/COVID19-SIR
6. https://medium.com/@tomaspueyo/coronavirus-act-today-or-people
29