Python Unit 1
Python Unit 1
The concept of data science is to help unify statistics, machine learning, data analysis,
and other related methods. That way people will better understand and analyze information with
data. It uses different theories and techniques that are drawn from different fields within the
context of computer science, information science, statistics, and mathematics.
Data science has popped in lots of different contexts over the past 30 years. but it didn‟t
become established until recent years. In the „60s it was referred with the term datalogy. In 2001
William Cleveland introduced data science as its own discipline.
The IEEE launched a Task Force on Data Science and Advanced Analytics in 2013. The
European Association for Data Science was established in Luxembourg in the same year. The
IEEE had their first international conference in 2014. Later in 2014, The Data Incubator then
created a data science fellowship.
Data is at the core of data science. There are troves of raw information that is being
streamed in and then stored in data warehouses. There is a lot to learn through mining it. There
are advanced capabilities that can be built from it. This means that data science is basically using
data in creative ways to add business value.
The main aspect of data science is discovering new results from data. People are
exploring at a granular level to understand and mine complex inferences, behaviors, and trends.
It‟s about uncovering hidden information that may be able to help companies make smarter
choices for their business.
For example:
o Data mines in Netflix are used to look for movie viewing patterns to better
understand user‟s interests and to make decisions on the Netflix series they should
produce.
o Target tries to find the major customer segments in its customer base and their
shopping behaviors, which helps them to guide messaging to other market groups.
o Proctor & Gamble looks towards time series models to help them to understand
future demand and plan production levels.
So how does the data scientist mine all this information? It begins with data exploration.
When a data scientist is given a challenging question, they become a detective. They will start to
investigate leads, and then try to understand characteristics or patterns in the data. This means
they need a lot of analytical creativity.
One of the classic examples of a data product is an engine which takes in user data, and then
creates a personalized recommendation based upon that data.
o The spam filter in Gmail is a data product. This is a behind the scenes algorithm
that processes the incoming mail and decides whether or not it is junk.
The computer vision that is used for self-driving cars is also a data product. Machine
learning algorithms can recognize pedestrians, traffic lights, other cars, and so on.
Data science and scientist add value to all businesses in many different ways.
---
Q. What is data science and its importance?
Q. explain how data science is used?
---
Advantages of data science:
The data science can be seen everywhere, from the information within your smart phone and
apps to the idea of a car that drives itself. This new modern phenomenon is why data scientists
are becoming more necessary.
When data science is brought into a business, it brings along with it several different benefits.
Among those are the following seven :
1. It will monetize data:
Facebook turns the data that they get from their subscribers into money, and so can any
business. For example, there are a lot of retailer sites that will show you a section that says,
“Customers Who Bought This Item Also Bought,” which will show items that is more likely to
provide them another sale. This type of creative analysis is what will allow a company to
increase its revenue.
When a non-technical boss asks a data scientist to figure out a data problem, the
description can end up being ambiguous at first. It becomes your job as the data
scientist, to change the task into a problem, figure out how you can solve it, and the
present your solution to the boss. This process uses several steps:
o Frame the problem: Who is the client? What are they asking you, exactly, to
solve? How are you able to translate the ambiguous request into a well-defined and
concrete problem?
o Collect the data that you need to solve the problem. Do you already have
access to this data? If you do, what parts of this data can help? If you don‟t, what
data do you need? What resources, such as infrastructure, time, and money, do you
need to get the data to a usable form?
o Process your data: Raw data is very rarely able to be used right out of the
box. There will be errors in the collection, missing values, corrupt records, and lots
of other challenges you have to take care of. You first have to clean the data to
change it into a form that you will be able to analyze.
o Explore the data: After you have the data cleaned, you need a high level of
understanding of the information that is contained in it. What are the obvious
correlations or trends that you see within the data? What high-level characteristics
does it have, and are there any of them that is more important than the other?
o Perform in-depth analysis: This is typically the core of the project. This is
where you use the machinery of data analysis to find the best predictions and
insights.
o Communicate the results of your analysis: All of the technical results and
analysis that you have found isn‟t very valuable unless you are able to explain it in
a way that is compelling and comprehensible. Data storytelling is a very underrated
and critical skill that a data scientist needs to use and build.
---
Q. Explain the process of Data Science
---
Responsibilities of a data scientist:
Different companies will have a different idea of data scientist tasks. There
are some businesses that will treat their data scientists like glorified data analysts,
or combine the duties with data engineering. There are others that need top-level
analytics experts that are skilled in intense data visualizations and machine
learning.
---
Q. Explain responsibilities of data scientist
---
Qualifications of data scientists:
There are three education options that you will need to look at when considering a
career in data science.
Burtch Works, in its salary report, found that 46% of data scientists have a
PhD and 88% have a master‟s degree. For the most part, these degrees are in
rigorous scientific, quantitative, or technical subjects which includes statistics and
math – 32%, engineering – 16%, and computer science – 19%.
Many companies are desperate to find candidates that have real-world skills.
If you have the technical knowledge, it could trump the preferred degree
requirements.
1) Technical skills:
o Cloud tools such as Amazon S3.
o Big data platforms such as Hive & Pig, and Hadoop.
o Python, Perl, Java, C/C++
o SQL databases, as well as database querying languages.
o SAS and R languages.
o Unstructured data techniques.
o Data visualization and reporting techniques.
o Data munging and cleaning.
o Data mining
o Software engineering skills
o Machine learning techniques and tools.
o Statistics
o Math
2) Business Skills:
Industry knowledge: It‟s important to understand how your chosen industry works
and how the data is utilized, collected, and analyzed. Intellectual curiosity: Data
Scientists have to explore new territories
and find unusual and creative ways to solve problems. Effective communication:
Data Scientists have to explain their discoveries and techniques to non-technical
and technical audiences in a way that they can understand.
---
Q. What are the qualifications required for a Data Scientists.
---
To figure out whether or not you would make a good data scientist, ask yourself
these questions:
o Are you interested in broadening your skills and taking on new challenges?
o Do you communicate well both visually and verbally?
o Do you enjoy problem-solving and individualized work?
o Are you interested in data analysis and collection?
o Do you have substantial work experience in the areas involved in data
science?
o Do you have a degree in marketing, management information systems,
computer science, statistics, or mathematics?
If you were able to answer yes to any of these questions, then you will probably
find a lot of enjoyment in data science.
It‟s important that data scientists have knowledge of statistics or math. It‟s also
important that they have a natural curiosity, such as critical thinking and creativity.
What are you able to do with the data? What undiscovered information is
hidden within the data? You need to have the ability to connect the dots and have a
desire to find the answers to these questions you haven‟t been asked if you notice
that there is data that is full of potential.
---
Q. how can one be a good data scientist?
Q. To become a good data scientist, what you should do?
---
Python was created by Guido van Rossum in the late 1980s and is an
interpreted high-level programming language that is typically used for general-
purpose programming.
Van Rossum wanted to create a small core language with a large library and
make an easy interpreter. His desire came from how frustrated he was with ABC,
which used a very different approach.