Python and Its Libraries in Data Science and Related Fields
Python and Its Libraries in Data Science and Related Fields
net/publication/347444225
CITATIONS READS
0 1,581
1 author:
SEE PROFILE
All content following this page was uploaded by Sai charan Marikala on 18 December 2020.
Abstract— This document talks about features, characteristics applications. Due to rise of integrated platforms and environments
and reasons behind python programming language and its modules there is a need for a good but clean simple syntax, flexible but
to become popular among data science and its application strong integration programming language. Python satisfies all these
developers compared to other programming languages like R. And qualities and it is also easy to learn. Let’s discuss some important
also their role and importance in implementing essential techniques characteristics of python.
to improve applications for solving real-world problems.
• Integrity:Python is a language known for its integration
Keywords—Python, Data Analytics, Data science, Artificial It can be integrated with a number of other programming
Intelligence, Machine learning, Numpy, Pandas, Skit-learn. languages such as C, C++, Java, CORBA etc., and with a
wide range and variety of Computer Science and
Machine Learning technologies like TensorFlow, Google
I. Introduction Cloud ML Engine, Amazon Machine Learning etc. Not
only python integrates with platforms and programming
In modern civilization of human beings, technology has language interfaces it also had a stack of libraries that
emerged and evolved as a powerful tool to solve modern day shows how strong is python integration capability.
problems and challenges. From the invention of computers, which
are initially used as computing devices for mathematics, have • Easy to learn:Python is developed by Guido Van
extended their compatibility with other machines and improved Rossum, with a intention of developing a programming
their capability to provide wide range of operations from a different language which can be easily learn and understood by
and numerous types of applications. This computing revolution students. Its syntax is just like a pseudocode very simple
forced every industry for an exponential growth by better to understand but elegant.
performance and quick improvements by overcoming the
challenges. • Obeject Oriented Programming:Python possess strong
Computer science sub fields such as data science which uses object oriented programming characteristics we can
statistics, probability and their related methods to analyse and create methods and attributes and then we can call them
understand the insights of data, Machine learning for exploratory to our functions. And it also provides high level data
data analysis and building model by training data, Artificial types which are dynamic and strong when compared to
Intelligence which is used to create intelligent systems, Deep other OOP languages like C++, Java etc.
learning which uses different layers in a network to predict etc.
These technologies has evolved as an essential need in the • Byte Code Compilation:Python source code is converted
technology industry to find solutions for ever challenging problems. to byte code and then is and then executed by the
Last decade witnessed a substantial and extraordinary amount of interpreter. So instead of CPU it uses virtual machine to
stored data. Growth of data in every industry including healthcare, execute its program, Which makes it a portable
automotive, manufacturing, finance, food processing etc., and then programming language that means written here can be
came a need to utilize this raw data for building and inventing best transferred to any system and can be executed.
new products and to renovate the existing ones, and also to improve
customer experience in their respective fields. To handle such • Data Structures: Python provides a range of data
amounts of data, there’s a need for mathematical tools like statistics, structures for both mutable and immutable data
integral calculus, differential calculus, probability etc., they play a structures such as array, String, tuple as mutable and list,
prominent role in understanding, interpreting and converting raw set and dictionary as immutable data structure. With the
data to information. use of these data structures we can easily organise and
Now comes a need for a good programming language which is perform operations on data.
powerful and flexible to implement the methods required to
develop data science applications, which is easy to use and popular
among the developers. Python is a high-level general purpose
programming language which had built-in data types such as lists, III. IMPORTANT OPERATIONS IN DATA SCIENCE AND
arrays etc. python source code is compiled to be byte code without RELATED FIELDS:
a need for separate compilation. In recent years, python with the
help of mathematical libraries such as Numpy, Pandas, Scipy and • Data Extraction:Data Science operations starts
Scikit-learn made python in reality for machine learning and deep with extracting raw data from real world, this raw
learning. data can be in any format, shape, size. Python
provides many libraries for extracting data from web
and real world machines such as requests,
II. WHY PYTHON? beautifulsoup, scrapy, pypdf.
Scientists and developers use complied languages like lisp,
C++, C for data analytics and for developing other scientific
• Data Processing: This operation involves steps to • Pandas: Pandas library is a most efficient tool for data
process raw data. Some important checks during this wrangling and manipulation operations, its data structures
process are checking for missing values, corrupted series and data frames acts as the building blocks for the data
values, timezone differences, date range errors. analysis. Series is a one-dimensional array same as a list in
Python provides Numpy and Pandas libraries for data python whereas data frames are multidimensional arrays.
processing. Using pandas we can load and read any type of data such as
Json, Ms excel files etc.
• Data Analytics:Once the data is cleaned and ready Initialization: import pandas as pd
for use, its important to know the insights of data. This command imports pandas library with a object pd.
The best way to know about data is through graphs, Reading a file:
graphs provide a overall meaning of the data. Python Df=pd.read(“filename”)
libraries pandas and matplotlib are powerful tools in Read function in pandas library loads all the files into the data
graph representation. frame.
REFERENCES
V. PYTHON IN DEEP LEARNING :
Deep learning is a form of unsupervised learning
which uses several features and representations. For performing [1] Nikita Pilnenskiy Ivan Smetannikov, “Modern
Implementations of Feature Selection Algorithms and Their
deep learning python supports a number of extensions, one of the Perspectives” ITMO University St.Petersburg, Russia
most import framework is keras framework. Keras supports a
[2] R. Tohid*, Bibek Wagle*, Shahrzad Shirzad*, Patrick Diehl*,
number of modules such as initializers, regularizes, constraints, Adrian Serio*, Alireza Kheirkhahan*, Parsa Amini*, Katy
activations, losses, metrics, optimizers etc. Williamst , Kate Isaacst , Kevin Huck+, Steven Brandt * and
With the help of keras, we can develop a wide range of state-of-art Hartmut Kaiser* Asynchronous Execution of Python Code on
applications such as audio/video recognitions, image recognition, Task-Based Runtime Systems. Louisiana State University, t
robotics etc. University of Arizona, + University of Oregon E-mail:
{mraste2, bwagle3, sshirzl, patrickdiehl, akheirl}@lsu.edu,
{hkaiser, aserio, sbrandt, parsa}@cct.lsu.edu,
VI. PYTHON FOR ARTIFICIAL NEURAL NETWORKS
There are a number of packages, modules, libraries for artificial [3] Abhinav Nagpal1 , Goldie Gabrani2,Python for Data
Analytics, Scientific and Technical Applications.
intelligence from python. neurolab is one such library with a
powerful neural network. Its core functionalities include single [4] Sanzu: A Data Science Benchmark R. Nicole, Alex Watson,
Deepigha Shree Vittal Babu, and Suprio Ray
layer neural network and multi-layer neural network. It extends
with Numpy, scipy, matplotlib libraries. [5] I. Stančin* and A. Jović * An overview and comparison of
free Python libraries for data mining and big data analysis
VII. DISADVANTAGES OF USING PYTHON:
Python is not only has a strong structure,
implementations and integrations there are also some
disadvantages associated with it. With accuracy there comes a cost
that is speed of the python programming language. It exhibits low
performance when compared to other complied programming
languages like C, C++, Go. This is because of the python modules
which takes much time for execution and requires large GPU’s for
better speed. This sometimes increases costs among developers
depending on the projects they are working on. There are also
some design issues caused due to restrictions caused due to its core
characteristics.
Python libraries are powerful in many areas but there are also
limitations for each library in their respective fields such as its
difficult to perform operations on missing values using Numpy and
also it requires contiguous memory for its selection and deletion
operations, this will become a problem when dealing with BigData
and distributed systems as the memory locations are shifting it
requires shifting. Pandas had a complex syntax and steep learning
curve and a bad documentation this halts the developers in
referencing and using the syntax. Technically its compatibility with
3d matrices is a major disadvantage.
Feature selection one of the main data pre-processing is not
covered properly by the python on the high dimensional platform.
This leads to overfitting of the data when dealing with neural
networks and big data.
VIII. CONCLUSION:
In this paper we have discussed about characteristics of python
programming language and the reasons behind python to become
the most popular language. We also discussed about various
python libraries and there functionalities on developing data
science applications and analysis. We discussed about the
disadvantages of using python in data science projects and
improvements required to meet future needs of the industry. we
also discussed about deep learning and artificial neural networks
and python libraries which support their functionality.
Machine learning is rapidly growing area and its sub branches such
as deep learning and neural networks are headed towards new
innovations and advancements. There is a need for every
technology to evolve to meet machine learning needs in the future,
this evolution process can be either by advancing the existing