Python Highway 2 Books in 1 The Fastest Way For Beginners To Learn Python Programming, Data Science and Machine Learning in 3 Days (Or Less) + Practical Exercises Included by Cox, Aaron
Python Highway 2 Books in 1 The Fastest Way For Beginners To Learn Python Programming, Data Science and Machine Learning in 3 Days (Or Less) + Practical Exercises Included by Cox, Aaron
Aaron Cox
Book 1 Python for Beginners
That is quite literally all that you need. Before we go on into our very
first and start learning the essentials, there is but one more thing I
would like to clarify right away.
If you picked up a copy of this or are considering it, under the
impression that it will teach you all the basics about Python, good
choice! However, if you are of the idea that by the end of this, you
will turn out to be a fully trained professional with an understanding
of things like machine learning and other advanced Python fields,
please understand that this would fall outside the scope.
This is to serve as a guide, a crash course of a sort. To learn more
advanced methods and skills, you will first need to establish
command over all the basic elements and components of the
language. Once done, it is highly recommended to seek out that are
for advanced learning.
What I can recommend you to do is to continue practicing your
codes after you have completed. Unlike driving and swimming, which
you will remember for the rest of your life, even if you stop doing
them, Python continues to update itself. It is essential that you keep
yourself in practice and continue to code small programs like simple
calculator, number predictors, and so on. There are quite a few
exercises you can come across online.
For advanced courses, refer to Udemy. It is one of the finest sources
to gain access to some exceptional courses and learn new
dimensions of programming, amongst many other fields.
Phew! Now that this is out of the way, I shall give you a minute to flex
your muscles, adjust your seat, have a glass of water; we are ready
to begin our journey into the world of Python.
Chapter 1: What is Python
Python is a multi-purpose language created by Guido van Rossum.
The language boasts of a simple syntax that makes it easy for a new
learner to understand and use. This will introduce the basics of the
Python language. Stay tuned.
Let’s get started!
Python is described as a general-purpose language. It has many
applications and therefore, you can use it to accomplish many
different functions.
The syntax of a python language is clean and the length of the code
is short. Developers who have used Python at one point of their lives
will express how fun it was to code with Python. The beauty of
Python is that it offers you a chance to think more about the task at
hand instead of the language syntax.
Some history of Python
The design of the Python language started back in the 1980s and it
was first launched in February 1991.
Why was Python developed?
The reason why Guido Van Rossum embarked on the move to
design a new programming language is that he wanted a language
that could offer a simple syntax just like the ABC. This motivation led
to the development of a new language named Python.
But you may be wondering why just the name Python?
First, this language wasn’t named after the huge snake called
python. No! One of the interests of Rossum was watching comedy.
He was a great fan of the comedy series in the late seventies. As a
result, the name of the language was borrowed from the “Monty
Python’s Flying Circus.”
Properties of Python
Easy to learn – The syntax of Python is simple and beautiful.
Additionally, Python programmers enjoy writing its syntax than other
languages. Python simplifies the art of programming and allows the
developer to concentrate on the solution instead of the syntax. For a
newbie, this is a great choice to start your Python career.
Portability – When it comes to Python portability, it offers you the
ability to run Python on different platforms without making any
changes.
Python is described as a high-level language – In other words, you
don’t need to be scared of tedious tasks such as memory
management and so on. Alternatively, whenever you execute a
Python code, it will automatically change the language to a language
that your computer understands. No need to be worried about any
lower-level operations.
Object-oriented – Since it is an object-oriented language, it will allow
you to compute solutions for the most difficult problems. Object-
Oriented Programming makes it possible to divide a large problem
into smaller parts by building objects.
Has a huge standard library to compute common tasks – Python has
different standard libraries for the programmer to use. As a result,
you will not write all the lines of code yourself. Instead, you will only
import the library of the relevant code.
A Brief Application of Python
Web Applications
You develop a scalable Web application using CMS and frameworks
that are created on Python. Popular environments for developing
web applications include Pyramid, Django, Django CMS, and Phone.
Other popular websites like Instagram, Mozilla, and Reddit are
written in Python language.
Scientific and Numeric Calculations
There are different Python libraries designed for Scientific and
numeric calculations. Libraries such as NumPy and SciPy use
Python for general computing purpose. And, there are specially
designed libraries like AstroPy for Astronomy, and so on.
Additionally, the Python language is highly applied in data mining,
machine learning, and deep learning.
A great Language for Tech Programmers
The Python language is an important tool used to demonstrate
programming to newbies and children. It is a great language that has
important capabilities and features. However, it is one of the easiest
languages to learn because it has a simple syntax.
Building Software Prototypes
Compared to Java and C++, Python is a bit slow. It may not be a
great choice when resources are restricted and efficiency is made
compulsory.
But Python is a powerful language to build prototypes. For instance:
You can apply the Pygame library to develop the prototype of your
game first. If you enjoy the prototype, you can decide to use C++ to
develop the actual game.
Chapter 2: Why Python is the Easiest
Language to Learn
Python is an interpretive, object-oriented and dynamic data type
high-level programming language. Since the birth of Python
language in the early 1990s, it has gradually been widely used in
processing system management tasks and Web programming.
Especially with the continuous development of artificial intelligence,
Python has become one of the most popular programming
languages.
The first benefit that you will notice with the Python language is that
it is easy to learn. This language was developed with the beginner in
mind, in the hopes of bringing more people into coding. Some of the
traditional languages were hard and bulky, and unless you were
really passionate about some of the work that you were doing with
coding, you would probably decide to give up long before anything
was done. But with the Python language, things are a bit different.
This language as designed to be easy to learn and easy to read,
which helped make it possible for more people to get into the world
of coding.
Even though you will be pleasantly surprised by how easy it is to
learn about the Python language, you will also find that it is a
powerful language. Don’t let the simplicity of this language fool you;
it has enough power to get the work done, no matter how complex or
hard the problem is. Even though Python is able to handle some of
the basic coding needs that you have, it also has the power to help
you to do things like machine learning and data analysis. And if you
have spent any time working with these topics, and these ideas, you
know that they are not easy.
With this in mind, Python is also going to have a lot of extensions
and libraries that help it to work better. This is primarily how you will
be able to get Python to work with some of those more complex
tasks. You can add these simply by installing them to your computer
or system, and the Python language is ready to go when you are.
You can then handle algorithms, finish your data analysis, and so
much more. There are many Python data science libraries available
based on which step of the process you are working on at the time
Why is Python special?
There are hundreds of programming languages now available for
programmers to start with. However, according to statistics from a
survey done by Harvard computer scientists Python is a leading
language among beginners. We will discuss about some of the
reasons that make Python an understandable language for new
programmers.
Python has the following major advantages over other programming
languages:
(1) The grammar is concise and clear, and the code is highly
readable. Python's syntax requires mandatory indentation, which is
used to reflect the logical relationship between statements and
significantly improve the readability of the program.
(2) Because it is simple and clear, it is also a programming language
with high development efficiency.
(3) Python can be truly cross-platform, for example, the programs we
develop can run on Windows, Linux, macOS systems. This is its
portability advantage.
(4) It consists of A large number of rich libraries or extensions.
Python is often nicknamed glue language. It can easily connect
various modules written in other languages, especially C/C++. Using
these abundant third-party libraries, we can easily develop our
applications.
(5) The amount of code is small, which improves the software quality
to a certain extent. Since the amount of code written in Python is
much smaller than that in other languages, the probability of errors is
much smaller, which improves the quality of the software written to a
certain extent.
Python is very versatile and can be used in the following areas:
(1) web page development;
(2) Visual (GUI) interface development;
(3) Network (can be used for network programming);
(4) System programming;
(5) Data analysis;
(6) Machine learning (Python has various libraries to support it);
(7) Web crawlers (such as those used by Google);
(8) Scientific calculation (Python is used in many aspects of the
scientific calculation).
For example, Python is used in many Google services. YouTube is
also implemented in Python. The basic framework of the Wikipedia
Network initially is also implemented in Python.
How does python work?
Python Program Execution Principle is very simple. We all know that
programs written in compiled languages such as C/C++ need to be
converted from source files to machine languages used by
computers, and then binary executable files are formed after linking
by linkers. When running the program, you can load the binary
program from the hard disk into memory and run it.
However, for Python, Python source code does not need to be
compiled into binary code. It can run programs directly from the
source code. The Python interpreter converts the source code into
bytecode and then forwards the compiled bytecode to the Python
virtual machine (PVM) for execution.
When we run the Python program, the Python interpreter performs
two steps.
(1) Compiles Source Code into Byte Code
Compiled bytecode is a Python-specific expression. It is not a binary
machine code and needs further compilation before it can be
executed by the machine. This is also why Python code cannot run
as fast as C/C++.
If the Python process has to write permission on the machine, it will
save the bytecode of the program as a file with the extension .pyc. If
Python cannot write the bytecode on the machine, the bytecode will
be generated in memory and automatically discarded at the end of
the program. When building a program, it is best to give Python
permission to write on the computer, so as long as the source code
is unchanged, the generated .py file can be reused to improve the
execution efficiency.
(2) Forwarding the compiled bytecode to Python Virtual Machine
(PVM) for execution.
PVM is short for Python Virtual Machine. It is Python's running
engine and part of the Python system. It is a large loop that
iteratively runs bytecode instructions, completing operations one
after another.
In this process, every python program is executed and gives results
that can be further analyzed and tested to deploy as new
applications completely.
Chapter 3: Installing the Interpreter
Python has many free IDEs and environments available online. With
this variety of options, there are some programs which are better
than others. With their shortfalls in mind, the best software one can
use to practice their Python programming is PyCharm Community
Edition.
Python is a common programming language for application
development. Python design focuses on code readability and clear
programming for both small and big projects. You are able to run
modules and full application from a massive library of resources on
the server. Python works on various operating systems, such as
Windows. Installing Python on the Windows server is a
straightforward process of downloading the installer, and running it
on your server and configuring some adjustments can Python easier.
It is this software that I recommend to many of my students, although
Anaconda is another, I found quite useful. PyCharm won’t offer you
the extraordinary power and capabilities as professional software
will, but for beginners, it’s more than adequate.
With that in mind, we need only to download and install the software.
I will go through this process with you, step-by-step with pictures.
Step 1: Open your preferred internet browser, (Google Chrome,
Firefox, etc.), and search ‘PyCharm community edition’. You should
see page-link depicted in image 1.2 as your first result.
1.1: Searching PyCharm Community Edition
Step 6: Once you have started PyCharm up, you should see the
following as depicted in image 6.1. For the first startup, PyCharm
asks you to accept standard terms & conditions before you can use
the program.
You can read through these or not, but in order to continue, check
the box that states you have read and accepted the terms of this
user agreement. Once checked, click ‘Continue’.
6.1: Accepting User Agreement
Step 7: The box you should see is an option for most programming
software. The software developers ask if you allow the software to
send data on your usage to help in bug-fixing etc. For more details,
they allow you an option to read more about it.
You can choose to provide this information or not, you still have full
access to PyCharm.
7.1: Data Sharing Agreement
Step 8: We are in the final stages of this installation process. The
few steps are more preference steps than anything else. Once
completed, you are ready to move, where we will create a project for
coding in.
Choose a theme for your UI. I will be using Darcula, but you can use
whichever. Once selected, click ‘Next: Featured plugins.
8.1: Theme Choosing
Already written
the instruction that we want the program to execute, we only have to
press the "Enter" key and automatically the interpreter will translate
instruction by instruction and will not wait to receive another
additional instruction but executes once we press the "Enter" key.
Additional detail of the interpreter is that it can also be used from the
command prompt, which is also available on Windows, Linux and
Mac.
In order to use the interpreter from the command prompt, simply type
in the word Python and press the "Enter" key. This way, you start to
run the Python interpreter and we know that we are effectively in the
interpreter because, we are going to see the same header as we
saw before.
Now we can start to execute instructions written with Python:
--- print ("Hello world"), the interpreter is going to translate this line
and immediately shows us the result "Hello world".
Chapter 5: Variables and Operators
What are Variables?
A variable is nothing more than a reserved location in the memory, a
container if you like, where values are stored. The basic rules
relating to variables are:
Integers
Integers in Python is not different from what you were taught in math
class: a whole number or a number that possess no decimal points
or fractions. Numbers like 4, 9, 39, -5, and 1215 are all integers.
Integers can be stored in variables just by using the assignment
operator, as we have seen before.
Floats
Floats are numbers that possess decimal parts. This makes
numbers like -2.049, 12.78, 15.1, and 1.01 floats. The method of
creating a float instance in Python is the same as declaring an
integer: just choose a name for the variable and then use the
assignment operator.
String
While we’ve mainly dealt with numbers so far, Python can also
interpret and manipulate text data. Text data is referred to as a
“string,” and you can think of it as the letters that are strung together
in a word or series of words. To create an instance of a string in
Python, you can use either double quotes or single quotes.
string_1 = "This is a string."
string_2 = ‘This is also a string.’
However, while either double or single quotes can be used, it is
recommended that you use double quotes when possible. This is
because there may be times you need to nest quotes within quotes,
and using the traditional format of single quotes within double quotes
is the encouraged standard.
Something to keep in mind when using strings is that numerical
characters surrounded by quotes are treated as a string and not as a
number.
# The 97 here is a string
Stringy = "97"
# Here it is a number
Numerical = 97
List
Lists are just collections of data. When you think about a list in
regular life, you often think of a grocery list or to-do list. These lists
are just collections of items, and that’s precisely what lists in Python
are; collections of items. Lists are convenient because they offer
quick and easy storage and retrieval of items.
Let’s say we have a bunch of values that we need to access in our
program. We could declare separate variables for all those values, or
we could store them all in a single variable as a list. Declaring a list
is as simple as using brackets and separating objects in the list with
commas. So, if we wanted to declare a list of fruits, we could do that
by doing the following:
Fruits = ["apple", "pear", "orange", "banana"]
It’s also possible to declare an empty list by just using empty
brackets. You can add items to the list with a specific function, the
append function - append (). We can access the items in the list
individually by specifying the position of the item that we want.
Remember, Python is zero-based, and so to get the first item is 0 in
the list. How do we select the values from a list? We just declare a
variable that references that specific value and position:
Apple = fruits [ 0]
Tuple
Tuples are very similar to lists, but unlike lists, their contents cannot
be modified once they are created. The items that exist in the tuple
when created will exist for as long as the tuple exists. If it’s unclear
as to when tuples would be useful, they would be helpful whenever
you have a list of items that will never change. For example,
consider the days of the week. A list containing all the days of the
week won’t change. In practice, you are likely to use tuples far less
often than you will use lists, but it’s good to be aware of the
existence of tuples.
Functionally, tuples are declared and accessed very similarly to lists.
The major difference is that when a list is created, parentheses are
used instead of brackets.
This_is_a_tuple = ("these", "are", "values", "in", "a", "tuple")
The items can be accessed with brackets, just like a list.
Word = this_is_a_tuple [ 0]
Dictionary
Dictionaries hold data that can be retrieved with reference items, or
keys. Dictionaries can be confusing for first-time programmers but try
to imagine a bank filled with a number of safety deposit boxes. There
are rows and rows of these boxes, and the contents of each box can
only be accessed when the correct key is provided. Much like
opening a deposit box, the correct key must be provided to retrieve
the value within the dictionary. In other words, dictionaries contain
pairs of keys and the value that can be accessed with those keys.
When you declare a dictionary, you must provide both the data and
the key that will point to that data. These key-value pairs must be
unique. Evidently, it would be a problem if one key could open
multiple boxes, so keys in a dictionary cannot be repeated; you
cannot have two keys, both named “Key1”.
The syntax for creating a key in Python is curly braces containing the
key on the left side and the value on the right side, separated by a
colon. To demonstrate, here’s an example of a dictionary:
Dict_example = {"key1": 39}
If you want to create a dictionary with multiple items, all you need to
do is separate the items with commas.
Dict_example2 = {"key1": 39, "key2": 21, "key3": 54}
Dictionaries can also be declared by using the dict () method. You
could create the same dictionary as above by-passing keys and their
values using the assignment operator and still separating them with
commas.
Dict_example3 = duct (key1 = 39, key2 = 21, key3 = 54)
Note that this method uses parentheses instead of curly braces and
doesn’t use quotes.
To access items within the dictionary, you need to supply the
appropriate key. The syntax for this in Python is dictionary[‘key’], so
in order to get 39 from the dictionary above, you would use this
syntax:
number = Dict_example3["key1"]
Since the syntax above selects the value associated with the passed
key, you might be able to guess that we can overwrite the data by
selecting the value we want and using an assignment operator.
Dict_example3["key1"] = 99
Much like how it is possible to create an empty list with just an empty
pair of parentheses, we can also create an empty dictionary by using
empty brackets when we declare the dictionary.
Dict_example4 = {}
To add data to a dictionary, all we need to do is create a new
dictionary entry and assign a value to it.
Dict_example4["key1"] = 109
To drop values from the dictionary, we use the del command
followed by the dictionary and the key we want to drop.
del Dict_example4["key1"]
Chapter 7: Making your program Interactive
Input0
When writing your program or creating an application, you may
require the users to enter an input such as their username and other
details. Python provides the input () function that helps you get and
process input from users. Other than entering input, you may require
the users to perform an action so that they may go to the next step.
For example, you may need them to press the enter key on the
keyboard to be taken to the next step.
Example:
input ("\n\n Press Enter key to Leave.")
Just type the above statement on the interactive Python interpreter
then hit the Enter key on the keyboard. You will be prompted to
press the Enter key:
The program waits for an action from the user to proceed to the next
step. Notice the use of \n\n which is characters to create a new line.
To create one line, we use a single one, that is, \n. In this case, two
blank lines will be created. That is how Python input () function
works.
Print ()
Python comes with many in-built functions. A good example of such
a function is the “print ()” function which we use for displaying the
contents on the screen. Despite this, it is possible for us to create
our own functions in Python. Such functions are referred to as the
“user-defined functions”.
#!/usr/bin/python3
def functionExample():
print ('The function code to run')
bz = 10 + 23
print(bz)
Triple Quotes
Before we move into triple quotes, keep in mind that you can also
create a string like 'this'.
>>> 'this''this'
We can create the string, but we really don't have to do anything with
it. The interpreter will tell you what it is. Next enter the following into
the interpreter. (Enter 3 single quotes before and after). Warning a
double quote and single quote together will give a different error.
>>> '''line 1... line 2''''line 1\nline 2'
Notice the (\n) fora new line. Try it with print in front of it and it will put
the string in two separate lines.
>>> print ('''line 1 ... line 2''') line 1 line 2
If you enter '\n' it will display the following string on a new line..
>>> Print('I\ngo')I go
The results will be the same with '>>>'. Now a raw string with some
slight changes. Try it at the interpreter.
>>> r'string''string' >>> r'string\string2''string\\string2' >>>
r"string\string2" 'string\\string2' >>> print("string\string2")string\string2
>>> r"""string\string2""" 'string\\string2' >>> print(r"""string\string2""")
string\string2
The last example is to show that you can test your expressions as
your scripts progress towards complexity. The string examples
above will become more clear as you progress.
Escape characters
Have a look at the following code:
print(“\tHi there”)
output:
Hi there #tabbed to the right
There is no space between the text and the escape character \thiya
Here are some of the most regularly used escape characters in
Python.
Escape Description
character New Line
\n Horizontal
\t Tab
\\ Backslash
\’ Single quote
\” Double
Quote
Let’s have a look at the some more escape characters.
The following sentences would result in an error when printed:
In the first example Python thinks that the inverted comma before
hello is the end of the string. The third inverted comma would cause
the program to crash. Likewise, in the second sentence Python
would take the back quote on the word he’d to mean the end of the
string and throw an error when it encounters the third comma. One
solution is to use single quotes when you intend using inverted
commas in the string:
print ( ‘I said “hello mate” and he totally ignored me’)
And use double quote if you intend using a lot of back quotes such
as he’d, there’s etc. in your string:
print ( " He said he’d be there at 2pm but there’s no sign of him")
Now what if you want just to print a backslash \ in Python? Yes, you
also must escape it.
If Statements
The if statement serves as a means of taking control of how a
statement that follows it is executed — in this case, a block of code
or single statement contained in braces. The if statement evaluates
the expression contained in parentheses. Should the expression
result in a value considered to be true, the execution process is
initiated? If not, the whole statement is abandoned. Doing this allows
your PHP script to make decisions on its own based on a range of
factors selected.
Syntax:
if ( expression ) {
// code to run if the expression outputs as true
}
Sample:
The following code would display x is greater than y if $x is greater
than $y:
<?php
$x=5;
$y=2;
if ($x > $y)
echo "x is bigger than y";
?>
Inline If
This statement is used alongside the if…else statement during the
execution of a series of codes, should one of a variety of conditions
be true. As the name connotes, the elseif statement is a mixture of
both if and else statements. As with the else statement, the elseif
statement extends the if statement to run another statement in the
event that the main if the expression is evaluated as FALSE. Albeit,
contrary to the else statement, the else if statement runs the
alternative expression only when the assigned conditional
expression is evaluated to be TRUE. So, put simply, whenever you
wish to run a set of code when one of many different conditions
evaluate to true, the else if statement should be used.
Syntax
if (condition)
code to be run if the condition evaluates to true;
elseif (condition)
code to be run if the condition evaluates to true;
else
code to be run if the condition evaluates to false;
Sample:
The sample shown below produces “Good morning. Rise and shine!”
if the period of the day is Morning, and “Good night! Sleep well.”
when it is night. Otherwise, it produces “Have a great day!”
<html>
<body>
<?php
$t = time("T");
if ($t == "Morn")
echo "Good morning. Rise and shine!";
elseif ($t == "Ngt")
echo "Good night! Sleep well.";
else
echo "Have a great day!";
?>
</body>
</html>
When executed, the result shown below will be outputted:
Good morning. Rise and shine!
While Loop
This type of loop runs a specific block of code for as long as the
given condition remains true. Once the given condition is no longer
valid, or turns to false, the block of code will end right away.
This is quite a useful feature as there may be codes that you may
need to rely on to process information quickly. To give you an idea,
suppose, you are to guess a number. You have three tries. You want
the prompt to ask the user to guess the number. Once the user
guesses the wrong number, it will reduce the maximum number of
tries from three to two, inform the user that the number is wrong and
then ask to guess another time. This will continue until either the
user guesses the right number or the set number of guesses are
utilized, and the user fails to identify the number.
Imagine just how many times you would have to write the code over
and over again. Now, thanks to Python, we just type it once
underneath the ‘while’ loop, and the rest is made for us.
Here’s how the syntax for the ‘while’ loop looks like:
while condition:
code
code
…
You begin by typing in the word ‘while’ followed by the condition. We
then add a colon, just like we did for the ‘if’ statement. This means,
whatever will follow afterward, it will be indented to show that the
same is working underneath the loop or the statement.
Let us create a simple example from this. We start by creating a
variable. Let’s give this variable a name and a value like so:
x=0
For Loop
In Python, the for…in statement is a looping statement that allows
users to iterate over a sequence of objects. That is, it is used to go
through every item that makes up a sequence. Take note that a
sequence refers to an ordered set of items. Let’s consider the same
code sample used for the if statement. This time, though, save the
file by the name “for. py”:
for x in range(1, 7):
print(x)
else:
print(‘The for loop is complete)
Output:
$ python for. py
1
2
3
4
5
6
The for loop is completed
How the for statement Works:
In the code sample used above, we attempt to print out a sequence
of numbers. This sequence of numbers is generated with the help of
a built-in “range” function. What we do at this point is to enter two
numbers into the program, and the “range” function returns a
sequence of numbers beginning from the initial number up to the
second one. For instance, range (1,7) produces the sequence (1, 2,
3, 4, 5, 6). In a default state, range assumes a step count of 1. If we
add a third number into the range, then it automatically takes the
place of the default step count. Take, for instance, range (1,7,2)
produces the sequence [1,3,5]. Take note that the range reaches up
to the second number, but does not include the second number
itself. So, the second number serves as a boundary the range never
reaches or exceeds. Keep in mind that the range() function only
generates one number per time. So, if you need a full set of numbers
at any point, use the list() on the range() function. For instance:
list(range(7)) will result in the sequence [0, 1, 2, 3, 4, 5, 6].
Moving on, the for loop steps in and begins iteration over the range
— for x in range(1,7) is the same as for x in [1, 2, 3, 4, 5, 6]. This
case is also similar to assigning each object or number in the
sequence to x, one per time, and then running the clock of code for
every value of x. At this point, we go straight to printing the values
within the block of code. Recall that the else of the code remains
optional. So, when it is introduced, it is only ever executed after the
for loop has been entirely executed, or until a break statement is
used. Also, recall that for in loops work on all sequences. At this
point, there is a sequence of numbers produced from executing the
range function. However, it is possible to use still any other
sequence containing any type of object.
Break
The break statement in Python is applied as a breakout strategy
from a loop statement. That is, it is used to stop the running of a loop
statement, even when the condition for looping remains True, and
the sequence of objects has not undergone complete iteration. A
point worth noting is that when you apply the break statement to a
while or for loop, any other alternative loop, such as the else or elif
block, remains unexecuted.
Let’s consider the same code sample used for the if statement. Save
the file by the name “break. py”:
while True:
m = input('Enter something : ')
if m == 'quit':
break
print('Length of the string is', len(m))
print('Completed')
When the code is executed, the result is as follows:
$ python break. py
Enter something: Python is easy to learn
Length of the string is 23
Enter something: When my work is over
Length of the string is 20
Enter something: You could make your work fun:
Length of the string is 29
Enter something: Hello, World!
Length of the string is 13
Enter something: quit
Completed
Continue
In Python, the continue statement is used to inform the program to
skip the remainder of the statements yet unexecuted in the present
loop block and continue to the following loop iteration. Let’s consider
a sample code of the continue statement in use. Save the file as
continue. py.
while True:
j = input(‘Write something : ')
if j == 'quit':
break
if lensj) <5:
print(‘Entry is too small')
continue
print('Entry is of sufficient length')
# Process other type of things here...
When the code sample above is executed, the result is as follows:
$ python continue. py
Enter something: x
Entry is too small
Enter something: 515
Entry is too small
Write something: vwxyz
Entry is of sufficient length
Write something: quit
Scientific Distributions
As you can see in the previous section, building your working
environment can be somewhat time-consuming. After installing
Python, you need to choose the packages you need for your project
and install them one at a time. Installing many different packages
and tools can lead to failed installations and errors. This can often
result in a massive loss of time for an aspiring data scientist who
doesn't fully understand the subtleties behind certain errors. Finding
solutions to them isn't always straightforward. This is why you have
the option of directly downloading and installing a scientific
distribution.
Automatically building and setting up your environment can save you
from spending time and frustration on installations and allow you to
jump straight in. A scientific distribution usually contains all the
libraries you need, an Integrated Development Environment (IDE),
and various tools. Let’s discuss the most popular distributions and
their application.
Anaconda
Virtual Environments
Virtual environments are often necessary because you are usually
locked to the version of Python you installed. It doesn’t matter
whether you installed everything manually or you chose to use a
distribution - you can’t have as many installations on the same
machine as you might want. The only exception will be if you are
using the WinPython distribution, which is available only for Windows
machines, because it allows you to prepare as many installations as
you want. However, you can create a virtual environment with the
"virtualenv". Create as many different installations as you need
without worrying about any kind of limitations. Here are a few solid
reasons why you should choose a virtual environment:
Testing grounds: It allows you to create a special
environment where you can experiment with different
libraries, modules, Python versions, and so on. This way, you
can test anything you can think of without causing any
irreversible damage.
Different versions: There are cases when you need multiple
installations of Python on your computer. There are packages
and tools, for instance, that only work with a certain version.
For instance, if you are running Windows, there are a few
useful packages that will only behave correctly if you are
running Python 3.4, which isn’t the most recent update.
Through a virtual environment, you can run different version
of Python for separate goals.
Replicability: Use a virtual environment to make sure you can
run your project on any other computer or version of Python,
aside from the one you were originally using. You might be
required to run your prototype on a certain operating system
or Python installation, instead of the one you are using on
your own computer. With the help of a virtual environment,
you can easily replicate your project and see if it runs under
different circumstances.
Once you make all the above decisions, you can finally create a new
environment. Type the following command:
virtualenv myenv
This instruction will create a new directory called “myenv” inside the
location, or directory, where you currently are. Once the virtual
environment is created, you need to launch it by typing these lines:
cd myenv
activate
Necessary Packages
We discussed earlier that the advantages of using Python for data
science are its system compatibility and highly developed system of
packages. An aspiring data scientist will require a diverse set of tools
for their projects. The analytical packages we are going to talk about
have been highly polished and thoroughly tested over the years, and
therefore are used by the majority of data scientists, analysts, and
engineers.
Here are the most important packages you will need to install for
most of your work:
NumPy: This analytical library provides the user with support
for multi-dimensional arrays, including the mathematical
algorithms needed to operate on them. Arrays are used for
storing data, as well as for fast matrix operations that are
much needed to work out many data science problems.
Python wasn't meant for numerical computing. Therefore
every data scientist needs a package like NumPy to extend
the programming language to include the use of many high-
level mathematical functions. Install this tool by typing the
following command: pip install numpy.
SciPy: You can't read about NumPy without hearing about
SciPy. Why? Because the two complement each other. SciPy
is needed to enable the use of algorithms for image
processing, linear algebra, matrices, and more. Install this
tool by typing the following command: pip install scipy.
pandas: This library is needed mostly for handling diverse
data tables. Install pandas to be able to load data from any
source and manipulate as needed. Install this tool by typing
the following command: pip install pandas.
Scikit-learn: A much-needed tool for data science and
machine learning, Scikit is probably the most important
package in your toolkit. It is required for data preprocessing;
error metrics supervised and unsupervised learning, and
much more. Install this tool by typing the following command:
pip install scikit-learn.
Matplotlib: This package contains everything you need to
build plots from an array. You also have the ability to visualize
them interactively. You don’t happen to know what a plot is? It
is a graph used in statistics and data analysis to display the
relation between variables. This makes Matplotlib an
indispensable library for Python. Install this tool by typing the
following command: pip install matplotlib.
Jupyter: No data scientist is complete without Jupyter. This
package is essentially an IDE (though much more) used in
data science and machine learning everywhere. Unlike IDEs
such as Atom, or R Studio, Jupyter can be used with any
programming language. It is both powerful and versatile
because it provides the user with the ability to perform data
visualization in the same environment, and allows
customizable commands. Not only that, it also promotes
collaboration due to its streamlined method of sharing
documents. Install this tool by typing the following command:
pip install jupyter.
Beautiful Soup: Extract information from HTML and XML files
that you have access to online. Install this tool by typing the
following command: pip install beautifulsoup4.
For now, these seven packages should be enough to get you started
and give you an idea of how to extend Python's abilities. You don't
have to overwhelm yourself just yet by installing all of them,
however, feel free to explore and experiment on your own. We will
mention and discuss more packages later in the book as needed to
solve our data science problems. But for now, we need to focus
more on Jupyter, because it will be used throughout the book. So
let’s go through the installation, special commands, and learn how
this tool can help you as an aspiring data scientist.
Using Jupyter
Throughout this book, we will use Jupyter to illustrate various
operations we perform and their results. If you didn’t install it yet,
let’s start by typing the following command:
pip install jupyter
The installation itself is straightforward. Simply follow the steps and
instruction you receive during the setup process. Just make sure to
download the correct installer first. Once the setup finishes, we can
run the program by typing the next line:
jupyter notebook
This will open an instance of Jupyter inside your browser. Next, click
on “New” and select the version of Python you are running. As
mentioned earlier, we are going to focus on Python 3. Now you will
see an empty window where you can type your commands.
You might notice that Jupyter uses code cell blocks instead of
looking like a regular text editor. That’s because the program will
execute code cell by cell. This allows you to test and experiment with
parts of your code instead of your entire program. With that being
said, let’s give it a test run and type the following line inside the cell:
In: print (“I’m running a test!”)
Now you can click on the play button that is located under the Cell
tab. This will run your code and give you output, and then a new
input cell will appear. You can also create more cells by hitting the
plus button in the menu. To make it clearer, a typical block looks
something like this:
In: < This is where you type your code >
Out: < This is the output you will receive >
The idea is to type your code inside the "In" section and then run it.
You can optionally type in the result you expect to receive inside the
"Out" section, and when you run the code, you will see another "Out"
section that displays the true result. This way, you can also test to
see if the code gives you the result you expect.
Chapter 14: Python Libraries to Help with
Data Science
Python is one of the best coding languages that you are able to work
with when you want to do some work with data science. But the
regular library that comes installed with the Python language is not
going to be able to handle all of the work that needs to be done with
this field. This doesn’t mean that you are stuck though. There are
many extensions and other libraries that work with Python, that can
do some wonderful things when it comes to working on data science.
When you are ready to start analyzing some of the data that you
have been able to collect and learn some valuable insights out of
them, here are some of the best coding libraries that work with
Python as well.
NumPy and SciPy
The first part of the Python libraries for data science that we are
going to take a look at is the NumPy, or Numeric and Scientific
Computation, and the SciPy library. NumPy is going to be useful
because it is going to help us lay down the basic premises that we
need for scientific computing in Python. It is going to help us get
ahold of functions that are precompiled and fast to help with
numerical and mathematical routines as needed.
In addition to some of the benefits that we listed out above, NumPy
is able to come in and optimize some of the programming that
comes with Python by adding in some powerful structures for data.
This makes it easier for us to efficiently compute matrices and arrays
that are multi-dimensional.
Scientific Python, which is known as SciPy, is going to be linked
together with NumPy, and it is often that you can’t have one without
the other. When you have SciPy, you can lend a competitive edge to
what happens with NumPy. This happens when you enhance some
of the useful functions for minimization, regression, and more.
When you want to work with these two libraries, you need to go
through the process of installing the NumPy library first and getting
that all setup and ready to work with Python. From there, you can
install the SciPy library and get to work with using the Python coding
language with any of your goals or projects that include data
science.
Pandas
The second type of Python library that we can use to help out with
data science is going to be known as Pandas, or Python Data
Analysis Library. The name of the library is going to be so important
when it shows us how we can use this kind of library to help us get
started.
Pandas is going to be a tool that is open-sourced and can provide us
with data structures that are easy to use and high in performance
and it comes with all of the tools that you need to complete a data
analysis in the Python code. You can use this particular library to add
in data structures and tools to complete that data analysis, no matter
what kind you would like to do. Many industries like to work with this
Python library for data science will include engineering, social
science, statistics, and finance.
The best part about using this library is that it is adaptable, which
helps us to get more work done. It also works with any kind of data
that you were able to collect for it, including uncategorized, messy,
unstructured, and incomplete data. Even once you have the data,
this library is going to step in and help provide us with all of the tools
that we need to slice, reshape, merge, and more all of the sets of
data we have.
Pandas is going to come with a variety of features that makes it
perfect for data science. Some of the best features that come with
the Pandas library from Python will include:
1. You can use the Pandas library to help reshape the
structures of your data.
2. You can use the Pandas library to label series, as well
as tabular data, to help us see an automatic alignment
of the data.
3. You can use the Pandas library to help with
heterogeneous indexing of the data, and it is also useful
when it comes to systematic labeling of the data as
well.
4. You can use this library because it can hold onto the
capabilities of identifying and then fixing any of the data
that is missing.
5. This library provides us with the ability to load and then
save data from more than one format.
6. You can easily take some of the data structures that
come out of Python and NumPy and convert them into
the objects that you need to Pandas objects.
Matplotlib
When you work on your data science, you want to make sure that
after gathering and then analyzing all of the data that is available you
also find a good way to present that information to others so they
can gain all of the insights quickly. Working with visualizations of
some sort, depending on the kind of data you are working with, can
make it easier to see what information is gathered and how different
parts are going to be combined together.
This is where the Matplotlib is going to come in handy. This is a 2D
plotting library from Python, and it is going to be capable of helping
us to produce publication-quality figures in a variety of formats. You
can also see that it offers a variety of interactive environments
across a lot of different platforms as well. This library can be used
with the scripts form Python, the Python and the IPython shell, the
Jupyter notebook, four graphical interface tool kits, and many
servers for web applications.
The way that this library is going to be able to help us with data
science is that it is able to generate a lot of the visualizations that we
need to handle all of our data, and the results that we get out of the
data. This library is able to help with generating scatterplots, error
charts, bar charts, power spectra, histograms, and plots to name a
few. If you need to have some kind of chart or graph to go along with
your data analysis, make sure to check out what the matplotlib
option can do for you.
Scikit-Learn
Scikit-Learn is going to be a module that works well in Python and
can help with a lot of the state of the art algorithms that are found in
machine learning. These algorithms that work the best with the
Scikit-Learn library will work with medium-scale unsupervised and
supervised machine learning problems so you have a lot of
applications to make all of this work.
Out of the other libraries that we have talked about in this guidebook,
the Scikit-Learn library is one of the best options from Python when it
comes to machine learning. This package is going to focus on
helping us to bring some more machine learning to non-specialists
using a general-purpose high-level language. With this language,
you will find that the primary emphasis is going to be on things like
how easy it is to use, the performance, the documentation, and the
consistency that shows up in the API.
Another benefit that comes with this library is that it has a minimal
amount of dependencies and it is easy to distribute. You will find that
this library shows up in many settings that are commercial or
academic. Scikit-Learn is going to expose us to a consistent and
concise kind of interface that can work with some of the most
common algorithms that are part of machine learning, which makes it
easier to add in some machine learning to the data science that you
are working with.
Theano
Theano is another great library to work with during data science, and
it is often seen as one of the highly-rated libraries to get this work
done. In this library, you will get the benefit of defining, optimizing,
and then evaluating many different types of mathematical
expressions that come with multi-dimensional arrays in an efficient
manner. This library is able to use lots of GPUs and perform
symbolic differentiation in a more efficient manner.
Theano is a great library to learn how to use, but it does come with a
learning curve that is pretty steep, especially for most of the people
who have learned how to work with Python because declaring the
variables and building up some of the functions that you want to
work with will be quite a bit different from the premises that you learn
in Python.
However, this doesn’t mean that the process is impossible. It just
means that you need to take a bit longer to learn how to make this
happen. With some good tutorials and examples, it is possible for
someone who is brand new to Theano to get this coding all done.
Many great libraries that come with Python, including Padas and
NumPy, will be able to make this a bit easier as well.
TensorFlow
TensorFlow, one of the best Python libraries for data science, is a
library that was released by Google Brain. It was written out mostly
in the language of C++, but it is going to include some bindings in
Python, so the performance is not something that you are going to
need to worry about. One of the best features that come to this
library is going to be some of the flexible architecture that is found in
the mix, which is going to allow the programmer to deploy it with one
or more GPUs or CPUs in a desktop, mobile, or server device, while
using the same API the whole time.
Not many, if any, of the other libraries that we are using in this
chapter, will be able to make this kind of claim. This library is also
unique in that it was developed by the Google Brain project, and it is
not used by many other programmers. However, you do need to
spend a bit more time to learn the API compared to some of the
other libraries. In just a few minutes, you will find that it is possible to
work with this TensorFlow library in order to implement the design of
your network, without having to fight through the API like you do with
other options.
Keras
Keras is going to be an open-sourced library form Python that is able
to help you to build up your own neural networks, at a high level of
the interface. It is going to be pretty minimalistic, which makes it
easier to work with, and the coding on this library is going to be
simple and straightforward, while still adding in some of the high-
level extensibility that you need. It is going to work either TensorFlow
or Theano along with CNTK as the backend to make this work better.
We can remember that the API that comes with Keras is designed
for humans to use, rather than humans, which makes it easier to use
and puts the experience of the user right in front.
Keras is going to follow what are known as the best practices when it
comes to reducing the cognitive load. This Python library is going to
offer a consistent and simple APIs to help minimize how many
actions the user has to do for many of the common parts of the code,
and it also helps to provide feedback that is actionable and clear if
an error does show up.
In this library, we find that the model is going to be understood as a
sequence, or it can be a graph of standalone, fully-configurable
modules that you are able to put together with very few restrictions at
the time. Neural layers, optimizers, activation functions, initialization
schemes, cost functions, and regularization schemes are going to be
examples of the standalone modules that are combined to create a
new model. You will also find that Keras is going to make creating a
new module simple, and existing module that are there can provide
us with lots of examples to work with.
Caffe
The final Python library that we will take a look at in order to do some
work with data science is going to be Caffe. This is a good machine
learning library to work with when you want to focus your attention
on computer vision. Programmers like to use this to create some
deep neural networks that are able to recognize objects that are
found in images and it has been explored to help recognize a visual
style as well.
Caffe is able to offer us an integration that is seamless with GPU
training and then is highly recommended any time that you would
like to complete your training with some images. Although this library
is going to be preferred for things like research and academics, it is
going to have a lot of scope to help with models of training for
production as well. The expressive architecture that comes with it is
going to encourage application and innovation as well.
In this kind of library, you are going to find that the models will be
optimized and then defined through configuration without hard
coding in the process. You can even switch between the CPU and
the GPU by setting a single flag to train on a GPU machine, and then
go through and deploy to commodity clusters, or even to mobile
devices.
These are just a few of the different libraries that you are able to use
when it comes to working on Python, and they will ensure that you
are going to see the best results any time that you want to explore a
bit with data science. While the traditional form of the Python library,
the one that comes with the original download, is not going to be
able to handle some of the different parts that come with data
science, you can easily download and add on these other Python
libraries and see exactly what steps they can help with when it
comes to gathering, cleaning, analyzing, and using the data that you
have with data science.
Chapter 15: Python Functions
Python functions are a good way of organizing the structure of our
code. The functions can be used for grouping sections of code that
are related. The work of functions in any programming language is to
improve the modularity of code and make it possible to reuse code.
Python comes with many in-built functions. A good example of such
a function is the “print()” function which we use for displaying the
contents on the screen. Despite this, it is possible for us to create
our own functions in Python. Such functions are referred to as the
“user-defined functions”.
To define a function, we use the “def” keyword which is then followed
by the name of the function, and then the parenthesis (()).
The parameters or the input arguments have to be placed inside the
parenthesis. The parameters can also be defined within parenthesis.
The function has a body or the code block and this must begin with a
colon (:) and it has to be indented. It is good for you to note that the
default setting is that the arguments have a positional behavior. This
means that they should be passed while following the order in which
you defined them.
Example:
#!/usr/bin/python3
def functionExample():
print('The function code to run')
bz = 10 + 23
print(bz)
We have defined a function named functionExample. The
parameters of a function are like the variables for the function. The
parameters are usually added inside the parenthesis, but our above
function has no parameters. When you run above code, nothing will
happen since we simply defined the function and specified what it
should do. The function can be called as shown below:
#!/usr/bin/python3
def functionExample():
print('The function code to run')
bz = 10 + 23
functionExample()
It will print this:
Function Parameters
You can dynamically define arguments for a function. Example:
#!/usr/bin/python3
def additionFunction(n1,n2):
result = n1 + n2
print('The first number is', n1)
print('The second number is', n2)
print("The sum is", result)
additionFunction(10,5)
The code returns the following result:
The error message tells us the function expects two arguments but
we have passed 3 to it.
In most programming languages, parameters to a function can be
passed either by reference or by value. Python supports parameter
passing only by reference. This means if what the parameter refers
to is changed in the function; the same change will also be reflected
in the calling function. Example:
#!/usr/bin/python3
def referenceFunction(ls1):
print ("List values before change: ", ls1)
ls1[0]=800
print ("List values after change: ", ls1)
return
# Calling the function
ls1 = [940,1209,6734]
referenceFunction( ls1 )
print ("Values outside function: ", ls1)
The code gives this result:
What we have done in this example is that we have maintained the
reference of the objects which are being passed and then values
have been appended to the same function.
In next example below, we are passing by reference then the same
reference will be overwritten inside the same function which has
been called:
#!/usr/bin/python3
def referenceFunction( ls1 ):
ls1 = [11,21,31,41]
print ("Values inside the function: ", ls1)
return
ls1 = [51,91,81]
referenceFunction( ls1 )
print ("Values outside function: ", ls1)
The code gives this result:
The parameter font had been given a default value, that is, TNR. In
the last line of the above code, we have passed only two parameters
to the function, that is, the values for width and height parameters.
However, after calling the function, it returned the values for the
three parameters. This means for a parameter with default, we don’t
need to specify its value or even mention it when calling the
function.
However, it’s still possible for you to specify the value for the
parameter during function call. You can specify a different value to
what had been specified as the default and you will get the new one
as value of the parameter. Example:
#!/usr/bin/python3
def windowFunction(width,height,font='TNR'):
# printing everything
print(width,height,font)
windowFunction(245,278,'GEO')
The program outputs this:
Above, the value for parameter was given the default value “TNR”.
When calling the function in the last line of the code, we specified a
different value for this parameter, which is “GEO”. The code returned
the value as “GEO”. The default value was overridden.
Data Types
Knowing the basic data types and how they work is a must. Python
has several data types, and in this section, we will go through a brief
description of each one and then see them in practice. Don't forget
to also practice on your own, especially if you know nothing or very
little about Python.
With that in mind, let's explore strings, numbers, dictionaries, lists,
and more!
Numbers
In Python, just like in math in general, you have several categories of
numbers to work with, and when you work them into code, you have
to specify which one you're referring to. For instance, there are
integers, floats, longs, and others. However, the most commonly
used ones are integers and floats.
Integers, written int for short, are whole numbers that can either be
positive or negative. So make sure that when you declare a number
as an integer, you don't type a float instead. Floats are decimal or
fractional numbers.
Now let's discuss the mathematical operators. Just like in elementary
school, you will often work using basic mathematical operators such
as adding, subtracting, multiplication, and so on. Keep in mind that
these are different from the comparison operators, such as greater
than or less than or equal to. Now let's see some examples in code:
x = 99
y = 26
print (x + y)
This basic operation simply prints the sum of x and y. You can use
this syntax for all the other mathematical operators, no matter how
complex your calculation is. Now let’s type a command using a
comparison operator instead:
x = 99
y = 26
print (x > 100)
As you can see, the syntax is the same. However, we aren't
performing a calculation. Instead, we are verifying whether the value
of x is greater than 100. The result you will get is "false" because 99
is not greater than 100.
Next, you will learn what strings are and how you can work with
them.
Strings
Strings have everything to do with text, whether it's a letter, number,
or punctuation mark. However, take note that numbers written as
strings are not the same as the numbers data type. Anything can be
defined as a string, but to do so you need to place quotation marks
before and after your declaration. Let's take a look at the syntax:
n = “20”
x = 10
Notice that our n variable is a string data type and not a number,
while x is defined as an integer because it lacks the quotation marks.
There are many operations you can do on strings. For instance, you
can verify how long a string is, or you can concatenate several
strings. Let's see how many characters there are in the word "hello"
by using the following function:
len (“Hello”)
The “len” function is used to determine the number of characters,
which in this case is five. Here’s an example of string concatenation.
You’ll notice that it looks similar to a mathematical operation, but with
text:
‘42 ’ + ‘is ’ + ‘the ’ + ‘answer’
The result will be “42 is the answer”. Pay attention to the syntax,
because you will notice we left a space after each string, minus the
last one. Spaces are taken into consideration when writing strings. If
we didn’t add them, all of our strings would be concatenated into one
word.
Another popular operation is the string iteration. Here’s an example:
bookTittle = “Lord of the Rings”
for x in book: print c
The result will be an iteration of every single character found in the
string. Python contains many more string operations. However, these
are the ones you will use most often.
Now let’s progress to lists.
Lists
This is a data type that you will often be using. Lists are needed to
store data, and they can be manipulated as needed. Furthermore,
you can store objects of different types in them. Here's what a
Python list looks like:
n = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
The square brackets define the list, and every object separated by a
comma is a list element. Here's an example of a list containing
different data types:
myBook = [“title”, “somePages”, 1, 2.1, 5, 22, 42]
This is a list that holds string objects as well as integers and floats.
You can also perform a number of operations on lists, and most of
them follow the same syntax as for the strings. Try them out!
Dictionaries
This data type is nearly identical to a list. However, you cannot
access the elements the same way. What you need is to know the
key, which is linked to a dictionary object. Take a look at the following
example:
dict = {‘weapon’ : ‘sword’, ‘soldier’ : ‘archer’}
dict [‘weapon’]
The first line contains the dictionary's definition, and as you can see,
the objects and their keys have to be stored between curly braces.
You can identify the keys as "weapon" and "soldier" because, after
them, you need to place a colon, followed by the attribute. Keep in
mind that while in this example, our keys are, in fact strings, they can
be other data types as well.
Tuples
This data type is similar to a list, except its elements cannot be
changed once defined. Here’s an example of a tuple:
n = (1, 43, ‘someText’, 99, [1, 2, 3])
A tuple is defined between parentheses, and in this case, we have
three different data types, namely a few integers, a string, and a list.
You can perform a number of operations on a tuple, and most of
them are the same as for the lists and strings. They are similar data
types, except that once you declare the tuple, you cannot change it
later.
Conditional Statements
Now that you know the basic data types, it’s time to take a crash
course on more complex operations that involve conditional
statements. A conditional statement is used to give an application a
limited ability to think for itself and make a decision based on their
assessment of the situation. In other words, it analyzes the condition
required by a variable in order to tell the program to react based on
the outcome of that analysis.
Python statements are simple to understand because they are
logical, and the syntax reflects human thinking. For instance, the
syntax written in English looks like this "If I don't feel well, I won't go
anywhere else. I will have to go to work." In this example, we instruct
the program to check whether you feel well. If the statement is
valued as false, it means you feel well, and therefore, it will progress
to the next line, which is an "else" statement. Both “if” and “if else”
conditionals are frequently used when programming in general.
Here’s an example of the syntax:
x = 100
if (x < 100):
print(“x is small”)
This is the most basic form of the statement. It checks whether it's
true, and if it is, then something will happen, and if it's not, then
nothing will happen. Here's an example using the else statement as
well:
x = 100
if (x < 100):
print(“x is small”)
else:
print(“x is large”)
print (“Print this no matter what”)
With the added “else” keyword, we instruct the application to perform
a different task if a false value is returned. Furthermore, we have a
separate declaration that lies outside of the conditional statement.
This will be executed no matter the outcome.
Another type of conditional involves the use of "elif" which allows the
application to analyze a number of statements before it makes a
decision. Here's an example:
if (condition1):
add a statement here
elif (condition2):
add another statement for this condition
elif (condition3):
add another statement for this condition
else:
if none of the conditions apply, do this
Take note that this time we did not use code. You already know
enough about Python syntax and conditionals to turn all of this into
code. What we have here is the pseudo-code, which is very handy,
whether you are writing simple Python exercises or working with
machine learning algorithms. Pseudocode allows you to place your
thoughts on "paper" by following the Python programming structure.
This makes it a lot easier for you to organize your ideas and your
application by writing the code after you've outlined it. With that
being said, here's the actual code:
x = 10
if (x > 10):
print (“x is larger than ten”)
elif x < 4:
print (“x is smaller”)
else:
print (“x is pretty small”)
Now you have everything you need to know about conditionals. Use
them in combination with what you learned about data types in order
to practice. Keep in mind that you always need to practice these
basic Python concepts in order to understand later how machine
learning algorithms work.
Loops
Code sometimes needs to be executed repeatedly until a specific
condition is met. This is what loops are for. There are two types, the
for loop and the while loop. Let’s begin with the first example:
for x in range(1, 10):
print(x)
This code will be executed several times, printing the value of X
each time, until it reaches ten.
The while loop, on the other hand, is used to repeat the execution of
a code block only if the condition we set is still true. Therefore, when
the condition is no longer met, the loop will break, and the
application will continue with the next lines of code. Here's a while
loop in action:
x=1
while x < 10:
print(x)
x += 1
The x variable is declared as an integer, and then we instruct the
program that as long as x is less than ten, the result should be
printed. Take note that if you do not continue with any other
statement at this point, you will create an infinite loop, and that is not
something you want. The final statement makes sure that the
application will print the new value with one added to it with every
execution. When the variable stops being less than ten, the condition
will no longer be met, and the loop will break, allowing the
application to continue executing any code that follows.
Keep in mind that infinite loops can easily happen due to mistakes
and oversight. Luckily, Python has a solution, namely the "break"
statement, which should be placed at the end of the loop. Here's an
example:
while True:
answer = input (“Type command:”)
if answer == “Yes”:
break
Now the loop can be broken by typing a command.
Functions
As a beginner machine learner, this is the final Python component
you need to understand before learning the cool stuff. Functions
allow you to make your programs a great deal more efficient,
optimized, and easier to work with. They can significantly reduce the
amount of code you have to type, and therefore make the application
less demanding when it comes to system resources. Here's an
example of the most basic function to get an idea about the syntax:
def myFunction():
print(“Hello, I am now a function!”)
Functions are first declared by using the “def” statement, followed by
its name. Whenever we want to call this block of code, we simply call
the function instead of writing the whole code again. For instance,
you simply type:
myFunction()
The parentheses after the function represent the section where you
can store a number of parameters. They can alter the definition of
the function like this:
def myName(firstname):
print(firstname + “ Smith”)
myName(“Andrew”)
myName(“Peter”)
myName(“Sam”)
Here we have a first name parameter, and whenever we call the
function to print its parameter, it does so together with the addition of
the word "Smith". Take note that this is a really basic example just so
you get a feel for the syntax. More complex function are written the
same way, however.
Here’s another example where we have a default parameter, which
will be called only if there is nothing else to be executed in its place.
def myHobby(hobby = “leatherworking”):
print (“My hobby is “ + hobby)
myHobby (“archery”)
myHobby (“gaming”)
myHobby ()
myHobby (“fishing”)
Now let’s call the function:
My hobby is archery
My hobby is gaming
My hobby is leatherworking
My hobby is fishing
You can see here how the default parameter is used when we lack a
specification.
Here you can see that the function without a parameter will use the
default value we set.
In addition, you can also have functions that return something. For
now, we only wrote functions that perform an action, but they don't
return any values or results. These functions are far more useful
because the result can then be placed into a variable that will later
be used in another operation. Here's how the syntax looks in this
case:
def square(x):
return x * x
print(square (5))
Now that you've gone through a brief Python crash course and you
understand the basics, it's time to learn how to use the right tools
and how to set up your machine learning environment. Don't forget
that Python is only one component of machine learning. However,
it's an important one because it's the foundation, and without it,
everything falls apart.
Chapter 17: Data Structures and the A*
Algorithm
In this chapter, you will learn how to create abstract data structures
using the same Python data types you already know. Abstract data
structures allow your programs to process data in intuitive ways and
rely on the Don't Repeat Yourself (DRY) principle. That is, using less
code and not typing out the same operations repeatedly for each
case. As you study the examples given, you will begin to notice a
pattern emerging: the use of classes that complement each other
with one acting as a node and another as a container of nodes. In
computer science, a data structure that uses nodes is generally
referred to as a tree. There are many different types of trees, each
with specialized use cases. You may have already heard of binary
trees if you are interested in programming or computer science at all.
One possible type of tree is called an n-ary tree, or n-dimensional
tree. Unlike the binary tree, the n-ary tree contains nodes that have
an arbitrary number of children. A child is simply another instance of
a node that is linked to another node, sometimes called a parent.
The parent must have some mechanism for linking up to child nodes.
The easiest way to do this is with a list of objects.
Example Coding #1: A Mock File-System
A natural application of the n-ary tree is a traditional windows or
UNIX file system. Nodes can be either folders, directories, or
individual files. To keep things simple, the following program
assumes a single directory as the tree's root.
# ch1a.py
The FileSystem acts as the tree, and the Node class does most of
the work, which is common with tree data structures. Notice also that
FileSystem keeps track of individual ID’s for each node. The ID’s can
be used as a way to quantify the number of nodes in the file system
or to provide lookup functionality.
When it comes to trees, the most onerous task is usually
programming a solution for traversal. The usual way a tree is
structured is with a single node as root, and from that single node,
the rest of the tree can be accessed. Here the function
look_up_parent uses a loop to traverse the mock directory structure,
but it can easily be adapted to a recursive solution as well.
General usage of the program is as follows: initiate the FileSystem
class, declare Node objects with the directory syntax (in this case
backslash so Python won’t mistake it for escape characters), and
then calling the add method on them.
Example Coding # 2: Binary Search Tree (BST)
The binary search tree gets its name from the fact that a node can
contain at most two children. While this may sound like a restriction,
it is actually a good one because the tree becomes intuitive to
traverse. An n-ary tree, in contrast, can be messy.
# ch1b.py
As before, the Node class does most of the heavy lifting. This
program uses a BST primarily to sort a list of numbers but can be
generalized to sorting any data type. There are also a number of
auxiliary methods for finding out the size of the tree and which nodes
are childless (leaves).
This implementation of a tree better illustrates the role that recursion
takes when traversing a tree at each node calls a method (for
example, insert) and creates a chain until a base case is reached.
Example Coding # 3: A* Algorithm
The A* star search algorithm is considered the same as the Dijkstra
algorithm but with brains. Whereas Dijkstra searches almost
exhaustedly until the path is found, A* uses what is called a heuristic,
which is a fancy way of saying “educated guess.” A* is fast because
it is able to point an arrow at the target (using the heuristic) and find
steps on that path.
First, here's a brief explanation of the algorithm. To simplify things,
we will be using a square grid with orthogonal movement only (no
diagonals). The object of A* is to find the shortest path between point
A and point B. That is, we know the position of point B. This will be
the end node and A the start. In order to get from A to B, the
algorithm must calculate distances of nodes between A and B such
that each node gets closer to B or is discarded. An easy way to
program this is by using a heap or priority queue and using some
measure of distance to sort order.
After the first node is added to the heap, each neighbor node will be
evaluated for distance, and the closest one to B is added to the
heap. The process repeats until the node is equal to B.
#ch1c.py
In this case, the heuristic is called Manhattan distance, which is just
the absolute value between the current node and the target. The
heapq library is being used to create a priority queue with f as the
priority. Note that the backtrace function is simply traversing a tree of
nodes that each has a single parent.
You can think of the g variable is the cost of moving from the starting
point to somewhere along the path. Since we are using a grid with
no variation in movement, cost g can be constant. The h variable is
the estimated distance between the current node and the target.
Adding these two together gives you the f variable, which is what
controls the order of nodes on the path.
Statistics
The basis of modern science is on the statements of probability and
statistical significance. In one example, according to studies,
cigarette smokers have a 20 times greater likelihood of developing
lung cancer than those that don’t smoke. In another research, the
next 200,000 years will have the possibility of a catastrophic
meteorite impact on Earth. Also, against the second male children,
the first-born male children exhibit IQ test scores of 2.82 points. But,
why do scientists talk in ambiguous expressions? Why don’t they say
it that lung cancer is as a result of cigarette smoking? And they could
have informed people if there needs to be an establishment of a
colony on the moon to escape the disaster of the extraterrestrial.
The rationale behind these recent analyses is an accurate reflection
of the data. It is not common to have absolute conclusions in
scientific data. Some smokers can reduce the risk of lung cancer if
they quit, while some smokers never contract the disease, other than
lung cancer; it was cardiovascular diseases that kill some smokers
prematurely. As a form of allowing scientists to make more accurate
statements about their data, it is the statistic function to quantify
variability since there is an exhibition of variability in all data.
Those statistics offer evidence that something is incorrect may be a
common misconception. However, statistics have no such features.
Instead, to observe a specific result, they provide a measure of the
probability. Scientists can put numbers to probability through statistic
techniques, taking a step away from the statement that someone is
more likely to develop lung cancer if they smoke cigarettes to a
report that says it is nearly 20 times greater in cigarette smokers
compared to nonsmokers for the probability of developing lung
cancer. It is a powerful tool the quantification of probability statistics
offers and scientists use it thoroughly, yet they frequently
misunderstand it.
Statistics in data analysis
Developed for data analysis is a large number of procedures for
statistics they are in two parts of inferential and descriptive:
Descriptive statistics:
With the use of measures for deviation like mean, median, and
standard, scientists have the capability of quickly summing up
significant attributes of a dataset through descriptive statistics. They
allow scientists to put the research within a broad context while
offering a general sense of the group they study. For example,
initiated in 1959, potential research on mortality was Cancer
Prevention Study 1 (CPS-1). Among other variables, investigators
gave reports of demographics and ages of the participants to let
them compare, at the time, the United States’ broader population
and also the study group. The age of the volunteers was from ages
30 to 108 with age in the middle as 52 years. The research had 57
percent female as subjects, 2 percent black, and 97 percent white.
Also, in 1960, the total population of female in the US was 51
percent, black was about 11 percent, and white was 89 percent. The
statistics of descriptive easily identified CPS-1’s recognized
shortcoming by suggesting that the research made no effort to
sufficiently consider illness profiles in the US marginal groups when
97 percent of participants were white.
Inferential statistics:
When scientists want to make a considered opinion about data,
making suppositions about bigger populaces with the use of smaller
samples of data, discover connection between variables in datasets,
and model patterns in data, they make use of inferential statistics.
From the perspective of statistics, the term “population” may differ
from the ordinary meaning that it belongs to a collection of people.
The larger group is a geometric population used by a dataset for
making suppositions about a society, locations of an oil field, meteor
impacts, corn plants, or some various set of measurements
accordingly.
With regards to scientific studies, the process of shifting results to
larger populations from small sample sizes is quite essential. For
example, though there was conscription of about 1 million and 1.2
million individuals in that order for the Cancer Prevention Studies I
and II, their representation is for a tiny portion of the 1960 and 1980
United States people that totaled about 179 and 226 million.
Correlation, testing/point estimation, and regression are some of the
standard inferential techniques. For example, Tor Bjerkedal and
Peter Kristensen analyzed 250,000 male’s test scores in IQ for
personnel of the Norwegian military in 2007. According to their
examination, the IQ test scores of the first-born male children scored
higher points of 2.82 +/- 0.07 than second-born male children, 95
percent confidence level of a statistical difference.
The vital concept in the analysis of data is the phrase “statistically
significant,” and most times, people misunderstand it. Similar to the
frequent application of the term significant, most people assume that
a result is momentous or essential when they call it significant.
However, the case is different. Instead, an estimate of the probability
is statistical significance that the difference or observed association
is because of chance instead of any actual connection. In other
words, when there is no valid existing difference or link, statistical
significance tests describe the probability that the difference or a
temporary link would take place. Because it has a similar implication
in statistics typical of regular verbal communication, though people
can measure it, the measure of significance is most times expressed
in terms of confidence.
Data Types
To do Exploratory Data Analysis, EDA, you need to have a clear
grasp of measurement scales, which are also the different data types
because specific data types have correlated with the use of
individual statistical measurements. To select the precise
visualization process, there is also the requirement of identifying
data types with which you are handling. The manner with which you
can categorize various types of variables is data types. Now, let’s
take an in-depth look at the main types of variables and their
examples, and we may refer to them as measurement scales
sometimes.
Categorical data
Characteristics are the representation of categorical data. As a
result, it stands for things such as someone’s language, gender, and
so on. Also, numerical values have a connection with categorical
data like 0 for female and 1 for male. Be aware that those numbers
have no mathematical meaning.
Nominal data
The discrete units are the representation of nominal values, and they
use them to label variables without any quantitative value. They are
nothing but “labels.” It is important to note that nominal data has no
order. Hence, nothing would change about the meaning even if you
improve the order of its values. For example, the value may not
change when a question is asking you for your gender, and you need
to choose between female and male. The order has no value.
Ordinal data
Ordered and discrete units are what ordinal values represent. Except
for the importance of its ordering, ordinal data is therefore almost
similar to nominal data. For example, when a question asks you
about your educational background and has the order of elementary,
high school, undergraduate, and graduate. If you observe, there is a
difference between college and high school and also between high
school and elementary. Here is where the major limitation of ordinal
data suffices; it is hard to know the differences between the values.
Due to this limitation, they use ordinal scales to measure non-
numerical features such as customer satisfaction, happiness, etc.
Numerical Data
Discrete data
When its values are separate and distinct, then we refer to discrete
data. In other words, when the data can take on specific benefits,
then we speak of discrete data. It is possible to count this type of
data, but we cannot measure it. Classification is the category that its
information represents. A perfect instance is the number of heads in
100-coin flips. To know if you are dealing with discrete data or not,
try to ask the following two questions: can you divide it into smaller
and smaller parts, or can you count it?
Continuous data
Measurements are what continuous data represents, and as such,
you can only measure them, but you can’t count their values. For
example, with the use of intervals on the real number lines, you can
describe someone’s height.
Interval data
The representation of ordered units with similar differences is interval
values. Consequently, in the course of a variable that contains
ordered numeric values and where we know the actual differences
between the values is interval data. For example, a feature that
includes a temperature of a given place may have the temperature in
-10, -5, 0, +5, +10, and +15. Interval values have a setback since
they have no “true zero.” It implies that there is no such thing as the
temperature in regards to the example. Subtracting and adding is
possible with interval data. However, they don’t give room for
division, calculation, or multiplication of ratios. Ultimately, it is hard to
apply plenty of inferential and descriptive statistics because there is
no true zero.
Ratio data
Also, with a similar difference, ratio values are ordered units. The
contrast of an absolute zero is what ratio values have, the same as
the interval values. For example, weight, length, height, and so on.
Statistical Methods
Nominal data
The sense behind dealing with nominal data is to accumulate
information with the aid of:
Frequencies:
The degree upon which an occasion takes place concerning a
dataset or over a period is the frequency.
Proportion:
When you divide the frequency by the total number of events, you
can easily calculate the proportion. For example, how often an event
occurs divided by how often the event could occur.
Percentage:
Here, the technique required is visualization, and a bar chart or a pie
chat is all that you need to visualize nominal data. To transform
nominal data into a numeric feature, you can make use of one-hot
encoding in data science.
Ordinal data
The same technique you use in nominal data can be applied with
ordinal data. However, some additional tools here there for you to
access. Consequently, proportions, percentages, and frequencies
are the data you can use for your summary. Bar charts and pie
charts can be used to visualize them. Also, for the review of your
data, you can use median, interquartile range, mode, and
percentiles.
Continuous data
You can use most techniques for your data description when you are
dealing with constant data. For the summary of your data, you can
use range, median, percentiles, standard deviation, interquartile
range, and mean.
Visualization techniques:
A box-plot or a histogram, checking the variability, central tendency,
kurtosis of a distribution, and modality all come to mind when you
are attempting to visualize continuous data. You need to be aware
that when you have any outliers, a histogram may not reveal that.
That is the reason for the use of box-plots.
Descriptive Statistics
As an essential aspect of machine learning, to have an
understanding of your data, you need descriptive statistical analysis
since making predictions is what machine is all about. On the other
hand, as a necessary initial step, you conclude from data through
statistics. Your dataset needs to go through descriptive statistical
analysis. Most people often get to wrong conclusions by losing a
considerable amount of beneficial understandings regarding their
data since they skip this part. It is better to be careful when running
your descriptive statistics, take your time, and for further analysis,
ensure your data complements all prerequisites.
Normal Distribution
Since almost all statistical tests require normally distributed data, the
most critical concept of statistics is the normal distribution. When
scientists plot it, it is essentially the depiction of the patterns of large
samples of data. Sometimes, they refer to it as the “Gaussian curve,”
or the “bell curve.”
There is a requirement that a normal distribution is given for
calculation and inferential statistics of probabilities. The implication of
this is that you must be careful of what statistical test you apply to
your data if it not normally distributed since they could lead to wrong
conclusions.
If your data is symmetrical, unimodal, centered, and bell-shaped, a
normal distribution is given. Each side is an exact mirror of the other
in a perfectly normal distribution.
Central tendency
Mean, mode, and the median is what we need to tackle in statistics.
Also, these three are referred to as the “Central Tendency.” Apart
from being the most popular, these three are distinctive “averages.”
With regards to its consideration as a measure that is most
consistent of the central propensity for formulating a hypothesis
about a population from a particular model, the mean is the average.
For the clustering of your data value around its mean, mode, or
median, central tendency determines the tendency. When the
values’ number is divided, the mean is computed by the sum of all
values.
The category or value that frequently happens contained by the data
is the mode. When there is no repletion of number or similarity in the
class, there is no mode in a dataset. Also, it is likely for a dataset to
have more than one mode. For categorical variables, the single
central tendency measure is the mode since you can compute such
as the variable “gender” average. Percentages and numbers are the
only categorical variables you can report.
Also known as the “50th percentile,” the midpoint or “middle” value in
your data is the median. More than the mean, the median is much
less affected by skewed data and outliers. For example, when a
housing prizes dataset is from $100,000 to £300,000 yet has more
than $3million worth of houses. Divided by the number of values and
the sum of all values, the expensive homes will profoundly impact
the mean. As all data points “middle” value, these outliers will not
profoundly affect the median. Consequently, for your data
description, the median is a much more suited statistic.
Chapter 21: Distributed Systems & Big
Data
Distributed System
A distributed system is a gathering of autonomous PCs which are
interconnected by either a nearby Network on a worldwide network.
Distributed systems enable a different machine to play out various
procedures. Distributed system example incorporates banking
system, air reservation system, etc.
Distributed System has numerous objectives. Some of them are
given underneath.
Scalability - To extend and deal with the server without corrupting
any administrations.
Heterogeneity - To deal with considerable variety types of hubs.
Straightforwardness - to shroud the interior working so that is user
can't understand the complexity.
Accessibility - To make the resources accessible with the goal that
the user accesses the resources and offer the resource adequately.
Receptiveness - To offers administrations as per standard
guidelines.
There are numerous points of interest in a distributed system. Some
of them are given beneath:
Complexity is covered up in a distributed system.
Distributed System guarantees the scalability.
Convey system give consistency.
Distributed System is more productive than other System.
A drawback of distributed System is given underneath:
Cost - It is increasingly costly because the advancement of
distributed System is difficult.
Security - More defenseless to hacking because resources are
uncovered through the network.
Complexity - More mind-boggling to understand fabric usage.
Network reliance - The current network may cause a few issues.
How do I get hands-on with distributed systems?
Learning DS ideas by
1. Building a simple chat application:
Step 1: Start little, implement a simple chat application.
If fruitful, modify it to help multi-user chat sessions.
You should see a few issues here with a message requesting.
Step 2: After reading DS hypothesis for following, causal, and other
requesting procedures, implement every one of them individually into
your System.
2. Building a capacity test system:
Step 1: Write an Android application (no extravagant UI, merely a
few catches) that can embed and inquiry into the hidden Content
Provider. This application ought to have the option to speak with
different gadgets that run your application.
Step 2: After perusing the hypothesis of Chord protocol and DHT,
reenact these protocols in your distributed set up.
For example, Assume I run your application in three emulators.
These three cases of your application should frame a chord ring and
serve embed/question demands in a distributed style, as indicated
by the chord protocol.
If an emulator goes down, at that point, you ought to have the option
to reassign keys dependent on your hashing calculation to at present
running examples.
WHAT ARE THE APPLICATIONS OF DISTRIBUTED SYSTEMS?
An appropriate system is a gathering of computer cooperating, which
shows up as a single computer to the end-user.
Whenever server traffic grows, one has to redesign the hardware
and programming arrangement of the server to deal with it, which is
known as the vertical scaling. The vertical scaling is excellent.
However, one cannot scale it after some purpose of time. Indeed,
even the best hardware and programming can not give better
support for enormous traffic.
Coming up next are the different applications of the distributed
System.
Worldwide situating System
World Wide Web
Airport regulation System
Mechanized Banking System
In the World Wide Web application, the information or application
was distributed on the few numbers of the heterogeneous computer
system, yet for the end-user or the browser, it is by all accounts a
single system from which user got the data.
The multiple numbers of the computer working simultaneously and
play out the asset partaking in the World Wide Web.
These all the System are the adaptation to internal failure, If anyone
system is bomb the application won't become up short,
disappointment computer errand can be given over by another
computer in the System, and this will all occur without knowing to the
end-user or browser.
The elements of the World Wide Web are
Multiple Computer
Common Sate
Interconnection of the Multiple computers.
There are three sorts of distributed systems:
Corporate systems
These separate utilization servers for database, business insight,
exchange preparation, and web administrations. These are more
often than not at one site, yet could have multiple servers at
numerous areas if continuous administration is significant.
Vast web locales, Google, Facebook, Quora, maybe Wikipedia
These resemble the corporate systems; however, are gigantic to the
point that they have their very own character. They are compelled to
be distributed due to their scale.
Ones serving distributed associations that can't depend on system
availability or need local IT assets
The military will require some unit-level direction and control
capacity. The perfect would be that every unit (trooper, transport, and
so on) can go about as a hub so that there is no focal area whose
pulverization would cut everything down.
Mining operations frequently have a significant modern limit at the
remotest places and are best served by local IT for stock control,
finance and staff systems, and particular bookkeeping and arranging
systems.
Development organizations frequently have huge ventures without
significant correspondences so that they will be something like
mining operations above. In the most pessimistic scenario, they may
depend on a driver bouncing in his truck with a memory stick and
associating with the web in some close-by town.
Data Visualization
What is Data Visualization?
Data Visualization is Interactive
Have you at any point booked your flight plans online and saw that
you can now view situate accessibility as well as pick your seat?
Perhaps you have seen that when you need to look into information
online on another nation, you may discover a site where all you need
to do to get political, affordable, land, and other information is drag
your mouse over the area of the nation wherein you are intrigued.
Possibly you have assembled a business introduction comprising of
different degrees of complicated advertising and spending
information in a straightforward display, which enables you to audit
all parts of your report by just tapping on one area of a guide, outline,
or diagram. You may have even made forecasts by adjusting some
information and watching the diagram change before your thought.
Warehouses are following the stock. Businesses are following deals.
Individuals are making visual displays of information that addresses
their issues. The explorer, the understudy, the ordinary laborer, the
advertising official, the warehouse administrator, the CEO are
currently ready to associate with the information they are searching
for with data visualization tools.
Data Visualization is Imaginative
If you can visualize it in your psyche, you can visualize it on a PC
screen. The eager skier might be keen on looking at the average
snowfall at Soldier Mountain, ID. Specialists and understudies may
need to look at the average malignant growth death pace of men to
ladies in Montana or Hawaii. The models are interminable.
Data visualization tools can assist the business visionary with
presenting items on their site imaginatively and educationally. Data
visualization has been grabbed by state and national government
offices to give helpful information to general society. Aircraft exploit
data visualization to be all the more obliging. Businesses utilize data
visualization for following and announcing. Youngsters use data
visualization tools on the home PC to satisfy investigate assignments
or to fulfill their interest in awkward spots of the world.
Any place you go, data visualization will be there. Whatever you
need, data visualization can present answers in an accommodating
way.
Data Visualization is a Comprehensive
Every one of us has looked into information online and found not
exactly accommodating introduction designs that have a way of
either exhibiting necessary details in a complicated technique or
showing complex information in a much progressively complex way.
Every one of us at some time has wanted that that site had a more
user amicable way of introducing the information.
Information is the language of the 21st century, which means
everybody is sending it, and everybody is looking through it. Data
visualization can make both the senders and the searchers cheerful
by creating a primary mechanism for frequently giving complex
information.
Data Visualization Basics
Data visualization is the way toward information/ displaying data in
graphical charts, bars, and figures.
It is used as intends to convey visual answering to users for the
performance, tasks, or general measurements of an application,
system, equipment, or all intents and purposes any IT asset. Data
visualization is ordinarily accomplished by extricating data from the
primary IT system. This data is generally as numbers, insights, and
by and massive action. The data is prepared to utilize displayed on
the system's dashboard and data visualization software.
It is done to help IT directors in getting brisk, visual, and
straightforward knowledge into the performance of the hidden
system. Most IT performance observing applications use data
visualization procedures to give an accurate understanding of the
performance of the checked system.
Software Visualization
Software visualization is the act of making visual tools to delineate
components or generally display parts of source code. This should
be possible with a wide range of programming dialects in different
ways, with different criteria and tools.
The principal thought behind software visualization is that by making
visual interfaces, makers can support developers and others to get
code or to figure out applications. A ton of the intensity of software
visualization has to do with understanding connections between
pieces of code, where specific visual tools, for example, windows,
will openly introduce this information. Different highlights may include
various sorts of charts or formats that developers can use to contrast
existing code with a specific standard.
Enormous Data Visualization
Massive data visualization alludes to the usage of progressively
contemporary visualization methods to show the connections inside
data. Visualization strategies incorporate applications that can
display constant changes and increasingly graphic designs along
these lines going past pie, bar, and different charts. These
delineations veer away from the use of many paths, segments, and
qualities toward a progressively creative visual portrayal of the data.
Ordinarily, when businesses need to introduce connections among
data, they use diagrams, bars, and charts to do it. They can likewise
make use of an assortment of hues, terms, and images. The primary
issue with this arrangement, notwithstanding, is that it doesn't work
superbly of exhibiting exceptionally enormous data or data that
incorporates immense numbers. Data visualization uses increasingly
intelligent, graphical representations - including personalization and
liveliness - to display figures and set up associations among pieces
of information.
Without a doubt, you’ve probably heard about tiny computers like the
Raspberry Pi or Arduino board. They are tiny, inexpensive devices
that can be used in a variety of projects. Some people create cool
little weather stations or drones that can scan the area, while others
build killer robots because why not. Once the hardware problems are
solved, they all need to take care of the software component.
Python is the ideal solution, and it is used by hobbyists and
professionals alike. These tiny computers don't have much power, so
they need the most powerful programming language that uses the
least amount of resources. After all, resources also consume power,
and tiny robots can only pack so much juice. Everything you have
learned so far can be used in robotics because Python is easily
combined with any hardware components without compatibility
issues. Furthermore, there are many Python extensions and libraries
specifically designed for the field of robotics.
In addition, Google uses some Python magic in their AI-based self-
driving car. If Python is good for Google and for creating killer robots,
what more can you want?
Machine Learning
You’ve probably heard about machine learning because it is the new
popular kid on the block that every tech company relies on for
something. Machine learning is all about teaching computer
programs to learn from experience based on data you already have.
Thanks to this concept, computers can learn how to predict various
actions and results.
Some of the most popular machine learning examples can be found
in:
1. Google Maps: Machine learning is used here to
determine the speed of the traffic and to predict for you
the most optimal route to your destination based on
several other factors as well.
2. Gmail: SPAM used to be a problem, but thanks to
Google’s machine learning algorithms, SPAM can now
be easily detected and contained.
3. Spotify or Netflix: Noticed how any of these streaming
platforms have a habit of knowing what new things to
recommend to you? That's all because of machine
learning. Some algorithms can predict what you will like
based on what you have watched or listened to so far.
Machine learning involves programming, as well as a great deal of
mathematics. Python's simplicity makes it attractive for both
programmers and mathematicians. Furthermore, unlike other
programming languages, Python has a number of add-ons and
libraries created explicitly for machine learning and data science,
such as Tensorflow, NumPy, Pandas, and Scikit-learn.
Cybersecurity
Mathematical Explanation
Before we get into the coding, let us talk about the mathematics
behind this algorithm.
In the figure above, you see a lot of different points, which all have
an x-value and a y-value. The x-value is called the feature, whereas
the y-value is our label. The label is the result of our feature. Our
linear regression model is represented by the blue line that goes
straight through our data. It is placed so that it is as close as possible
to all points at the same time. So we “trained” the line to fit the
existing points or the existing data.
The idea is now to take a new x-value without knowing the
corresponding y-value. We then look at the line and find the resulting
y-value there, which the model predicts for us. However, since this
line is quite generalized, we will get a relatively inaccurate result.
However, one must also mention that linear model only really
develops their effectiveness when we are dealing with numerous
features (i.e., higher dimensions).
If we are applying this model to data of schools and we try to find a
relation between missing hours, learning time, and the resulting
grade, we will probably get a less accurate result than by including
30 parameters. Logically, however, we then no longer have a straight
line or flat surface but a hyperplane. This is the equivalent to a
straight line, in higher dimensions.
Preparing Data
Our data is now fully loaded and selected. However, in order to use it
as training and testing data for our model, we have to reformat them.
The sklearn models do not accept Pandas data frames, but only
NumPy arrays. That's why we turn our features into an x-array and
our label into a y-array.
X = np.array(data.drop([prediction], 1))
Y = np.array(data[prediction])
The method np.array converts the selected columns into an array.
The drop function returns the data frame without the specified
column. Our X array now contains all of our columns, except for the
final grade. The final grade is in the Y array.
In order to train and test our model, we have to split our available
data. The first part is used to get the hyperplane to fit our data as
well as possible. The second part then checks the accuracy of the
prediction, with previously unknown data.
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.1)
With the function train_test_split, we divide our X and Y arrays into
four arrays. The order must be exactly as shown here. The test_size
parameter specifies what percentage of records to use for testing. In
this case, it is 10%. This is also a good and recommended value. We
do this to test how accurate it is with data that our model has never
seen before.
Visualizing Correlations
Since we are dealing with high dimensions here, we can’t draw a
graph of our model. This is only possible in two or three dimensions.
However, what we can visualize are relationships between individual
features.
plt.scatter(data['study time'], data['G3'])
plt.title("Correlation")
plt.xlabel("Study Time")
plt.ylabel("Final Grade")
plt.show()
Here we draw a scatter plot with the function scatter, which shows
the relationship between the learning time and the final grade.
In this case, we see that the relationship is not really strong. The
data is very diverse and you cannot see a clear pattern.
plt.scatter(data['G2'], data['G3'])
plt.title("Correlation")
plt.xlabel("Second Grade")
plt.ylabel("Final Grade")
plt.show()
However, if we look at the correlation between the second grade and
the final grade, we see a much stronger correlation.
Here we can clearly see that the students with good second grades
are very likely to end up with a good final grade as well. You can play
around with the different columns of this data set if you want to.
Conclusion
In conclusion, Python and big data provide one of the strongest
capabilities in computational terms on the platform of big data
analysis. If this is your first time at data programming, Python will be
a much easier language to learn than any other and is far more user-
friendly.
And so, we've come to the end of this book, which was meant to give
you a taste of data analysis techniques and visualization beyond the
basics using Python. Python is a wonderful tool to use for data
purposes, and I hope this guide stands you in good stead as you go
about using it for your purposes.
I have tried to go more in-depth in this book, give you more
information on the fundamentals of data science, along with lots of
useful, practical examples for you to try out.
Please read this guide as often as you need to and don’t move on
from a chapter until you fully understand it. And do try out the
examples included – you will learn far more if you actually do it
rather than just reading the theory.
This was just an overview to recap on what you learned in the first
book, covering the datatypes in pandas and how they are used. We
also looked at cleaning the data and manipulating it to handle
missing values and do some string operations.