Lectures Data Processing
Lectures Data Processing
Week 1
Chapter 1 – Hello World
Why Python?
- Simple and friendly
- Popular
- Drives data science and machine learning libraries, databases, web development, and
even other languages.
This language has symbols (the text that make up the code), very precise syntax (rules that
define what valid text is) and semantics (how those are translated into what actions, in which
order).
Chapter 4 – Expressions
The function print() is needed to show, display or print the result of this expression.
You write the word print() followed by an opening parenthesis, followed by what you want to
display.
For example:
- print (5 + 7)
- print(5 + 79 – 2)
You can display multiple things with one print() function by putting everything that you want to
display between the parentheses with commas in between. For example:
- print( “I”, “own”, “two”, “apples”, “and”, “one”, “banana”)
Data types
Strings
- A string is a text, consisting of zero or more characters.
- In Python, a string is enclosed by either double quotes, or single quotes. In principle, it
does not matter which of the two you use, i.e., "orange" is equivalent to 'orange'.
- However, if you have a text which contains a single quote, if you want to avoid Python
mistakenly interpreting as the end of the string, you have to enclose it in double quotes,
i.e., "I can't stand it" works as a full string, while 'I can't stand it' does not (t stand it will
be interpreted as code instead of a string).
- Vice versa for double quotes in a string, of course. Note that as discussed before, natural
language is conventionally enclosed in double quotes.
Integers
- Integers are whole numbers, which can be positive or negative (or zero).
- There are different ways of writing integers that result in the same value.
- For example 1 is the same as +1
This is different for strings. The string "1" is not the same as the string "+1" (the latter being the
character "+" and the character "1" ). Luckily, Python will not try to do calculations within
strings; that’d make it impossible for us to do things like print("1 + 1") for example.
Floats
Floats, or "floating-point numbers", are numbers with explicit decimals. For instance,
3.14159265 is a float, and so is 1.0000000). Note that a period is used as the decimal separator.
If there is an integer that for some reason you want to use as a float, one of the ways you can do
so while manually inputting numbers is by adding .0 to it. I.e., 13 is an integer, while 13.0 is a
float.
Expressions
An expression is a combination of one or more values (such as strings, integers, or floats) using
operators, which result in a new value. In other words, you can think of expressions as
calculations.
+ addition
- subtraction
* multiplication
/ division
// integer division
** power
% modulo
Integer division (also called "floor division"): division that rounds down to a whole number. If
there are floats in the calculation the result be a float, if there’s only integers an integer.
Modulo operator (%) takes the remainder of a division. What is a remainder? Given 15 and 4, 4
fits in 15 three times (for a total of 12). Whatever is left is the remainder (so 3). For: 7 % 3, 3 fits
in 7 two times, remainder is 1.
A select few of the operators given above can also be used for strings. In particular, you can use
the addition operator (+) to concatenate two strings, and you can use the multiplication operator
(*) with a number and a string to create a string that contains a repetition of the original string.
Check it out:
- print("hello" + "world")
- print(3 * "hello")
- print("arrowheads, " * 4 * 4)
You can’t add an integer to a string, multiply two strings, or subtract a string from a string (using
-, at least). Such use of the operators is undefined, and will give error messages. None of the
other operators listed for numbers will work on strings either.
Type Casting
A function has a name, and may have parameters between parentheses after the name.
The type casting functions take the parameter value between the parentheses as input and give
back a value that is (almost) the same as the input value, but of a different data type. The three
main type casting functions are the following:
- int ( ) will return the input value as an integer (rounding down if necessary);
- float ( ) will return the input value as a float (adding .0 if necessary); and
- str( ) will return the input value as a string.
In this chapter, you learned about:
- Using the print() function to display results;
- Data types: string, integer, and float;
- Calculations;
- Basic string expressions; and
- -Type casting between strings, integers, and floats, using str(), int(), and float().
Chapter 5 - Variables
Variables and values
A variable is a labeled place in the computer memory that you can use to store a value in. You can
choose the label yourself, and it is usually called the "variable name".
To create a variable (i.e., choose the variable name), you must "assign" it a value. The assign
operator is the equal symbol (=). On its left side you write the variable name, and to the right
the value that you want to store in the variable. This is best illustrated with an example:
x=5
print (x)
In the code block above, two things happen. First, we create a variable with the name x and give
it a value; in this case 5. This is called an "assignment". We then display the contents of the
variable x, using print(). Note that print() does not display the letter x, but actually displays the
value that was assigned to x.
The term "variable" means the variable name, i.e., the letter x on the box. The term "value" means
the value that is stored in the variable, i.e., the contents of the box.
To the right of the assign operator you can place anything that results in a value. Therefore, it
does not need to be a single number. It can be, for instance, a calculation, a string, or a call to a
function that results in a value (such as the int() function).
Variable names
However, you are free to choose the names of your variables as you like them, provided that you
follow a few simple rules, namely:
- A variable name must consist of only letters, digits, and/or underscores (_)
- A variable name must start with a letter or an underscore
- A variable name should not be a reserved word
Reserved Words
These are
Conventions
Programmers follow many conventions when choosing variable names. The major ones are:
- Never choose variable names that are also the names of functions (whether they are
functions provided by Python or functions they wrote themselves). Doing so will cause
the corresponding function to be no longer accessible by the code, and may then lead to
rather eccentric errors. If you need to (why though), one typically adds a _ after (e.g.,
print_).
- Try to choose variable names that are in some way meaningful to the code. For instance,
a variable that stores the number of seconds in a week, might have the name
secs_per_week, but not the name i_hate_my_job. It would be even worse to name a
variable that contains the numbers of seconds in a week secs_per_month.
- An exception to choosing meaningful variable names can be made when choosing names
for "throw-away" variables; i.e., variables that you only use in a very small section of the
code and that are no longer needed afterwards, and that have no good meaning by
themselves. One usually chooses a single-letter name for such variables. For instance, if a
variable is needed to quickly count to 100, after which it is not needed anymore,
programmers often choose the letter i or j for such a variable.
- To avoid confusion with capitals and lower case letters, only use lower case letters in
variable names. In Python this adheres to the style guidelines.
- If a variable name is chosen that consists of multiple words, put one underscore between
each of the words.
- Never choose variable names that start with an underscore. Such variable names are
considered reserved for the authors of the Python interpreter.
Constants
Many programming languages offer the ability to create "constants", which are values assigned
to a variable which can no longer be changed after the value has been first assigned. It is
convention in most such languages that the name of a constant is written in all capitals.
Soft Typing
You can use the function type () to see what the type of a variable is.
change it to
Shorthand Operators
Using the operators you have learned about above, you can change the variables in your code as
many times as you want. You can assign new values to existing variables. Very often, you want to
make changes to existing variables. For instance, it is common in code that you want to add 1 to a
number (you will find out why in a later chapter). Since this occurs fairly often, Python offers
some shorthand notation to deal with changes to variables.
If you want to add something to a variable, you can write += as the assignment operator and to
the right-hand side of the += the thing that you want to add to the variable.
Similar to the += operator, you can use -= to subtract something from a variable, *= to multiply a
variable by something, /= to divide a variable by something, **= to raise a variable to a power,
and %= to turn a variable into itself modulo the right-hand side. Most of these are uncommon,
except for the +=, which is used a lot, and the -=, which is used occasionally.
Comments
Since the code that you have to write has now increased to more than five lines or so, it has
become sufficiently complex to warrant discussing the use of comments. Comments are parts of
the code that Python ignores. Comments can be used to explain parts of the code; in this way,
they are not only useful to other people which might need to use or change your code, but also to
yourself, as you may need to change your own code some time after you wrote it and you might
not remember exactly what you did.
Remember that there are two main ways to include comments in Python code. The first is to use
a hash mark (#), which turns everything to the right of the hash mark on the line into
commentary (of course, this is only the case if the hash mark is not part of a string). The second
is to use triple double-quotes or triple single-quotes to indicate the start and end of some
commentary, which may be spread over multiple lines.
Each function has a name, and its name may consist of letters, digits, and underscores, and
cannot start with a digit.
Most functions are called parameters, these are places between the parentheses that follow the
function name.
calculate_shareholder_profit(initial_stock_1_value, new_stock_1_value)
# and
print("something")
print("something", "something else")
A function may or may not "return" (output) something. If a function returns a value, that value
can be used in your code. For instance, the function int() returns an integer representation of the
parameter it gets. You can place this return value in a variable, using an assignment:
- x = int(2.0)
Creating a function
So, when creating our own functions, we need to define the name of the function, its
parameters, and the value it returns. To create a function, you use the following syntax: