LEARN DATA SCIENCE ONLINE
Start Learning For Free - [Link]
Data Science Cheat Sheet
Python Basics
BASICS, PRINTING AND GETTING HELP
x = 3 - Assign 3 to the variable x help(x) - Show documentation for the str data type
print(x) - Print the value of x help(print) - Show documentation for the print() function
type(x) - Return the type of the variable x (in this case, int for integer)
READING FILES 3 ** 2 - Raise 3 to the power of 2 (or 32) def calculate(addition_one,addition_two,
f = open("my_file.txt","r") 27 ** (1/3) - The 3rd root of 27 (or 3√27) exponent=1,factor=1):
file_as_string = [Link]() x += 1 - Assign the value of x + 1 to x result = (value_one + value_two) ** exponent * factor
- Open the file my_file.txt and assign its x -= 1 - Assign the value of x - 1 to x return result
contents to s - Define a new function calculate with two
import csv L I STS required and two optional named arguments
f = open("my_dataset.csv","r") l = [100,21,88,3] - Assign a list containing the which calculates and returns a result.
csvreader = [Link](f) integers 100, 21, 88, and 3 to the variable l addition(3,5,factor=10) - Run the addition
csv_as_list = list(csvreader) l = list() - Create an empty list and assign the function with the values 3 and 5 and the named
- Open the CSV file my_dataset.csv and assign its result to l argument 10
data to the list of lists csv_as_list l[0] - Return the first value in the list l
l[-1] - Return the last value in the list l B O O L E A N C O M PA R I S O N S
ST R I N G S l[1:3] - Return a slice (list) containing the second x == 5 - Test whether x is equal to 5
s = "hello" - Assign the string "hello" to the and third values of l x != 5 - Test whether x is not equal to 5
variable s len(l) - Return the number of elements in l x > 5 - Test whether x is greater than 5
s = """She said, sum(l) - Return the sum of the values of l x < 5 - Test whether x is less than 5
"there's a good idea." min(l) - Return the minimum value from l x >= 5 - Test whether x is greater than or equal to 5
""" max(l) - Return the maximum value from l x <= 5 - Test whether x is less than or equal to 5
- Assign a multi-line string to the variable s. Also [Link](16) - Append the value 16 to the end of l x == 5 or name == "alfred" - Test whether x is
used to create strings that contain both " and ' [Link]() - Sort the items in l in ascending order equal to 5 or name is equal to "alfred"
characters " ".join(["A","B","C","D"]) - Converts the list x == 5 and name == "alfred" - Test whether x is
len(s) - Return the number of characters in s ["A", "B", "C", "D"] into the string "A B C D" equal to 5 and name is equal to "alfred"
[Link]("hel") - Test whether s starts with 5 in l - Checks whether the value 5 exists in the list l
the substring "hel" DICTIONARIES "GB" in d - Checks whether the value "GB" exists in
[Link]("lo") - Test whether s ends with the d = {"CA":"Canada","GB":"Great Britain", the keys for d
substring "lo" "IN":"India"} - Create a dictionary with keys of
"{} plus {} is {}".format(3,1,4) - Return the "CA", "GB", and "IN" and corresponding values I F STAT E M E N TS A N D LO O P S
string with the values 3, 1, and 4 inserted of of "Canada", "Great Britain", and "India" The body of if statements and loops are defined
[Link]("e","z") - Return a new string based d["GB"] - Return the value from the dictionary d through indentation.
on s with all occurances of "e" replaced with "z" that has the key "GB" if x > 5:
[Link](" ") - Split the string s into a list of [Link]("AU","Sorry") - Return the value from the print("{} is greater than five".format(x))
strings, separating on the character " " and dictionary d that has the key "AU", or the string elif x < 0:
return that list "Sorry" if the key "AU" is not found in d print("{} is negative".format(x))
[Link]() - Return a list of the keys from d else:
NUMERIC TYPES AND [Link]() - Return a list of the values from d print("{} is between zero and five".format(x))
M AT H E M AT I C A L O P E R AT I O N S [Link]() - Return a list of (key, value) pairs - Test the value of the variable x and run the code
i = int("5") - Convert the string "5" to the from d body based on the value
integer 5 and assign the result to i for value in l:
f = float("2.5") - Convert the string "2.5" to MODULES AND FUNCTIONS print(value)
the float value 2.5 and assign the result to f The body of a function is defined through - Iterate over each value in l, running the code in
5 + 5 - Addition indentation. the body of the loop with each iteration
5 - 5 - Subtraction import random - Import the module random while x < 10:
10 / 2 - Division from math import sqrt - Import the function x += 1
5 * 2 - Multiplication sqrt from the module math - Run the code in the body of the loop until the
value of x is no longer less than 10
LEARN DATA SCIENCE ONLINE
Start Learning For Free - [Link]
LEARN DATA SCIENCE ONLINE
Start Learning For Free - [Link]
Data Science Cheat Sheet
Python - Intermediate
KEY BASICS, PRINTING AND GETTING HELP
This cheat sheet assumes you are familiar with the content of our Python Basics Cheat Sheet
s - A Python string variable l - A Python list variable
i - A Python integer variable d - A Python dictionary variable
f - A Python float variable
L I STS len(my_set) - Returns the number of objects in now - wks4 - Return a datetime object
[Link](3) - Returns the fourth item from l and my_set (or, the number of unique values from l) representing the time 4 weeks prior to now
deletes it from the list a in my_set - Returns True if the value a exists in newyear_2020 = [Link](year=2020,
[Link](x) - Removes the first item in l that is my_set month=12, day=31) - Assign a datetime
equal to x object representing December 25, 2020 to
[Link]() - Reverses the order of the items in l REGULAR EXPRESSIONS newyear_2020
l[1::2] - Returns every second item from l, import re - Import the Regular Expressions module newyear_2020.strftime("%A, %b %d, %Y")
commencing from the 1st item [Link]("abc",s) - Returns a match object if - Returns "Thursday, Dec 31, 2020"
l[-5:] - Returns the last 5 items from l specific axis the regex "abc" is found in s, otherwise None [Link]('Dec 31, 2020',"%b
[Link]("abc","xyz",s) - Returns a string where %d, %Y") - Return a datetime object
ST R I N G S all instances matching regex "abc" are replaced representing December 31, 2020
[Link]() - Returns a lowercase version of s by "xyz"
[Link]() - Returns s with the first letter of every RANDOM
word capitalized L I ST C O M P R E H E N S I O N import random - Import the random module
"23".zfill(4) - Returns "0023" by left-filling the A one-line expression of a for loop [Link]() - Returns a random float
string with 0’s to make it’s length 4. [i ** 2 for i in range(10)] - Returns a list of between 0.0 and 1.0
[Link]() - Returns a list by splitting the the squares of values from 0 to 9 [Link](0,10) - Returns a random
string on any newline characters. [[Link]() for s in l_strings] - Returns the integer between 0 and 10
Python strings share some common methods with lists list l_strings, with each item having had the [Link](l) - Returns a random item from
s[:5] - Returns the first 5 characters of s .lower() method applied the list l
"fri" + "end" - Returns "friend" [i for i in l_floats if i < 0.5] - Returns
"end" in s - Returns True if the substring "end" the items from l_floats that are less than 0.5 COUNTER
is found in s from collections import Counter - Import the
F U N C T I O N S F O R LO O P I N G Counter class
RANGE for i, value in enumerate(l): c = Counter(l) - Assign a Counter (dict-like)
Range objects are useful for creating sequences of print("The value of item {} is {}". object with the counts of each unique item from
integers for looping. format(i,value)) l, to c
range(5) - Returns a sequence from 0 to 4 - Iterate over the list l, printing the index location c.most_common(3) - Return the 3 most common
range(2000,2018) - Returns a sequence from 2000 of each item and its value items from l
to 2017 for one, two in zip(l_one,l_two):
range(0,11,2) - Returns a sequence from 0 to 10, print("one: {}, two: {}".format(one,two)) T RY/ E XC E P T
with each item incrementing by 2 - Iterate over two lists, l_one and l_two and print Catch and deal with Errors
range(0,-10,-1) - Returns a sequence from 0 to -9 each value l_ints = [1, 2, 3, "", 5] - Assign a list of
list(range(5)) - Returns a list from 0 to 4 while x < 10: integers with one missing value to l_ints
x += 1 l_floats = []
DICTIONARIES - Run the code in the body of the loop until the for i in l_ints:
max(d, key=[Link]) - Return the key that value of x is no longer less than 10 try:
corresponds to the largest value in d l_floats.append(float(i))
min(d, key=[Link]) - Return the key that DAT E T I M E except:
corresponds to the smallest value in d import datetime as dt - Import the datetime l_floats.append(i)
module - Convert each value of l_ints to a float, catching
S E TS now = [Link]() - Assign datetime and handling ValueError: could not convert
my_set = set(l) - Return a set object containing object representing the current time to now string to float: where values are missing.
the unique values from l wks4 = [Link](weeks=4)
- Assign a timedelta object representing a
timespan of 4 weeks to wks4
LEARN DATA SCIENCE ONLINE
Start Learning For Free - [Link]
LEARN DATA SCIENCE ONLINE
Start Learning For Free - [Link]
Data Science Cheat Sheet
NumPy
KEY IMPORTS
We’ll use shorthand in this cheat sheet Import these to start
arr - A numpy Array object import numpy as np
I M P O RT I N G/ E X P O RT I N G arr.T - Transposes arr (rows become columns and S C A L A R M AT H
[Link]('[Link]') - From a text file vice versa) [Link](arr,1) - Add 1 to each array element
[Link]('[Link]',delimiter=',') [Link](3,4) - Reshapes arr to 3 rows, 4 [Link](arr,2) - Subtract 2 from each array
- From a CSV file columns without changing data element
[Link]('[Link]',arr,delimiter=' ') [Link]((5,6)) - Changes arr shape to 5x6 [Link](arr,3) - Multiply each array
- Writes to a text file and fills new values with 0 element by 3
[Link]('[Link]',arr,delimiter=',') [Link](arr,4) - Divide each array element by
- Writes to a CSV file A D D I N G/ R E M OV I N G E L E M E N TS 4 (returns [Link] for division by zero)
[Link](arr,values) - Appends values to end [Link](arr,5) - Raise each array element to
C R E AT I N G A R R AYS of arr the 5th power
[Link]([1,2,3]) - One dimensional array [Link](arr,2,values) - Inserts values into
[Link]([(1,2,3),(4,5,6)]) - Two dimensional arr before index 2 V E C TO R M AT H
array [Link](arr,3,axis=0) - Deletes row on index [Link](arr1,arr2) - Elementwise add arr2 to
[Link](3) - 1D array of length 3 all values 0 3 of arr arr1
[Link]((3,4)) - 3x4 array with all values 1 [Link](arr,4,axis=1) - Deletes column on [Link](arr1,arr2) - Elementwise subtract
[Link](5) - 5x5 array of 0 with 1 on diagonal index 4 of arr arr2 from arr1
(Identity matrix) [Link](arr1,arr2) - Elementwise multiply
[Link](0,100,6) - Array of 6 evenly divided C O M B I N I N G/S P L I T T I N G arr1 by arr2
values from 0 to 100 [Link]((arr1,arr2),axis=0) - Adds [Link](arr1,arr2) - Elementwise divide arr1
[Link](0,10,3) - Array of values from 0 to less arr2 as rows to the end of arr1 by arr2
than 10 with step 3 (eg [0,3,6,9]) [Link]((arr1,arr2),axis=1) - Adds [Link](arr1,arr2) - Elementwise raise arr1
[Link]((2,3),8) - 2x3 array with all values 8 arr2 as columns to end of arr1 raised to the power of arr2
[Link](4,5) - 4x5 array of random floats [Link](arr,3) - Splits arr into 3 sub-arrays np.array_equal(arr1,arr2) - Returns True if the
between 0-1 [Link](arr,5) - Splits arr horizontally on the arrays have the same elements and shape
[Link](6,7)*100 - 6x7 array of random 5th index [Link](arr) - Square root of each element in the
floats between 0-100 array
[Link](5,size=(2,3)) - 2x3 array I N D E X I N G/S L I C I N G/S U B S E T T I N G [Link](arr) - Sine of each element in the array
with random ints between 0-4 arr[5] - Returns the element at index 5 [Link](arr) - Natural log of each element in the
arr[2,5] - Returns the 2D array element on index array
I N S P E C T I N G P R O P E RT I E S [2][5] [Link](arr) - Absolute value of each element in
[Link] - Returns number of elements in arr arr[1]=4 - Assigns array element on index 1 the the array
[Link] - Returns dimensions of arr (rows, value 4 [Link](arr) - Rounds up to the nearest int
columns) arr[1,3]=10 - Assigns array element on index [Link](arr) - Rounds down to the nearest int
[Link] - Returns type of elements in arr [1][3] the value 10 [Link](arr) - Rounds to the nearest int
[Link](dtype) - Convert arr elements to arr[0:3] - Returns the elements at indices 0,1,2
type dtype (On a 2D array: returns rows 0,1,2) STAT I ST I C S
[Link]() - Convert arr to a Python list arr[0:3,4] - Returns the elements on rows 0,1,2 [Link](arr,axis=0) - Returns mean along
[Link]([Link]) - View documentation for at column 4 specific axis
[Link] arr[:2] - Returns the elements at indices 0,1 (On [Link]() - Returns sum of arr
a 2D array: returns rows 0,1) [Link]() - Returns minimum value of arr
C O P Y I N G/S O RT I N G/ R E S H A P I N G arr[:,1] - Returns the elements at index 1 on all [Link](axis=0) - Returns maximum value of
[Link](arr) - Copies arr to new memory rows specific axis
[Link](dtype) - Creates view of arr elements arr<5 - Returns an array with boolean values [Link](arr) - Returns the variance of array
with type dtype (arr1<3) & (arr2>5) - Returns an array with [Link](arr,axis=1) - Returns the standard
[Link]() - Sorts arr boolean values deviation of specific axis
[Link](axis=0) - Sorts specific axis of arr ~arr - Inverts a boolean array [Link]() - Returns correlation coefficient
two_d_arr.flatten() - Flattens 2D array arr[arr<5] - Returns array elements smaller than 5 of array
two_d_arr to 1D
LEARN DATA SCIENCE ONLINE
Start Learning For Free - [Link]
LEARN DATA SCIENCE ONLINE
Start Learning For Free - [Link]
Data Science Cheat Sheet
Python Regular Expressions
S P E C I A L C H A R AC T E R S \A | Matches the expression to its right at the (?:A) | Matches the expression as represented
^ | Matches the expression to its right at the absolute start of a string whether in single by A, but unlike (?PAB), it cannot be
start of a string. It matches every such or multi-line mode. retrieved afterwards.
instance before each \n in the string. \Z | Matches the expression to its left at the (?#...) | A comment. Contents are for us to
$ | Matches the expression to its left at the absolute end of a string whether in single read, not for matching.
end of a string. It matches every such or multi-line mode. A(?=B) | Lookahead assertion. This matches
instance before each \n in the string. the expression A only if it is followed by B.
. | Matches any character except line A(?!B) | Negative lookahead assertion. This
terminators like \n. S E TS matches the expression A only if it is not
\ | Escapes special characters or denotes [ ] | Contains a set of characters to match. followed by B.
character classes. [amk] | Matches either a, m, or k. It does not (?<=B)A | Positive lookbehind assertion.
A|B | Matches expression A or B. If A is match amk. This matches the expression A only if B
matched first, B is left untried. [a-z] | Matches any alphabet from a to z. is immediately to its left. This can only
+ | Greedily matches the expression to its left 1 [a\-z] | Matches a, -, or z. It matches - matched fixed length expressions.
or more times. because \ escapes it. (?<!B)A | Negative lookbehind assertion.
* | Greedily matches the expression to its left [a-] | Matches a or -, because - is not being This matches the expression A only if B is
0 or more times. used to indicate a series of characters. not immediately to its left. This can only
? | Greedily matches the expression to its left [-a] | As above, matches a or -. matched fixed length expressions.
0 or 1 times. But if ? is added to qualifiers [a-z0-9] | Matches characters from a to z (?P=name) | Matches the expression matched
(+, *, and ? itself) it will perform matches in and also from 0 to 9. by an earlier group named “name”.
a non-greedy manner. [(+*)] | Special characters become literal (...)\1 | The number 1 corresponds to
{m} | Matches the expression to its left m inside a set, so this matches (, +, *, and ). the first group to be matched. If we want
times, and not less. [^ab5] | Adding ^ excludes any character in to match more instances of the same
{m,n} | Matches the expression to its left m to the set. Here, it matches characters that are expression, simply use its number instead of
n times, and not less. not a, b, or 5. writing out the whole expression again. We
{m,n}? | Matches the expression to its left m can use from 1 up to 99 such groups and
times, and ignores n. See ? above. their corresponding numbers.
GROUPS
( ) | Matches the expression inside the
C H A R AC T E R C L AS S E S parentheses and groups it. POPULAR PYTHON RE MODULE
( A. K.A. S P E C I A L S E Q U E N C E S) (?) | Inside parentheses like this, ? acts as an FUNCTIONS
\w | Matches alphanumeric characters, which extension notation. Its meaning depends on [Link](A, B) | Matches all instances
means a-z, A-Z, and 0-9. It also matches the character immediately to its right. of an expression A in a string B and returns
the underscore, _. (?PAB) | Matches the expression AB, and it them in a list.
\d | Matches digits, which means 0-9. can be accessed with the group name. [Link](A, B) | Matches the first instance
\D | Matches any non-digits. (?aiLmsux) | Here, a, i, L, m, s, u, and x are of an expression A in a string B, and returns
\s | Matches whitespace characters, which flags: it as a re match object.
include the \t, \n, \r, and space characters. a — Matches ASCII only [Link](A, B) | Split a string B into a list
\S | Matches non-whitespace characters. i — Ignore case using the delimiter A.
\b | Matches the boundary (or empty string) L — Locale dependent [Link](A, B, C) | Replace A with B in the
at the start and end of a word, that is, m — Multi-line string C.
between \w and \W. s — Matches all
\B | Matches where \b does not, that is, the u — Matches unicode
boundary of \w characters. x — Verbose
LEARN DATA SCIENCE ONLINE
Start Learning For Free - [Link]