DA Using Python - Original - 17th
DA Using Python - Original - 17th
Programming Language
https://colab.research.google.com/drive/1D-7TyR_I30rlBdBoRO8xvJSIVhbvT4PJ#scrollTo=6pfTuQFHRWaE
Installation:
• There are many interpreters available freely to run Python scripts like IDLE (Integrated
• Development Environment) which is installed when you install the python software
• from http://python.org/downloads/
• Steps to be followed and remembered:
• Step 1: Select Version of Python to Install.
• Step 2: Download Python Executable Installer.
• Step 3: Run Executable Installer.
• Step 4: Verify Python Was Installed On Windows.
• Step 5: Verify Pip Was Installed.
• Step 6: Add Python Path to Environment Variables (Optional)
HISTORY OF PYTHON PROGRAMMING LANGUAGE
• Guido van Rossum in 1991 & Informatica Netherland, developed by Python Software Foundation.
• He was the successor to the ABC programming Language which was capable of exception
handling and interfacing with the Amoeba operating system
• He is the “Benevolent Dictator For Life” (BDFL)
• The logo of python depicts the picture of two snakes, blue and yellow and came to
implementation in December 1989.
• Code readability and Syntax allows programmers to write in fewer lines of code
• Python will let you assign the same value to multiple variables in one statement. It will also let
you assign values to multiple variables at once.
• Unlike Java and C++, Python does not use braces to delimit code. Indentation is mandatory with
Python.
• IDE Software for Python
THRUST AREAS OF PYTHON
Since 2003, Python has been consistently ranked in the top ten most popular
programming languages as measured by the TIOBE Programming Community
Index.
The Zen of Python
1. Data Science
2. Automation
3. Application Development
4. AI & Machine Learning
5. Audio/Video Applications
6. Console Applications
7. Desktop GUI
1. Python is object-oriented
• Structure supports such concepts as polymorphism, operation overloading and multiple inheritance.
2. Indentation
• Indentation is one of the greatest feature in python
3. It’s free (open source)
• Downloading python and installing python is free and easy
4. It’s Powerful
• Dynamic typing
• Built-in types and tools
• Library utilities
• Third party utilities (e.g. Numeric, NumPy, sciPy)
• Automatic memory management
5. It’s Portable
• Python runs virtually every major platform used today
• As long as you have a compatible python interpreter installed, python
• programs will run in exactly the same manner, irrespective of platform.
6. It’s easy to use and learn
• No intermediate compile
• Python Programs are compiled automatically to an intermediate form called byte code, which the interpreter
then reads.
• This gives python the development speed of an interpreter without the performance loss inherent in purely
interpreted languages.
• Structure and syntax are pretty intuitive and easy to grasp.
7. Interpreted Language
• Python is processed at runtime by python Interpreter
8. Interactive Programming Language
• Users can interact with the python interpreter directly for writing the programs
9. Straight forward syntax
• The formation of python syntax is simple and straight forward which also makes it popular.
Data vs. Information
Data and information are both critical elements in business decision-making. By understanding how these
components work together, you can move your business toward a more data- and insights-driven culture.
Organizations that prioritize collecting data, interpreting it, and putting that information to use can realize
significant benefits. It is a comprehensive understanding of their target audience, which can help them make
decisions about future offerings, branding, and communication preferences.
For example, a company might gather data about the performance of their ads or content. They could organize
and interpret that data to produce a wealth of insights, like what types of graphics, phrases, and even products
are most appealing to their customer base.
Data Information
Data is an individual unit that do not carry any specific Information is a group of data that collectively carries a
meaning logical meaning
Collection of facts, measured in Bits/Bytes Facts in context/Measured in meaningful units
Raw and unorganized Structured & organized
Data on its own, is meaningless When analysed and interpreted – is meaningful
Data is independent Information is dependent on data
Data isn’t sufficient for decision-making Decisions are based on information
Ex: a single customer’s bill amount is data Ex: collect and interpret multiple bills over a range of time
Business Analytics: is the process of iterative exploration and investigation of past business data to
identify trends, patterns and root causes and make data driven business decisions using insights
from historical data.
Examples:
Starbucks uses Data analytics to make strategic decisions.
Delta Air Lines uses Data analysis to improve customer experiences.
Characteristics of Data:
Here, Color is a variable (an identifier) which holds the value “Green”, we cannot use keywords as
variable names.
import keyword
print(keyword.kwlist)
Difference between a Compiler and Interpreter
Syntax
Syntax refers to the rules and conventions that define the structure of a language.
Syntax in computer programming means the rules that control the structure of the symbols, punctuation, and words
of a programming language.
Compilers convert programming languages like Java or C++ into binary code that computers can understand. If
the syntax is incorrect, the code will not compile.
Interpreters execute programming languages such as JavaScript or Python at runtime. The incorrect syntax will
cause the code to fail.
Object code refers to low level code which is understandable by machine. Object code is generated from
source code after going through compiler or other translator. It is in executable machine code
format. Object code contains a sequence of machine understandable instructions to which Central
Processing Unit understands and executes. Eg: common object file format (COFF), COM files and “.exe”
files.
Algorithm and Pseudocode are the two related terms in computer programming.
The basic difference between algorithm and pseudocode is that an algorithm is a step-by-step procedure
developed to solve a problem.
An algorithm follows a systematic and a logical approach and consists of sequences, iterations, selections, etc.
The selection of an algorithm depends upon the nature of the given problem. Thus, the problem is first analyzed,
and then the best algorithm is used to solve it.
While a Pseudocode is a technique of developing an algorithm. Thus, computer programmers use simple informal
language to write a pseudocode. It does not have any specific syntax to follow. The pseudocode is a text based
design tool.
Basically, pseudocode represents an algorithm to solve a problem in natural language and mathematical notations.
they use short phrases to represent the functionalities that the specific lines of code would do. Since there is no
strict syntax to follow in pseudocode writing, they are relatively difficult to debug.
Algorithm Pseudocode
It is defined as a sequence of well-defined steps. These steps It can be understood as one of the methods that helps in the
provide a solution/ a way to solve a problem in hand. representation of an algorithm.
It is a systematic, and a logical approach, where the procedure It is a simpler version of coding in a programming language.
is defined step-wise
Algorithms can be represented using natural language, It is written in plain English, and uses short phrases to write
flowchart and so on. the functionalities that s specific line of code would do.
This solution would be translated to machine code, which is There is no specific syntax which is actually present in other
then executed by the system to give the relevant output. programming languages. This means it can't be executed on a
computer.
Many simple operations are combined to help form a more There are many formats that could be used to write pseudo-
complicated operation, which is performed with ease by the codes.
computer
It can be understood as the pseudocode for a program. Pseudocode is not actually a programming language.
Plain text is used. Control structures such as 'while', 'if-then else', 'repeat-until',
and so on can be used.
In Python, there is no need to declare a variable explicitly by specifying whether the variable is an integer or a float or
any other type.
If a need for variable arises you need to think of a variable name based on the rules mentioned and use it in the
program.
Statements:
A statement is an instruction that the Python interpreter can execute. Python program consists of a sequence of
statements.
We have normally two basic statements, the assignment statement and the print statement.
An print statement is something which is an input from the user, to be printed / displayed on to the screen (or )
monitor.
>>> print(“The value of x is”, x)
Some other kinds of statements that are if statements, while statements, and for statements generally called as
control flows.
Expression is an arrangement of values and operators which are evaluated to make a new value. Expressions are
statements as well.
x= x + 20
Operators:
Operators are symbols, such as +, –, =, >, and <, that perform certain mathematical
or logical operation to manipulate data values and produce a result based on some
rules.
1. Arithmetic Operators
2. Assignment Operators
3. Comparison Operators
4. Logical Operators
5. Bitwise Operator
Arithmetic Operators
Example:
x=6
y=2
print(x / y)=3.0
print(x % y)=0
print(x**y)=36
print(x//y)=3
Assignment Operators
Operators are used to perform operations on values and variables. These are the special symbols that carry out
arithmetic, logical, bitwise computations. The value the operator operates on is known as Operand.
Output
Precedence of Operators:
Operator precedence affects how an expression is evaluated.
For example, x = 7 + 3 * 2; here, x is assigned 13, not 20 because operator * has higher precedence than +, so it
first multiplies 3*2 and then adds into 7.
Example:
Output
Comments:
Single-line comments begins with a hash(#) symbol and is useful in mentioning that the whole line should be
considered as a comment until the end of line.
A Multi line comment is useful when we need to comment on many lines. In python, triple double quote(“ “ “)
and single quote(‘ ‘ ‘)are used for multi-line commenting.
f-strings
A new string formatting mechanism known as Literal String Interpolation or more commonly as F-strings.
The idea behind f-strings is to make string interpolation simpler. **A f-string is a string literal that is prefixed with
“f”. These strings may contain replacement fields, which are expressions enclosed within curly braces {}.
F-strings provide a concise and convenient way to embed python expressions inside string literals for formatting.
Literals in Python is defined as the data assigned to variables or constants while programming.
Output
Output
Run below code:
Output
String format() Method:
The format() method uses its arguments to substitute an appropriate value for each format code in the
template.
The format() method formats the specified value(s) and insert them inside the string's placeholder.
Syntax:
string.format(p0, p1, ..., k0=v0, k1=v1, ...)
if...elif...else Statement
if condition1:
# code block 1
elif condition2:
# code block 2
else:
# code block 3
if... Statement if...else
if...elif…else
For Loops
A for loop is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set,
or a string).
With the for loop we can execute a set of statements, once for each item in a list, tuple, set
etc.
The while Loop
With the while loop we can execute a set of statements as long as a condition is true.
range() function:
The range() function is a built-in-function used in python, it is used to generate a sequence of numbers. If the user
wants to generate a sequence of numbers given the starting and the ending values then they can give these values as
parameters of the range() function. The range function does not work for floating-point values, because according to the
range function the floating object cannot be interpreted as an integer.
To loop through a set of code a specified number of times, we can use the range() function,
The range() function returns a sequence of numbers, starting from 0 by default, and increments by 1 (by default), and
ends at a specified number.
Output
Develop a Python Program to separate the given list elements into a list of odd and even numbers.
Run below code: Run below code: Run below code:
Output
Output
Output
Parameters and Arguments:
Parameters are passed during the definition of function while Arguments are passed during the function
call.
There are three types of Python function arguments using which we can call a function.
1. Default Arguments
2. Positional Arguments
3. Keyword Arguments
User defined functions
Functions that we define ourselves to do the certain specific task are referred to as user-defined
functions. Functions that readily comes with Python are called built-in functions. Python provides
built-in functions like print(), etc. but we can also create your own functions. These functions are
known as user defines functions.
Functions that readily come with Python are called built-in functions. If we use functions written by
others in the form of the library, it can be termed as library functions.
All the other functions that we write on our own fall under user-defined functions. So, our user-
defined function could be a library function to someone else.
In a function, arguments can have default values. We assign default values to the argument using the ‘=’ (assignment)
operator at the time of function definition
The age and name parameters do not have default values and are required (mandatory) during a function call.
Example:
In the positional argument number and position of arguments must be matched. If we change the number of arguments,
we will get an error.
all the keyword arguments should match the parameters in the function definition.
When we call functions, the order (position) of the arguments can be changed.
Python also defines expressions only contain identifiers, literals, and operators.
Identifiers: Any name that is used to define a class, function, variable module, or object is an identifier.
Literals: These are language-independent terms in Python and should exist independently in any programming
language. In Python, there are the string literals, byte literals, integer literals, floating point literals, and imaginary
literals.
Operators: In Python you can implement the following operations using the corresponding tokens.
Data Types
A data type, in programming, is a classification that specifies which type of value a variable has and what type of
mathematical, relational or logical operations can be applied to it without causing an error.
A data type is a classification of data which tells the compiler or interpreter how the programmer intends to use the data.
Most programming languages support various types of data, including integer, real, character or string, and Boolean.
Identify the type of a variable when it is declared. Identify the type of return value of a function. Identify the type of
parameter expected by a function.
Output
Output
Output
2. Python String Data Type
The string is a sequence of characters. Python supports Unicode characters. Generally, strings are represented by either
single or double-quotes.
Output
Output
4. Python Tuple
The tuple is another data type which is a sequence of data similar to a list.
But it is immutable. That means data in a tuple is write-protected.
5. Python Dictionary
Python Dictionary is an unordered sequence of data of key-value pair form. It is similar to the hash table type.
Dictionaries are written within curly braces in the form key: value . It is very useful to retrieve data in an optimized way
among a large amount of data.
Output
6. Python Sets
Sets are used to store multiple items in a single variable. A Set is declared using curly brackets.
Set is one of 4 built-in data types in Python used to store collections of data. All elements in a
set are unique and are not arranged in an ordered sequence as Sets are randomized. The set
objects are not subscriptable so slicing syntax is not applicable.
Output
# statement is a Compound assignment operator
Output
Output
Output
sum( ) function:
Returns the sum of the items in the given sequence, and it is best suited to take in a sequence of integers as input.
Run below code:
Output
Output
abs( ) function:
The python abs() function is used to return absolute value of a number. It takes only one argument, a number whose
absolute value is to be returned.
The argument can be an integer and floating point number. If the argument is a complex number, then, abs() returns its
magnitude.
abs (num)
num - A number whose absolute value is to be returned, the number cold be an integer, floating, complex number.
len( ) :
It’s a built-in function that is used when working with data in Python programming as it takes in a sequence or
collection as input and returns the length of the given object.
Run below code:
Output
Output
Practice:
Write a Program to find how many ‘Seconds’ make a day, ‘Seconds’
are in a week, & calculate the Seconds in a Month of 30 days, and
Print the Output for the same.
Run below code:
Output:
READING INPUT
➢ In Python, input() function is used to gather data from the user. The syntax for input function is,
Run below code: Run below code:
Output Output
Output
Output
PRINT OUTPUT
➢ The print() function allows a program to display text onto the console.
➢ The print function will print everything as strings and anything that is not already a string is automatically converted to
its string representation.
Output
Run below code: Run below code: max() function and min() function
Output
Output
abs ( ) function:
Takes in numeric value as input and returns the absolute Output
value of output.
Output
Output
Output
Output
Output Output
Strings: Methods
Run below code:
Output
Output
Output
String Slices
The "slice" syntax is a handy way to refer to sub-parts of sequences -- typically strings and lists.
The slice s[start:end] is the elements beginning at start and extending up to but not including end .
Run below code: Run below code:
Run below code:
Output
Output Output
Run below code:
Run below code: Run below code:
Output
Output
Output
Changeable, means that we can change, add or remove items after the data structure has been
created. Note: Set items are unchangeable, but you can remove items and add new items. A tuple is a
collection which is ordered and unchangeable.
Matrices are rectangular arrays consisting of numbers and are an example of 2nd-order tensors. If m and n are
positive integers, that is m, n ∈ ℕ then the m×n matrix contains m*n numbers, with m rows and n columns. In Python, We
use numpy library which helps us in creating ndimensional arrays.
Tensors: The more general entity of a tensor encapsulates the scalar, vector and the matrix. We use Python libraries like
tensorflow or PyTorch in order to declare tensors, rather than nesting matrices. Tensors are multi-dimensional arrays with
a uniform type (called a dtype). All tensors are immutable like Python numbers and strings: you can never update the
contents of a tensor, only create a new one.
LISTS:
The *if* construct -- if var in list
Run below code:
Run below code:
Output
Output
Output
Output
List Methods
•list.append(elem) -- adds a single element to the end of the list. Common error: does not return the new list, just modifies the
original.
•list.insert(index, elem) -- inserts the element at the given index, shifting elements to the right.
•list.extend(list2) adds the elements in list2 to the end of the list. Using + or += on a list is similar to using extend().
•list.index(elem) -- searches for the given element from the start of the list and returns its index.
•list.remove(elem) -- searches for the first instance of the given element and removes it (throws ValueError if not present)
•list.sort() -- sorts the list in place (does not return it). (The sorted() function shown later is preferred.)
•list.pop(index) -- removes and returns the element at the given index. Returns the rightmost element if index is omitted
(roughly the opposite of append()).
The *if elif else* construct
The *for* construct -- for var in list
Run below code: Run below code:
Output
Output
list.append(elem) list.insert(index, elem) list.extend(list2)
Run below code: Run below code: Run below code:
Output
Output Output
list.remove(elem) list.remove(elem)
list.index(elem) Run below code: Run Below code:
Run below code:
Output
Output
Output
Output
Output
Develop a Python Program to separate the given list elements into a list of odd and even numbers.
Run below code: Run below code: Run below code:
Output
Output
Output
Develop a Python Program to separate the given list elements into a list of odd and even numbers.
Run below code: Run below code: Run below code:
Output
Output
Output
Write a program using python to check whether an item exists within a tuple. If the item is present print the index of
the item else print a message “sorry not found”
Write a Python program to generate a Fibonacci Series:
Output
Write a Python program to sum all the items in a list without using sum() :
Sorting: The easiest way to sort is with the sorted(list) function, which takes a list and returns a new
list with those elements in sorted order. However, original list elements is not changed. List.sort() method
is an alternative to sorted() function.
ListQ=[10, 32, 21, 12, 14, 8, 1]
list.sort() sorted()
Run below code: Run below code:
Output Output
list.reverse()
Run below code: List.reverse()
Run below code:
Output
Output
sorted()
Run below code: specifying key=len (the built in len() function) sorts the
strings by length, from shortest to longest.
Output
sorted()
Run below code:
Furthermore, we represent them by writing the elements inside the parenthesis () separated by commas.
We can also define tuples as lists that we cannot change. Therefore, we can call them immutable tuples.
Hence, tuples are not modifiable in nature.
These immutable tuples are a kind of group data type. Moreover, we access elements by using the index
starting from zero. The natural python representation would be a list of tuples, where each tuple is size 3
holding one (x, y, z) group.
Method Description
count() Returns the number of times a specified value occurs in a tuple
index() Searches the tuple for a specified value and returns the position of where
it was found
Tuple with integers as elements: Tuple with mixed data type
Run below code: Run below code:
Output Output
Output
Count () Method:
The count() method returns the number of times a specified value appears in the tuple.
Syntax:
tuple.count(value)
index() Method:
Output
Write a Python program to sum all the items in a list without using sum() :
Sets
the Sets are unordered group of elements which means that the sets are randomized and the order
of elements may not be the same coming out of the Set.
Sets are quite similar to Lists but the elements of the Set are immutable, cannot be changed once
they are declared, however we can add/remove elements from the Set.
Another distinct characteristic of a set is that it may include elements of different types. This means
you can have a group of numbers, strings, and even tuples, all in the same set.
the pop() method removes and returns a random element from the set.
When there is a single expression in the function definition, and you want to write a
clear syntax with few lines of code.
f=
“f” is Function object that accepts and stores the result of expression.
The Lambda function is often used
with filter(), map(), reduce(), sorted() and apply() functions.
The filter() function is a Python built-in function. It selects elements of a given iterable that satisfy
a function definition applied to every element.
Syntax: filter(function, iterable)
The map() function is also a Python built-in function. It transforms the elements of a given iterable into
new values. In the example, the map() function is used to calculates the squares (i.e., transforms to new
values) of the elements in a given iterable.
Syntax: map(function, iterable)
The reduce() function is not a Python built-in function. It is in the functools module and requires
an additional import. It reduces the elements of a given iterable to a single value.
Syntax: reduce(lambda x, y: x *
y, iterable) If you want multiplication of all the elements
instead, change the Lambda function definition
to
lambda x, y: x * y
Lambda with sorted() It can be used to sort the elements of an iterable in ascending or descending
(reverse) order.
Syntax: sorted(iterable, key, reverse)
The key argument is optional and takes a function. We can use a Lamba function for this.
Here, each element in the iterable is sorted based on its remainder after dividing by 5. This criterion is given by
the Lamda function in the key argument.
A dictionary is a collection which is ordered*, changeable and does not allow duplicates.
The iterator stops when the shortest input iterable is exhausted. Using zip() doesn't throw errors when all
iterables are of potentially different lengths. With a single iterable argument, it returns an iterator of 1-
tuples. With no arguments, it returns an empty iterator.
PRACTICE
Given a list of fruits, their price per kilo and the quantities in kilos that you purchased,
the total amount spent on each item is printed out.
Use the for loop, eg: for fruit, price, quantity in zip(fruits, prices, quantities): ,
and f-string literal statement to print the output.
Output
Output
Slicing in Python:
is a feature that enables accessing parts of sequences like strings, tuples, and lists.
You can also use them to modify or delete the items of mutable sequences such as lists.
Slices can also be applied on third-party objects like NumPy arrays, as well as Pandas series and data
frames.
Slicing is similar to indexing but returns a sequence of items instead of a single item. The indices used for
slicing are also zero-based.
When you use the syntax sequence [start:stop], you’ll get the new sequence. It will start with the item that
has the index start (inclusive) and end before the item with the index stop.
You can use indices to modify the items of the mutable sequence. strings and tuples are immutable, but lists
are mutable.
You can access a single item of a sequence (such as String, Tuple, or List). Indices are zero based, it means
that the first item (leftmost) corresponds to index 0, index 1 to second item, and so on.
There are two variants of the slicing syntax: sequence[start:stop] and
sequence[start:stop:step]
When you omit start, the value 0 is taken by default, therefore starts at the beginning of original sequence.
If you omit to stop, the resulting sequence stops at the end of the original sequence.
We can also apply negative values of start and stop similarly as with indices.
Indexing in Python:
Indices are provided inside the brackets, that is with the syntax: sequence[index], and we can use the
negative integer indices when you want to access the last item of a sequence.
Syntax: sequence[index]
NumPy Library:
Contains powerful tools for manipulation of numerical data in Python.
Has several functions that are useful for performing mathematical and logical operations on arrays with
various dimensions.
NumPy Array:
Returns a numpy array containing the items from the specified input
# import NumPy Library
Import NumPy as np
array0 = np.array([])
array0
Output
#Create a two dimensional numpy array from list2 Run below code:
Run below code:
Output
Output
Allows us to create a NumPy array that contains numbers from the interval that you specify.
• Start indicates the start of the interval, and is included in the interval. If Start is not provided, the interval
starts at 0 by default.
• Stop is a required input, indicates the end of the interval, and is excluded from the interval.
• Step is an optional input, and indicates the spacing between the numbers in the interval. However, if Step is
not provided the default spacing is 1.
#create a numpy array containing integers 0 to 9 including 0 and 9
Run below code:
Output
Output
PRACTICE:
The shape provided can be either an integer or a tuple of two positive integers.
If the shape is given as an integer, this function will return a one dimensional array containing as many zero’s specified.
If the shape is given as a tuple, the first number in the tuple indicates the number of rows , the second number indicates
the number of columns, and this function will return a multi-dimensional numpy array, that has many rows and columns
as specified by the shape and contains all zeros.
Run below code: Run below code:
Output
Output
Output
numpy.random.rand()
The numpy.random.rand() function creates an array of specified shape and fills it with random values.
Syntax :
numpy.random.rand(d0, d1, ..., dn)
Parameters :
d0, d1, ..., dn : [int, optional] Dimension of the returned array we require,
If no argument is given a single Python float is returned.
NumPy: Random Plot
PANDAS Library:
• Powerful Open source tool used by data analysts and data scientists
• Provides data structures that make working with data easy and intuitive
• Contains functions that enable a user to store data, manipulate data, access data, analyze data
DataFrame is a two-dimensional labeled data structure with columns that can hold any data type.
Series object: an ordered, one-dimensional array of data with an index. All the data in a Series is of the same data
type. Pandas Series.to_frame() function is used to convert the given series object to a dataframe.. It is a one-dimensional
labeled array a capable of holding any data type.
#pandas.Series
Syntax:
series = pd.Series(data=None, index=None, dtype=None, name=None)
DataFrame object: The pandas DataFrame is a two-dimensional table of data with column and row indexes. The
columns are made up of pandas Series objects. DataFrame is a two-dimensional labeled data structure with columns of
potentially different types.
Pandas Series Pandas DataFrame
One-dimensional Two-dimensional
Homogenous – Series elements must be of the same Heterogeneous – DataFrame elements can have
data type. different data types.
Size-immutable – Once created, the size of a Series Size-mutable – Elements can be dropped or added
object cannot be changed. in an existing DataFrame.
Series.to_frame() function to convert the
given series object to a dataframe.
• One way to deal with empty cells is to remove rows that contain empty cells. This is usually
OK, since data sets can be very big, and removing a few rows will not have a big impact on
the result.
• By default, if there are no empty cells then the dropna() method returns a new DataFrame,
and will not change the original. But, if you want to change the original DataFrame, use the
inplace = True argument.
df = pd.read_csv(‘……….csv’)
df.fillna(100, inplace = True)
df=pd.read_csv("/content/sample_data/inflationdata.csv")
df['Nifty'].fillna(100, inplace = True)
print(df.to_string())
Calculate the MEAN, and replace any empty values with it:
A common way to replace empty cells, is to calculate the mean
df = pd.read_csv(‘-----.csv’)
x = df[“Nifty"].mean()
df[“Nifty"].fillna(x, inplace = True)
The (inplace = True) will make sure that the method does NOT return a new DataFrame,
but it will remove all duplicates from the original DataFrame.
Calculate the MEDIAN, and replace any empty values with it:
A common way to replace empty cells, is to calculate the mean
import pandas as pd
df = pd.read_csv(‘-----.csv’)
x = df[“Nifty"].median()
df[“Nifty"].fillna(x, inplace = True)
WRONG Data
"Wrong data" does not have to be "empty cells" or "wrong format", it can just be wrong. For
example: like if someone may have had a typo error while entering personal info such as age,
instead of 24.2 years the person may have entered 242 in error.
To replace wrong data for larger data sets you can create some rules, e.g. set some boundaries
for legal values, and replace any values that are outside of the boundaries.
Example:
Loop through all values in the "Duration" column. If the value is higher than 120, set it to 120:
for x in df.index:
if df.loc[x, "Duration"] > 120:
df.loc[x, inplace=True] = 120
Discovering Duplicates:
Duplicate rows are rows that have been registered more than one time. To discover duplicates, we
can use the duplicated() method. The duplicated() method returns a Boolean values for each row: