0% found this document useful (0 votes)
81 views

Pandas Cheat Sheet

The document provides information about data analysis and visualization using Pandas. It covers topics like loading and exporting data, data wrangling operations, merging and aggregating data, and creating basic visualizations.

Uploaded by

Ananda Saikia
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views

Pandas Cheat Sheet

The document provides information about data analysis and visualization using Pandas. It covers topics like loading and exporting data, data wrangling operations, merging and aggregating data, and creating basic visualizations.

Uploaded by

Ananda Saikia
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Data export Set tick marks:

labels = ['A', 'B', 'C', 'D'] Pandas


cheat sheet
Data as NumPy array: positions = [1.0, 2.0, 3.0, 4.0]
df.values plt.xticks(positions, labels)
plt.yticks(positions, labels)
Save data as CSV file:
df.to_csv('output.csv', sep=",") Select area to plot: All of the following code examples refer to this table:
plt.axis([0.0, 2.5, 0.0, 10.0])
Format a data frame as tabular string: # [from x, to x, from y, to y] df=
df.to_string() col1 col2
Label diagram and axes:
Convert a data frame to a dictionary: plt.title('Correlation') A 1 4
df.to_dict() plt.xlabel('Nunstück')
plt.ylabel('Slotermeyer') B 2 5
Save a data frame as an Excel table:
df.to_excel('output.xlsx') Save most recent diagram: C 3 6
plt.savefig('plot.png')
(requires package xlwt) plt.savefig('plot.png', dpi=300)
plt.savefig('plot.svg') Getting started
Visualization Import pandas:
import pandas as pd
Import matplotlib:
import pylab as plt Create a series:
s = pd.Series([1, 2, 3], index=['A', 'B', 'C'],
Start a new diagram: name='col1')
plt.figure()
Create a data frame:
Scatter plot: data = [[1, 4], [2, 5], [3, 6]]
df.plot.scatter('col1', 'col2', style='ro') index = ['A', 'B', 'C']
df = pd.DataFrame(data, index=index,
Bar plot: columns=['col1', 'col2'])
df.plot.bar(x='col1', y='col2', width=0.7)
Load a data frame:
Area plot: df = pd.read_csv('filename.csv',
df.plot.area(stacked=True, alpha=1.0) sep=',',
names=['col1', 'col2'],
Box-and-whisker plot: index_col=0,
df.plot.box() encoding='utf-8',
Text by Kristian Rother, Thomas Lotze (CC-BY-SA 4.0) nrows=3)
Histogram over one column:
df['col1'].plot.hist(bins=3) https://www.cusy.io/de/seminare
Selecting rows and columns
Histogram over all columns:
df.plot.hist(bins=3, alpha=0.5) Select single column:
df['col1']
Select multiple columns: Merge multiple data frames horizontally: Count unique values:
df[['col1', 'col2']] df3 = pd.DataFrame([[1, 7], [8, 9]], df['col1'].value_counts()
index=['B', 'D'],
Show first n rows: columns=['col1', 'col3']) Summarize descriptive statistics:
df.head(2) df.describe()
Only merge complete rows (INNER JOIN):
Show last n rows: df.merge(df3)
df.tail(2) Hierarchical indexing
Left column stays complete (LEFT OUTER JOIN):
Select rows by index values: df.merge(df3, how='left') Create hierarchical index:
df.loc['A'] df.stack()
df.loc[['A', 'B']] Right column stays complete (RIGHT OUTER JOIN):
df.merge(df3, how='right') Dissolve hierarchical index:
Select rows by position: df.unstack()
df.loc[1] Preserve all values (OUTER JOIN):
df.loc[1:] df.merge(df3, how='outer')
Aggregation
Merge rows by index:
Data wrangling df.merge(df3, left_index=True, right_index=True) Create group object:
g = df.groupby('col1')
Filter by value: Fill NaN values:
df[df['col1'] > 1] df.fillna(0.0) Iterate over groups:
for i, group in g:
Sort by columns: Apply your own function: print(i, group)
df.sort_values(['col2', 'col2'], ascending=[False, True]) def func(x): return 2**x
df.apply(func) Aggregate groups:
Identify duplicate rows: g.sum()
df.duplicated() g.prod()
Arithmetics and statistics g.mean()
Identify unique rows: g.std()
df['col1'].unique() Add to all values: g.describe()
df + 10
Swap rows and columns: Select columns from groups:
df = df.transpose() Sum over columns: g['col2'].sum()
df.sum() g[['col2', 'col3']].sum()
Remove a column:
del df['col2'] Cumulative sum over columns: Transform values:
df.cumsum() import math
Clone a data frame: g.transform(math.log)
clone = df.copy() Mean over columns:
df.mean() Apply a list function on each group:
Connect multiple data frames vertically: def strsum(group):
df2 = df + 10 Standard devieation over columns: return ''.join([str(x) for x in group.values])
pd.concat([df, df2]) df.std() g['col2'].apply(strsum)

You might also like