Scenario-Based Python Interview Questions for Data Analysis Roles (Freshers)
1. Scenario: You receive a CSV file with missing values, inconsistent casing, and duplicate
rows. How would you clean this data using Python?
Answer:
import pandas as pd
df = pd.read_csv('[Link]')
# Remove duplicates
df = df.drop_duplicates()
# Standardize casing (e.g., for a 'Name' column)
df['Name'] = df['Name'].[Link]()
# Handle missing values
df = [Link](method='ffill')
2. Scenario: You have a sales dataset with columns: Date, Product, and Revenue. How would
you find the top 3 products with the highest average monthly revenue?
Answer:
df['Date'] = pd.to_datetime(df['Date'])
df['Month'] = df['Date'].dt.to_period('M')
monthly_avg = [Link](['Month', 'Product'])['Revenue'].mean().reset_index()
top_products =
monthly_avg.groupby('Product')['Revenue'].mean().sort_values(ascending=False).head(3)
3. Scenario: How would you prepare customer data with demographic info and activity logs
for a churn prediction model?
Answer:
- Handle missing values
- Convert categorical features to numeric using pd.get_dummies()
- Normalize/scale numerical features
- Merge datasets if activity logs are separate
- Label churn (e.g., Churn = 1 if customer left, else 0)
4. Scenario: You suspect some products have incorrect prices in a dataset. How would you
identify and handle outliers?
Answer:
Q1 = df['Price'].quantile(0.25)
Q3 = df['Price'].quantile(0.75)
IQR = Q3 - Q1
outliers = df[(df['Price'] < Q1 - 1.5*IQR) | (df['Price'] > Q3 + 1.5*IQR)]
df = df[~[Link]([Link])]
5. Scenario: You have two datasets: [Link] and [Link]. How would you
combine them to analyze total spending per user?
Answer:
users = pd.read_csv('[Link]')
transactions = pd.read_csv('[Link]')
merged = [Link](users, transactions, on='user_id')
spending = [Link]('user_id')['amount'].sum()
6. Scenario: You have daily temperature data. How would you visualize trends and seasonal
patterns?
Answer:
import [Link] as plt
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
[Link](figsize=(10,5))
[Link](df['Temperature'])
[Link]('Daily Temperature Trends')
[Link]('Date')
[Link]('Temperature')
[Link]()
7. Scenario: You have a column 'Country' with many categories. How would you prepare this
for machine learning?
Answer:
# Use OneHotEncoder or pd.get_dummies
df = pd.get_dummies(df, columns=['Country'], drop_first=True)
8. Scenario: Your dataset has a column 'Join_Date'. What features can you extract from it?
Answer:
df['Join_Date'] = pd.to_datetime(df['Join_Date'])
df['Year'] = df['Join_Date'].[Link]
df['Month'] = df['Join_Date'].[Link]
df['Weekday'] = df['Join_Date'].dt.day_name()
df['Join_Quarter'] = df['Join_Date'].[Link]