Complete Steps for Descriptive Analysis (with Data
Cleaning)
Step 1: Data Cleaning & Preparation
• Check for missing values (e.g., salaries, experience, gender diversity %).
• Handle duplicates (remove or consolidate repeated job records).
• Validate data types (salary numeric, percentages floats, etc.).
• Detect and review outliers (e.g., extreme salaries, unrealistic years of experience).
• Standardize categories (industries, education levels, locations).
• Normalize text case if needed (e.g., USA vs. United States).
• Create derived variables if needed (e.g., Growth = Projected 2030 – 2024 openings).
Step 2: Dataset Overview
• Number of rows (30,000) and columns (13).
• Variables: Job title, industry, job status, AI impact level, salary, education, experience,
openings, automation risk, location, gender diversity, etc.
• Types of data:
• - Categorical: Job Title, Industry, Job Status, AI Impact Level, Required Education, Location.
• - Numerical: Salary, Experience, Openings, Remote Work %, Automation Risk %, Gender
Diversity %.
Step 3: Descriptive Statistics
• For Numerical Variables: Mean, median, mode; Standard deviation, min, max, range, quartiles;
Histograms (e.g., salary distribution, automation risk).
• For Categorical Variables: Frequencies, proportions/percentages.
• Examples:
• - Average salary across industries.
• - Median automation risk for different jobs.
• - Distribution of gender diversity percentages.
• - Top 5 industries with most job entries.
• - % of jobs requiring Master’s Degree.
Step 4: Cross-Tabulation / Group Analysis
• Mean salary by industry.
• Average automation risk by AI impact level.
• Remote work ratio by job status.
• Education level vs. average years of experience.
• Industry vs. gender diversity %.
Step 5: Trends & Comparisons
• Compare Job Openings 2024 vs. Projected 2030 → growth/decline.
• Correlation between automation risk and salary.
• Relationship between remote work ratio and gender diversity.
Step 6: Visualization
• Bar charts: Number of jobs per industry, required education distribution.
• Boxplots: Salary distribution per industry.
• Heatmaps: Correlation matrix (salary, experience, automation risk, remote work).
• Line charts: Job openings 2024 vs. 2030 (growth by industry).
Step 7: Summary Insights
• Which industries pay highest median salary.
• Which industries are most at risk of automation.
• Where (countries/locations) AI jobs are most concentrated.
• How required education affects salary and automation risk.
• Which industries/jobs project the biggest job growth AND the biggest job losses to 2030.