0% found this document useful (0 votes)
20 views2 pages

Descriptive Analysis Steps With Cleaning

The document outlines a comprehensive process for conducting descriptive analysis with a focus on data cleaning and preparation, including steps such as checking for missing values, handling duplicates, and validating data types. It details the overview of the dataset, descriptive statistics for numerical and categorical variables, cross-tabulation for group analysis, trends and comparisons, visualization techniques, and summary insights regarding industry salaries and job projections. The analysis aims to provide actionable insights into job market trends and automation risks.

Uploaded by

Jansdale Yusi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views2 pages

Descriptive Analysis Steps With Cleaning

The document outlines a comprehensive process for conducting descriptive analysis with a focus on data cleaning and preparation, including steps such as checking for missing values, handling duplicates, and validating data types. It details the overview of the dataset, descriptive statistics for numerical and categorical variables, cross-tabulation for group analysis, trends and comparisons, visualization techniques, and summary insights regarding industry salaries and job projections. The analysis aims to provide actionable insights into job market trends and automation risks.

Uploaded by

Jansdale Yusi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Complete Steps for Descriptive Analysis (with Data

Cleaning)

Step 1: Data Cleaning & Preparation


• Check for missing values (e.g., salaries, experience, gender diversity %).
• Handle duplicates (remove or consolidate repeated job records).
• Validate data types (salary numeric, percentages floats, etc.).
• Detect and review outliers (e.g., extreme salaries, unrealistic years of experience).
• Standardize categories (industries, education levels, locations).
• Normalize text case if needed (e.g., USA vs. United States).
• Create derived variables if needed (e.g., Growth = Projected 2030 – 2024 openings).

Step 2: Dataset Overview


• Number of rows (30,000) and columns (13).
• Variables: Job title, industry, job status, AI impact level, salary, education, experience,
openings, automation risk, location, gender diversity, etc.
• Types of data:
• - Categorical: Job Title, Industry, Job Status, AI Impact Level, Required Education, Location.
• - Numerical: Salary, Experience, Openings, Remote Work %, Automation Risk %, Gender
Diversity %.

Step 3: Descriptive Statistics


• For Numerical Variables: Mean, median, mode; Standard deviation, min, max, range, quartiles;
Histograms (e.g., salary distribution, automation risk).
• For Categorical Variables: Frequencies, proportions/percentages.
• Examples:
• - Average salary across industries.
• - Median automation risk for different jobs.
• - Distribution of gender diversity percentages.
• - Top 5 industries with most job entries.
• - % of jobs requiring Master’s Degree.

Step 4: Cross-Tabulation / Group Analysis


• Mean salary by industry.
• Average automation risk by AI impact level.
• Remote work ratio by job status.
• Education level vs. average years of experience.
• Industry vs. gender diversity %.

Step 5: Trends & Comparisons


• Compare Job Openings 2024 vs. Projected 2030 → growth/decline.
• Correlation between automation risk and salary.
• Relationship between remote work ratio and gender diversity.
Step 6: Visualization
• Bar charts: Number of jobs per industry, required education distribution.
• Boxplots: Salary distribution per industry.
• Heatmaps: Correlation matrix (salary, experience, automation risk, remote work).
• Line charts: Job openings 2024 vs. 2030 (growth by industry).

Step 7: Summary Insights


• Which industries pay highest median salary.
• Which industries are most at risk of automation.
• Where (countries/locations) AI jobs are most concentrated.
• How required education affects salary and automation risk.
• Which industries/jobs project the biggest job growth AND the biggest job losses to 2030.

You might also like