Healthcare Data Exploration Report Word File
Healthcare Data Exploration Report Word File
Report
1
Introduction
Healthcare data is crucial for analysing trends, detecting
diseases, and improving patient care. This project explores a
healthcare dataset to understand patterns, detect missing
values, visualize distributions, and identify outliers. By
analysing various attributes such as age, blood pressure, and
correlations, we gain insights that can be valuable for
healthcare professionals.
2
Methodology
Data Loading & Exploration:
The dataset is loaded using Pandas for analysis.
Basic dataset information, including column types, data types, and
missing values, is displayed to understand data quality.
Data Visualization:
A histogram is plotted to visualize the distribution of the Age column
and detect skewness.
A boxplot is used for BloodPressure to check for potential outliers
and extreme values.
A heatmap is generated to identify correlations between numerical
variables, which helps in understanding patterns in the dataset.
Outlier Detection:
The Interquartile Range (IQR) method is used to detect and quantify
outliers in numerical columns.
This helps in identifying extreme values that might affect the analysis
and decision-making.
Summary Statistics:
The dataset is summarized using statistical measures such as mean,
median, standard deviation, and percentiles.
These statistics provide insights into the central tendency and
variability of key attributes.
3
CODE
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Summary statistics
print("\nSummary Statistics:\n", df.describe()) # Displays
statistical summary of numerical columns
# Visualizing distributions
plt.figure(figsize=(10, 5))
sns.histplot(df['Age'], bins=10, kde=True) # Plots histogram
with density estimate for Age column
plt.title("Age Distribution")
plt.show()
4
plt.figure(figsize=(10, 5))
sns.boxplot(x=df['BloodPressure']) # Creates a boxplot for
BloodPressure column to identify outliers
plt.title("Blood Pressure Boxplot")
plt.show()
plt.figure(figsize=(10, 5))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm',
fmt=".2f") # Displays correlation heatmap
plt.title("Correlation Heatmap")
plt.show()
5
Output/Result
6
7
8
References/Credits
1. Dataset Source:
Healthcare dataset sourced from KIET Group of
Institutions.
3. Image Credits:
Output images from Google Collab Output.
4. Acknowledgments:
Thanks to professors, mentors, or peers who provided
guidance or assistance in completing the project.
Special thanks to Mr. Abhishek Shukla for guidance on
data analysis concepts.