0% found this document useful (0 votes)

42 views

Exploratory Data Analysis Updated

The document discusses concepts in exploratory data analysis including statistics, data types, frequency tables, descriptive statistics, distributions, and correlation. Statistics involves collecting, analyzing, interpreting and presenting data. Descriptive statistics calculated from data include measures of central tendency (mean, median, mode), dispersion (range, variance, standard deviation), and position (quartiles, percentiles). Common distributions examined include uniform, normal, binomial, Bernoulli, Poisson and exponential. Correlation coefficients indicate the strength and direction of linear relationships between variables.

Uploaded by

Dr. Sanjay Gupta

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

Exploratory Data Analysis Updated

Uploaded by

Dr. Sanjay Gupta

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Exploratory Data Analysis

CSDS3202
5.1 STATISTICS
The science of statistics involves ...
 Collection of data
 Analysis of data
 Interpretation of data
 Presentation of data

CSDS3202-Introduction to Data Science 2

5.2 DATA
Data are the actual values of variables
 Qualitative Data
 Qualitative data can be described in words or symbols rather than in numbers
 Example : colors, blood types etc..
 Quantitative Data
 Quantitative data is described by numbers
 Two categories:
 Discrete Data : Discrete data are quantitative data that are counted.
 Continuous Data : Continuous data are quantitative data that are measured.

CSDS3202-Introduction to Data Science 3

5.3 FREQUENCY TABLE

 Frequency is the number of times that a particular result occurs.

 Frequency tables are used to organize data. A basic frequency table consists of a
column of data followed by a column of frequencies.
 Example:
Look at the table below. It shows the three different ages represented in a pre-
school class. The table columns show the ages (3-5) and how many students there
are of those three ages in that class. Notice that the data is sorted in order from
smallest to largest.

CSDS3202-Introduction to Data Science 4

5.4 DESCRIPTIVE STATISTICS
Quartiles
• Quartiles divide an ordered set (smallest to largest) of data into quarters.
• Consider the following ordered set of 17 data values: {2, 2, 3, 3.5, 4, 4, 4, 6, 7.5, 8, 8, 10, 10,
11.5, 12, 12, 12}
• The value that divides the set in halves is called the second quartile (Q2). The second quartile, Q2,
is equal to 7.5. The second quartile is also called the median and the 50th percentile.
• The lower half of the data is 2, 2, 3, 3.5, 4, 4, 4, 6 .The value that divides the lower half into
halves is called the first quartile (Q1). The first quartile, Q1, is between the two middle values 3.5
and the first 4.
Q1 = (3.5 + 4)/2 = 3.75 [ Notice that 3.75 is not part of the data ]
• The upper half of the data is: 8, 8, 10, 10, 11.5, 12, 12, 12. The value that divides the upper half
into halves is called the third quartile (Q3). The third quartile, Q3, is between the two middle
values 10 and 11.5.
Q3 = (10 + 11.5)/2 = 10.75 [ Notice that 10.75 is not part of the data ]
CSDS3202-Introduction to Data Science 5
DESCRIPTIVE STATISTICS…

Quartiles…
• The data that falls below Q1= 3.75 is (2, 2, 3, 3.5) and is 25% of the data. We say that
25% of the data falls below Q1 = 3.75.

• The data that is more than Q1 = 3.75 but less than Q2 = 7.5 is (4, 4, 4, 6) and is 25% of
the data. We say that 25% of the data falls between Q1 =3.75 and Q2 = 7.5.

• The data that is more than Q2 = 7.5 but less than Q3 = 10.75 (8, 8, 10, 10) is 25% of the
data. We say that 25% of the data falls between Q2 = 7.5 and Q3 = 10.75.

• The data that falls above Q3 = 10.75 (11.5,12, 12, 12) is 25% of the data. We say that
25% of the data falls above Q3 = 10.75.

CSDS3202-Introduction to Data Science 6

DESCRIPTIVE STATISTICS…

Percentiles
 Percentiles divide an ordered set (smallest to largest) of data into
hundredths.
 Consider the ordered set of the 100 numbers 1, 2, 3, 4, 5, ..., 99, 100. Ten
percent of 100 numbers is 10 numbers. The 10 numbers 1, 2, 3, 4, 5, 6, 7, 8, 9,
10 fall below the 10th percentile. This means that the 10th percentile is
between 10 and 11. The 10th percentile (10th %ile) is equal to 10.5. Similarly,
the 90th percentile (90th %ile) is equal to 90.5.

CSDS3202-Introduction to Data Science 7

DESCRIPTIVE STATISTICS…

Mean
 The mean is the same as the average. To find the mean, add all the values and divide by
the total number of values.
 Example: {2, 3, 5, 6}
The mean
 The letter x with a bar over it, represents the sample mean.

Mode
 The mode is the most frequent value in the set of numbers.
 Example: In the data set 52, 60, 65, 67, 70, 71, 74, 76, 78, 78, 78, 80, 86, 89, 95, the most
frequent value is 78. The mode = 78.
 Example: In the data set 52, 53, 53, 53, 60, 67, 72,72,72, 90, both 53 and 72 occur the most
number of times (3 times each) so there are two modes, 53 and 72. We call this set of data
bimodal meaning it has two modes.
CSDS3202-Introduction to Data Science 8
DESCRIPTIVE STATISTICS…

Median
 The median is the middle value of a set of numbers that has been ordered from smallest
to largest. The upper case letter M is used for the median.
 Example: A sample of statistics exam scores for 14 students are (in order from smallest
to largest) as follows: 53, 59, 63, 63, 72, 72, 76, 78, 81, 83, 84, 84, 90, 93
 Notice that 14 is an even number. The median is between 7th and 8th values (the middle two
values).

 Example: A second sample of statistics exam scores for 15 students are (in order from
smallest to largest) as follows: 52, 60, 65, 67, 70, 71, 74, 76, 78, 78, 78, 80, 86, 89, 95
 Notice that 15 is an odd number. The median is the 8th value (the middle value). The 8th value is
76 so the median M = 76.

CSDS3202-Introduction to Data Science 9

DESCRIPTIVE STATISTICS…

Variance
 The variance is the average of the squares of the deviations. A deviation is the difference
between a value and the mean and is written as:

 Example: {2, 3, 5, 6} is a set of data. The sample mean is 4. The deviations are:
2 - 4 = -2
3 - 4 = -1
5-4=1
6-4=2
 The deviations squared are:
(-2)2 = 4
(-1)2 = 1
(1)2 = 1
(2)2 = 4
 An average of the deviations squared is
CSDS3202-Introduction to Data Science 10
DESCRIPTIVE STATISTICS…

Standard Deviation
 The standard deviation is a special average of the deviations. It measures how the data is
spread out from its mean.
 The standard deviation is the square root of the variance and has the same units as the
mean. The letter s represents the sample standard deviation and the Greek
letter σ represents the population standard deviation.
 Example: In the variance example above, the sample variance was s2 = 3.33 (to 2 decimal
places). The sample standard deviation is s =

rounded to one decimal place.

CSDS3202-Introduction to Data Science 11

5.5.1 THE STANDARD NORMAL PROBABILITY DISTRIBUTION

Standard Normal
 The standard normal distribution is a normal probability distribution of standardized
values called z-scores.
 The standard normal has a mean of 0 and a standard deviation of 1. Z is commonly used
as the random variable.
 Notation: Z ~ N(0, 1)
 Z-Scores
 The formula for a z-score is:
 where x is the value that is being standardized.
 A z-score is measured in terms of the standard deviation.
 So, if z = 2, then 2 is the standardized score for the value of X that is 2 standard deviations above
(positive z-score) the mean.
 If z = -1, then -1 is the standardized score for the value of X that is 1 standard deviation below
(negative z-score) the mean.
CSDS3202-Introduction to Data Science 12
5.5.2 TYPES OF DISTRIBUTIONS

 Uniform Distribution
 Normal Distribution
 Binomial Distribution
 Bernoulli Distribution
 Poisson Distribution
 Exponential Distribution

CSDS3202-Introduction to Data Science 13

UNIFORM DISTRIBUTION

CSDS3202-Introduction to Data Science 14

NORMAL DISTRIBUTION

CSDS3202-Introduction to Data Science 15

EXPONENTIAL DISTRIBUTION

CSDS3202-Introduction to Data Science 16

5.6 CORRELATION COEFFICIENT

 If a scatter plot shows a possible linear relationship, then the correlation coefficient indicates how
strong the relationship is between x and y. We use the letter r for the correlation coefficient.

 If r = 1 or r = -1, there is "perfect correlation." This means that the points are already in a straight
line. In the real world, perfect correlation is very unlikely to happen.
 The closer r is to 1 or -1, the better the correlation between x and y because the data points are
closer to the line of best fit.
 There is positive correlation if x increases then y increases or if x decreases then y decreases. If
there is positive correlation, then the line has a positive slope.
 There is negative correlation if x increases then y decreases or if x decreases then y increases. If
there is negative correlation, then the line has a negative slope.
 There is no correlation if the correlation coefficient is 0 (r = 0). This means there is no relationship
between x and y. If there is no correlation, then the slope of the line is 0.
 High correlation does not necessarily mean that x causes y or y causes x.

CSDS3202-Introduction to Data Science 17

CORRELATION COEFFICIENT…

Examples of scatter diagrams with different values of correlation coefficient (ρ)

CSDS3202-Introduction to Data Science 18

5.7 DIMENSIONALITY REDUCTION

Dimensionality reduction techniques can reduce the number of features in the

dataset without having to lose much information and keep /improve the model’s
performance
Benefits of applying dimensionality reduction to a dataset:
 Space required to store the data is reduced as the number of dimensions comes
down.
 Less dimensions lead to less computation/training time.
 Some algorithms do not perform well when we have a large dimensions. So
reducing these dimensions needs to happen for the algorithm to be useful.
 It takes care of multicollinearity by removing redundant features.
 It helps in visualizing data.
CSDS3202-Introduction to Data Science 19
DIMENSIONALITY REDUCTION…

Dimensionality reduction can be done in two different ways:

 Feature Selection
 Dimensionality Reduction
 Components/Factor Based
 Factor Analysis
 Principal Component Analysis(PCA)
 Singular Value Decomposition(SVD)
 Independent Component Analysis(ICA)
 Projections Based
 ISOMAP
 t-Distributed Stochastic Neighbor Embedding (t-SNE)
 Uniform Manifold Approximation and Projection (UMAP)

CSDS3202-Introduction to Data Science 20

5.7.1 FACTOR ANALYSIS

 Suppose we have two variables: Income and Education. These variables will
potentially have a high correlation as people with a higher education level tend to
have significantly higher income, and vice versa.
 In the Factor Analysis technique, variables are grouped by their correlations, i.e.,
all variables in a particular group will have a high correlation among themselves,
but a low correlation with variables of other group(s). Here, each group is known
as a factor. These factors are small in number as compared to the original
dimensions of the data. However, these factors are difficult to observe.

CSDS3202-Introduction to Data Science 21

FACTOR ANALYSIS…
Read in all the images contained in the train folder:
train = pd.read_csv("../input/fashionmnist/fashion-
mnist_train.csv",sep=',')
Convert these images into a numpy array format
train_data = np.array(train, dtype = 'float32')
image = []
for i in range(0,60000):
img = train_data[i].flatten()
image.append(img)
image = np.array(image)

CSDS3202-Introduction to Data Science 22

FACTOR ANALYSIS…

Create a dataframe containing the pixel values of every individual pixel present in each
image, and also their corresponding labels
train = pd.read_csv("../input/fashionmnist/fashion-
mnist_train.csv",sep=',') # Give the complete path of your
train.csv file
feat_cols = [ 'pixel'+str(i) for i in range(image.shape[1]) ]
df = pd.DataFrame(image,columns=feat_cols)
df['label'] = train['label']
Decompose the dataset using Factor Analysis:
from sklearn.decomposition import FactorAnalysis
fa = FactorAnalysis(n_components =
3).fit_transform(df[feat_cols].values)

CSDS3202-Introduction to Data Science 23

FACTOR ANALYSIS…

Visualize the results:

%matplotlib inline
import matplotlib.pyplot as plt
plt.figure(figsize=(16,10))
plt.title('Factor Analysis Components')
plt.scatter(fa[:,0], fa[:,1],c='r',s=10)
plt.scatter(fa[:,1], fa[:,2],c='b',s=10)
plt.scatter(fa[:,2],fa[:,0],c='g',s=10)
plt.legend(("First Factor","Second Factor","Third
Factor"))

CSDS3202-Introduction to Data Science 24

5.7.2 UNIFORM MANIFOLD APPROXIMATION
AND PROJECTION (UMAP)
 Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction
technique that can preserve as much of the local, and more of the global data
structure
 Key advantages of UMAP are:
 It can handle large datasets and high dimensional data without too much difficulty
 It combines the power of visualization with the ability to reduce the dimensions of the
data
 Along with preserving the local structure, it also preserves the global structure of the
data. UMAP maps nearby points on the manifold to nearby points in the low
dimensional representation, and does the same for far away points.
 This method uses the concept of k-nearest neighbor and optimizes the results
using stochastic gradient descent.

CSDS3202-Introduction to Data Science 25

UMAP…

import umap
umap_data = umap.UMAP(n_neighbors=5, min_dist=0.3,
n_components=3).fit_transform(df[feat_cols][:6000].values
)
Here,
 n_neighbors determines the number of neighboring points
used
 min_dist controls how tightly embedding is allowed.
Larger values ensure embedded points are more evenly
distributed

CSDS3202-Introduction to Data Science 26

UMAP…

Visualize the transformation:

CSDS3202-Introduction to Data Science 27

5.8 FEATURE SELECTION

 The process of choosing a subset of input features that contribute the most to
the output feature for use in model construction.
 Important if we have datasets with high dimensionality (i.e., large number of
features).
 Helps to mitigate these problems by selecting features that have high
importance to the model, such that the data dimensionality can be reduced
without much loss of the total information.
 Benefits of feature selection are:
 Reduce training time
 Reduce the risk of overfitting
 Potentially increase model's performance
 Reduce model's complexity such that interpretation becomes easier

CSDS3202-Introduction to Data Science 28

5.8.1 METHODS OF FEATURE SELECTION

 Filter Methods
 ANOVA F-value
 Variance Threshold
 Mutual Information
 Wrapper Methods
 Exhaustive feature selection (EFS)
 Sequential forward selection (SFS)
 Sequential backward selection (SBS)
 Embedded Methods
 Random forest

CSDS3202-Introduction to Data Science 29

5.8.1.1 NECESSARY PYTHON LIBRARIES

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns
sns.set(style="whitegrid")

import warnings
warnings.filterwarnings('ignore')

CSDS3202-Introduction to Data Science 30

IRIS FLOWER DATASET FROM SCIKIT-
LEARN
# Load Iris dataset from Scikit-learn
from sklearn.datasets import load_iris

# Create input and output features

feature_names = load_iris().feature_names
X_data = pd.DataFrame(load_iris().data, columns=feature_names)
y_data = load_iris().target

# Show the first five rows of the dataset

X_data.head()

CSDS3202-Introduction to Data Science 31

5.8.1.2 ANOVA F-VALUE

 ANOVA F-value method estimates the degree of linearity between the input
feature (i.e., predictor) and the output feature.
 A high F-value indicates high degree of linearity and a low F-value indicates low
degree of linearity.
 The main disadvantage of using ANOVA F-value is it only captures linear
relationships between input and output feature.
 In other words, any non-linear relationships cannot be detected by F-value.
 Scikit-learn has two functions to calculate F-value:
 f_classif, which calculate F-value between input and output feature for classification
task
 f_regression, which calculate F-value between input and output feature for
classification task

CSDS3202-Introduction to Data Science 32

ANOVA F-VALUE…

 Use f_classif because the Iris dataset entails classification task

# Import f_classif from Scikit-learn
from sklearn.feature_selection import f_classif
# Create f_classif object to calculate F-value
f_value = f_classif(X_data, y_data)

# Print the name and F-value of each feature

for feature in zip(feature_names, f_value[0]):
print(feature)

CSDS3202-Introduction to Data Science 33

ANOVA F-VALUE…

Visualize the results by creating a bar chart:

# Create a bar chart for visualizing the F-values
plt.figure(figsize=(4,4))
plt.bar(x=feature_names, height=f_value[0], color='tomato')
plt.xticks(rotation='vertical')
plt.ylabel('F-value')
plt.title('F-value Comparison')
plt.show()

CSDS3202-Introduction to Data Science 34

5.8.1.3 EXHAUSTIVE FEATURE SELECTION (EFS)

 EFS finds the best subset of features by evaluating all feature combinations.
 Suppose we have a dataset with three features. EFS will evaluate the
following feature combinations:
 feature_1
 feature_2
 feature_3
 feature_1 and feature_2
 feature_1 and feature_3
 feature_2 and feature_3
 feature_1, feature_2, and feature_3
 EFS selects a subset that generates the best performance (e.g., accuracy,
precision, recall, etc.) of the model being considered.
 Mlxtend provides ExhaustiveFeatureSelector function to perform EFS.
CSDS3202-Introduction to Data Science 35
EXHAUSTIVE FEATURE SELECTION (EFS)…

 EFS has five important parameters:

 estimator: the classifier that we intend to train
 min_features: the minimum number of features to select
 max_features: the maximum number of features to select
 scoring: the metric to use to evaluate the classifier
 cv: the number of cross-validations to perform

CSDS3202-Introduction to Data Science 36

EXHAUSTIVE FEATURE SELECTION (EFS)…

# Import ExhaustiveFeatureSelector from Mlxtend

from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS

# Import logistic regression from Scikit-learn

from sklearn.linear_model import LogisticRegression

# Create a logistic regression classifier

lr = LogisticRegression()

# Create an EFS object

efs = EFS(estimator=lr, # Use logistic regression as the

classifier/estimator

min_features=1, # The minimum number of features to consider is 1

max_features=4, # The maximum number of features to consider is 4

scoring='accuracy', # The metric to use to evaluate the classifier is

accuracy

cv=5) # The number of cross-validations to perform is 5

CSDS3202-Introduction to Data Science 37
EXHAUSTIVE FEATURE SELECTION (EFS)…

# Train EFS with our dataset

efs = efs.fit(X_data, y_data)
# Print the results
print('Best accuracy score: %.2f' % efs.best_score_) #
best_score_ shows the best score
print('Best subset (indices):', efs.best_idx_) # best_idx_
shows the index of features that yield the best score
print('Best subset (corresponding names):',
efs.best_feature_names_) # best_feature_names_ shows the feature
names
# that yield the best score

CSDS3202-Introduction to Data Science 38

EXHAUSTIVE FEATURE SELECTION (EFS)…

Transform the dataset into a new dataset containing only the subset of features that
generates the best score by using transform method.
# Transform the dataset
X_data_new = efs.transform(X_data)
# Print the results
print('Number of features before transformation:
{}'.format(X_data.shape[1]))
print('Number of features after transformation:
{}'.format(X_data_new.shape[1]))
# Show the performance of each subset of features
efs_results = pd.DataFrame.from_dict(efs.get_metric_dict()).T
efs_results.sort_values(by='avg_score', ascending=True, inplace=True)
efs_results

CSDS3202-Introduction to Data Science 39

EXHAUSTIVE FEATURE SELECTION (EFS)…

Visualize the performance of each subset of features by creating a horizontal bar chart:
# Create a horizontal bar chart for visualizing

# the performance of each subset of features

fig, ax = plt.subplots(figsize=(12,9))

y_pos = np.arange(len(efs_results))

ax.barh(y_pos, efs_results['avg_score'],

xerr=efs_results['std_dev'], color='tomato')

ax.set_yticks(y_pos)

ax.set_yticklabels(efs_results['feature_names'])

ax.set_xlabel('Accuracy')

plt.show()

CSDS3202-Introduction to Data Science 40

5.8.1.4 FEATURE SELECTION USING RANDOM FOREST

 Random forest is one of the most popular learning algorithms used for feature
selection in a data science workflow.
 Split dataset into train and test split because the feature selection is a part of the
training process.
 Use gini criterion to define feature importance

CSDS3202-Introduction to Data Science 41

FEATURE SELECTION USING RANDOM
FOREST…
# Import RandomForestClassifier from Scikit-learn
from sklearn.ensemble import RandomForestClassifier
# Import train_test_split from Scikit-learn
from sklearn.model_selection import train_test_split

# Split the dataset into 30% test and 70% training

X_train, X_test, y_train, y_test =
train_test_split(X_data, y_data, test_size=0.3,
random_state=0)

CSDS3202-Introduction to Data Science 42

FEATURE SELECTION USING RANDOM
FOREST…
# Create a random forest classifier
rfc = RandomForestClassifier(random_state=0, criterion='gini') # Use gini
criterion to define feature importance
# Train the classifier
rfc.fit(X_train, y_train)
# Print the name and gini importance of each feature
for feature in zip(feature_names, rfc.feature_importances_):
print(feature)

If we add up al the importance scores, the result is 100%. As we can see, petal length and petal
width correspond to 83% of the total importance score. They are clearly the most important features!

CSDS3202-Introduction to Data Science 43

REFERENCES
J. Han and M. Kamber (2011). Data Mining: Concepts and Techniques. Morgan
Kaufmann, 3rd ed.

Rahil Shaikh (2018). Feature Selection Techniques in Machine Learning with Python.
Available 2023-02-22 at https://towardsdatascience.com/feature-selection-
techniques-in-machine-learning-with-python-f24e7da3f36e

CSDS3202-Introduction to Data Science 44

Norms and Basic Statistics For Testing
No ratings yet
Norms and Basic Statistics For Testing
26 pages
Decision Science
No ratings yet
Decision Science
523 pages
2 - Norms and Basic Statistics For Testing
100% (2)
2 - Norms and Basic Statistics For Testing
43 pages
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)
Lecture 4 - Bias-Variance Trade-Off and Model Selection
No ratings yet
Lecture 4 - Bias-Variance Trade-Off and Model Selection
66 pages
Statistics For Data Science
100% (1)
Statistics For Data Science
27 pages
History Reporting
No ratings yet
History Reporting
61 pages
Spring Semester, 2020-2021
No ratings yet
Spring Semester, 2020-2021
40 pages
Measures of Relative Position
No ratings yet
Measures of Relative Position
28 pages
2-Statistika Deskriptif
No ratings yet
2-Statistika Deskriptif
34 pages
Chapter 1 (Introduction)
No ratings yet
Chapter 1 (Introduction)
40 pages
GE MODMAT Unit 4 Statistics 1
No ratings yet
GE MODMAT Unit 4 Statistics 1
14 pages
Complete Business Statistics: Introduction and Descriptive Statistics Introduction and Descriptive Statistics
No ratings yet
Complete Business Statistics: Introduction and Descriptive Statistics Introduction and Descriptive Statistics
79 pages
Complete Business Statistics: by Amir D. Aczel & Jayavel Sounderpandian 6 Edition
No ratings yet
Complete Business Statistics: by Amir D. Aczel & Jayavel Sounderpandian 6 Edition
74 pages
Nursing Research Methods: PH.D in Nursing
No ratings yet
Nursing Research Methods: PH.D in Nursing
66 pages
Health Statistics: Principles of Secondary Data Analysis
No ratings yet
Health Statistics: Principles of Secondary Data Analysis
61 pages
Chapter2 Stats
No ratings yet
Chapter2 Stats
9 pages
Lecture 02 - SP - Probability Review
No ratings yet
Lecture 02 - SP - Probability Review
47 pages
Basic Maths23su
No ratings yet
Basic Maths23su
42 pages
Mathematics As A Tool (Descriptive Statistics) (Midterm Period) Overview: This Module Tackles Mathematics As Applied To Different Areas Such As Data
No ratings yet
Mathematics As A Tool (Descriptive Statistics) (Midterm Period) Overview: This Module Tackles Mathematics As Applied To Different Areas Such As Data
33 pages
Chapter 2
No ratings yet
Chapter 2
21 pages
Stat Chapter 5-9
No ratings yet
Stat Chapter 5-9
32 pages
Statistics
100% (4)
Statistics
124 pages
Lesson 4: Statistics/Data Management Unit 1 - Measures of Central Tendency
No ratings yet
Lesson 4: Statistics/Data Management Unit 1 - Measures of Central Tendency
26 pages
Statistics Lab 10-4
No ratings yet
Statistics Lab 10-4
11 pages
Statstical Method
No ratings yet
Statstical Method
60 pages
Descriptive Statistics and Exploratory Data Analysis
No ratings yet
Descriptive Statistics and Exploratory Data Analysis
36 pages
chapter2-statistical analysis
No ratings yet
chapter2-statistical analysis
86 pages
Statistics 84
No ratings yet
Statistics 84
4 pages
Analysing Quantitative Data - 13april2017
No ratings yet
Analysing Quantitative Data - 13april2017
41 pages
Reviewer For Stat
No ratings yet
Reviewer For Stat
7 pages
Staticus: Math 103 Lecture 9 Class Notes
No ratings yet
Staticus: Math 103 Lecture 9 Class Notes
4 pages
3 - Descriptive Stat
No ratings yet
3 - Descriptive Stat
70 pages
Descriptive Statistics - Handout
No ratings yet
Descriptive Statistics - Handout
10 pages
Statistical Analysis
No ratings yet
Statistical Analysis
15 pages
Unit 4-23
No ratings yet
Unit 4-23
42 pages
Lect 7
No ratings yet
Lect 7
16 pages
Data Management
No ratings yet
Data Management
36 pages
It B.tech II Year II Sem DV (R18a0555)
No ratings yet
It B.tech II Year II Sem DV (R18a0555)
73 pages
Year 11 Preliminary Standard Math: Analysing Data
No ratings yet
Year 11 Preliminary Standard Math: Analysing Data
32 pages
statisticsforinterpretingtestscores-101220031831-phpapp01 (1)
No ratings yet
statisticsforinterpretingtestscores-101220031831-phpapp01 (1)
60 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
123 pages
Business Statistics
No ratings yet
Business Statistics
106 pages
To Data Science: Chapter 4: Statistical Description of Data
No ratings yet
To Data Science: Chapter 4: Statistical Description of Data
13 pages
1 Statistics 23
No ratings yet
1 Statistics 23
98 pages
Lec2 1-Dataset1
No ratings yet
Lec2 1-Dataset1
32 pages
Section 6 Slides 2022
No ratings yet
Section 6 Slides 2022
372 pages
Maam Teslyn Report
No ratings yet
Maam Teslyn Report
38 pages
Introduction To Data Science: Course Code: CS-4883 Course Instructor: Muhammad Owais
No ratings yet
Introduction To Data Science: Course Code: CS-4883 Course Instructor: Muhammad Owais
38 pages
Measures of Relative Position
100% (1)
Measures of Relative Position
18 pages
Describe Data II
No ratings yet
Describe Data II
86 pages
Lecture4_slides
No ratings yet
Lecture4_slides
22 pages
Chapter 1 BFC34303 (Lyy)
No ratings yet
Chapter 1 BFC34303 (Lyy)
104 pages
Introduction To Statistics: Prepared By: Joshua Erdy A. Tan
No ratings yet
Introduction To Statistics: Prepared By: Joshua Erdy A. Tan
29 pages
Biostat Lec Part 4 (SV)
No ratings yet
Biostat Lec Part 4 (SV)
3 pages
Chap 4
No ratings yet
Chap 4
62 pages
Data Analysis and Visualization EDA
No ratings yet
Data Analysis and Visualization EDA
51 pages
Lesson 5 (Descriptive Statistics Part 1)_Oct 2024
No ratings yet
Lesson 5 (Descriptive Statistics Part 1)_Oct 2024
72 pages
FDSA unit 2
No ratings yet
FDSA unit 2
44 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
A Malware Detection Method
No ratings yet
A Malware Detection Method
74 pages
Robin 1 PDF
No ratings yet
Robin 1 PDF
20 pages
Machine Learning Methods
No ratings yet
Machine Learning Methods
27 pages
Assessment II
No ratings yet
Assessment II
25 pages
Paper 123
No ratings yet
Paper 123
25 pages
Tutorial Galgo R
No ratings yet
Tutorial Galgo R
92 pages
Splat Feature Classification With Application To Retinal Hemorrhage Detection in Fundus Images
No ratings yet
Splat Feature Classification With Application To Retinal Hemorrhage Detection in Fundus Images
13 pages
Artificial Intelligence and Machine Learning For Medical Imaging
No ratings yet
Artificial Intelligence and Machine Learning For Medical Imaging
15 pages
Bda Super Imp
No ratings yet
Bda Super Imp
35 pages
JMP SUMMIT EUROPE 2018 - Data Mining Under The Curse of Dimensionality (Gianpaolo Polsinelli - LFoundry Italy)
No ratings yet
JMP SUMMIT EUROPE 2018 - Data Mining Under The Curse of Dimensionality (Gianpaolo Polsinelli - LFoundry Italy)
7 pages
Black Book Draft
No ratings yet
Black Book Draft
52 pages
Chapter 2 Data Preprocessing
No ratings yet
Chapter 2 Data Preprocessing
23 pages
Machine Learning With MATLAB Quick Reference
No ratings yet
Machine Learning With MATLAB Quick Reference
36 pages
Adatis Azure National Archives
No ratings yet
Adatis Azure National Archives
26 pages
Cost-Sensitive Prediction of Stock Price Direction Selection of Technical Indicators
No ratings yet
Cost-Sensitive Prediction of Stock Price Direction Selection of Technical Indicators
17 pages
Evaluation of Machine Learning Based Optimized Feature Selection Approaches and Classification Methods For Cervical Cancer Prediction
No ratings yet
Evaluation of Machine Learning Based Optimized Feature Selection Approaches and Classification Methods For Cervical Cancer Prediction
16 pages
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
No ratings yet
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
6 pages
Sustainability 14 06199
No ratings yet
Sustainability 14 06199
23 pages
Trainer - X-Vision
No ratings yet
Trainer - X-Vision
21 pages
2 Machine Learning Overview
No ratings yet
2 Machine Learning Overview
86 pages
Feature Selection and Similarity Coefficient Based Method For Email Spam Filtering
No ratings yet
Feature Selection and Similarity Coefficient Based Method For Email Spam Filtering
4 pages
Feature Selection
No ratings yet
Feature Selection
61 pages
Feature Selection Techniques in ML With Python-1
No ratings yet
Feature Selection Techniques in ML With Python-1
7 pages
A Survey On Data Mining Techniques For COVID Prediction
100% (2)
A Survey On Data Mining Techniques For COVID Prediction
6 pages
A Review of Network Traffic Analysis and Prediction Techniques
No ratings yet
A Review of Network Traffic Analysis and Prediction Techniques
22 pages
A Novel Approach For Feature Selection and Classification of Diabetes Mellitus: Machine Learning Methods
No ratings yet
A Novel Approach For Feature Selection and Classification of Diabetes Mellitus: Machine Learning Methods
11 pages
Research Proposal
No ratings yet
Research Proposal
8 pages
Experiment No.: 9: T. Y. B. Tech (CSE) - II Subject: Open Source Lab-II
No ratings yet
Experiment No.: 9: T. Y. B. Tech (CSE) - II Subject: Open Source Lab-II
4 pages
College Recommendation System-(Research Paper)
No ratings yet
College Recommendation System-(Research Paper)
4 pages