0% found this document useful (0 votes)
4 views

Statistical Analysis Lesson 1 Notes

The document outlines a Diploma in Statistical Analysis course, detailing the differences between statistical analysis and data analysis, and introducing the SAS software for data management and analysis. It describes the statistical analysis journey, including steps such as understanding the question, data collection, cleaning, management, modeling, interpretation, and reporting. Additionally, it provides information on the advantages and disadvantages of using SAS, as well as instructions for importing data into the software.

Uploaded by

jenniferbebeda12
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Statistical Analysis Lesson 1 Notes

The document outlines a Diploma in Statistical Analysis course, detailing the differences between statistical analysis and data analysis, and introducing the SAS software for data management and analysis. It describes the statistical analysis journey, including steps such as understanding the question, data collection, cleaning, management, modeling, interpretation, and reporting. Additionally, it provides information on the advantages and disadvantages of using SAS, as well as instructions for importing data into the software.

Uploaded by

jenniferbebeda12
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Diploma in Statistical Analysis

Introduction to
Statistical
Analysis
Contents

3 Introduction

4 Lesson outcomes

5 Introduction to Statistical Analysis

5 Introduction to SAS

6 Importing data into SAS

10 References
Lesson outcomes
By the end of this lesson, you should be able to:

• Know what the difference between Statistical Analysis and data analysis is
• Understand the Statistical Analysis journey
• Have downloaded and installed SAS University Edition
• Have downloaded an open source data set from Kaggle
• Have imported the open source data set into SAS University Edition

Introduction
In lesson 1, we will be introduced to Statistical Analysis.

Lesson 1 will discuss what the difference between this course and the data analysis course is and who this course is aimed
at. We will dive deep into all the fundamental concepts you need to lay a good understanding of Statistical Analysis.

Thereafter, we will introduce the tool we will utilize throughout this module, called SAS. We will end the lesson with a fun
practical demonstration in SAS.

Introduction to Statistical Analysis


Statistical Analysis vs data analysis
Statistical analysis is when we apply mathematical statistical techniques to a portion of the population that is our data.

Data analysis is the investigation, cleaning, some modelling and presenting of data.

A data analyst is, therefore, someone who specializes in exploring data, whereas a statistical analyst will centre their
attention more on what is inferring beyond the data. There is an overlap between these fields, especially as technology
evolves, but they are two separate pursuits.

Statistical Analysis journey


Let’s have a look at the journey you will go on when you want to statistically analyse your data.

1. Step one involves understanding the question posed to us as the statistician or statistical analyst

2. Thereafter, we obtain the data

3. Thirdly, we need to clean the data

4. Thereafter we manage the data by, for instance, creating extra variables.

5. In step 5, we describe the data with descriptive measures like the mean and median as well as with the help of
plots

6. Now, we return to step 3 and clean the data again

7. Step 7 involves modelling the data


8. Finally, we have our results and we can interpret the results

9. TWe Statihe last step is to report the results back to shareholders

Let us break each step down and make sure we know what is expected of us in each part of the journey.

1. Understanding the question

Before we dive into the data, we need to understand what stakeholders want from data and understand the question they
are posing to us:

• Is it possible to answer the question?


• Think about how long it will realistically take you to answer the question and be upfront about any challenges you
foresee.
• Try to ask specific questions in this step of the process to fully understand the question posed.

2. Data collection

Next up in the process we need to extract data from the various sources that can help us answer the question posed by the
stakeholders.

Use all data sources available to draw insights from; the more data we gather, the better.

Make sure that the format of the data is compatible with the tool you are using in this step and if not, transform it.

3. Data cleaning

Cleaning data is focused on the quality of the data.

Data cleaning involves removing information not needed, updating information that is incomplete, incorrect, or
incomplete, checking if there are any duplicate variables, to name but a few of the steps.

Note: If you are combining data sources, check that unexpected errors did not creep in. Also, check if the data is outdated
or not.

4. Data management

In this step, we create new variables through a combination of existing ones, for instance, to make the data exploration
step more efficient. After we have explored the data, we will return to this step and likely have a better understanding of
how to manage the data

This step can also include making sure the data complies with the data protection privacy act if the data is drawn from
older sources.

5. Describing data
• Describing the data involves showing or summarising the data in a meaningful way such that, for example,
patterns might emerge from the data.
• Data is described to spot any obvious trends and outliers and evaluate initial distributions.
• This step helps us to further clean and manage the data.
6. Rewind

We return to the data cleaning step after we have a better understanding of the data and we repeat the data cleaning,
managing and describing the data steps of this journey until we find the data to be in a sufficient condition for us to start
the modelling process.

7. Model

Finally, we get to the fun part, modelling the data! here we apply the appropriate model to the data for predictive
analytical purposes.

With the model, we can within a certain accuracy forecast the event

If we work with garbage data, the result in this step will also be garbage, hence why we spend so much time cleaning and
managing the data.

8. Interpret

Appropriate interpretation of the results is critical to make sense of the apparent disarray. To accurately assess the results,
you need to understand the subject matter you are analysing as well as the statistical method you are applying.

9. Report

The last step, but never the end of the analytics process, will always be to communicate the findings of your results to your
audience, the shareholders that posed the question.

Always be sure to carry over the learnings in a manner that is understandable to the audience. If the audience has some
technical knowledge of the field, you can explain findings in more detail, but if the audience has no technical knowledge of
the field, make sure to focus more on the key takeaways of the results.

Reporting can also take the form of a written document, for example, in the case of academic papers. Naturally, academic
jargon is more technical than a presentation and is typically led by the medium it wishes to be published in, but we will not
go into too much detail, because this is dependent on your field of study and many other factors.

Finally, remember to include many visualizations as this help to sketch a clearer picture of the outcome to both technical
and non-technical audiences alike.

Introduction to SAS
SAS
SAS is a Statistical Analysis system established in 1976 specifically designed for data management, advanced analytics,
multivariate analysis, business intelligence and predictive analytics to name but a few of the functions.

SAS is a software suite and tools include data mining, Statistical Analysis, forecasting, text analysis, and optimization and
simulation.

Advantages of SAS

• SAS has products for various scenarios.


• SAS models are created as a closed programming language; therefore, data security is easier to ensure.
• In comparison to R, it has been said that SAS has a quicker learning curve. Users seem to find it easier to learn
SAS. Users also say that debugging, meaning being able to fix and error, is easier.
• SAS is a highly requested skill in many organizations.
• SAS offers good customer support for corporations during the installation and administration process of their
services, which open source software is simply not able to provide because of the nature of the tool.

Disadvantages of SAS

• There is a high cost involved for purchasing the license to work with SAS (in this module, however, we will use SAS
university edition that is free SAS software).
• In comparison to open-sourced platforms, SAS has a shortcoming when it comes to graphically represent data.
• Users have also mentioned that it is difficult to use SAS for text mining (which we will not do in this course,
therefore this downfall is not of great importance to us).

SAS university edition


We will be working with SAS university edition that is free SAS software used for learning statistics. SAS specifically
designed the university edition for those needing access to statistical software to learn more about quantitative analysis.
The tool holds a vast array of features that you can read more about here:
https://www.sas.com/content/dam/SAS/en_us/doc/factsheet/sas-university-edition-107140.pdf

Download

SAS University Edition can be downloaded through the following link:

https://www.sas.com/en_za/software/university-edition/download-software.html

Import data to SAS


File types SAS can import
According to SAS’s documentation, the following file types can be imported into SAS:

• Microsoft Access database files.

• delimited files, such as files with comma-separated values.

• dBASE 5.0, IV, III+, and III.

• Stata files.

• Microsoft Excel files. To import XLSB and XLSM files, you must use the SAS LIBNAME statement.

• JMP files.

• Paradox DB files.

• SPSS files.

• Lotus 1-2-3 files from Releases 2, 3, 4, or 5.


Importing data into SAS
Data set from Kaggle (red wine data set): https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009

1. Ensure data set is in myfolder folder created as a subfolder in the SAS University Edition folder

2. View file as text

3. Import data

4. SAS automatically generates code to import data set


5. By clicking on the running man icon, you produce results that generate data set dimensions and column
names
References
• Dawson, B. & Trapp, R.G., 2004, Basic & Clinical Biostatistics, 4th ed., United States of America,

McGraw-Hill.

• Dinsmore, T.W., 2014, SAS versus R part two, ML/AI, https://thomaswdinsmore.com/2014/12/15/sas-

versus-r-part-

two/#:~:text=Entry%20costs%20to%20license%20the,of%20the%20first%20year%20fee.

• Gomez, L., 2018, 6 steps for data cleaning and why it matters, Geotab,

https://www.geotab.com/blog/data-cleaning/

• Glen, S., 2020, Difference between data analysis and statistical analysis, Data Science Central,

https://www.datasciencecentral.com/profiles/blogs/difference-between-data-analysis-and-

statistical-analysis

• javaTpoint, 2018, Advantages | Disadvantages of SAS Programming Language,

javaTpoint, https://www.javatpoint.com/advantages-and-disadvantages-of-sas

• Kozyrkov, C., 2019, What’s the difference between analytics and statistics?, towards data science,

https://towardsdatascience.com/whats-the-difference-between-analytics-and-statistics-

cd35d457e17

• SAS Institute Inc., 2020, Importing data, SAS Institute Inc.,

https://documentation.sas.com/?cdcId=webeditorcdc&cdcVersion=5.2&docsetId=webeditorug&doc

setTarget=p11uw39h8jb27on1fc3d0og7ac52.htm&locale=en

You might also like