Microsoft Excel Data Analysis For Beginners, Intermediate and Expert
Microsoft Excel Data Analysis For Beginners, Intermediate and Expert
Data Analysis
Course objectives:
• Import data
• Use statistical functions in Excel
• Create histograms
• Gain insights from your data
Shared by Aditi K
UQ Library
Staff and Student I.T. Training
Table of Contents
Importing External Data ...................................................................................................................................... 3
Forecasting ..................................................................................................................... 15
T Tests .............................................................................................................................................................. 16
the information again. Depending on the format of the data you would like to import, different
methods can be used, including opening and saving in Excel, linking to data, importing data and
Open the spreadsheet Data Analysis_Exercises.xlsx (which can be found under the Excel
section on the Library Training Resources page. The External Data Link sheet is selected.
1. Copy the URL of the web page with the data you
want to import.
e.g. World University Rankings on Wikipedia (which can be found in
cell A1 of the External Data Link sheet)
https://en.wikipedia.org/wiki/QS_World_University_Rankings
5. Paste the
ensure that refreshing your excel file will update the data to the
latest version. Excel will then open a new worksheet with the
imported data.
usually done via “macros” which are little programs that are typically created to do complex or
repetitive tasks. Because hackers have exploited these tools, Microsoft has disabled macros by
default in Excel. In fact, when you open an Excel file from an untrusted source, you will get a
security warning like this one. If you are working on data from an unknown or untrusted source,
Some hackers have even learned to use social engineering techniques to try and trick users into
turning macros back on. For example there may be an image in the file that appears blurred with a
note that it is for security reasons. The goal is to get you to enable macros so that you can ‘see’
the image when, in reality, enabling the macro allows the virus to run. Of course if you have good
anti-virus / anti-malware programs installed, they will go a long way towards mitigating that threat.
An external reference (also called a link) is a reference to a cell or range on a worksheet in another
Excel workbook, or a reference to a defined name in another workbook. If your data is coming
from a source beyond your immediate control, you may find that these ‘links’ are broken. If you
don’t have access to the workbooks/worksheets where the underlying data lives, you won’t be able
to use it via the link in the spreadsheet you are currently working on.
Enable Content
2. Click on the button on
the Security Warning (if necessary)
Note: In Office 365 (Windows version) Microsoft removed the Text Import Wizard as an option
when using steps below. They force you to use the Power Query window which does not have
the “Treat consecutive delimiters as one” option. You can get around this by opening the text file
directly in Excel which will launch the wizard below
6. Locate
data_analysis.txt
7. Click on Import
– )(in MacGet Data
8. Click on
Delimited option
9. Click
Next
Descriptive Statistics
Descriptive statistics is the discipline of quantitatively (expressed as numbers) describing the main
features of a collection of data. Excel’s Analysis Toolpak add-in offers a variety of features to
undertake statistical computations and graphing. Descriptive Statistics is included to provide
statistical averages (mean, mode, median), standard error, standard deviation, sample variance,
kurtosis and confidence levels of sample data.
Statistical Functions
NB: For quick statistical reference refer to status bar after highlighting a selection of values. Adjust options on
status bar by right clicking on it and selecting items.
The standard deviation (σ) is simply a measure of how close the values are to the average. A
smaller number means the values are bunched whilst a larger number indicates values that are
spread out.
the proportion of data that fits into specific categories or bins. For example, we may want to find
out how many items were of a particular length, e.g. 100mm. Excel provides a Histogram tool
which is available via the Analysis ToolPak add-in. With the latest versions of Excel there is now
a Histogram chart available in the Statistics chart options.
Creating histograms
Use worksheet “Importing Data & Histograms”
Using the tool in Data Analysis
Prepare data for a histogram of weights
1. Go to cell F19
2. Type “Bin”
3. Go to cell F20
4. Type 0
5. Go to cell F21
6. Type 50
7. Select F20 and F21
8. Autofill to display a value of 500 in cell F30
Input Range: This is the data that you want to analyse by using the Histogram tool.
Bin Range: This represents the intervals that you want the Histogram tool to use for measuring the input data in
the data analysis.
NB: Table with Bin and Frequency headings will appear along with Histogram graph.
Resize graph as required.
Windows:
Single click the X axis – Double click the X
axis to launch the Format Axis panel on the
right of the screen.
Choose the Axis Option and expand the
Axis Options
Set the Bin Width to 25
Set the Overflow bin to 200
Set the Underflow bin to 50
Mac:
Right mouse clickdata series columns
the blue
ChooseFormat Data Series
…
Change Bins – Auto to Bin Width
Expand the Data Series Options (if necessary)
SetBin Width
the to 25
SetOverflow bin
SetUnderflow
the to 200 bin
the to 50
3. Select Scatter
selecting the text box with the formulas and then drag it
Forecasting
Forecasting is estimating the likelihood of an event taking place in the future, based on available
data. Statistical forecasting concentrates on using the past to predict the future by identifying
trends, patterns and business drives within the data to develop a forecast.
Forecasting
Use worksheet “Correlation & Linear Regression”
In Excel the FORECAST function takes raw trendline data, an input (independent variable)
and returns the dependent variable
1. Click in $C$20
2. Click the Insert Function button
4. X, select B20
5. Known_y’s, select C4:C14 (the range name
Tuition_Fees will appear)
6. Known_x’s, select B4:B14 (the range name
Year will appear)
7. Note how the indicated answer matches the Intercept
value of the regression analysis
8. Click OK
9. In cell B20 type 20 to forecast the cost of
tuition fees in year 20
T Tests
TTests are performed when you have two sets of measurements or results from given populations
and you would like to compare them to see if they are significantly different.
For example you may have two lists of measurements from the same set of people. The first set of
measurements may have been taken in the morning and the second set in the afternoon. This type
of TTest is known as a related TTest or a paired TTest because you have tested the same population
twice.
Alternatively if you had two sets of measurements taken from two sets of people with one set being
in the morning and the other in the afternoon you would have an unpaired or independent TTest.
This is because you have tested two different populations.
If you are sure about the direction of differences, for example that the morning measurements are
faster than the afternoon then you perform a one tail t test.
If you are unsure about the difference between the values perform a two tail t test.
A result is called "statistically significant" if the result of the t test comes in at below .05. This is often
referred to as the P Value.
Significance tests
measurements.
These measurements are paired as they are from
the same population but taken at different times.
columns of measurements.
In this case B3:B10 and C3:C10
Note: Descriptive statistics and ANOVA summary table are displayed on screen
Interpreting results: In the summary section we can see the mean exam results for each class, But are
these differences statistically significant?
There are two types of hypotheses. Null (negative) or Alternative (positive). It is best practice to use null
hypotheses so no personal opinions creep in to the testing statement.
A null hypothesis is a default position and can never be proven. Statistically results can only reject or fail to
reject the null hypotheses.
Null hypotheses are always phrased as a negative statement e.g. There is no real difference between the
effectiveness of lectures, online delivery and video delivery.
The test result shows F =0.93 With a critical P-value of .4, the critical F = 3.285. Therefore, since the F
statistic is smaller than the critical value, we fail to reject the null hypothesis. Remember from before the P
value is statistically significant if it is below .05. This value of .4 shows there is some connection in the data
though. So, we fail to reject that there is no difference between the effectiveness of lectures, online delivery
and video delivery. These values may be explained by the small sample size. A larger sample of data may
give more statistically significant results. Apparently, the differences we saw in this sample were simply due
to random sampling error.
2. Click
Interpreting results:
Point - The location of the value within the original list. This can be used to quickly sort the output table into
the same order of the original
list.
Original - This is the column containing the original values. This column has the same column name as the
original list since we used labels in the first row.
- Rank
This is the rank of the corresponding number in the list.
Percent - This is the numbers percentage rank within the list. This percentage indicates the proportion of
the list which are below this given number.