0% found this document useful (0 votes)
31 views4 pages

The Stack

The document discusses various statistical and machine learning techniques including logistic regression, ANOVA, linear regression, and lowess regression. Logistic regression and ANOVA are used to investigate relationships between factors and predict outcomes. Linear regression builds prediction models and lowess regression shows patterns in data. Various statistical values like p-values and measures of significance are also discussed.

Uploaded by

imaboneofmysword
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
31 views4 pages

The Stack

The document discusses various statistical and machine learning techniques including logistic regression, ANOVA, linear regression, and lowess regression. Logistic regression and ANOVA are used to investigate relationships between factors and predict outcomes. Linear regression builds prediction models and lowess regression shows patterns in data. Various statistical values like p-values and measures of significance are also discussed.

Uploaded by

imaboneofmysword
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 4

The stack() function is used to stack the prescribed level(s) from columns

to index.

Before the ~ sign is the column that you want it to predict, and then the
columns used to predict it

predict command is to predict the data according to the formula built


from model1

• The p-value is a statistical measurement used to validate a hypothesis


against observed data.
• The p-value measures the probability of the observed outcomes,
assuming that the null hypothesis is true.
• The lower the p-value, the greater the statistical significance of the
observed difference.

Logistic regression and ANOVA.


Logistic regression: to investigate the relationship between components
factors and age in day with the strength of cocrete
ANOVA: to compare the efficiency of linear regression model built

Lowess line then it's the local regression of those dots


It kind of shows us the pattern of those dots

WHY DO WE USE ONE – WAY ANOVA?


Answer:
+ The one-way analysis of variance (ANOVA) is used to determine whether there are
any statistically significant differences between the means of two or more
independent (unrelated) groups (although you tend to only see it used when there is
a minimum of three, rather than two groups).
+ For example, you could use a one-way ANOVA to understand whether exam
performance differed based on test anxiety levels amongst students, dividing
students into three independent groups (e.g., low, medium and high-stressed
students).

WHY DO WE USE TWO, MORE WAY ANOVA?


Answer:
+ A two-way ANOVA is used to estimate how the mean of a quantitative
variable changes according to the levels of two categorical variables.
+ Use a two-way ANOVA when you want to know how two independent variables, in
combination, affect a dependent variable.

THE DIFFERENCE BETWEEN ONE WAY AND TWO WAY?


Answer:

The key differences between one-way and two-way ANOVA are summarized
clearly below.

A one-way ANOVA is primarily designed to enable the equality testing


between three or more means. A two-way ANOVA is designed to assess the
interrelationship of two independent variables on a dependent variable.

A one-way ANOVA only involves one factor or independent variable, whereas


there are two independent variables in a two-way ANOVA.

In a one-way ANOVA, the one factor or independent variable analyzed has


three or more categorical groups. A two-way ANOVA instead compares
multiple groups of two factors.

One-way ANOVA need to satisfy only two principles of design of experiments,


i.e. replication and randomization. As opposed to Two-way ANOVA, which
meets all three principles of design of experiments which are replication,
randomization, and local control.

WHY DO WE USE CHI-SQUARED?


Answer:
The Chi Square statistic is commonly used for testing relationships between
categorical variables. The null hypothesis of the Chi-Square test is that no
relationship exists on the categorical variables in the population; they are
independent. An example research question that could be answered using a Chi-
Square analysis would be:
Is there a significant relationship between voter intent and political party
membership?
The difference between ANOVA and Chi-squared?
Answer:
A chi-square is only a nonparametric criterion. You can make comparisons for each
characteristic. You can also use Factorial ANOVA. In Factorial ANOVA, you can
investigate the dependence of a quantitative characteristic (dependent variable) on
one or more qualitative characteristics (category predictors). If the number of groups
is large, you do not see which caused a significant deviation of the average. It is
necessary to make a posterial comparison of the average (post-hoc analysis) Tukey
HSD criterion. The Tukey criterion has the same applicability conditions as the
variance analysis, i.e., the normality of the data distribution and the uniformity of
the group dispersions. The uniformity of the group dispersions is checked by Levene
's test index.
Can we use chi-squared to replace ANOVA?
Này bí :V
HOW TO READ A BOX-PLOT?

Answer:

A boxplot is a standardized way of displaying the distribution of data based on a five


number summary (“minimum”, first quartile (Q1), median, third quartile (Q3), and
“maximum”). It can tell you about your outliers and what their values are. It can also
tell you if your data is symmetrical, how tightly your data is grouped, and if and how
your data is skewed.

The mean, median, sd, maxium, minium?


Answer:
+ The sample mean is the average and is computed as the sum of all the observed
outcomes from the sample divided by the total number of events. We use x as the
symbol for the sample mean. In math terms
+ An alternative measure is the median. The median is the middle score. If we have
an even number of events we take the average of the two middles. The median is
better for describing the typical value. It is often used for income and home prices.
+ The mean, mode, median, and trimmed mean do a nice job in telling where the
center of the data set is, but often we are interested in more. For example, a
pharmaceutical engineer develops a new drug that regulates iron in the blood.
Suppose she finds out that the average sugar content after taking the medication is
the optimal level. This does not mean that the drug is effective. There is a possibility
that half of the patients have dangerously low sugar content while the other half
have dangerously high content. Instead of the drug being an effective regulator, it is
a deadly poison. What the pharmacist needs is a measure of how far the data is
spread apart. This is what the variance and standard deviation do. First we show
the formulas for these measurements.
WHY DO WE USE LINEAR MODEL ?
Answer:
A linear regression is a statistical model that analyzes the relationship between a
response variable (often called y) and one or more variables and their interactions
(often called x or explanatory variables). You make this kind of relationships in your
head all the time, for example when you calculate the age of a child based on her
height, you are assuming the older she is, the taller she will be.

You might also like