The document discusses various statistical and machine learning techniques including logistic regression, ANOVA, linear regression, and lowess regression. Logistic regression and ANOVA are used to investigate relationships between factors and predict outcomes. Linear regression builds prediction models and lowess regression shows patterns in data. Various statistical values like p-values and measures of significance are also discussed.
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0 ratings0% found this document useful (0 votes)
31 views4 pages
The Stack
The document discusses various statistical and machine learning techniques including logistic regression, ANOVA, linear regression, and lowess regression. Logistic regression and ANOVA are used to investigate relationships between factors and predict outcomes. Linear regression builds prediction models and lowess regression shows patterns in data. Various statistical values like p-values and measures of significance are also discussed.
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 4
The stack() function is used to stack the prescribed level(s) from columns
to index.
Before the ~ sign is the column that you want it to predict, and then the columns used to predict it
predict command is to predict the data according to the formula built
from model1
• The p-value is a statistical measurement used to validate a hypothesis
against observed data. • The p-value measures the probability of the observed outcomes, assuming that the null hypothesis is true. • The lower the p-value, the greater the statistical significance of the observed difference.
Logistic regression and ANOVA.
Logistic regression: to investigate the relationship between components factors and age in day with the strength of cocrete ANOVA: to compare the efficiency of linear regression model built
Lowess line then it's the local regression of those dots
It kind of shows us the pattern of those dots
WHY DO WE USE ONE – WAY ANOVA?
Answer: + The one-way analysis of variance (ANOVA) is used to determine whether there are any statistically significant differences between the means of two or more independent (unrelated) groups (although you tend to only see it used when there is a minimum of three, rather than two groups). + For example, you could use a one-way ANOVA to understand whether exam performance differed based on test anxiety levels amongst students, dividing students into three independent groups (e.g., low, medium and high-stressed students).
WHY DO WE USE TWO, MORE WAY ANOVA?
Answer: + A two-way ANOVA is used to estimate how the mean of a quantitative variable changes according to the levels of two categorical variables. + Use a two-way ANOVA when you want to know how two independent variables, in combination, affect a dependent variable.
THE DIFFERENCE BETWEEN ONE WAY AND TWO WAY?
Answer:
The key differences between one-way and two-way ANOVA are summarized clearly below.
A one-way ANOVA is primarily designed to enable the equality testing
between three or more means. A two-way ANOVA is designed to assess the interrelationship of two independent variables on a dependent variable.
A one-way ANOVA only involves one factor or independent variable, whereas
there are two independent variables in a two-way ANOVA.
In a one-way ANOVA, the one factor or independent variable analyzed has
three or more categorical groups. A two-way ANOVA instead compares multiple groups of two factors.
One-way ANOVA need to satisfy only two principles of design of experiments,
i.e. replication and randomization. As opposed to Two-way ANOVA, which meets all three principles of design of experiments which are replication, randomization, and local control.
WHY DO WE USE CHI-SQUARED?
Answer: The Chi Square statistic is commonly used for testing relationships between categorical variables. The null hypothesis of the Chi-Square test is that no relationship exists on the categorical variables in the population; they are independent. An example research question that could be answered using a Chi- Square analysis would be: Is there a significant relationship between voter intent and political party membership? The difference between ANOVA and Chi-squared? Answer: A chi-square is only a nonparametric criterion. You can make comparisons for each characteristic. You can also use Factorial ANOVA. In Factorial ANOVA, you can investigate the dependence of a quantitative characteristic (dependent variable) on one or more qualitative characteristics (category predictors). If the number of groups is large, you do not see which caused a significant deviation of the average. It is necessary to make a posterial comparison of the average (post-hoc analysis) Tukey HSD criterion. The Tukey criterion has the same applicability conditions as the variance analysis, i.e., the normality of the data distribution and the uniformity of the group dispersions. The uniformity of the group dispersions is checked by Levene 's test index. Can we use chi-squared to replace ANOVA? Này bí :V HOW TO READ A BOX-PLOT?
Answer:
A boxplot is a standardized way of displaying the distribution of data based on a five
number summary (“minimum”, first quartile (Q1), median, third quartile (Q3), and “maximum”). It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed.
The mean, median, sd, maxium, minium?
Answer: + The sample mean is the average and is computed as the sum of all the observed outcomes from the sample divided by the total number of events. We use x as the symbol for the sample mean. In math terms + An alternative measure is the median. The median is the middle score. If we have an even number of events we take the average of the two middles. The median is better for describing the typical value. It is often used for income and home prices. + The mean, mode, median, and trimmed mean do a nice job in telling where the center of the data set is, but often we are interested in more. For example, a pharmaceutical engineer develops a new drug that regulates iron in the blood. Suppose she finds out that the average sugar content after taking the medication is the optimal level. This does not mean that the drug is effective. There is a possibility that half of the patients have dangerously low sugar content while the other half have dangerously high content. Instead of the drug being an effective regulator, it is a deadly poison. What the pharmacist needs is a measure of how far the data is spread apart. This is what the variance and standard deviation do. First we show the formulas for these measurements. WHY DO WE USE LINEAR MODEL ? Answer: A linear regression is a statistical model that analyzes the relationship between a response variable (often called y) and one or more variables and their interactions (often called x or explanatory variables). You make this kind of relationships in your head all the time, for example when you calculate the age of a child based on her height, you are assuming the older she is, the taller she will be.