Research+Methodology+ +Multivariate+Analysis
Research+Methodology+ +Multivariate+Analysis
STUDY NOTES RESEARCH METHODOLOGY Dr. O S SARAVANAN M.Com., M.Sc., M.B.A., Ph.D, I.C.W.A.(I),
SYNOPSIS 1. Multivariate Analysis 2. Factor Analysis 3. Multiple Regression Analysis 4. Discriminant Analysis 5. Cluster Analysis 6. Conjoint Analysis 7. Statistical Packages MULTIVARIATE ANALYSIS
Multivariate analysis (MVA) is based on the statistical principle of multivariate statistics, which involves observation and analysis of more than one statistical variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest.
8. Analysis of concepts with respect to changing scenarios 9. Identification of critical design drivers and correlations across hierarchical levels.
Obtain a summary or an overview of a table. This analysis is often called Principal Components Analysis or Factor Analysis. In the overview, it is possible to identify the dominant patterns in the data, such as groups, outliers, trends, and so on. The patterns are displayed as two plots
Analyze groups in the table, how these groups differ, and to which group individual table rows belong. This type of analysis is called Classification and Discriminant Analysis Find relationships between columns in data tables, for instance relationships between process operation conditions and product quality. The objective is to use one set of variables (columns) to predict another, for the purpose of optimization, and to find out which columns are important in the relationship. The corresponding analysis is called Multiple Regression Analysis or Partial Least Squares (PLS), depending on the size of the data table
with any factor. This is the most common form of factor analysis. There is no prior theory and one uses factor loadings to intuit the factor structure of the data. Confirmatory factor analysis (CFA) seeks to determine if the number of factors and the loadings of measured (indicator) variables on them conform to what is expected on the basis of preestablished theory. Indicator variables are selected on the basis of prior theory and factor analysis is used to see if they load as predicted on the expected number of factors. The researcher's a priori assumption is that each factor (the number and labels of which may be specified a priori) is associated with a specified subset of indicator variables. A minimum requirement of confirmatory factor analysis is that one hypothesizes beforehand the number of factors in the model, but usually also the researcher will posit expectations about which variables will load on which factors. The researcher seeks to determine, for instance, if measures created to represent a latent variable really belong together.
Identify the salient attributes consumers use to evaluate products in this category. Use quantitative marketing research techniques (such as surveys) to collect data from a Input the data into a statistical program and run the factor analysis procedure. The Use these factors to construct perceptual maps and other product positioning devices.
sample of potential customers concerning their ratings of all the product attributes.
Both objective and subjective attributes can be used provided the subjective attributes can Factor Analysis can be used to identify hidden dimensions or constructs which may not be It is easy and inexpensive to do
attributes. If important attributes are missed the value of the procedure is reduced.
If sets of observed variables are highly similar to each other and distinct from other items,
factor analysis will assign a single factor to them. This may make it harder to identify factors that capture more interesting relationships.
Naming the factors may require background knowledge or theory because multiple
More precisely, multiple regression analysis helps us to predict the value of Y for given values of X1, X2, , Xk. For example the yield of rice per acre depends upon quality of seed, fertility of soil, fertilizer used, temperature, rainfall. If one is interested to study the joint affect of all these variables on rice yield, one can use this technique. An additional advantage of this technique is it also enables us to study the individual influence of these variables on yield. THE MULTIPLE REGRESSION MODEL In general, the multiple regression equation of Y on X1, X2, , Xk is given by: Y = b0 + b1 X1 + b2 X2 + + bk Xk USAGE Multiple regression analysis is used when one is interested in predicting a continuous dependent variable from a number of independent variables. If dependent variable is dichotomous, then logistic ASSUMPTIONS Multiple regression technique does not test whether data are linear. On the contrary, it proceeds by assuming that the relationship between the Y and each of Xis is linear. Hence as a rule, it is prudent to always look at the scatter plots of (Y, Xi), i= 1, 2,,k. If any plot suggests non linearity, one may use a suitable transformation to attain linearity. Another important assumption is non existence of multicollinearity- the independent variables are not related among themselves. At a very basic level, this can be tested by computing the correlation coefficient between each pair of independent variables. Other assumptions include those of homoscedasticity and normality. regression should be used.
(ANOVA). The principal difference between discriminant analysis and the other two methods is with regard to the nature of the dependent variable. 2. Discriminant analysis requires the researcher to have measures of the dependent variable and all of the independent variables for a large number of cases. In regression analysis and ANOVA, the dependent variable must be a "continuous variable." A numeric variable indicates the degree to which a subject possesses some characteristic, so that the higher the value of the variable, the greater the level of the characteristic. A good example of a continuous variable is a person's income. 3. In discriminant analysis, the dependent variable must be a "categorical variable." The values of a categorical variable serve only to name groups and do not necessarily indicate the degree to which some characteristic is present. An example of a categorical variable is a measure indicating to which one of several different market segments a customer belongs; another example is a measure indicating whether or not a particular employee is a "high potential" worker. The categories must be mutually exclusive; that is, a subject can belong to one and only one of the groups indicated by the categorical variable. While a categorical variable must have at least two values (as in the "high potential" case), it may have numerous values (as in the case of the market segmentation measure). As the mathematical methods used in discriminant analysis are complex, they are described here only in general terms. We will do this by providing an example of a simple case in which the dependent variable has only two categories. 4. Discriminant analysis is most often used to help a researcher predict the group or category to which a subject belongs. For example, when individuals are interviewed for a job, managers will not know for sure how job candidates will perform on the job if hired. Suppose, however, that a human resource manager has a list of current employees who have been classified into two groups: "high performers" and "low performers." These individuals have been working for the company for some time, have been evaluated by their supervisors, and are known to fall into one of these two mutually exclusive categories. The manager also has information on the employees' backgrounds: educational attainment, prior work experience, participation in training programs, work attitude measures, personality characteristics, and so forth. This information was known at the time these employees were hired. The manager
wants to be able to predict, with some confidence, which future job candidates are high performers and which are not. A researcher or consultant can use discriminant analysis, along with existing data, to help in this task. 5. There are two basic steps in discriminant analysis. The first involves estimating coefficients, or weighting factors, that can be applied to the known characteristics of job candidates (i.e., the independent variables) to calculate some measure of their tendency or propensity to become high performers. This measure is called a "discriminant function." Second, this information can then be used to develop a decision rule that specifies some cut-off value for predicting which job candidates are likely to become high performers. 6. The equation is quite similar to a regression equation.
The population is divided into N groups, called clusters. The researcher randomly selects n clusters to include in the sample.
The number of observations within each cluster Mi is known, and M = M1 + M2 + M3 + ... + MN-1 + MN. Each element of the population can be assigned to one, and only one, cluster. One-stage sampling. All of the elements within selected clusters are included in the sample. Two-stage sampling. A subset of elements within selected clusters are randomly selected for inclusion in the sample.
Cluster Sampling: Advantages and Disadvantages Assuming the sample size is constant across sampling methods, cluster sampling generally provides less precision than either simple random sampling or stratified sampling. This is the main disadvantage of cluster sampling. Given this disadvantage, it is natural to ask: Why use cluster sampling? Sometimes, the cost per sample point is less for cluster sampling than for other sampling methods. Given a fixed budget, the researcher may be able to use a bigger sample with cluster sampling than with the other methods. When the increased sample size is sufficient to offset the loss in precision, cluster sampling may be the best choice. When to Use Cluster Sampling Cluster sampling should be used only when it is economically justified - when reduced costs can be used to overcome losses in precision. This is most likely to occur in the following situations.
Constructing a complete list of population elements is difficult, costly, or impossible. For example, it may not be possible to list all of the customers of a chain of hardware stores. However, it would be possible to randomly select a subset of stores (stage 1 of cluster sampling) and then interview a random sample of customers who visit those stores (stage 2 of cluster sampling).
The population is concentrated in "natural" clusters (city blocks, schools, hospitals, etc.). For example, to conduct personal interviews of operating room nurses, it might make sense to randomly select a sample of hospitals (stage 1 of cluster sampling) and then interview all of the operating room nurses at that hospital. Using cluster sampling, the interviewer could conduct many interviews in a single day at a single hospital. Simple random sampling, in contrast, might require the interviewer to spend all day traveling to conduct a single interview at a single hospital.
Even when the above situations exist, it is often unclear which sampling method should be used. Test different options, using hypothetical data if necessary. Choose the most cost-effective approach; that is, choose the sampling method that delivers the greatest precision for the least cost.
Advantages
estimates psychological tradeoffs that consumers make when evaluating several attributes measures preferences at the individual level uncovers real or hidden drivers which may not be apparent to the respondent themselves realistic choice or shopping task able to use physical objects if appropriately designed, the ability to model interactions between attributes can be used to
together
designing conjoint studies can be complex with too many options, respondents resort to simplification strategies difficult to use for product positioning research because there is no procedure for
converting perceptions about actual features to perceptions about a reduced set of underlying features
respondents are unable to articulate attitudes toward new categories, or may feel forced to poorly designed studies may over-value emotional/preference variables and undervalue does not take into account the number items per purchase so it can give a poor reading of
think about issues they would otherwise not give much thought to
concrete variables
market share
*******
STATISTICAL PACKAGES Meaning Statistical software are specialized computer programs for statistical analysis. Packages 1. Aabel Graphic display and plotting of statistical data sets
2. ADAPA batch and real-time scoring of statistical models 3. Angoss 4. ASReml for restricted maximum likelihood analyses 5. BMDP general statistics package
6. CalEst general statistics and probability package with didactic tutorials 7. Data Applied for building statistical models 8. DPS comprehensive statistics package 9. EViews for econometric analysis 10. FAME a system for managing time series statistics and time series databases 11. FinMath - A .NET numerical library containing descriptive statistics, distributions, factor
data
32. Primer-E Primer environmental and ecological specific. 33. PV-WAVE programming language comprehensive data analysis and visualization with
34. Q research software quantitative data analysis software for market research 35. Quantum part of the SPSS MR product line, mostly for data validation and tabulation in
Analyse-it add-on to Microsoft Excel for statistical analysis Sigma Magic - add-on to Microsoft Excel for statistical analysis designed for Lean Six Sigma SigmaXL add-on to Microsoft Excel for statistical and graphical analysis SPC XL add-on to Microsoft Excel for general statistics SUDAAN add-on to SAS and SPSS for statistical surveys XLfit add-on to Microsoft Excel for curve fitting and statistical analysis XLSTAT add-on to Microsoft Excel for statistics and multivariate data analysis
3. 4. 5. 6. 7.
8.
Stats Helper add-on to Microsoft Excel for descriptive statistics and Six Sigma *************