Data Processing and Data Analysis

ASSIGNMENT
ON
Data processing and data analysis

COURSE TITLE: BUSINESS RESEARCH METHOD
SUBMITTED TO
Prof. Mohammed Harisur Rahman
Howlader
DEPARTMENT OF MANAGEMENT
UNIVERSITY OF CHITTAGONG
SUBMITTED BY
TILAK KARMAKAR
ID: 16302124
Date of Submission: 30/07/2020

BBA, 8th Semester
Session: 2015-16
Question 1: Why data editing? Discuss the means of coding data in case of
category data, rating data and ranking data?
Editing is the process of checking and adjusting the data for omissions, legibility and consistency
and readying them for coding and storage. The editing process corrects problems such as
interviewer errors (an answer recorded on the wrong portion of a questionnaires, for example)
before the data are transferred to the computer.
Means of coding data in case of category data, rating data and ranking data:
Coding of data in case of category data, rating data, and ranking data refers to the process of
transforming collected information or observations to a set of meaningful, cohesive categories. It
is a process of summarizing and re-presenting data in order to provide a systematic account of
the recorded or observed phenomenon. Data refer to a wide range of empirical objects such as
historical documents, newspaper articles, TV programming, field notes, interview or focus group
transcripts, pictures, face-to-face conversations, social media messages (e.g., tweets or YouTube
comments), and so on. Codes are concepts that link data with theory. They can either be
predefined by the researcher or emerge inductively from the coding process. By coding data,
researchers classify and attach conceptual labels to empirical objects under study in order to
organize and interpret.
Question 2: Discuss different means of data entry and data presentation.

There are three main means of data entry and data presentation,
 it can be incorporated into the main body of text;
 it can be presented separately as a table; or
 It can be used to construct a graph or chart.
Including numbers in the main body of text: Numbers are most effective in the main body of the
text of an essay, report or dissertation when there are only two values to compare. For
example: 86% of male students said they regularly ate breakfast compared to 62% of female
students.
If we are discussing three or more numbers, including them within the main body of text does
not facilitate comprehension or comparison and it is often more useful to use a table incorporated
within the text.
Using Table: Tables are the format in which most numerical data are initially stored and
analyzed and are likely to be the means you use to organize data collected during experiments
and dissertation research. However, when writing up, we will have to make a decision about
whether a table is the best way of presenting the data, or if it would be easier to understand if you
were to use a graph or chart. Tables are an effective way of presenting data,
 When we wish to show how a single category of information varies when measured at
different points (in time or space). For example, a table would be an appropriate way of showing
how the category unemployment rate varies between different countries in the EU (different
points in space);
 When the dataset contains relatively few numbers. This is because it is very hard for a reader
to assimilate and interpret many numbers in a table. In particular, avoid the use of complex tables
in talks and presentations when the audience will have a relatively short time to take in the
information and little or no opportunity to review it at a later stage;
 When the precise value is crucial to our argument and a graph would not convey the same
level of precision. For example, when it is important that the reader knows that the result was
2.48 and not 2.45;
Graphs: Graphs are a good means of describing, exploring or summarizing numerical data
because the use of a visual image can simplify complex information and help to highlight
patterns and trends in the data. They are a particularly effective way of presenting a large amount
of data but can also be used instead of a table to present smaller datasets.
Types of Graph
Bar charts: Bar charts are one of the most commonly used types of graph and are used to
display and compare the number, frequency or other measure (e.g. mean) for different discrete
categories or groups. The graph is constructed such that the heights or lengths of the different
bars are proportional to the size of the category they represent. Since the x-axis (the horizontal
axis) represents the different categories it has no scale. The y-axis (the vertical axis) does have a
scale and this indicates the units of measurement. The bars can be drawn either vertically or
horizontally depending upon the number of categories and length or complexity of the category
labels.
Histograms: Histograms are a special form of bar chart where the data represent continuous
rather than discrete categories. For example, instead of drawing a bar for each individual age
between 0 and 65, the data could be grouped into a series of continuous age ranges such as 1624,
25-34, 35-44 etc. Unlike a bar chart, in a histogram both the x- and y-axes have a scale.
Pie charts: Pie charts are a visual way of displaying how the total data are distributed between
different categories. The example here shows the proportional distribution of visitors between
different types of tourist attractions. Similar uses of a pie chart would be to show the percentage
of the total votes received by each party in an election. Pie charts should only be used for
displaying nominal data (i.e. data that are classed into different categories).
Line graphs: Line graphs are usually used to show time series data – that is how one or more
variables vary over a continuous period of time. Typical examples of the types of data that can be
presented using line graphs are monthly rainfall and annual unemployment rates. Line graphs are
particularly useful for identifying patterns and trends in the data such as seasonal effects, large
changes and turning points.
Question 3: Name some computer program used for data analyses.

Business or social science researcher should become familiar with at least one general computer
program for data analysis. Here, some name of computer program given below-
SPSS- SPSS is a software which is widely used as a Statistical Analytic Tool in the Field of
Social Science, Such as Market research, Surveys, Competitor Analysis, and others. It is one of
the most popular statistical package which can perform highly complex data manipulation and
analysis with ease. It is designed for both interactive and non-interactive users.
SAS- SAS is a software suite that can mine, alter, manage and retrieve data from a variety of
sources and perform statistical analysis on it.
SYSTAT- Powerful statistical analysis and graphics software. Simplify research and enhance
publications with SYSTAT’s comprehensive suite of statistical functions and brilliant 2D and 3D
charts and graphs.
Microsoft Excel- Excel's basic data analysis tool will allow descriptive statistical including
frequencies and measures of central tendency to be computed. Spreadsheets packages like excel
continue to evolve and become more viable for performing many statistical analyses.
Question 4: What is the specific use of following statistical techniques in data

analysis:
A. ANOVA, B. T-TEST OR Z- TEST, C. CHI SQUARE TEST, D. WILCOXON TEST?
ANOVA: An ANOVA (Analysis of Variance) test is a way to find out if survey or experiment
results are significant. In other words, they help to figure out if it is need to reject the null
hypothesis or accept the alternate hypothesis. We might use Analysis of Variance (ANOVA),
when you want to test a particular hypothesis. We would use ANOVA to help in understand how
different groups respond, with a null hypothesis for the test that the means of the different groups
are equal. If there is a statistically significant result, then it means that the two populations are
unequal (or different)
Examples of when we might want to test different groups:
 A group of psychiatric patients are trying three different therapies: counseling, medication
and biofeedback. You want to see if one therapy is better than the others.
 A manufacturer has two different processes to make light bulbs. They want to know if one
process is better than the other.
T- TEST OR Z- TEST:
1) When the population standard deviation (α) is known, the z-test is most appropriate.
2) When (α) is unknown (the situation in most marketing research studies), and the sample size
greater than 30, the z- test also can be used.
3) When (α) is unknown and the sample size is small, the t-test is most appropriate. Since the
two distribution are similar with larger sample sizes, the two tests often yield the same
conclusion
CHI SQUARE TEST: The Chi Square statistic is commonly used for testing relationships
between categorical variables. The null hypothesis of the Chi-Square test is that no relationship
exists on the categorical variables in the population; they are independent. The Chi-Square
statistic is most commonly used to evaluate Tests of Independence when using a cross tabulation
(also known as a bivariate table). Cross tabulation presents the distributions of two categorical
variables simultaneously, with the intersections of the categories of the variables appearing in the
cells of the table. The Test of Independence assesses whether an association exists between the
two variables by comparing the observed pattern of responses in the cells to the pattern that
would be expected if the variables were truly independent of each other. Calculating the
ChiSquare statistic and comparing it against a critical value from the Chi-Square distribution
allows the researcher to assess whether the observed cell counts are significantly different from
the expected cell counts.
WILCOXON TEST: The Wilcoxon signed-rank test is a non-parametric statistical hypothesis

test used to compare two related samples, matched samples, or repeated measurements on a
single sample to assess whether their population mean ranks differ (i.e. it is a paired difference
test). It can be used as an alternative to the paired Student's t-test (also known as "t-test for
matched pairs" or "t-test for dependent samples") when the distribution of the difference between
two samples' means cannot be assumed to be normally distributed. A Wilcoxon signed-rank test
is a nonparametric test that can be used to determine whether two dependent samples were
selected from populations having the same distribution.
Question 5: What do you mean by R2=.43?

R2 (The coefficient of multiple determination) in multiple regression indicates the percentage of
variation in dependent variable is explained by the combination of all independent variables.
Here, a value R2=.43 means that 43 percent of the variance in the dependent variable is
explained by the independent variables. If two independent variables are truly independent
(uncorrected with each other), the R2 for a multiple regression model is equal to the separate R2
values that would result from two separate simple regression models.
Question 6: Explain the following result:

F=2.34, p=.156
If you get a large f value (one that is bigger than the F critical value found in a table) It means
something is significant, while a small p value means all your results are significant. The F
statistics just compares the joint effect of all the variables together.
In here, we see F value is 2.34 and p value is .156 that means p value is smaller.
So we can say that all our results are significant.
Question 7: Compare between factor analysis and cluster analysis.

A comparison chart between factor and cluster analysis is given below-
Factor analysis Cluster Analysis

Dimension reduction technique Classification Classification analysis
analysis
Usual objective of factor analysis is to explain
correlation in a set of data and relate variable Objective of cluster analysis is to address
to each other heterogeneity in each set of data
Factor analysis is a form of simplification Cluster analysis is a form of categorization
Factor analysis has the ability to reduce a Cluster analysis is suitable for classifying
unwieldy set of variables to a much smaller objects according to certain criteria
set of factors
Statistics associated with factor analysis Statistics associated with cluster analysis-
Correlation matrix, Community ,Factor scores agglomeration schedule, Cluster centroid,
etc. Cluster centers etc.
Example- new car buyers might be grouped Example- Cluster analysis has been used to
based on the relative emphasis they place on identify the kinds of strategies automobile
economy, convenience, performance, comfort purchasers use to obtain external information
and luxury.
Question 8: Explain the following multiple regression analysis:

Y= 102.18 + .387X1 + 115.2X2 + 6.73X3
Coefficient of multiple determination (R2) = .845
Here Y denotes a regression equation. Note that all the sign in the equation are positive. Thus,
regression equation indicates that Y is positively related to X1, X2, X3.The coefficients show the
effect on the dependent variable of a 1 unit increase in any independent variable. The value
associated with X1 is 0.387. Thus 1 unit increase in X1 is actually associated with an increase of
(0.387*1). It goes same in case of X2 and X3.
So if the effects associated with X1, X2, X3 are positive we can support the hypotheses and vice
versa.
And the value R2=.843 means that 84.3 percent of the variance in the dependent variable is
explained by the independent variables.
Question 9: What is canonical correlation? Give an example.

Canonical correlation analysis is the study of the linear relations between two sets of variables. It
is the multivariate extension of correlation analysis. There are,
-> Two or more criterion variable (dependent variables)
-> Multiple predictor variables (independent variables)
-> An extension of multiple regression
-> Linear association between two sets of variables
Canonical correlation is appropriate in the same situations where multiple regression would be,
but where are there are multiple inter correlated outcome variables.
As an example, variables related to exercise and health. On one hand, you have variables
associated with exercise, observations such as the climbing rate on a stair stepper, how fast you
can run a certain distance, the amount of weight lifted on bench press, the number of push-ups
per minute, etc. On the other hand, you have variables that attempt to measure overall health,
such as blood pressure, cholesterol levels, glucose levels, body mass index, etc. Two types of
variables are measured and the relationships between the exercise variables and the health
variables are of interest.
End

Data Processing and Data Analysis

Uploaded by

Data Processing and Data Analysis

Uploaded by

ASSIGNMENT

Data processing and data analysis

Date of Submission: 30/07/2020

Question 2: Discuss different means of data entry and data presentation.

 it can be incorporated into the main body of text;

 it can be presented separately as a table; or

 It can be used to construct a graph or chart.

Question 3: Name some computer program used for data analyses.

Question 4: What is the specific use of following statistical techniques in data

Examples of when we might want to test different groups:

WILCOXON TEST: The Wilcoxon signed-rank test is a non-parametric statistical hypothesis

Question 5: What do you mean by R2=.43?

Question 6: Explain the following result:

So we can say that all our results are significant.

Question 7: Compare between factor analysis and cluster analysis.

Factor analysis Cluster Analysis

Question 8: Explain the following multiple regression analysis:

Coefficient of multiple determination (R2) = .845

Question 9: What is canonical correlation? Give an example.

-> Two or more criterion variable (dependent variables)

-> Multiple predictor variables (independent variables)

-> An extension of multiple regression

-> Linear association between two sets of variables

You might also like