MBA MK-01: MARKETING RESEARCH
Unit – 4
Q1. What is the role of ‘Data Editing and Data
Analysis’ in preparing research report?
Ans. When
the researcher collects the data it is in raw form and it needs to be edited,
organized and analyzed. The raw data needs to be transformed into a
comprehensible form of data. The first steps in this process are to edit the
data. The edited data is then coded and inferences are drawn. The editing of
the data is not a complex task but it requires an experienced, talented and knowledgeable
person to do so. The role of data editing and data analysis is as follows:
Role of data editing:
Clarify responses
With editing the data the researcher makes sure that all responses
are now very clear to understand. Bringing clarity is important otherwise the
researcher can draw wrong inferences from the data. Sometimes the respondents
make some spelling and grammatical mistakes the editor needs to correct them.
The respondents might not be able to express their opinion in proper wording.
The editor can rephrase the response, but he needs to be very careful in doing
so. Any bias can be introduced by taking the wrong meanings of the respondent’s
point of view.
Make omissions
The editor may also need to make some omissions in the responses.
By chance or by some mistake some responses are left incomplete, the editor has
to see what has been an oversight by the respondent.
It depends on the target population how well you get the
questionnaires filled. An educated respondent will fill the questionnaire in a
better manner than a person who is not very educated. It also depends on how
much interested the respondent is in filling the questionnaire. Sometimes the
respondents are very reluctant to fill it out. In case, you think that your
respondents are not very much interested, you should take an interview rather
than submitting a questionnaire. In the questionnaire, the respondents will
leave blank spaces and you might get “no response”. On the other hand, in an
interview you can better assess what they want to tell and what they are trying
to hide.
Avoid biased editing
The editor has a great responsibility to edit the surveyed data or
other form of responses. The editor needs to be very objective and should not
try to hide or remove any information. He should not add anything in the
responses without any sound reason. He should have to be confident in making
any changes or corrections in the data. In short, he should make least changes
and only logical changes. He should not add anything that shows his opinion on
the issue.
Make judgements
Sometimes the respondents leave something incomplete; to complete
the sentence or a phrase the editor has to make a judgement. He should have to
have good judgement to do so. He should do it so well that his personal bias do
not involve in the responses.
Check handwriting
Handwriting issues needs also be resolved by the editor. Some
people write very fast and in this way they write so that comprehension of the
text becomes difficult. In electronically sent questionnaires this problem
never arises.
Logical
adjustments
Logical adjustments must be made or otherwise the data will become
faulty. There might be need for some logical corrections, for example, a
respondent gives these three answers to the three questions that have been
asked from him;
Q1: What is your age?
Ans: 16 years
Q2: What is your academic qualification?
Ans: Bachelors
Q3: What academic qualifications you
want to achieve in the future?
Ans: Bachelors in fine arts
Looking at the answers he has provided, he could not be 16 years
of age and done with bachelors degree. By looking at other answers he has
provided you can guess his age. If he is 16 years of age then he could not be
done with bachelors and you can guess in which class he will be. In case, it is
possible to contact with the respondent you can ask him about these answers.
You can make logical changes in these answers because it is clearly evident
that 16-year boy or girl could not be in bachelors. He might got confused
between the two questions and give wrong response. Such corrections are pretty
easy to make but there can be some other responses that are tricky and clearly
wrong. The editor must have knowledge how to correct the answers and what to do
in such situation.
Re-contact the respondent
If some information is least comprehendible and no logical meaning
can be taken, interviewees can be re-contacted to know what they meant by that.
In case, the data in the questionnaire is not correct and the editor cannot
take any meaning from it. The editor should ask the respondents, re-contact
with them and get their help.
Electronic editing
In recent years, most of the researchers prefer to submit
electronic questionnaires wherever it is possible. Electronically sent
questionnaires are easy to edit, because in the electronic questionnaire you
can set some parameters. The computer can edit the questionnaire itself and the
job of the editor becomes easy. You can avoid inconsistencies in the electronic
questionnaire. The logical errors can be completely avoided. No response
answers are few in electronic questionnaires.
The qualities of the data editor
The data editor should have three qualities; he should have to be
Intelligent, objective and experienced in editing the data. He should know that
how important is the handling of data to the researcher. He should try to avoid
the slightest chances of bias, which means that he should also be honest with
his work. His data editing will play a major role on the final inferences that
the researcher will draw from the data.
Role of data analysis:
Before beginning the dissertation writing, one has to collect data
for the research. The data to be used can be either collected using data
gathering techniques or someone else’s existing data, if it serves the purpose
of the research. Collecting the data correctly takes a great deal of work.
Before data analysis can begin, the accuracy of the data collected needs to be
verified. Following data collection, the data needs to be critically analysed.
For any research, data analysis is very important as it provides an explanation
of various concepts, theories, frameworks and methods used. It eventually helps
in arriving at conclusions and proving the hypothesis.
Data analysis is a process used to inspect, clean, transform and
remodel data with a view to reach to a certain conclusion for a given
situation. Data analysis is typically of two kinds: qualitative or
quantitative. The type of data dictates the method of analysis. In qualitative
research, any non-numerical data like text or individual words are analysed.
Quantitative analysis, on the other hand, focuses on measurement of the data
and can use statistics to help reveal results and conclusions. The results are
numerical. In some cases, both forms of analysis are used hand in hand. For
example, quantitative analysis can help prove qualitative conclusions.
Among the many benefits of data analysis, the more important ones
are:
·
Data analysis helps in structuring the findings from different
sources of data.
·
Data analysis is very helpful in breaking a macro problem into
micro parts.
·
Data analysis acts like a filter when it comes to acquiring
meaningful insights out of huge data set.
·
Data analysis helps in keeping human bias away from the research
conclusion with the help of proper statistical treatment.
·
When discussing data analysis it is important to mention that a
methodology to analyse data needs to be picked. if a specific methodology is
not selected data can neither be collected nor analyzed.
·
The methodology should be present in the dissertation as it
enables the reader to understand which methods have been used during the
research and what type of data has been collected and analyzed throughout the
process.
·
The dissertation also presents a critical analysis of various
methods and techniques that were considered but ultimately not used for the
data analysis. An effective research methodology leads to better data
collection and analysis and leads the researcher to arrive at valid and logical
conclusions in the research. Without a specific methodology, observations and
findings in a research cannot be made which means methodology is an essential
part of a research or dissertation.
Q2. Define Data Processing. What are the steps
involved in Data Processing?
Ans. Data
processing is, generally, "the collection and manipulation of items of
data to produce meaningful information." In this sense it can be
considered a subset of information processing, "the change (processing) of
information in any manner detectable by an observer." The term Data
processing (DP) has also been used previously to refer to a department within
an organization responsible for the operation of data processing applications.
General: Operations performed on a given set of data to extract
the required information in an appropriate form such as diagrams, reports, or
tables. See also electronic data processing.
Computing: Manipulation of input data with an application program
to obtain desired output as an audio/video, graphic, numeric, or text data
file.
Data processing is concerned with editing, coding, classifying,
tabulating and charting and diagramming research data. The essence of data
processing in research is data reduction. Data reduction involves winnowing out
the irrelevant from the relevant data and establishing order from chaos and
giving shape to a mass of data. Data processing in research consists of five
important steps. They are discussed below:
1. Editing of Data
Editing is the first step in data processing. Editing is the
process of examining the data collected in questionnaires/schedules to detect
errors and omissions and to see that they are corrected and the schedules are
ready for tabulation. When the whole data collection is over a final and a
thorough check up is made. Mildred B. Parten in his book points out that the
editor is responsible for seeing that the data are;
·
Accurate as possible,
·
Consistent with other facts secured,
·
Uniformly entered,
·
As complete as possible,
·
Acceptable for tabulation and arranged to facilitate coding
tabulation.
There are different types of editing. They are:
Field Editing is
done by the enumerator. The schedule filled up by the enumerator or the
respondent might have some abbreviated writings, illegible writings and the
like. These are rectified by the enumerator. This should be done soon after the
enumeration or interview before the loss of memory. The field editing should
not extend to giving some guess data to fill up omissions.
Central Editing is
done by the researcher after getting all schedules or questionnaires or forms
from the enumerators or respondents. Obvious errors can be corrected. For
missed data or information, the editor may substitute data or information by
reviewing information provided by likely placed other respondents. A definite
inappropriate answer is removed and “no answer” is entered when reasonable
attempts to get the appropriate answer fail to produce results.
Editors must keep in view the following points while performing
their work:
1. They should be familiar with instructions given to the interviewers
and coders as well as with the editing instructions supplied to them for the
purpose,
2. While crossing out an original entry for one reason or another,
they should just draw a single line on it so that the same may remain legible,
3. They must make entries (if any) on the form in some distinctive
color and that too in a standardized form,
4. They should initial all answers which they change or supply,
5. Editor’s initials and the data of editing should be placed on
each completed form or schedule.
2. Coding of Data
Coding is necessary for efficient analysis and through it the
several replies may be reduced to a small number of classes which contain the
critical information required for analysis. Coding decisions should usually be
taken at the designing stage of the questionnaire. This makes it possible to
pre-code the questionnaire choices and which in turn is helpful for computer
tabulation as one can straight forward key punch from the original
questionnaires. But in case of hand coding some standard method may be used.
One such standard method is to code in the margin with a colored pencil. The
other method can be to transcribe the data from the questionnaire to a coding
sheet. Whatever method is adopted, one should see that coding errors are altogether
eliminated or reduced to the minimum level.
Coding is the process/operation by which data/responses are
organized into classes/categories and numerals or other symbols are given to
each item according to the class in which it falls. In other words, coding
involves two important operations; (a) deciding the categories to be used and
(b) allocating individual answers to them. These categories should be
appropriate to the research problem, exhaustive of the data, mutually exclusive
and unidirectional Since the coding eliminates much of information in the raw
data, it is important that researchers design category sets carefully in order
to utilize the available data more fully.
The study of the responses is the first step in coding. In the
case of pressing – coded questions, coding begins at the preparation of
interview schedules. Secondly, coding frame is developed by listing the
possible answers to each question and assigning code numbers or symbols to each
of them which are the indicators used for coding. The coding frame is an
outline of what is coded and how it is to be coded. That is, a coding frame is
an outline of what is coded and how it is to be coded. That is, coding frame is
a set of explicit rules and conventions that are used to base classification of
observations variable into values which are which are transformed into numbers.
Thirdly, after preparing the sample frame the gradual process of fitting the
answers to the questions must be begun. Lastly, transcription is undertaken
i.e., transferring of the information from the schedules to a separate sheet
called transcription sheet. Transcription sheet is a large summary sheet which
contains the answer/codes of all the respondents. Transcription may not be
necessary when only simple tables are required and the number of respondents is
few.
3. Classification of Data
Classification or categorization is the process of grouping the
statistical data under various understandable homogeneous groups for the
purpose of convenient interpretation. A uniformity of attributes is the basic
criterion for classification; and the grouping of data is made according to
similarity. Classification becomes necessary when there is a diversity in the
data collected for meaningless for meaningful presentation and analysis. However,
it is meaningless in respect of homogeneous data. A good classification should
have the characteristics of clarity, homogeneity, equality of scale,
purposefulness and accuracy.
Objectives of Classification are below:
- The
complex scattered and haphazard data is organized into concise, logical
and intelligible form.
- It is
possible to make the characteristics of similarities and dissimilarities
clear.
- Comparative
studies are possible.
- Understanding
of the significance is made easier and thereby good deal of human energy
is saved.
- Underlying
unity amongst different items is made clear and expressed.
- Data is so arranged that analysis and generalization becomes
possible.
Classification is of two types, viz., quantitative classification,
which is on the basis of variables or quantity and qualitative classification,
in which classification according to attributes. The former is the way of,
grouping the variables, say, quantifying the variables in cohesive groups,
while the latter groups the data on the basis of attributes or qualities.
Again, it may be multiple classification or dichotomous classification. The
former is the way of making many (more than two) groups on the basis of some
quality or attributes while the latter is the classification into two groups on
the basis of presence or absence of a certain quality. Grouping the workers of
a factory under various income (class intervals) groups come under the multiple
classification; and making two groups into skilled workers and unskilled
workers is the dichotomous classification. The tabular form of such
classification is known as statistical series, which may be inclusive or
exclusive.
4. Tabulation of Data
Tabulation is the process of summarizing raw data and displaying
it in compact form for further analysis. Therefore, preparing tables is a very
important step. Tabulation may be by hand, mechanical, or electronic. The
choice is made largely on the basis of the size and type of study, alternative
costs, time pressures, and the availability of computers, and computer
programmes. If the number of questionnaire is small, and their length short,
hand tabulation is quite satisfactory.
Table may be divided into: (i) Frequency tables, (ii) Response
tables, (iii) Contingency tables, (iv) Uni-variate tables, (v) Bi-variate
tables, (vi) Statistical table and (vii) Time series tables.
Generally a research table has the following parts: (a) table
number, (b) title of the table, (c) caption (d) stub (row heading), (e) body,
(f) head note, (g) foot note.
As a general rule the following steps are necessary in the
preparation of table:
- Title of table: The table should be first given a brief,
simple and clear title which may express the basis of classification.
- Columns and rows: Each table should be prepared in just
adequate number of columns and rows.
- Captions and stubs: The columns and rows should be given
simple and clear captions and stubs.
- Ruling: Columns and rows should be divided by means of thin
or thick rulings.
- Arrangement of items; Comparable figures should be arranged
side by side.
- Deviations: These should be arranged in the column near the
original data so that their presence may easily be noted.
- Special emphasis: This can be done by writing important data
in bold or special letters.
- Unit of measurement: The unit should be noted below the
lines.
- Approximation: This should also be noted below the title.
- Footnotes: These may be given below the table.
- Source : Source of data must be given. For primary data,
write primary data.
It is always necessary to present facts in tabular form if they
can be presented more simply in the
body of the text. Tabular presentation enables the reader to follow quickly than textual
presentation. A table should not merely repeat information
covered in the text. The same information should not, of course be presented in tabular form and graphical
form. Smaller and simpler tables may be presented
in the text while the large and complex table may be placed at the end of the chapter or report.
5. Data Diagrams
Diagrams are charts and graphs used to present data. These
facilitate getting the attention of the reader more. These help presenting data
more effectively. Creative presentation of data is possible. The data diagrams
classified into:
Charts: A chart is
a diagrammatic form of data presentation. Bar charts, rectangles, squares and
circles can be used to present data. Bar charts are uni-dimensional, while
rectangular, squares and circles are two-dimensional.
Graphs: The method
of presenting numerical data in visual form is called graph, A graph gives
relationship between two variables by means of either a curve or a straight
line. Graphs may be divided into two categories. (1) Graphs of Time Series and
(2) Graphs of Frequency Distribution. In graphs of time series one of the
factors is time and other or others is / are the study factors. Graphs on
frequency show the distribution of by income, age, etc. of executives and so
on.
Q3. Describe the principles that should be kept in
mind while classifying data into categories?
Ans. The
following main principles would be kept in mind while classifying and
tabulating the statistical data, which helps to improving statistical research.
Rules to classification of data: The following general rules are
to be followed while classifying the data according to class intervals.
a) Number of class
intervals: The number of classes is usually
between 5 to 15. It should neither be very large nor be very small because if
the number of classes are very few, necessary information may be lost and if
the number of classes are very large, further analyzing of data would be
difficult.
b) Size of class interval: The approximate size of the class intervals can be estimated as
following relation.
c) Starting of class: The classes should start with 0 or 5 or multiple of 5.
d) Method of class: Exclusive method of classification should be preferred as it is
more useful.
e) Tally Bars: Class frequencies should be obtained by using tally marks or
tally bars.
f) It should be clear: Precise and Concrete form and it should meet the purpose of
study. It should be flexible.
Q4. What is Hypothesis? Discuss the characteristics
of a good hypothesis. Explain the term Null and Alternative Hypothesis?
Ans. Ordinarily,
when one talks about hypothesis, one simply means a mere assumption or some
supposition
to be proved or disproved. But for a researcher hypothesis is a formal
question that he intends to resolve.
Thus a hypothesis may be defined as a proposition or a set of
proposition set forth as an explanation for the occurrence of some specified
group of phenomena either asserted merely as a provisional conjecture to guide
some investigation or accepted as highly probable in the light of established
facts. Quite often a research hypothesis is a predictive statement, capable of
being tested by scientific methods, that relates an independent variable to
some dependent variable. For example, consider statements like the following
ones:
“Students who receive counselling will show a greater
increase in creativity than students not receiving counselling” Or
“the automobile A is performing as well as
automobile B.”
These are hypotheses capable of being objectively verified and tested.
Thus, we may conclude that a hypothesis states what we are looking for and it
is a proposition which can be put to a test to determine its validity.
The Characteristics of a hypothesis can be discussed as follows:
- Hypothesis should be clear and precise. If the
hypothesis is not clear and precise, the inferences drawn on its basis
cannot be taken as reliable.
- Hypothesis should be capable of being tested. In
a swamp of untestable hypotheses, many a time the research programmes have
bogged down. Some prior study may be done by researcher in order to make
hypothesis a testable one. A hypothesis “is testable if other deductions
can be made from it which, in turn, can be confirmed or disproved by
observation.”
- Hypothesis should state relationship between
variables, if it happens to be a relational hypothesis.
- Hypothesis should be limited in scope and must be
specific. A researcher must remember that narrower hypotheses are
generally more testable and he should develop such hypotheses.
- Hypothesis should be stated as far as possible in
most simple terms so that the same is easily understandable by all
concerned. But one must remember that simplicity of hypothesis has nothing
to do with its significance.
- Hypothesis should be consistent with most known
facts i.e., it must be consistent with a substantial body of established
facts. In other words, it should be one which judges accept as being the
most likely.
- Hypothesis should be amenable to testing within a
reasonable time. One should not use even an excellent hypothesis, if the
same cannot be tested in reasonable time for one cannot spend a life-time
collecting data to test it.
- Hypothesis must explain the facts that gave rise
to the need for explanation. This means that by using the hypothesis plus
other known and accepted generalizations, one should be able to deduce the
original problem condition. Thus hypothesis must actually explain what it
claims to explain; it should have empirical reference.
Statistics: An assumption about certain characteristics of a
population. If it specifies values for every parameter of a population, it is
called a simple hypothesis; if not,
a composite hypothesis. If it attempts to nullify the difference between two
sample means (by suggesting that the difference is of no statistical
significance), it is called a null hypothesis.
Null hypothesis (in a statistical test) the hypothesis that there is no
significant difference between specified populations, any observed difference
being due to sampling or experimental error.
A null hypothesis is a type of hypothesis used in statistics that
proposes that no statistical significance exists in a set of given
observations. The null hypothesis attempts to show that no variation exists
between variables or that a single variable is no different than its mean. It
is presumed to be true until statistical evidence nullifies it for an
alternative hypothesis.
The null hypothesis, also known as the conjecture, assumes that
any kind of difference or significance you see in a set of data is due to
chance. The opposite of the null hypothesis is known as the alternative
hypothesis. Null hypothesis. The null hypothesis, denoted by H0, is usually the
hypothesis that sample observations result purely from chance.
Alternative hypothesis. The alternative hypothesis, denoted by H1 or Ha, is the
hypothesis that sample observations are influenced by some non-random cause.
(in the statistical testing of a hypothesis) the hypothesis to be
accepted if the null hypothesis is rejected.
Difference Between Null and Alternative
The null hypothesis is the initial statistical claim that the
population mean is equivalent to the claimed. For example, assume the average
time to cook a specific brand of pasta is 12 minutes. Therefore, the null
hypothesis would be stated as, "The population mean is equal to 12
minutes." Conversely, the alternative hypothesis is the hypothesis that is
accepted if the null hypothesis is rejected.
For example, assume the hypothesis test is set up so that the
alternative hypothesis states that the population parameter is not equal to the
claimed value. Therefore, the cook time for the population mean is not equal to
12 minutes; rather it could be less than or greater than the stated value. If
the null hypothesis is accepted or the statistical test indicates that the
population mean is 12 minutes, then the alternative hypothesis is rejected. The
opposite is true.
Q5. Discuss the different steps that are involved in
testing of a hypothesis?
Ans. To test a hypothesis means to tell (on the basis of
the data the researcher has collected) whether or not the hypothesis seems to
be valid. In hypothesis testing the main question is: whether to accept the
null hypothesis or not to accept the null hypothesis? Procedure for hypothesis
testing refers to all those steps that we undertake for making a choice between
the two actions i.e., rejection and acceptance of a null hypothesis. The
various steps involved in hypothesis testing are stated below:
1.Making a formal statement: The step
consists in making a formal statement of the null hypothesis (H0) and
also of the alternative hypothesis (Ha). This means that hypotheses
should be clearly stated, considering the nature of the research problem. For
instance, Mr. Mohan of the Civil Engineering Department wants to test the load
bearing capacity of an old bridge which must be more than 10 tons, in that case
he can state his hypotheses as under:
Null hypothesis H0 : m = 10 tons
Alternative Hypothesis Ha: m > 10 tons
Take another example. The average score in an aptitude test administered
at the national level is 80. To evaluate a state’s education system, the
average score of 100 of the state’s students selected on random basis was 75.
The state wants to know if there is a significant difference between the local
scores and the national scores. In such a situation the hypotheses may be
stated as under:
Null hypothesis H0: m = 80
Alternative Hypothesis Ha: m ¹ 80
The formulation of hypotheses is an important step which must be
accomplished with due care in accordance with the object and nature of the
problem under consideration. It also indicates whether we should use a
one-tailed test or a two-tailed test. If Ha is of the type greater than
(or of the type lesser than), we use a one-tailed test, but when Ha is
of the type “whether greater or smaller” then we use a two-tailed test.
2.Selecting a significance level: The hypotheses are tested on a pre-determined level of
significance and as such the same should be specified. Generally, in practice,
either 5% level or 1% level is adopted for the purpose. The factors that affect
the level of significance are: (a) the magnitude of the difference between
sample means; (b) the size of the samples; (c) the variability of measurements within
samples; and (d) whether the hypothesis is directional or non-directional (A
directional hypothesis is one which predicts the direction of the difference
between, say, means). In brief, the level of significance must be adequate in
the context of the purpose and nature of enquiry.
3.Deciding the distribution to use: After
deciding the level of significance, the next step in hypothesis testing is to
determine the appropriate sampling distribution. The choice generally remains between
normal distribution and the t-distribution. The rules for selecting the
correct distribution are similar to those which we have stated earlier in the
context of estimation.
4.Selecting a random sample and computing an
appropriate value: Another step is to select a random sample(s) and
compute an appropriate value from the sample data concerning the test statistic
utilizing the relevant distribution. In other words, draw a sample to furnish
empirical data.
5.Calculation of the probability: One has
then to calculate the probability that the sample result would diverge as
widely as it has from expectations, if the null hypothesis were in fact true.
6.Comparing the probability: Yet another
step consists in comparing the probability thus calculated with the specified
value for a , the significance level. If the calculated
probability is equal to or smaller than the a value in case of one-tailed
test (and a /2 in case of two-tailed test), then reject the null hypothesis
(i.e., accept the alternative hypothesis), but if the calculated probability is
greater, then accept the null hypothesis. In case we reject H0, we run a
risk of (at most the level of significance) committing an error of Type I, but
if we accept H0, then we run some risk (the size of which cannot be
specified as long as the H0 happens to be vague rather than specific) of
committing an error of Type II.
Q6. What do you understand by Non-Parametric Test?
Discuss major advantages and limitations of Non-Parametric test?
Ans. A
non-parametric test is a hypothesis test that does not require the population's
distribution to be characterized by certain parameters. For example, many
hypothesis tests rely on the assumption that the population follows a normal
distribution with parameters μ and σ. Non-parametric tests do not have this
assumption, so they are useful when your data are strongly abnormal and
resistant to transformation.
However, nonparametric tests are not completely free of
assumptions about your data. For instance, nonparametric tests require the data
to be an independent random sample. For example, salary data are heavily skewed
to the right, with many people earning modest salaries and fewer people earning
larger salaries. You can use nonparametric tests on this data to answer
questions such as the following:
Is the median salary at your company equal to a certain value? Use
the 1-sample sign test.
Is the median salary at a bank's urban branch greater than the
median salary of the bank's rural branch? Use the Mann-Whitney test or the
Kruskal-Wallis test.
Are median salaries different in rural, urban, and suburban bank
branches? Use Mood's median test.
How does education level affect salaries at the rural and urban
branch? Use Friedman test.
Limitations of nonparametric tests
- Nonparametric tests have the following limitations:
- Nonparametric
tests are usually less powerful than corresponding tests designed for use
on data that come from a specific distribution. Thus, you are less likely
to reject the null hypothesis when it is false.
- Nonparametric tests often require you to modify the
hypotheses. For example, most nonparametric tests about the population
centre are tests about the median instead of the mean. The test does not
answer the same question as the corresponding parametric procedure.
Advantages of nonparametric test
(1) Nonparametric
test make less stringent demands of the data. For standard parametric
procedures to be valid, certain underlying conditions or assumptions must be
met, particularly for smaller sample sizes. The one-sample t test, for example,
requires that the observations be drawn from a normally distributed population.
For two independent samples, the t test has the additional requirement that the
population standard deviations be equal. If these assumptions/conditions are
violated, the resulting P-values and confidence intervals may not be
trustworthy. However, normality is not required for the Wilcoxon signed rank or
rank sum tests to produce valid inferences about whether the median of a
symmetric population is 0 or whether two samples are drawn from the same
population.
(2) Nonparametric
procedures can sometimes be used to get a quick answer with little calculation.
(3) Two
of the simplest nonparametric procedures are the sign test and median test. The
sign test can be used with paired data to test the hypothesis that differences
are equally likely to be positive or negative, (or, equivalently, that the
median difference is 0).
(4) Nonparametric
methods provide an air of objectivity when there is no reliable (universally
recognized) underlying scale for the original data and there is some concern
that the results of standard parametric techniques would be criticized for
their dependence on an artificial metric.
(5) A
historical appeal of rank tests is that it was easy to construct tables of
exact critical values, provided there were no ties in the data. The same
critical value could be used for all data sets with the same number of
observations because every data set is reduced to the ranks 1,...,n. However,
this advantage has been eliminated by the ready availability of personal
computers
(6) Sometimes
the data do not constitute a random sample from a larger population. The data
in hand are all there are. Standard parametric techniques based on sampling
from larger populations are no longer appropriate. Because there are no larger
populations, there are no population parameters to estimate.
Q7. What do you mean by parametric test? State the
conditions necessary for the use of following test:
a.
Z test
b.
T test
c.
Chi-Square test
Ans. Parametric
statistics/test is a
branch of statistics which assumes that sample data comes from a population
that follows a probability distribution based on a fixed set of parameters. Most
well-known elementary statistical methods are parametric
A parametric model as it relies on a fixed parameter set assumes
more about a given population than non-parametric methods. When the assumptions
are correct, parametric methods will produce more accurate and precise
estimates than non-parametric methods, i.e. have more statistical power. As
more is assumed when the assumptions are not correct they have a greater chance
of failing, and for this reason are not a robust statistical method. On the
other hand, parametric formulae are often simpler to write down and faster to
compute. For this reason their simplicity can make up for their lack of
robustness, especially if care is taken to examine diagnostic statistics
Reasons to Use Parametric
Tests
Reason 1: Parametric tests can perform well with skewed and abnormal
distributions
This may be a surprise but parametric tests can perform well with
continuous data that are abnormal if you satisfy these sample size guidelines.
Parametric analyses
Sample size guidelines for abnormal data
1-sample t test
Greater than 20
2-sample t test
Each group should be greater than 15
One-Way ANOVA
If you have 2-9 groups, each group should be greater than 15.
If you have 10-12 groups, each group should be greater than 20.
Reason 2: Parametric tests can perform well when the spread of each group
is different
While nonparametric tests don’t assume that your data follow a
normal distribution, they do have other assumptions that can be hard to meet.
For nonparametric tests that compare groups, a common assumption is that the
data for all groups must have the same spread (dispersion). If your groups have
a different spread, the nonparametric tests might not provide valid results.
On the other hand, if you use the 2-sample t test or One-Way
ANOVA, you can simply go to the Options sub dialog and uncheck Assume equal
variances. Voilà, you’re good to go even when the groups have different
spreads!
Reason 3: Statistical power
Parametric tests usually have more statistical power than
nonparametric tests. Thus, you are more likely to detect a significant effect
when one truly exists.
Conditions for following Tests :
For both Z-tests and T-tests, the conditions are the same.
However, you may recall that for Z-tests, the population standard deviation has
to be known, and for T-tests, the population standard deviation is unknown.
T-test conditions
The data were collected in a random way, each observation must be
independent of the others, and the sampling distribution must be normal or
approximately normal.
Z-test conditions
The data were collected in a random way, each observation must be
independent of the others, the sampling distribution must be normal or
approximately normal, and the population standard deviation must be known. When
performing a hypothesis test for a population mean, there are three conditions.
One has to deal with how the data were collected. Were they
collected in some random way? A simple random sample is the gold standard.
Second, is each observation independent of the others? You're
going to verify that mathematically.
And third, is the sampling distribution approximately normal?
Again, you're going to verify that a number of ways.
1. First, are the data collected in some random way? The purpose
is to make sure there's not any bias in the sample. Ideally, you want a simple
random sample from the population or to be able to treat our data as being a
simple random sample. Cluster samples are typically okay, as are stratified
random samples. The randomness is what matters most.
2. Second, the independence condition. You want to make sure that
each observation doesn't affect any other observation. There are a couple ways
to do that:
One, which isn't very common, is sampling with replacement. This
means when you take a person out, or an item out of the population, that you
put them back and can sample them again. That's not typically how you do
sampling. Normally, when you're sampling somebody, you don't put them back, and
you can't sample them again. For instance, if you're taking a political poll
you wouldn't want someone's opinion counted twice. So you need a population
that is large.
Sampling without replacement, where you have to check that the
sample is less than 10% of the population. If we multiply your sample size by
10, the population has to be at least that big in order to say that the
observations are pretty much independent of each other.
3. Finally, Is the sampling distribution approximately normal? The
distribution of sample means the sampling distribution will be nearly normal in
two cases:
3.1 One is if the
sample size is 30 or above. The central limit theorem says that the sampling distribution of sample means
will be approximately normal when the sample
size is large. For most distributions that's 30 or larger for a sample size.
3.2 The other way
is, if the parent distribution (the distribution of values from which we got our data) is normal, then
the sampling distribution of sample means will
also be normal, regardless of the sample size. There's two ways to verify that:
3.2
(a) if we're lucky, it might be stated within the context of the problem. If you're
actually doing this, though, in real life, it would be hard to verify that for sure.
3.2
(b) if it doesn't, then you actually have to look at your data. Graph the data in a histogram or
a dot plot and look for approximate symmetry, a mound shape, and a lack of outliers.
Chi-Square Goodness of Fit Test
This lesson explains how to conduct a chi-square goodness of fit
test. The test is applied when you have one categorical variable from a single
population. It is used to determine whether sample data are consistent with a
hypothesized distribution.
For example, suppose a company printed baseball cards. It claimed
that 30% of its cards were rookies; 60%, veterans; and 10%, All-Stars. We could
gather a random sample of baseball cards and use a chi-square goodness of fit
test to see whether our sample distribution differed significantly from the
distribution claimed by the company. The sample problem at the end of the
lesson considers this example.
When to Use the Chi-Square Goodness of Fit Test
The chi-square goodness of fit test is appropriate when the
following conditions are met:
The sampling method is simple random sampling.
The variable under study is categorical.
The expected value of the number of sample observations in each
level of the variable is at least 5.
This approach consists of four steps: (1) state the hypotheses,
(2) formulate an analysis plan, (3) analyze sample data, and (4) interpret
results.
Nice information, thanks for sharing in blog post.
ReplyDeleteBusiness Consulting Firm in UAE
Thanks for your post which is truly informative for us and we will surely keep visiting this website.
ReplyDeletewbe services
tail spend management services
sas certified advanced analytics professional
sas academic data science training
This is a very nice and detailed article on market research. I would also like to add that to run a business properly we should also need to use proper Business Management tools. I hope you can also write an article regarding them as well.
ReplyDelete