Uploaded by babilaf629

STATS SUMMER

advertisement
REVIEW QUESTIONS
1.
Define the following terms:
a.
Population
A population is the pool of individuals from which a statistical sample is drawn for a study.
b.
Sample
A sample is an analytic subset of a larger population.
The use of samples allows researchers to conduct their studies with more manageable data
and in a timely manner. Randomly drawn samples do not have much bias if they are large enough, but
achieving such a sample may be expensive and time-consuming.
2.
What are the two major types of statistics? Describe
them in detail.
The two major areas of statistics are known as descriptive statistics, which describes the
properties of sample and population data, and inferential statistics, which uses those properties to test
hypotheses and draw conclusions.
3.
Identify some methods used in descriptive statistics.\
1. Measures of Frequency:
* Count, Percent, Frequency
* Shows how often something occurs
* Use this when you want to show how often a response is given
2. Measures of Central Tendency
* Mean, Median, and Mode
* Locates the distribution by various points
* Use this when you want to show how an average or most commonly indicated response
3. Measures of Dispersion or Variation
* Range, Variance, Standard Deviation
* Identifies the spread of scores by stating intervals
* Range = High/Low points
* Variance or Standard Deviation = difference between observed score and mean
* Use this when you want to show how "spread out" the data are. It is helpful to know when your data
are so spread out that it affects the mean
4. Measures of Position
* Percentile Ranks, Quartile Ranks
* Describes how scores fall in relation to one another. Relies on standardized scores
* Use this when you need to compare scores to a normalized score (e.g., a national norm).
4.
Explain two ways in which descriptive statistics and
inferential statistics are interrelated.
5.
Define the following terms:
a. Observational study
b. Designed experiment
Observational study
In an observational study, the researchers only observe the subjects and do not interfere or try to
influence the outcomes. In other words, the researchers do not control the treatments or assign
subjects to experimental groups.
Designed experiment
Design of experiments (DOE) is defined as a branch of applied statistics that deals with planning,
conducting, analyzing, and interpreting controlled tests to evaluate the factors that control the value of
a parameter or group of parameters.
6.
Explain why a census is often not the best way to obtain
information about a population.
In a census, information is obtained by the researcher from the entire population of
interest. Conducting a census of the entire population is time-consuming, costly, impractical, or
sometimes impossible.
7.
Identify two methods other than a census for obtaining
information.
Questionnaires and surveys. Observations. Documents and records. Focus groups.
8.
In sampling, why is obtaining a representative sample
important?
Representative samples are important because they ensure that as many relevant types of
people as possible are included in your sample, and that the right mix of people are interviewed
or surveyed. This helps ensure your results aren't tainted by bias.
9.
Regarding probability sampling:
a. What is it?
Probability sampling refers to the selection of a sample from a population, when this selection is
based on the principle of randomization, that is, random selection or chance. Probability
sampling is more complex, more time-consuming and usually more costly than non-probability
sampling.
b. Does probability sampling always yield a representative sample?
Explain your answer.
c. Identify some advantages of probability sampling.
o
o
o
o
o
o
The absence of systematic error and sampling bias
Higher level of reliability of research findings
Increased accuracy of sampling error estimation
The possibility to make inferences about the population. Effective to collect choose
samples from broad population base
Cost-effectiveness
Simple and straightforward in application
10.
Regarding simple random sampling:
a. What is simple random sampling?
Simple random sampling is a type of probability sampling in which the researcher randomly
selects a subset of participants from a population. Each member of the population has an
equal chance of being selected. Data is then collected from as large a percentage as possible
of this random subset.
b. What is a simple random sample?
A simple random sample is a subset of a statistical population in which each member of the
subset has an equal probability of being chosen.
c. Identify two forms of simple random sampling and explain
the difference between the two.
Probability sampling involves random selection, allowing you to make strong statistical
inferences about the whole group.
Non-probability sampling involves non-random selection based on convenience or other
criteria, allowing you to easily collect data.
A simple random sample is similar to a random sample. The difference between the two is
that with a simple random sample, each object in the population has an equal chance of
being chosen. With random sampling, each object does not necessarily have an equal
chance of being chosen.
11.
Identify two methods for obtaining a simple random sample.
Identify two methods for obtaining a simple random sample.
a. Stratified
b. Cluster
Set 2
1. Explain the meaning of data?
data are individual pieces of factual information recorded and used
for the purpose of analysis. It is the raw information from which
statistics are created. Statistics are the results of data analysis - its
interpretation and presentation.
2. Explain the meaning of variables. On what basis do we classify
variables?
A variable is a characteristic that can be measured and that can assume
different values. Height, age, income, province or country of birth,
grades obtained at school and type of housing are all examples of
variables.
Classification




Qualitative
Quantitative
Discrete
Continuous
3. Provide a reason why the classification of data is important.
Using data classification helps organizations maintain the
confidentiality, ease of access and integrity of their data. It also helps to
lower the danger of unstructured sensitive information becoming
vulnerable to hackers, and it saves companies from steep data storage
costs.
4. What is the frequency distribution of qualitative data and why is
it
useful?
It is a chart that displays the distinct values of the qualitative data on a
horizontal axis and the frequencies of those values on a vertical axis. It
is O B. It is a listing of the distinct values and their frequencies. It is
useful because it provides a table of the values of the observations and
how often O C.
5. Explain the difference between
a. frequency and relative frequency.
Frequency is the number of times a result occurs, while “relative
frequency” is the number of times the result occurs divided by the
number of times the experiment is repeated.
b. percentage and relative frequency.
Frequency is the number of times a result occurs, while “relative
frequency” is the number of times the result occurs divided by the
number of times the experiment is repeated.
6. Identify an important reason for grouping data. 2.35 Do the
concepts of class limits, marks, cut points, and mid-points make
sense for qualitative data? Explain your answer.
7. State three of the most important guidelines in choosing the
classes for grouping a quantitative data set.



Classes should be exhaustive: there should be a class for every
data value in the set.
Classes should be mutually exclusive: there should be no
overlapping data values between classes.
Classes should be continuous: there should be no gaps between
classes.
8. With regard to grouping quantitative data into classes in which
each class represents a range of possible values, we discussed
two methods for depicting the classes. Identify the two methods
and explain the relative advantages and disadvantages of each
method.
The two methods for depicting the classes are;
 Limit grouping
 Cut-point grouping
Limit grouping works for large discrete data specifically because it
reduces the number of classes which makes it easier to understand.
Although it would be difficult to handle it with continuous data. On the
other hand cut-point grouping is specifically for continuous data.
9. For quantitative data, we examined three types of grouping:
single-value grouping, limit grouping, and cut point grouping. For
each type of data given, decide which of these three types is
usually best. Explain your answers.
a. Continuous data is displayed in one or more decimal places
CUT-POINT GROUPING
b. Discrete data in which there are relatively few distinct
observations.
SINGLE-VALUE GROUPING
10.
We used slightly different methods for determining the
“middle” of a class with limit grouping and cut point grouping.
Identify the methods and the corresponding terminologies.
11.
Explain the difference between a frequency histogram and
a relative-frequency histogram.
The only difference between a frequency histogram and a relative
frequency histogram is that the vertical axis uses relative or
proportional frequency instead of simple frequency.
12.
Explain the advantages and disadvantages of frequency
histograms versus frequency distributions.
The frequency histogram shows the distribution graphically, which
is clearer than the numbers given by the frequency
distribution. The frequency distribution gives the exact
frequencies, while in the frequency histogram you can only
estimate them.
13.
Suppose that you have a data set that contains a large
number of observations. Which graphical display is generally
preferable: a histogram or a stem-and-leaf diagram? Explain
your answer.
STEM LEAF DIAGRAM
14.
Suppose that you have constructed a stem-and-leaf
diagram and discover that it is only moderately useful because
there are too few stems. How can you remedy the problem?
15.
Clearly differentiate between skewness and symmetry.
Skewness is a measure of symmetry, or more precisely, the lack of
symmetry. A distribution, or data set, is symmetric if it looks the
same to the left and right of the center point. Kurtosis is a
measure of whether the data are heavy-tailed or light-tailed
relative to a normal distribution.
16.
Explain the significance of modality.
Modality shares its root with the word mode, meaning "the way
in which something happens or is experienced." A sensory
modality is a way of sensing, like vision or hearing. Modality in
someone's voice gives a sense of the person's mood.
SET 3
1. What is forecasting?
Forecasting refers to the practice of predicting what will happen
in the future by taking into consideration events in the past and
present.
2. How do we execute qualitative forecasting? Which method is
preferred?
3. Clearly differentiate between Time Series and Causal data.
In time series, the independent variable is time. So every quantity in
the time series is measured by time. Where in causal model one is
cause and other is effect.
4. How do we apply forecasting for time series?
5. Clarify the concept of smoothing.
the aim of smoothing is to give a general idea of relatively slow
changes of value with little attention paid to the close matching of
data values
6. Regarding a scatterplot,
a. identifies one of its uses.
It helps to get a general idea of the data. It also helps to identify the
level in the frequency of data.
c. what property should it have to obtain a regression line for
the data?
7. Regarding the criterion used to decide on the line that best fits a
set of data points,
a. what is that criterion called?
b. specifically, what is the criterion?
8. Regarding the line that best fits a set of data points,
b. what is that line called?
Regression line
b. what is the equation of that line called?
Regression Equation
9. What is the significance of the correlation and regression
constants?
The most commonly used techniques for investigating the relationship
between two quantitative variables are correlation and linear
regression. Correlation quantifies the strength of the linear relationship
between a pair of variables, whereas regression expresses the
relationship in the form of an equation.
Download