File - Dalai Nguyen's ePortfolio

advertisement
Dalai Nguyen
Chapter 1
 Section 1.2

Summary: This section provides an overview of the process involved in
conducting a statistical study. This process consists of “prepare, analyze, and
conclude”, as well as statistical and critical thinking. Also, it discusses about
involves consideration of the context, the source of data, the sampling method.
And, construct suitable graphs, explore the data, execute computations. Finally,
have statistical significance and practical significance.

Definition:
-
Data: collection of observation, such as measurements, gender, or survey
responses.
-
Statistics: the science of planning studies and experiments, obtaining data, and
then organizing, summarizing, presenting, analyzing, interpreting data, and
draw conclusion based on them.
-
Population: complete collection of all measurement.
-
Census: the collection of data from every member of the population.
-
Sample: a sub collection of members selected from a population.
-
Voluntary response sample (or self-selected sample): the respondent
themselves decide whether to get involved.

Journal question:
What is the difference between statistical and practical significance?
-
Statistical significance is based on large sample size, while practical
significance is based on common sense difference.

Example:
In a test of the Atkins weight loss program, 40 subjects using that program
had a mean weight loss of 2.1 kg (or 4.6 pounds) after one year ( based on
data from” Comparison of the Atkins, Ornish, Weight Watchers, and Zone
Diets for Weight Loss and Heart Disease Risk Reduction,” by Dansinger
et., Journal of the American Medical Association, Vol. 293, No. 1). Using
formal methods of statistical analysis, we can conclude that the mean
weight loss of 2.1 kg is statistically significant. That is, based on statistical
criteria, the diet appears to be effective. However, using common sense, it
does not seem very worthwhile to pursue a weight loss program resulting in
such relatively insignificant results. Someone starting a weight loss
program would probably want to lose considerably more than 2.1 kg.
Although the mean weight loss of 2.1 kg is statistically significant, it does
not have practical significance. The statistical analysis suggests that the
weight loss program is effective, but practice considerations suggest that
the program is basically ineffective.
 Section 1.3

Summary: This section should be know and understand the meaning of the
term statistic and parameter, as defined below. The term statistic and
parameter are used to distinguish between cases in data for a sample, and
data for an entire population. Also need to know the different between the
terms quantitative data and categorical data.

-
Definition:
Parameter: numerical measurement describing some characteristic of a
population.
-
Statistic: numerical measurement describing some characteristic of a sample.
-
Quantitative data: data consists of number.
-
Categorical data: data consists of names and labels.
-
Discrete data: data values are quantitative and the number of values is finite or
countable.
-
Continuous data: data values are quantitative and the number of values is
finite and is not countable.
-
Nominal level of measurement: data consist of names, labels, categories
without ordering.
-
Ordinal level of measurement: data with a clear order, but differences either
cannot be determined or are meaningless.
-
Interval level of measurement: data with a clear order and differences can be
found and are meaningful. Data do not have a natural zero.
-
Ratio level of measurement: data with a clear order and differences can be
found and are meaningful. Data have a natural zero.

Journal question:
What is the difference between a sample statistic and a population
parameter?
. Population parameter is the measurement of a population, while sample
statistic is sample of people actually surveyed.

Example:
In a Harris Poll, 2320 adults in the United States were surveyed about
body piercing, and 5% of the respondents said that they had a body
piercing, but not on the face. Based on the latest available data at the
time of this writing, there are 241,472,385 adults in the United States.
The results from the survey are a sample drawn from the population of
all adults.
-
Parameter: the population size of 241,472,385 is a parameter, because it is
based on the entire population of all adults in the United States.
-
Statistic: the sample size of 2320 surveyed adults is a statistic, because it is
based on a sample, not the entire population of all adults in the United States.
The value of 5% is another statistic, because it is also based on the sample, not
on the entire population.
 Section 1.4

Summary:
This section will introduces the basics of data collection, and describe
some common ways in which observational studies and experiments
are conducted. And particular importance is the method of using a
simple random sample.

-
Definition:
Observational study: observing and measuring specific characteristics without
modifying.
-
Experiment: applying treatment then proceeds to observe it.
-
Experimental units: subjects in experiment.
-
Simple random sample: if n subjects are selected, every possible sample of the
same size n has the same chance of being chosen.
-
Systematic sampling: select every n element.
-
Convenience sampling: simply results that is easy to get.
-
Stratified sampling: divide the subjects into different groups and take a
random sample from each group.
-
Cluster sampling: divide the population into groups and random pick a group.
-
Cross-sectional study: data are measured at one point in time.
-
Retrospective study: go back in time to collect data over some past period.
-
Prospective (longitudinal) study: go forward in time and observe groups
sharing common factors.
-
Confounding: investigators are not able to distinguish among the effect of
different factors.
-
Sampling error: sample has been selected with a random method, but there is a
discrepancy between the sample result and the true population results.
-
Non sampling error: result of human error, including such factors as wrong
data entries, computing errors, based wording and conclusions, false data
provided, and inappropriate statistical method for circumstances.
-
Non random sampling error: results of using a sampling method that is not
random, such as using convenience sample or a voluntary response sample.

Journal question:
What is a simple random sample?
-
Simple random sample: a sample of n subjects is selected in such a way that
every possible sample of the same size n has the same chance of being chosen.

Example:
. Observation study: the typical survey is a good example of an
observational study. For example, the Pew Research Center surveyed
2252 adults in the United States and founds that 59% of them go
online wirelessly. The respondents were asked questions, but they
were not given any treatment, so this is an example of an observational
study.
. Experiment: in the largest public health experiment ever conducted,
200,745 children were given a treatment consisting of the Salk
vaccine, while 201,229 other children were given a placebo. The Salk
vaccine injections constitute a treatment that modified the subjects, so
this is an example of an experiment.
Chapter 2
 Section 2.2

Summary:
This section is working with large data set, a frequency distribution (or
frequency table) is often helpful in organizing and summarizing data.
A frequency distribution helps us to understand the nature of the
distribution of a data set.

-
Definition:
Frequency distribution: shows how data are partitioned among several
categories (or classes) by listing the categories along with the number
(frequency) of data values in each of them.
-
Lower class limits: smallest numbers that can belong to the different classes.
-
Upper class limits: longest numbers that can belong to the different classes.
-
Class boundaries: the numbers used to separate the classes, but without the
gaps created by class limits.
-
Class midpoints: the values in the middle of the classes. It is computed by
adding the lower class limit to the upper class limit and dividing the sum by 2.
-
Class width: the different between two consecutives lower class limits or two
consecutives lower class boundaries in a frequency distribution.

Journal question:
Why do we use frequency distributions?
. We use frequency distribution because it is helpful in organizing and
summarizing data to help us understand the nature of the distribution
of a data set.

Example:
Table 2-3 summarizes the race/ethic classifications record on traffic
tickets issued by Connecticut’s East Haven Police Department during
a recent nine-month period. Here is an interesting and revealing fact
about the data: table 2-3 shows that 18 of those given tickets were
classified by police as being Hispanic, but in fact, 209 of those given
tickets had Hispanic names!
Race
Frequency
White
329
Black
15
Asian
0
Hispanic
18
White/Hispanic
4
Blank(no indication)
5
 Section 2.3

Summary:
This section is discussed about the histograms. A histogram is
basically a graph of a frequency distribution, which consists of a graph
that is easier to interpret than a table of numbers.

Definition:
Histograms: is a graph consisting of bars of equal width drawn
adjacent to each other (unless there are gaps in the data). The
horizontal scale represents classes of quantitative data values and the
vertical scale represents frequencies. The heights of the bars
correspond to the frequency values.

Journal question:
Why do we use histograms?
. We use histograms because it is basically a graph of a frequency
distribution, and it is easier to interpret than a table of number.

Example:
NASA provides these duration times (in minutes) of all flight of the
space shuttle challenger: 7224, 8784, 8709, 11,476, 10,060, 11,844,
10,089, 11,445, 10,125, 1. Why does it not make sense to construct a
histogram for this data set? What is notable about this data set?
. The data set has an outlier of 1 min. when a data set is so small, the
true nature of the distribution cannot be seen with a histogram.
 Section 2.4

Summary:
In this section, working with graphs are excellent tools for describing,
exploring, and comparing data. Describing data: in a histogram, for
example, consider the distribution, center, variation, and outliers
(value that are very far way from almost all of the other data values).
Exploring data: look for features of the graph that reveal some useful
and/or interesting characteristic of the data set. Comparing data:
construct similar graphs to compare data sets.

Definition :
There is no definition in this section.

Journal question:
List 3 graphs that enlighten and 2 graphs that deceive.
. 3 graphs that enlighten:
-
Scatter plots
-
Time-series graph
-
Histogram
. 2 graphs that deceive:

-
Nonzero axis
-
Pictographs
Example:
Listed below are SAT scores from a sample of students. Why it is that
a graph of these data will not be very effective in helping us
understand the data? 2400 2200 2150 2040 2230 1890 2100 2090
-
The data has only one variable, SAT scores. We need a
second variable to be able to relate the scores and give
meaning to the data.
Download