StatChap.1-2.doc

advertisement
CHAPTER 1: Introduction
TERMS:
We shall begin our discussion with some basic terminology:
STATISTICS: A method of quantification, organizing and summarizing data.
DATA: Another name for the numbers or scores collected.
POPULATION: The collection of individuals to whom the research is concerned or whomever
the study tries to describe.
SAMPLE: A portion or subset of the population of interest.
PARAMETER: A characteristic in the population, such as the average age of its members.
STATISTIC: A characteristic in the sample; therefore, an estimate of a parameter. The average
age of the members of a sample may be the best guess of the average age of the population from
which the sample was drawn.
INFERENTIAL STATISTICS: A method of quantification, organizing and summarizing data
based upon a sample.
DESCRIPTIVE STATISTICS: A method of quantification, organizing and summarizing data
based upon a population.
*Note...The plural of statistic (statistics) is not defined as more than one statistic, as the term
statistics shall be reserved for the method of quantification and each statistic shall be evaluated one
characteristic at a time.
DISCUSSION:
Statistics is a technique that is used to describe and assess the features of a specified group of people,
the population.
The term population is not necessarily defined by a geographic boundary or nationality, the way it usually
is in casual conversation. If the researcher is interested in the AIDS epidemic, then all people with AIDS
may be included in the referent population. If interest is limited to only those cases within a specific
nationality, say for the United States, then the referent population will be the US AIDS cases, only. If
interest is limited to young men, as they are the majority of AIDS cases, the population may be called US
AIDS AMONG MEN. Note the title of the population grows larger as the population grows more specific.
As it is not likely that every member of the population can participate in the study, a portion of the
population will actually be assessed. That portion, or sample, will serve to 'best guess' the status of the
referent population.
Statistical techniques will be used to quantify certain features, or parameters, of the population. When a
sample is used, those same techniques (with minor adjustments to the formulas) will be applied to each
feature, or statistic, of that sample. As a result, there are two types of Statistics; Inferential, in the case of
the sample, and Descriptive, in those cases when the researcher can actually use the entire population. As
you might guess, the former happens much more often than the latter. Unless the population is very small,
Inferential Statistics will be used.
LEVEL OF DATA:
Traditionally, statistical techniques had been chosen to fit the level of data that was collected. The data
level, or scale, refers to the degree of precision of the measure used to score the subjects. The higher the
level of the data, the greater the precision, and presumably, the greater the degree of detail gleaned from the
measures. This was considered very desirable, as it was presumed that higher, more sophisticated analyses,
which require more precise measures, could be applied to the analysis.
In actuality, it is common practice for researchers to ignore the level of the data. Further, there is
mathematical support that in the great number of studies a researcher might conduct in a career, any minor
variation in interpretation of one's data would balance out. In so much as many social scientists and
especially clinicians will not do many studies, this generous indulgence may be rather risky. As a result,
many discussions of introductory statistics advise the novice to consider the level, or scale, of the data
being analyzed.
An Englishman named Stevens, suggested a standard:
Nominal-data which is classified or named only.
i.e. Cats and Dogs.
Ordinal-data which is ranked by class.
i.e. Course grades, A > B > C > D > F.
Interval-data ranked by equal classes.
i.e. The number of stars a critic gives a movie.
Ratio-ranked by equal classes, with an absolute zero.
i.e. Height or weight.
Most sophisticated statistical procedures would require interval scale, though ratio was better still. The
vague nominal and ordinal scales would require special techniques.
An American by the name of Savage, however, pointed out, that as there were only two levels of
techniques, only two levels of data were required.
Discrete-data with a limited number of classes.
i.e.1 It is either a dog or a cat; never both,
though in some ways a dog may be preferred to a cat and vice versa.
i.e.2 Nominal and ordinal scales.
Continuous-data with an unlimited number of classes.
i.e.1 Inches of height and satisfaction with one's
spouse can be considered in degrees, while an absolute zero can only be specified in the
case of inches. One may no longer be satisfied with one's spouse but it may not make
sense to specify an absolute zero.
i.e.2 Interval and ratio scales.
It bears pointing out that the two levels of statistical techniques referred to are parametric and nonparametric procedures, which concerns a separate set of formulas. These terms should not be equated with
the two types of statistics, descriptive and inferential, which alludes to the source of the data. Nonparametric procedures, which are often given less credence, will be discussed further in the final chapters
of this course.
NOTATION:
Sometimes letters are use to denote specific functions in statistics, particularly, greek letters. Knowing
these will aid calculating outcomes based upon formulas. The only new symbol introduced at this point is
the upper case (capital) sigma [∑], which indicates that a summation is required. This does not negate the
directives of parentheses [( )] to specify order of calculation. There in the notation X, one is to sum all
numbers in the column called X. If X is squared and no parentheses appear, then it is insinuated that only
the numbers in a column called X are each squared. It is the square of each of those numbers that are added.
Therefore, X reads 'the sum of the squares'.
If parentheses incorporate the X and the upper case sigma, the numbers in a row called X are to be summed
first, and it is the sum of those numbers that is squared. Therefore, (∑X) is called 'the square of the sum'.
The 'sum of the squares', then, is not equal to the 'square of the sums' as they are distinct concepts. This
distinction becomes crucial in succeeding chapters, so be sure to do the problems in the text and the work
book to clarify any obscurities you may have about them.
METHODOLOGY:
It was noted in the BASIC SKILLS section that the method of research determined the statistical technique
required.
The basic types of investigation in the social sciences are:
NATURAL OBSERVATION (NO): Subjects are observed in their natural setting, without
interference. Ideally, the observer will not be detected, though this is not always possible.
ADVANTAGE: NO provides frequencies without the contrived trappings of a laboratory
to tamper with the true flow of events.
DISADVANTAGE: NO can cause subjects to behave self-consciously if the observer is
detected, and they usually are. Further, observers who hope to stay long enough for selfconsciousness to ware off, may linger long enough to become engrossed in the
phenomenon to the point of losing their objectivity.
CORRELATION (Corr): Studies in which subjects may be asked questions for further
clarification, usually in the form of a survey. Scales can add a degree of precision, or magnitude,
as well (Likerk's Strongly Agree, Moderately Agree, etc.)
ADVANTAGE: Corr provides frequency and magnitude of response, allowing
comparison of fluctuations across factors, or variables.
DISADVANTAGE: Corr can only assess linear relations. In the case of answer scales,
such as Likerk's subjective rating scale, subjects are being told to restrict their answers to
those available, which may or may not fit the actuality. Further, the quality of the data
received can be determined, in part, by the form of the survey. Consider the pros and
cons of each:
INTERVIEW: Most expensive and most dishonest, as the least anonymous; however, most
returns and detail.
QUESTIONNAIRE: Least expensive and most honest, as the most anonymous; however least
detail, as not their to make comments per item. Even worse, many subjects will simply not
respond. Losing 70% of the subjects is common.
PHONE SURVEY: Rather popular, as it seems to have a medial impact in all aspects (expense,
honesty, detail and returns). Still, if money or subjects are scarce, the researcher may not have a
choice.
THE EXPERIMENT: Unlike the first two discussed, which are only quasi-experimental, the
true experiment is the only investigation techniques appropriate when considering a causal
relation. This is because, only an experiment includes a manipulation under controlled
circumstances. Specifically, the presumed causal agent (independent variable) is manipulated
and the presumed effect (the dependent variable) is observed. Variables should be defined
operationally; that is, in a way which can be measured. Thus, the dependent variable is also
referred to as the dependent measure.
ADVANTAGE: The experiment can support a causal inference, in addition to
frequencies and magnitudes.
DISADVANTAGE: The experiment is the most contrived, and may behave in a way
quite different from outside of the testing situation (In the real world).
Even though the results of an experiment are generally given more credence than quasi-experimental
procedures, it may be necessary to do then for preliminary or exploratory purposes. Grant requests require
some initial investigation to justify supporting a potential experiment. Further, conditions and data quality
may make true experimentation impossible.
DOCUMENTATION: Research of all levels benefit from clear documentation. Documentation
may mean official certificates which verify subject records. In the most technical sense, the
CASE STUDY is a form of record keeping, or documentation. Not a specific research method,
all manner of information about a specific case may be included: the individuals's response to a
survey; the individual's participation in an experiment, their IQ, health record, school grades or
even their credit rating.
DATA COLLECTION: The method of data collection also affects the data quality. This
includes concerns about from whom the data was collected. If data was collected from all the
major subsets of the population, the study can be described as CROSS-SECTIONAL. If subjects
are followed for a long time, the study can be described as LONGITUDINAL. These are not
types of research, per se. A given study can be both cross-sectional and longitudinal.
MEDICAL RESEARCH: Medical research is especially sensitive to ethical restraints, so
many studies are abridged compared to the social sciences. Natural Observation is not directly
useful to the development of medical interventions. Correlations are distinguished as:
CROSS-SECTION: Usually a preliminary one, time survey.
RETROPECTIVE: A longitudinal study of archival cases, which is convenient, but
based upon the memory of survivors or old documents which may not be accurate.
PROPECTIVE: A longitudinal study of individual cases (cohort)
followed forward in time; more accurate but very expensive.
Medical experiments are called CLINICAL TRIALS. These are abridged, in that, the moment a treatment's
effective is suspected, ethical restraint requires that all patients receive that treatment; even the control
group.
Download