Review of the Logic of Statistical Reasoning

advertisement
Reviews of the Logic of Statistical Reasoning
Song Yang, Ph.D.
Assistant professor
Department of Sociology and Criminal Justice
211 Old Main
University of Arkansas
Fayetteville AR 72701
Comments on the organization of the book
With many real life examples and historical background information about the statistical
techniques in question, the Logic of Statistical Reasoning (LSR) does fill a gap among statistics
texts for social science students. From my teaching of social statistics to our undergraduate
sociology majors, I found my students are more concerned about details in problem computation
than applying those techniques to solve real problems. This difference is by no means nuance.
For example, when given a crosstabulation and asked to compute its chi-square, my students can
generally produce an accurate result. However, when I present them with a raw dataset about,
say, sex and number of times go to gym and ask for an assessment of significance of the relations
between the two variables, students get lost, not knowing what to do and where to start.
Ironically, knowing when and how to apply the learnt statistical methods is exactly the emphasis
of social statistics. In this, the authors are right that this text, by filling in the gap between the
methods and their applications, should carry stronger pedagogy than many other statistical texts.
My suggestions are mainly in the organization of chapters. This text, as it stands, is a bit
too lengthy. Much can be reorganized and some can be dropped. For example, chapter 3
describes various graphing options for one variable. Because chapter 4 also discusses descriptive
statistics, I recommend that the authors merge chpts 3 and 4. The new chapter should start with
description of descriptive statistics, and followed by some graphic methods such as pie chart, bar
chart, and various methods of scatterplot and diagrams. Another reason for merging chapter 3 is
that graphic methods are substantive not as significant as the other topics such as descriptive
statistics, inferential statistics, or correlation/regression. Devoting a whole chapter in graphic
methods seems overdo!
Both chapter 11 on Analysis of Variance (ANOVA) and chapter 12 on chi-square are
about bivariate statistics. I suggest that the two chapters combined to produce a new chapter
addressing both issues in one. The reasons for the merge are two. (1) both techniques are
bivariate statistics: ANOVA deals with the significant test (F-test) and strength of relationships
(eta-square) between an interval/ratio dependent variable and a nominal or ordinal independent
variable with more than 2 attributes, chi-square tackles significant test between two
nominal/ordinal variables; and (2) chi-square shies a little short for a whole chapter.
Surveying many social statistics textbooks, I feel a bit disappointed by the lack of
attention to the computation details of coefficient estimate of multivariate regression with more
than four variables. On the one hand, commercial software such as SAS routinely publishes their
computation procedures and software algorithm for OLS multivariable regression, which
requires significant level of mathematical and computer knowledge to comprehend. On the other
hand, the lack of step-by-step instructions in many social statistics textbooks leave many
interested readers in the dark of the important topic of estimate of coefficients of multivariate
statistics. The common phrase we heard when dealing with the multivariate regression is that let
the computer software to take care of it. Can the authors add a section dealing with the
coefficient estimates of OLS multivariate regression in generic terms?
Comments on chapter 4
Chapter 4 covers the basics of the univariate descriptive statistics such as mean, median,
mode, variance, and standard deviation. However, I should caution the authors that depending on
the data presented to the students, computation formulas for those parameters (mean, median,
mode, variance, and standard deviation) are different between raw data and data in frequency
distribution table. Often we would assume students could make reasonable extension to derive
those formulas on their own from the formulas given in the textbook. My teaching experience
tells me this assumption is often wrong, and we have to be explicit with students about which
formulas to use upon which type of data. The authors should be also aware that the computation
of parameters in the population is different from the computation of those in samples. Below are
the formulas I use in my class. The authors may consider incorporating those in their text.
Raw data computations
Parameters
Population
Mean
µ=
Sample
 Xi
X=
N
Variance
σ
2
( Xi  µ)
=
N
2
2
S
( Xi  X )
=
N
St.d.
σ=
 ( Xi  µ)
N
 Xi
2
S=
2
N 1
 ( Xi  X)
N 1
2
Frequency distribution computations
Parameters
Sample
Mean
X=
N
i
* Xi
i
* ( X i  X)
N
Variance
2
S
N
=
St.d.
S=
2
N 1
N
i
N 1
* ( X i  X)
2
Comments on chapter 9
My first reading of chapter 9 strikes me that the length of this hardly justifies it as a
stand-alone chapter. Why not merge chapters 8 and 9, since they are both under the cap of
inferential statistics. This issue goes back to the organization of the text.
( Xi  X )
ó =
^
For details, equations in page 5 are baffling. See equation 9.7,
2
N 1
2
.
( Xi  X )
Because S = 
, ó 2 should be S2. How come equation 9.8? My suggestion is
2
2
^
N 1
that,
 ( Xi  X)
^
equation 9.7
ó
2
=
N
^
Thus equation 9.8
2
N×
2
ó 2 =  ( Xi  X)
( Xi  X )
=
2
2
Because
S
Thus equation 9.9
S2 ×(N – 1)=  ( Xi  X)
Thus equation 9.10
N×
N 1
2
^
^
Thus Equation 9.11
ó
from equation 9.8 in the text.
2
ó 2 = S2 ×(N – 1)
S 2  ( N  1)
=
, note that equation 9.11 here is different
N
Download