XC-BK2 - Eclectic Anthropology Server

Chapter 2
Using a Database: Comparative Research
with a Standard Sample http:// links are live
The first steps in testing hypotheses with a database are to search the codebook for
variables relevant to the hypothesis, study how these variables are coded, and decide
which variables to retrieve from the database. In the case of the Standard Sample
codebook (http://eclectic.ss.uci.edu/~drwhite/courses/SCCCodes.htm), as with other
social science datasets, there will be additional information somewhere in the codebook
as to the specific criteria and methods used when the original investigatores coded the
variables. You might not look at this background information right away, but once you
take a serious interest in a particular set of variables, you should do so. In the SCCS
codes, for example, codes come in clusters that are numbered and grouped according to
the original investigators who contributed the codes. Thus the first SCCS study by
Murdock and Morrow (1970) contributed 22 variable and there were numbered 1-22 in
the cumulative codebook. Within the codebook itself there is a header statement before
this set of variables, which for this set of codes reads as follows:
George P. Murdock and Diana O. Morrow. 1970. ETHNOLOGY 9:302-330.
Datafile: STDS01.DAT Vars.
1- 22 subsistence
The first line is the title of the article where the article first appeared in the journal
ETHNOLOGY and as noted in the next line, this article can be found in volume 9, pp.
302-330, under the authorship of George P. Murdock and Diana O. Morrow (1970). This
then gives you all the information needed to find this article in the library if you need to
use coded variables that were published by these authors (in which case you should also
cite them in your research paper resulting from your analysis).
In the first section of this chapter we show how to open and search the codebook for
relevant information and variables, how to copy and store critical information from the
codebook that you will be using for your research, and how to think about what types of
variables may be relevant for the topics you are thinking of studying. You will note that it
is important to note the general context of the study, such as SUBSISTENCE ECONOMY, the
name of each variable, such as INTERCOMMUNITY TRADE AS FOOD SOURCE, and the
specific categories used under the heading that variable to classify cases (societies as
described by one or more ethnographers at a particular time period) into particular named
coding categories (e.g., >50% of food). You will note that the coding categories are
compressed and telegraphic: “>50% of food” means that if this code applies to a given
society, that society depends on intercommunity trade for over half its economic
subsistence. If you are unsure about the meanings of codoes, consult the original article in
which it is published, but in any case take a good look at the title of the article. Do not
Chapter 2
assume, as did one hapless soul, that a code “absent” under a variable named Political
and Legal system in an article on Modernization means “no political or legal system”
when an examination of the article itself and other variables coded would clearly indicate
that what was coded were presence or absence of recent changes in the political and legal
system. Given that kind of source of confusion, the labeling of variables in the codebook
for that study were changed to indicate more clearly that it was change that was coded
and “absent” category is node labeled “no changes.” The coding labels, however, were
intended as a shorthand to appear on a table made by a computer program with access to
the coded data, so that shorthand label is not sufficient to understand what was coded in
all cases. Do your background research adequately so as to understand what your
variables and codes consist of, if not initially, then once you have decided which
variables to investigate.
Section two of this chapter shows the next steps you will take to work with the computer
database using SPSS to extract the variables you are interested in so as to be able to use
them to test hypotheses. The SCCS database is organized in rows that represent cases
(societies) and columns that represent variables. In earlier editions of the database there
were separate files for each set of variables from a single article or study, such as
Datafile: STDS01.DAT for Murdock and Morrow’s (1970) variables 1-22 on
subsistence.1 Now these reside in a single SPSS file, “SCCSDatabase.sav,” which is
distributed on CD rom by the World Cultures journal.2
If you are a subscriber to the journal and have received the CD rom, or if you have
purchased this book with a CD rom, first copy to the hard disk of your computer from the
CD which accompanies this manual files “SCCSDatabase.sav” and “Codes.doc.” It might
make sense to put them in a special directory named, e.g., CROSSCULT. In addition to
this, be sure that you should have at hand a piece of paper/notebook and a pen.
If you are a student in a computer lab at a school where this database is used in class,
your instructor will have placed all the material you need on the hard drives at that lab.
Most of the material used in this book is available at the home site of the UCI course in
which this material was first put to use instructionally, and is accessible at
http://eclectic.ss.uci.edu/~drwhite/courses. If you are concerned about the variable of
time because the SCCS cases are coded only for a single time period, you will find on
that site links to an archaeological site database comparable to the SCCS.
Now we can begin.
Section 1: Finding Variables
STDS and SCCS stand for STandarD Cross-Cultural Sample (Murdock and White 1969), using variations
in the acrynym. The sample is described in http://eclectic.ss.uci.edu/~drwhite/pub/SCCS1969.pdf, where
you will also find links to the societies in the sample. William Divalle converted the raw STDS .dat files
later on into the labeled SPSS files that we used today, and these were similarly named, e.g.,
STDS01.SAV, where ‘.sav’ is the extension used for SPSS data files. More recently, Andrey Korotayev
combined nearly all these files (except for the newest ones for which the variables had not been
sequentially numbered) into a single file, “SCCSDatabase.sav.”
http://eclectic.ss.uci.edu/~drwhite/worldcul/world.htm is the electronic site for the journal, which is also
published on paper. Codebooks, datasets and articles are distributed in the accompanying CD rom.
Cross-Cultural Research: Starting up
Exercise 1
Now, let us, for example, test a hypothesis that the transition to agriculture would tend to
lead to the transition to fixed settlement patterns using the Standard Cross-Cultural
Sample database. (This hypothesis would predict that the reliance on agriculture should
correlate positively with the fixity of settlement, i.e. the higher the reliance on agriculture
by the given culture is, the more fixed settlement [as opposed to nomadic or migratory] it
is likely to have).
To do this we should find the respective variables in the database. To find them open the
file “CODES.DOC.” Then press “CTRL + F” button.
You will see the following window:
The independent variable in our case is the reliance on agriculture. However, we do not
advise you just to type the name of the variable in the “Find what” line. The same
variable could be named in a number of different ways. Hence, we would rather advise
you to type a keyword, i.e. a word which is bound to be present in the variable name
whatever way is chosen to denote it. In our case this word seems to be just “agriculture.”
So, let us type this word in the “Find what” (it could be also named “Search for”) line.
Then press “Find” (or “Find Next”) button.
Chapter 2
As a result you will get the following window:
As we see, the first variable which we have found is described in the following way:
1 = None
2 = Non-food Crops
3 = < 10%
4 = < 50%, and less than any other single source, incl. trade
5 = < 50%, and more than any other single source, incl. trade
6 = Primarily agricultural
Hence, the impression is that what we have found is quite appropriate for our task. Now,
write down the number of this variable (which is 3).
IMPORTANT NOTE: If you do serious cross-cultural research we would strongly advise
you to continue your search. Many cross-cultural variables (including the variable under
consideration) were coded more than once; that is why more than one version of them
have been published and are available in the electronic form. The first variable you will
Figures to the right denote number of Standard Cross-Cultural Sample cultures
possessing the respective characteristic. E.g., number 77 at the beginning of the last line
indicates that 77 Standard Sample cultures are primarily agricultural.
Cross-Cultural Research: Starting up
find is not always the best available one for your purposes. Hence, our suggestion would
be to continue the search till you reach the end of the variable list writing down numbers
of relevant variables, to compare between them and to choose the most appropriate one.
Frequently it makes sense to perform several tests using all the variables you have found.
However, as this is our first exercise we will simplify our task and will perform just one
test using the first variable on the reliance on agriculture we found.
Yet, now we have also to find a variable on settlement fixity. To do this press “CTRL +
F” and type in the “Search for” (or “Find what”) line a new keyword. What keyword
would you suggest? Perhaps, the most evident keyword here would be just “fixity.” So,
type this word and press “Find” (or “Find Next”) button. You will see the following:
Again, the first variable which we have found turns out to be quite appropriate
(incidentally, let us remind you that this will not always be the case!). So, let us write
down its number and after that we can go to the database.
However, before doing this, let us find variables for testing another hypothesis.
According to this hypothesis the growth of population density leads to an increase in
political complexity (hence, according to this hypothesis population density should
correlate positively with political complexity). We have to find in the Standard CrossCultural Sample variables to test the hypothesis.
Chapter 2
Let us start with population density. Follow the algorithm specified above using
“density” as a keyword. The result should look as follows:
In fact, the first variable we have found is quite appropriate for our task. So, after writing
down its number we can start looking for the other variable, political complexity.
Soon you will understand that in this case our task is not so simple. Indeed if you follow
the above described algorithm using “complexity” as a keyword, your first finding will
look as follows:
Cross-Cultural Research: Starting up
As you see this is not quite that “complexity” we really need. You can go on looking for
“complexity” in the codebook. And you will find it there three more times. But again you
will see that in all three cases the keyword will turn out to be as useless.
Thus, in this case we appear to be in need in a less straightforward solution. Let us
recollect what levels of political complexity we know. Apparently the most frequently
used scheme of the political complexity levels is the one designed by Service (1962):
band – tribe – chiefdom – state. Out these three words the one which is less likely to be
used outside the discussion of political complexity levels is “chiefdom.” Let us use it as
the keyword.
The result will look as follows:
Chapter 2
Let us study now the variable which we have found. Yes, its name might not suggest to
us that it is what we need. But if we study how this variable is coded we will see
immediately that it is JUST what we need. Hence, let us write down this variable number
and move further on.
Section 2: Working with the database
Our natural next step is to start working with the database itself. To do this you should
just open the file SCCSDatabase.sav which could be found on the CD which
accompanies this manual and which you are supposed to have already copied to the hard
disk of your computer. Now find the file in the directory, then double click on the SPSS
sign next to the file’s name (We assume that SPSS has already been installed on your
You will see the following window:
Cross-Cultural Research: Starting up
Note that by default SPSS opens files without value labels. For example the first line of
the database looks as follows:
Nama Hottentot
“1” is just the case number. “Socname” is the name of the respective culture which in our
case is “Nama Hottentot.” “1860” is the “focal year,” i.e. the year when the data were
collected. However, if you are dealing with an SPSS database for the first time, the rest
might look entirely puzzling. For example, what could “v1 = 4” mean. In fact, it is not
difficult at all to get to know what v1 is. You should just move your mouse pointer to
“v1” and you will see the following:
Chapter 2
This way you can easily find out that the database column marked as “v1” contains
information on intercommunity trade as food source. (In the same way, of course, you
can easily get to know the same information on all the other variables.) But what could
“4” just to the right from “Nama Hottentot” mean? To get to know what this number
means (as well as all the other numbers/codes in the database) is very easy. As shown in
the following window, just choose in the menu line:
Cross-Cultural Research: Starting up
Immediately after this the database will start looking in an entirely different way:
(the SCCSDatabase.sav file does not have foc_year this window)
Chapter 2
Now you can easily see which information the upper cell in “v1” column contains: that
the Nama Hottentot in 1860 got less than 10% of their food through intercommunity
trade. The information contained in all the other cells of the database can now, of course,
be as easily understood.
Now we can start analyzing the database. Chapter 3 begins with scatterplots and maps,
and Chapter 4 continues with how to do cross-tabulations. You will then be ready for
Chapter 5, which is a mini-course in statistics that you may use as a reference in chosing
and interpreting correlations, in evaluating tests of significance, and in evaluating your
hypotheses generally.
Chapter 6 provides additional help in reading and interpreting cross-tabulations from the
larger perspective of the anthropological sciences, and Chapter 7 provides the advanced
methods for developing and evaluating your measures through one-factor constructed
variables and in testing hypotheses using third factors as controls.
In this Chapter you have begun your journey into the anthropological sciences through a
cross-cultural research project. With the following chapters you will be able to finish
your journey. Since this book is electronic, hence easily revised, we will appreciate your
feedback on the good rapids and the bad. Like river-rafting, perhaps the best approach is
to deal with each turn and problem as they are encountered, and return to these chapters
as well as the on-line links for the course as needed.