IV Overview of the Course Purpose/ Description
This is a class in advanced (multivariate) statistics that also teaches scientific paper writing.
V Learning Outcomes: Explain how each of the following learning outcomes will be achieved.
Students choose a data set relevant to their
Student learning outcomes :
Identify and pursue sophisticated questions for interests and use it to test hypotheses in the
course of their term project.
academic inquiry
Students need to do a literature review on
Find, evaluate, analyze, and synthesize
the problem or hypothesis of their term
information effectively and ethically from
diverse sources (see
Students have to present all sides of the issue
Manage multiple perspectives as appropriate
upon which they are basing their term
Recognize the purposes and needs of
discipline-specific audiences and adopt the
academic voice necessary for the chosen
Use multiple drafts, revision, and editing in
conducting inquiry and preparing written work
Follow the conventions of citation,
documentation, and formal presentation
appropriate to that discipline
Develop competence in information
technology and digital literacy (link)
I teach the students to write in scientific style
with a slant toward scientific style in
biological anthropology. I also explain to
them why scientific style is important within
this context.
I edit a draft of the students’ term paper and
give it back to them for revision.
I teach the students how to use CBE/SCE
citation style. They give a short presentation
of their project in class at the end of the
The students use Web of Knowledge and
other online databases (JSTOR is popular)
to find articles for their term project/paper.
They also need to find a copy of the data
they will work with online.
VI. Writing Course Requirements
Enrollment is capped at 25 students.
If not, list maximum course enrollment.
Explain how outcomes will be adequately met
for this number of students. Justify the request
for variance.
Capped at 15.
Briefly explain how students are provided with
tools and strategies for effective writing and editing
in the major.
I explain scientific writing in class and
also provide abundant feedback on their
assignments and term paper.
Which written assignment(s) includes revision in
The term paper based on the term
response to instructor’s feedback?
VII. Writing Assignments: Please describe course assignments. Students should be required to
individually compose at least 20 pages of writing for assessment. At least 50% of the course grade
should be based on students’ performance on writing assignments. Quality of content and writing
are integral parts of the grade on any writing assignment.
Formal Graded Assignments
The entire course grade (100%) is based
on written assignments.
 7 written assignments on
statistical analyses, each about 5
pages in length.
 Five written assignments relating
to the term paper, varying from 2
pages to perhaps 25 pages in
Informal Ungraded Assignments
VIII. Syllabus: Paste syllabus below or attach and send digital copy with form.  For assistance
on syllabus preparation see: http://teaching.berkeley.edu/bgd/syllabus.html
The syllabus must include the following:
1. Writing outcomes
2. Information literacy expectations
3. Detailed requirements for all writing assignments or append writing assignment instructions
Paste syllabus here.
ANTHROPOLOGY 408: Spring 2013
Advanced Anthropological Statistics
TR 8:10-9:30 SS 258
Dr. Randy Skelton
Office Hours:
Phone: 243-4245
226 Social Sciences Building
MWF 8:00-8:50, TR 10:00-11:00
Email: randall.skelton@umontana.edu
The goal of this class is to learn several advanced (multivariate) methods
of data analysis and to learn the skill of writing a scientific paper. The focus will
be on use of statistical software to perform analyses, with interpretation and
write-up of the results obtained. Students who pass this class will:
Learn to use several types of statistical analysis including multiple
regression, principal components analysis, cluster analysis, discriminant
analysis, and more.
Explore how these analysis can be applied to novel situations by carrying
out a project that involves the use of data analysis.
Gain facility with a statistical software packages such as SPSS.
Build the ability to interpret the results of multivariate statistical analyses
and express them in a professional manner.
Become familiar with standard scientific paper style and format.
Gain experience with finding sources through library databases.
Come to appreciate the vast array of data that is available on the web.
Moodle Supplement
There will be a Moodle supplement for this class, where I will post various
types of useful materials and information, including required materials. The
people at IT Central in SS120 and Moodle Tech Support (243-4999, umonlinehelp@umontana.edu) can help you with access and technical issues. As your
instructor I can only be responsible for content placed on Moodle – not for it’s
administration or technical issues.
Required Materials
Text: Landau, Sabine and Everitt, Brian S., 2004. A Handbook of
Statistical Analyses using SPSS. Chapman & Hall/CRC Press. Hereafter I will
refer to this text as “the Handbook”. The Handbook provides a walk-through of
many of the methods we will be covering. The Handbook will be most useful to
you when you are doing your assignments, and need not be read before coming
to class.
Online Resources: For each week I have some required browsing listed.
Some of this is for help with your assignment. There are many statistical texts
online, some of which I have links to in the “Helpful Materials” section of the class
Moodle supplement. The most useful of these materials are made available on
the WWW by Statsoft Inc, Karl L. Wuensch (a professor in psychology at East
Carolina University), and William K. Trochim (a professor in policy analysis and
management at Cornell University).
Statistical Package: We will use SPSS. SPSS is available in the Fred W.
Reed Social Sciences Research Lab (SSRL) and other campus computing labs.
You may also buy SPSS.
Other Software: I assume that you have, and
know how to use, Microsoft Office products, especially Word and Excel. You
may also download the free office package OpenOffice and use it instead,
though I don’t guarantee that it operates exactly the same as MS Office.
Computer access: You will need access to a computer with SPSS
installed. SPSS is installed on computers for student use in the SSRL. We will
have an orientation to the use of these labs by the SSRL staff early in the
semester. Also, the computers in the UC 225 general student lab are supposed
to have SPSS installed. You will need to show your GrizCard for access to the
general student labs.
Data Storage: You will need some mechanism for storing the data sets you
use and the output from the statistical software. The best option for this is a USB
flash drive (also known as a memory stick, pen drive, flash drive, etc.).
How will this class work?
First 2/3 of the semester. We will explore several methods of advanced
statistical analysis. The focus will be on using SPSS to perform the
analysis, interpreting the output, and writing up the procedure in standard
scientific paper format. We will meet at every normally scheduled class
meeting time. Each week there is an assignment due and you will be
expected to do the analysis requested, write up your results, and submit
them to me by uploading them through Moodle. Moodle doesn’t have a
way for me to send materials back to you, so I will do that using email to
your official University email address (what I see in Moodle). You will need
to either check your University email regularly or forward it to where you
normally check your email. Most weeks there will be a lecture on Tuesday,
and we will work with data on Thursday.
Last 1/3 of the semester. You will each do a project in which you analyze
a data set of interest to you in order to draw some conclusions about some
topic of interest to anthropology. Grad students should use the data set
they are working with in their thesis or dissertation research, if possible.
We will continue to meet for class, and I will use this time to explore and
demonstrate additional statistical and analytical methods. I will not allow
you to fall behind or put off the steps of the project until the end, and there
is an assignment related to your project due every week.
For undergraduate students, your grade will be based on attendance,
preparation, and participation (25%); weekly exercises you complete (30%); and
your project (45%). For graduate students, your grade will be based on
attendance, preparation, and participation (20%); weekly exercises you complete
(30%); and your project (40%). and a short presentation of your project (10%).
There are no examinations. Your score in the course will be calculated to yield
your grade using this scale: A = 100-90, B = 89-80, C = 79-60, D = 59-50, F =
<50. I may modify these basic grades with a + or - in special cases if I believe it
is appropriate.
Basic Grading Philosophy for This Class
This class is not required for any students. Therefore, I assume that all
students who have enrolled in the class have done so because they want to learn
how to do data analysis. Given this, I will have little tolerance for any behavior
which suggests that a student is trying to avoid learning the material. On the
other hand, I encourage and try to reward behavior which suggests that a student
is attempting to enhance how quickly or thoroughly they learn the material, how
to minimize the effort involved in doing an analysis correctly, and similar
wholesome strategies. I will assess your understanding of the material using
assignments, and each student’s final write-up and presentation of their project. I
will not give tests, because genuine understanding of this material is difficult to
assess via a test, and because I do not want to encourage students to merely
memorize material for a test.
Attendance Policy
Attendance is required at every class meeting except in the case of
documented excusable absences (see the document online at
http://www2.umt.edu/catalog/acpolpro.htm for University policy on excused
absences). Attendance will constitute 20% of your grade.
Policy on Collaboration and Use of Outside Resources:
Students are encouraged to work and study together during the first 2/3 of
the semester, including working together on completing the exercises.
Additionally, there are many resources available on the internet and elsewhere,
including model answers to most of the exercises in the textbook (see pp v-vi). I
encourage you to use these to the extent that they enhance your
understanding of the analyses being learned. My only requirement is that in
your write-ups you must acknowledge your collaboration with other students
and/or your use of these and other resources. There is never a penalty for
working with other students or using additional resources so long as you
acknowledge them.
However, the privilege of collaboration and use of external resources does
not extend to your required individual written solution to each exercise. Each
student must write up the exercises independently using their own words. You
should use these write-ups to show me that you understand the analysis being
performed, how to make SPSS perform the analysis, and how to interpret the
output generated by SPSS. In general, the way to do this is to provide a detailed
explanation of why you took the steps you chose and how you drew any
interpretations you made.
Regretfully, I must punish infractions of this policy. If I find that two or
more students have turned in write-ups that are copies, or which I judge to be
“too similar”, I will split the credit for that assignment evenly between the students
involved. If I detect an answer that is too similar to the model answer on the
textbook website or to those on other websites that I know of, I will at most award
that student half credit.
During the last 1/3 of the semester each student will be working on their
own individual data analysis project. You are welcome and encouraged to
discuss your project with anybody who will sit still for it. However, you must write
it up individually in your own words. Furthermore, you must acknowledge any
help you got from fellow students, or anyone else, in the acknowledgment section
of your final report. This principle also extends to published and online
resources, which must be cited in your report and referenced in the bibliography
of your report. Direct copying of published or online materials, or use of them
without citation is considered plagiarism, a form of academic misconduct, and I
am required by University policy give you zero credit for any assignment for
which I detect it.
Weekly Assignments
You will have an assignment to do (almost) every week. The assignment
will be posted on Moodle. Each assignment is explicit in what I want you to do
and what I want you to submit. Most of the assignments will also include practice
in writing parts of a scientific research paper.
Each student will complete a project that involves analysis of a data set of
their choice, applied to an anthropological problem they are interested in.
Certain milestones in the completion of the project (selection of a data set,
analysis results, rough draft, and final draft) will be submitted, with one or another
of these due each week. The format of the paper should be scientific research
paper format, which you will learn over the course of the semester. Here are
some things that I will expect to see in your research paper.
Five part scientific format, including the following sections: introduction,
materials & methods, results, discussion, and conclusions.
The introduction should include at least a brief literature review of other
studies that have been done in the area you are working on. A minimum
of 10 sources should be discussed and cited in the text of this section.
These sources should be referenced in the bibliography.
Your paper should include a bibliography. The citation or bibliography
format should be according to one of the major journals in anthropology,
such as American Anthropologist, American Journal of Physical
Anthropology, etc. Alternatively, you can use CSE/CBE style. Online
materials are acceptable if referenced properly, and there is a large
amount of advice online about how to reference online or other electronic
Submission Procedures
Weekly assignments and project fragments should be submitted via
Moodle . This saves me time, saves you printing costs, saves trees, and
(possibly most importantly) helps me avoid losing students’ work.
Assignments are due via Moodle before Tuesday at midnight during the
week after they are listed in the syllabus. There will be a penalty of 20% of that
assignment’s score for each day (or fraction of a day) that an assignment is late.
You can expect me to grade your assignments promptly and give you
feedback via a grading form or via comments on your assignment returned to you
via your official University email address.
Other Statistical Software
As the person who has to grade your assignments, I have to standardize
on one statistical software package. For many reasons, I have chosen SPSS for
our standard statistical software for ANTH 402. However, there are several other
commercial, shareware, and freeware statistical software packages available. In
particular, I am impressed with PAST. PAST will do almost everything SPSS
does, though the output isn’t as pretty or as easy to capture. PAST offers
additional useful types of analysis that SPSS doesn’t, such as cladistics,
neighbor joining clustering, mixture analysis, and correspondence analysis. It
has the best, fastest, and most flexible cluster analysis that I have ever seen. In
my own research I use PAST 10 times more often than SPSS.
Advanced Anthropological Statistics: Provisional Schedule
This schedule is tentative and the topics might change as we go. Topics and readings
will always be current on Moodle, so I have put the readings and other materials there.
Thus, this schedule is a bare-bones list of topics, Readings from the Handbook, and
assignments. Required browsings are listen on Moodle. Assignments are due
before 12:00 midnight on the Tuesday of the week after the assignment is listed
unless otherwise noted on Moodle.
Getting Started
Chapter 1
Understand how the class
will work
Find some place to use SPSS
Intro to SPSS and the Labs
LECTURES (Notes on Moodle)
Lecture 1: Intro to the Class
Lecture 2: Types of Data and Sampling
Basic Statistics
Chapter 2
Assig 1: Descriptive &
Inferential Stats
Chapter 3
Descriptive & Inferential Stats
LECTURES (Notes on Moodle)
Lecture 3: Descriptive Statistics
Lecture 4: Inferential Statistics
Lecture 5: Frequencies, Data Transformation, and Capturing Output
Multiple Regression
Chapter 4
LECTURES (Notes on Moodle)
Lecture 6: Regression & Correlation
Lecture 7: Working with Non-Linear Data in Regression
Assig 2: Multiple Regression
Principle Component Analysis
Chapter 11
LECTURES (Notes on Moodle)
Lecture 8: Principal Components
Lecture 9: The Search for Significant Relationships
Assig 3: PCA and FA
Discriminant Analysis
Chapter 12
Assig 4: Discriminant
LECTURES (Notes on Moodle)
Lecture 10: Discriminant Analysis
Lecture 11: Decisions, Decisions
Cluster Analysis
Assig 5: Cluster Analysis
LECTURES (Notes on Moodle)
Lecture 12: Cluster Analysis
Lecture 13: Constructing & Reading Dendrograms
Clustering PC’s and DF’s
LECTURES (Notes on Moodle)
Lecture 14: Clustering PC’s and DF’s
Lecture 15: Causes of Similarity Between Things
Lecture 16: Logistic Regression
LECTURES (Notes on Moodle)
Assig 6: Clustering PC & DF
Chapter 5 & 6 Assig 7: ANOVA
Lecture 17: ANOVA
Lecture 18: Character Coding and Missing Data
Beginning Your Project
LECTURES (Notes on Moodle)
Lecture 19: Finding a Data Set
Lecture 20: Scientific Paper Format
Project week 1
Assig 8: Find a Data Set
Assig 9: Make a Research
LECTURES (Notes on Moodle)
Lecture 21: Advanced Nominal Data Methods: Multidimensional scaling, classification
trees, and correspondence analysis
Lecture 22: How to Treat Males and Females in an Analysis
Project week 2
Assig 10: Data Analysis
LECTURES (Notes on Moodle)
Lecture 23: Multivariate Statistics in Forensic Anthro
Lecture 24: The effect of Admixture on DF for Ancestry
Project week 3
LECTURES (Notes on Moodle)
Lecture 25: Models of Genetic Similarity
Lecture 26: Intro to Some Other Statistical Software
Project week 4
LECTURES (Notes on Moodle)
Lecture 27: Cladistics
Lecture 28: Data Mining
Assig 11: Preliminary Draft
Assig 12: Revise Paper, Turn
In final draft
Due by midnight Sunday May 12
Project completion: Project presentations by undergrad students.
Finals Week Thursday, May 16, 8:00 to 10:00. Meet for project presentations by grad
Advanced Anthropological Statistics (ANTY 408)
ASSIGNMENT 1: Descriptive and Inferential Statistics
For your first formal assignment I want you to focus on the following practical
starting and running SPSS;
downloading the main dataset we will use for these assignments and
loading it into SPSS;
generating descriptive statistics (means and standard deviations) for the
plotting data using a histogram;
data set subsectioning; and
testing the hypothesis that two means are equal.
This sounds like a lot to accomplish, but it won’t actually take you very long to run the
analyses, because you are using SPSS. Remembering back to your introductory
statistics class, where you did calculations with a hand calculator, even calculating the
mean for a large data set might take several minutes. For the 1853 individuals in the
data set we will be working with it would take you a very long time. If you could enter
one individual’s measurement into the calculator every 5 second, it would take you just
under 3 hours to calculate the mean for one variable. Of course, in entering and adding
1853 data values the probability that your fingers pressed the wrong button for at least
one of them is very high. With SPSS you can obtain means and standard deviations for
all 20 variables for the 1853 individuals in just a few seconds, and you can be sure that
the result is accurate because the data was entered accurately.
Starting and running SPSS. Get SPSS running. If you haven’t a clue how to
do this, the first part of the document at http://www.csub.edu/ssrictrd/SPSS/SPSS11-1/11-1.htm has good instructions on opening and closing
Downloading the data set. I have a data set on Blackboard in the
“Assignments” section, named anth402data.xls. Download this data to the
computer you are using and store it on your memory .. The data file is in
Microsoft Excel format, which is a common format that you will want to import
data from.
Also download the file anth402data_codebook.htm, which is the “codebook” for
this data set. A codebook is a document that explains what the data variables in
the data set are. This file is a HTML file, which should be viewable using a web
browser or any modern word processor.
Load the data into SPSS. Note that SPSS will directly load Excel spreadsheets.
If you don’t know how to do this, the document (Wuensch, 2005, “Importing Data
To SPSS From Excel”) at
http://core.ecu.edu/psyc/wuenschk/SPSS/Excel2SPSS.doc will show you how.
Once you have the data loaded, save it as a native SPSS file. SPSS data files
have the extension ‘.sav’. To do this, click “File” on the taskbar, choose “Save
as”, then if necessary choose the SPSS file type in the “File Type” window.
Make sure that the file will be saved on your memory stick, then click OK.
Prepare a results file. Open Microsoft Word (or an equivalent word processor).
Type in your name and “Assignment 1" at the top. Save this file with the name
“firstname_lastname_1.doc” to your memory stick, where firstname_lastname is
your first and last names joined by an underscore. For example, my results file
would be named “Randy_Skelton_1.doc”. You will use this file to store and save
the results from your analyses described below.
Descriptive statistics. The term “descriptive statistics” refers to those statistics
that describe a data set’s central tendency (mean or median), its dispersion
(range, standard deviation, or variance), and,perhaps its degree of departure
from a true normal curve (skewness and kurtosis).
The document at http://www.ats.ucla.edu/stat/spss/output/descriptives.htm is an
“annotated output document” for the descriptive procedures. An annotated
output document explains each item of the output in detail in case you don’t
understand what the tables and figures in the output mean.
Here are the analyses you should do.
Use the SPSS decriptives procedure [Analyze->Descriptive Statistics>Descriptives] to find the means, standard deviations, minimum values,
and maximum values for the variables GOL, NOL, BNL, BBH, and XCB.
Your output should look something like the table below (don’t expect your
numbers to match exactly). Examine the output produced to see whether
it gave you the results you were expecting.
Copy and paste the table that looks like the one below into your results
In your results document, change the table label from “Descriptive
Statistics” to “Descriptive Statistics: All Populations”. To do this, right click
on the table and choose “Edit Picture” from the menu. Now, click on the
“Descriptive Statistics Label, and you should be able to access the text
box to change the label. Save your results file. There is no need to save
your file using a different name.
Descriptive Statistics
Std. Deviation
Valid N (listwise)
Use the SPSS select cases procedure [Data->Select Cases->If condition
is satisfied] to set a filter for only the Norse and the Zalavar populations.
To choose only the Norse and the Zalavar data, type
ANY(POPNAME,”Norse”, “Zalavar”) into the “Select cases if” window.
This command says to choose any individual case that has a value for
POPNAME of Norse or Zalavar. Note that Norse and Zalavar must be
inside quotes because they are not numbers, and that SPSS is case
sensitive for text items.
Now, generate the descriptive statistics for the variables GOL, NOL, BNL,
BBH, and XCB; using only the Norse and Zalavar samples. Your output
should look something like the table above. Examine this table and note
that the sample size (N) is much less for this analysis, because you are
only using the Norse and Zalavar – not all the individuals. Copy and paste
the table that looks like the one above into your results file. Change the
table label from “Descriptive Statistics” to “Descriptive Statistics: Norse
and Zalavar”, and save your results file.
Use the SPSS explore procedure [Analyze->Descriptive Statistics>Explore] to list the descriptive statistics for GOL, separately for the Norse
and Zalavar samples. Unless you have changed something, SPSS will
still be using filtering that chooses only the individuals who are Norse or
Zalavar. In the “Dependent List” window enter GOL. In the “Factor List”
window enter POP (here we have to use a number and POP is the
number equivalent of POPNAME, where 1 = Norse, 2 = Zalavar, etc.).
Click on the “Plots” button and choose a histogram type of plot. Click
“Continue” and click “OK” to run.
You should get a table that looks like the one below. Examine this table
and convince yourself that the means (and other possibly interesting
information) are given separately for the Norse and the Zalavar). You
should get two histograms that look like the one below below, one for POP
= 1 and one for POP =2. Note that these histograms only vaguely
resemble a normal curve. Copy and paste the table that looks like the one
below into your results file. Change the table label from “Descriptives” to
“Descriptives: Norse and Zalavar Separately”. Copy and paste the two
histograms that look like the one below into your results file, and save your
results file.
95% Confidence
Interval for Mean
Lower Bound
Upper Bound
5% Trimmed Mean
Std. Deviation
Interquartile Range
95% Confidence
Interval for Mean
Lower Bound
Upper Bound
5% Trimmed Mean
Std. Deviation
Interquartile Range
for POP= 1
Mean = 184.17
Std. Dev. = 6.525
N = 111
Inferential Statistics. Inferential statistics refers to hypothesis testing. As you no
doubt remember from introductory statistics, this often involves constructing a
confidence interval based on a mean and some function of the standard
deviation. SPSS often does this slightly differently, but the principle is the same.
For this assignment, we will use an independent samples t-test to test the
hypothesis that the Norse and Zalavar have the same means for GOL.
Use the SPSS independent samples t-test procedure [Analyze->Compare
Means->Independent Samples T Test]. Enter GOL into the “Test Variables”
window. Enter POP into the “Grouping Variable” window. Click on the “Define
Groups” button and tell SPSS that Group 1 is 1 (Norse) and Group 2 is 2
(Zalavar). Click “Continue”, then click “OK”. Your output should include a table
that looks similar to the table below. Copy and paste this table into your results
file, and save your results file.
Independent Samples Test
Levene's Test for
Equality of Variances
t-test for Equality of Means
95% Confidence
Interval of the
Equal variances
Equal variances
not assumed
Sig. (2-tailed)
Std. Error
SPSS it almost always presents probabilities that a null hypothesis is true in a
column labelled “Sig”. Note that SPSS rounds probabilities to three decimal
places. This means that “sig” values are often presented as “.000". This does
not mean that the probability is actually zero – it might be .000248 or some
similar value, which is rounded to .000 using three decimal places. The correct
way to refer to a probability of “.000" is “less than .001" or “< .001".
To finish this assignment answer the following four questions at the
bottom of your results file. The document at http://statistics-help-forstudents.com/How_do_I_interpret_data_in_SPSS_for_an_independent_samples
_T_test.htm#.UNnd_Vvcjac is a good annotated output document that explains
how to interpret the SPSS t-test results. This document should be opened in
Mozilla Firefox or the tables won’t display correctly.
What is the null hypothesis about the means that is being tested in this
The table you copied to your results file has a section about the Levene’s
test for equality of variances. Given the results of this test, should you use
the t-test results for “Equal variances assumed” or for “Equal variances not
assumed”? Explain your reason for this choice.
What is the probability that the null hypothesis about the means (that you
gave as the answer to question 1) is true?
Do you accept or reject the null hypothesis about the means? What do
you then conclude about the means?
Submitting your assignment. Add to your results document an
“Acknowledgements and Bibliography” section in which you acknowledge your
collaborators and sources. Submit your results file (named
firstname_lastname_1.doc) to me through the Assignment Submission link I’ve
put on Moodle.
Advanced Anthropological Statistics (ANTY 408)
ASSIGNMENT 2: Multiple Regression
For this assignment I want you to focus on the following practical tasks:
learn how to do a multiple regression analysis using SPSS;
learn how to interpret the output SPSS provides (including examining and
testing hypothesis about regression, correlation, and prediction); and
beginning the process of learning to write scientific reports.
Scientific Report Format. Scientific reports usually follow a conventional format.
Most of the anthropology faculty, and certainly the editors of any journal you
might want to submit an article to, will want you to adhere to this format.
For most students, scientific report format seems odd and constraining at first.
This is because almost everything we learned about writing we learned in an
English class. In English classes, you learn how to write stories. Even the most
in-depth analysis is framed in the form of a story. A story has the characteristic
that it begins with a statement of a problem. It then moves fluidly through a
series of processes that lead to the conclusion of the problem. Finally, the
problem is concluded in some manner. A moment’s reflection will convince you
that this style applies to novels as well as to most non-scientific nonfiction you
have read. This is a good style in many ways. It is designed to capture the
reader’s attention and to keep the reader’s attention to the end. The assumption
is that the reader is going to read the work from start to finish.
Scientific reports however, start with a different assumption. They assume that
the reader will use the report as a reference. That is, that the reader is probably
not going to read the work from start to finish, but instead to add it to a collection
of similar works and refer to it for specific items of information as necessary. In
other words, a scientific report is used more like an encyclopedia than like a
novel. I know that there are some people who read the encyclopedia cover to
cover (I am one), but, let’s face it, those people are weird (proudly so in my
case). Given this, although scientific reports have some element of the basic
story pattern to them, they sacrifice the fluid movement through the steps of a
solution in order to compartmentalize information. When information is
compartmentalized it is easy to find. This style makes scientific reports difficult to
read, as many of you have no doubt observed. However, this difficulty
disappears as you gain experience reading scientific articles (which is why your
professors often assign a lot of them). As you gain experience working with
scientific reports, you will find that the convenience of being able to easily locate
information far outweighs the inconvenience of not having a coherent “plot line”
to the story.
Scientific format consists of five parts:
introduction: this is where the problem is described and given a context;
materials and methods: this is where the materials used are described,
along with the analyses performed upon them;
results: this is where the results of the analysis are presented, free from
discussion: this is where the results are interpreted; and
conclusions: this is where some conclusion to the problem is reached and
Most scientific reports will also include an abstract at the beginning, and a
bibliography at the end.
You can see how the scientific format compartmentalizes information, as a
service to the reader who is looking for particular types of information. For
example, if the reader (usually another researcher) wants to find out which SPSS
procedure the author(s) of the article used, he or she can turn directly to the
materials and methods section to find this information, without having to read
through the entire body of the article looking for it.
Introductions. For this assignment we will focus on introduction sections. Recall
that this is the section in which the problem is introduced and given a context.
There are three main things that are normally done in an introduction section:
a statement of the problem in context;
a literature review; and
hypotheses to be tested, or other focus of the research.
The statement of the problem in context and literature review may be combined
and integrated. By convention the hypotheses to be tested (or other focus of the
research) are at the very end of the introduction section, which makes them easy
to locate.
The statement of the problem in context should be in “funnel” format. Funnel
format means that you start with the broadest possible context, then proceed to
narrow it down until the “pointy end” of the funnel leads directly to the exact
problem being examined in the report. Here is an example, using very terse
statements that you would expand to sentences or paragraphs in an actual
report. The topic is estimating stature from femur length in Asian immigrants to
the U.S.
Statement Purpose
Crime exists
Provide the broadest possible context by giving a
broad statement about the nature of the world that
is of interest to a broad group of people.
Some crimes result in
unidentified skeletal
Narrows the focus to a certain group of crimes,
which are still of interest to a broad group of
Stature is an important
Connects stature to identity and to the process of
part of the description of a identifying skeletal remains in criminal cases.
missing person
Narrows the focus to stature estimation.
There is a physical
(possibly genetic)
relationship between
femur length and stature
Provides a reason why your method might work.
Narrows the focus to stature estimation from femur
There does not exist a
formula for estimating
stature from femur length
for Asian Americans
Presents a practical problem to be solved. Narrows
the focus to a specific population
Therefore, I propose to
develop a formula for
estimating stature from
femur length for Asian
States your exact project. Narrows the focus to
your exact project.
Notice how each statement narrows the focus of interest. After being led down
this funnel, the reader is clear about what the problem is and why it is interesting
and important. Notice how the statements made can also guide you as to what
literature you would need to review in order to carry out the project.
Assignment part 1: Writing an introduction. We will focus in this assignment
in figuring out what to put in the introduction section. I am not interested in
having you review literature for this assignment, but I do want you to think about
“funnel” format as described above.
Begin by creating a results document to store your answers and results to be
submitted. Open a new Microsoft word document and put your name and
assignment 2 at the top. Save this file to your memory stick as
firstname_lastname_2.doc (where firstname is your first name, and lastname is
your last name).
Now, assume that you have a business that makes high quality custom hats for
people. Unfortunately, you are just starting the business and can’t afford the
instrument that most hat makers would use to measure the length of the head
from front to back, GOL in our data set, although you do own the less expensive
instrument for measuring skull breadth at several points. Therefore you need to
find a method for estimating GOL from measures of the width of the head: XCB,
XFB, STB, ZYB, and AUB in our data set (the B in all these abbreviations stands
for Breadth).
In your results document, create a table similar to the one above. The first
statement should be “everybody wears a hat sometimes”, and the last statement
should be “therefore, I will develop a formula for estimating head length from five
measures of head breadth”. Fill in the missing statements in between. For each
statement, including the ones I just gave you, identify the purpose of the
statement for achieving the “funnel” effect.
Assignment part 2: The Analysis. Use the SPSS regression procedure [Analyze
– Regression – Linear] to perform a multiple regression analysis to develop a
formula for estimating GOL (the dependent variable) from XCB, XFB, STB, ZYB,
and AUB (the independent variables). Use all the individuals in the data set (i.e.
don’t use any form of case selection). Choose the stepwise regression method,
in the “Method” box that is immediately under the list of independent variables.
Now we are going to “trick” SPSS into showing the results for multiple regression
analyses using 1 variable, 2 variables, 3, 4, and all 5 variables. To do this, while
the “Linear Regression” window is showing, click on the “Options” button and
make sure that the “Use probability of F” radio button is selected, then in the box
for “Entry” type in 0.99 and in the box for “Removal” type in 1.0. Click continue.
SPSS uses these value to decide whether adding another variable to the
regression formula would make it significantly more accurate, and whether
removing a variable from the regression formula (in the backward or remove
methods) would make it significantly less accurate. By default, “significantly”
more or less accurate are set to 0.05 for entry and 0.10 for removal, meaning
that if SPSS is 95% confident that adding the variable will make the formula more
accurate then the variable will be added; and if SPSS is 90% confident that
removing a variable will not make the formula less accurate, then it will be
removed. Setting the “Entry” value to 0.99 tell SPSS that it only needs to be 1%
confident that adding a variable will increase the accuracy of the formula. The
number in the “Removal” box must be larger than the number in the “Entry” box,
so we will set it to the highest possible value, which is 1.0.
Finally, click “OK” to run the analysis.
The document at http://core.ecu.edu/psyc/wuenschk/MV/multReg/intromr.docx
does a good job of explaining all the concepts involved with multiple regression
and how to implement them in SPSS.
Save the output produced by SPSS, by exporting it to Word format. Then add it
to your results document.
Assignment part 3: Interpretation. Answer the following questions at the bottom
of your results document. The document at
http://www.ats.ucla.edu/stat/spss/output/reg_spss.htm is an annotated output
document for the SPSS regression procedure, and you should refer to it while
answering these questions.
What is the multiple regression formula for estimating GOL from XCB,
XFB, STB, ZYB, and AUB. You want the formula that will allow you to
take measurements of XCB, XFB, STB, ZYB, and AUB, plug and chug,
and generate an estimate of GOL. [Hint, use the unstandardized
coefficients from the coefficients table.]
What is the multiple correlation between the independent variables XCB,
XFB, STB, ZYB, AUB and the dependent variable GOL? [Hints: Look at
the Model Summary table. SPSS symbolizes multiple correlation as R.]
What is the amount of variability in GOL explained by the combined
effects of XCB, XFB, STB, ZYB, and AUB? [Hint: this is asking for a
coefficient of determination. You remember how these are calculated
from a correlation, don’t you?]
Say that you obtained the following measurements from an individual:
XCB = 139mm, XFB = 117mm, STB = 114mm, ZYB = 140mm, AUB =
121mm. What is your estimate for this individual’s GOL? You will need to
use a calculator. Show your work.
What is a 95% confidence interval (i.e. based on two standard errors in
either direction) for your estimate) of GOL? [Hint: Look at the Model
Summary table for some critical information.]
Are all the coefficients in your multiple regression formula (including the
constant) significantly different from zero? Which are not significantly
different from zero and how do you know? [Hint: You only need to look at
model 5, and if you asked for confidence intervals as the instructions
specified the work has been done for you in the Coefficients table.]
What is the single most important of the breadth measurements for
estimating GOL? Why does this variable seem to be the most important
As the hat maker, you are most interested in finding the simplest
regression formula for estimating GOL. What is the regression formula for
estimating GOL from the single most important breadth measurement?
Submitting your assignment. Add to your results document an
“Acknowledgements and Bibliography” section in which you acknowledge your
collaborators and sources. Send your results document (named
firstname_lastname_2.doc) to me using the Assignment Submission link I’ve put
on Moodle.
Advanced Anthropological Statistics (ANTY 408)
ASSIGNMENT 3: Principal Components & Factor
For this assignment I want you to focus on the following practical tasks:
learn how to do principal components analysis using SPSS;
learn how to do factor analysis using SPSS;
learn how to interpret the output SPSS provides; and
continue the process of learning to write scientific reports by examining
the materials and methods section.
Assignment part 1: The Analysis. We will do both a principal components
analysis and a factor analysis.
Begin by creating a results document for storing your answers and results to be
submitted. Open a new Microsoft word document and put your name and
assignment 3 at the top. Save this document to your memory stick as
firstname_lastname_3.doc (where firstname is your first name and lastname is
your last name).
Principal Components Analysis (also called “principle” components
analysis). The document at
http://core.ecu.edu/psyc/wuenschk/MV/FA/PCA-SPSS.docx explains the
concepts of principal components as implemented by SPSS, and how to
perform a principal components analysis. The author, Karl L. Wuensch,
has a different point of view from mine on how factor analysis and
principal component analysis differ, but that should not be a problem.
Use the SPSS factor procedure [Analyze – Dimension Reduction -Factor] to perform a principal components analysis of the class data set.
Use all the variables except ID, POP, POPNAME, and SEX (it won’t give
you POPNAME as a choice).
Click the “Extraction” button and make sure that the “Method” is “Principal
components”, “Extract” is set to “Based on Eigenvalue”, and that
“Unrotated Factor Solution” is selected in the “Display” area. Click
“continue” to get back to the Factor window. Click the “Rotation” button,
and make sure that “None” is selected for “Method”. Click “continue” to
get back to the factor window, then click “OK” to run the analysis. Your
output window will call this a “Factor” analysis, even though it’s only a
Principal Components analysis at this point.
Save the output produced by SPSS, by exporting it to Word format. Then
add it to your results document.
Factor Analysis. The document at
http://core.ecu.edu/psyc/wuenschk/MV/FA/FA-SPSS.docx continues the
discussion in the document I gave you above for principal components
analysis (with a somewhat skewed point of view in my opinion).
Use the SPSS factor procedure [Analyze – Dimension Reduction -Factor] to perform a factor analysis of the class data set. Use all the
variables except ID, POP, POPNAME, and SEX. Click the “Extraction”
button, set the “Method” to “Principal axis factoring”. Click “continue” to
get back to the Factor window. Click the “Rotation” button, and set the
“Method” to “Varimax”. Click “continue” to get back to the factor window,
then click “OK”.
Save the output produced by SPSS, by exporting it to Word format. Then
add it to your results document.
Assignment part 2: Interpretation. Answer the following questions at the bottom
of your results document.
Interpreting the results of PCA. The document at
http://www.ats.ucla.edu/stat/spss/output/principal_components.htm is an
annotated output document for the SPSS principal components
procedure, and you should refer to it while answering these questions.
What proportion of the variation in WCB can be explained by the
principal components?
How much of the total variation in the entire data set can be
accounted for by the first 4 principal components?
Why were only 4 principal components (out of 20) selected as
“significant”? What is the significance of an eigenvalue of 1?
Interpreting and naming the principal components. I will do the first
three to show you how, then ask you to name the fourth one.
Interpretation of principal components or factors involves examining
the coefficients for each principal component or factor as presented
in an appropriate SPSS output table. The size of the coefficient is
called its “loading”. If a principle component or factor coefficient for
a certain variable is large, we say that this variable “loads highly” or
“has a high loading” on this principal component or factor. A
variable has a “small loading” on a principal component or factor if
the coefficient for that variable is small for that PC or factor. We
also describe loadings as positive or negative, depending on
whether the coefficient is positive or negative.
Principal component 1 (PC1) has the classic signature of a size
component. All of the variables have positive loadings and all but
three of them are high loadings (let’s say high = over 0.50 for this
PC2, PC3, and PC4 have a mixture of positive and negative
loadings, and so must represent shapes rather than sizes, where
these shapes are defined by contrasts in dimensions. To interpret
PC2, PC3, and PC4, we need to know more about the variables.
All of the variables that end in an L (GOL, NOL, etc.) are length
variables, meaning that they are measured in the anterior (front) to
posterior (back) dimension. All of the variables that end in B (XCB,
STB, etc.) are breath variables, meaning that they are measured in
the side to side (right to left) dimension. All of the variables that
end in H are height variables, meaning that they are measured in
the superior (up) to inferior (down) direction.
Further, the first 10 variables, GOL through WCB are measurments
of the entire skull. The next 9 variables, ASB through MAB are
measurements of the face. The 20th variable, MDH is the height of
the mastoid process, which is located behind the ear. Ignoring
MDH, we have a set of skull lengths, breadths, and heights; and a
set of face lengths, breadths, and heights.
Looking at the loadings (coefficients) for component 2, we see that
there are both positive and negative loadings. The largest single
positive loading is .679 for BPL, a face length; and the single
largest negative loading is -.615 for STB, a skull breadth.
Therefore, we know that this component contrasts lengths and
breadths, at first glance contrasting face length with skull breadths.
We must modify this slightly, however, when we examine which
other variables have high or moderate loadings. We note that
GOL, NOL, and BNL, all skull lengths, also have moderately large
positive loadings compared to the other variables. So now we
know that the length being contrasted here is both face length and
skull length. When we examine which other variables have high
negative loadings, we find XCB and XFB, which are both skull
breadths like STB. So now we know that this principal component
contrasts skull breadth with face and skull length. Therefore, I will
call it “skull breadth vs face and skull length”.
Examining PC3, we see that the single highest positive loading is
for NLB, a face breadth; and that the loading for STB, a skull
breadth, is nearly as high and positive. We also see that the single
highest negative loading is for OBH, a face height; and that NPH
and NLH, also face heights, also have high negative loadings.
Therefore, this component primarily contrasts face and skull
breadth with face height, and I will call it “face and skull breadth vs
face height”.
Apply this type of reasoning to PC4 to give it a name that describes
the shape contrast it represents.
Interpreting the results of FA. The document at
http://www.ats.ucla.edu/stat/spss/output/factor1.htm is an annotated
output document for SPSS factor analysis.
Are there the same number of “significant” factors as there were
significant principal components?
Examine the Factor Matrix, which presents the loadings
(coefficients) of the variables before rotation. Have the loadings
changed significantly from your PCA? Explain why you would or
would not give these factors the same names as their
corresponding principal components.
Examine the Rotated Factor Matrix, which presents the loadings
(coefficients) of the variables after rotation. Have the loadings
changed significantly from those in the Factor Matrix? Explain
whether it seem to you as if rotation made the pattern of high and
low loadings more interpretable or less interpretable than they were
before rotation?
Rotation of the factors, and varimax rotation in particular, seeks to
make some factors larger while making others small. Therefore,
we need to switch our strategy for interpreting the factors slightly.
Instead of focusing on the contrasts (large positive vs large
negative), focus only on which variable exhibit high positive
loadings and then figure out what they represent.
Do the new loadings change your interpretation and naming of the
factors? I will interpret factors 1 and as examples, and leave the
remaining two factors for you to interpret and name. Factor 1
Unrotated factor 1, like principal component 1 is clearly a size
variable (all variables have fairly high positive loadings). However,
after rotation, only GOL, NOL, BNL, and BPL have high loadings.
Therefore, this factor now represents skull and face length instead
of general size, and I will call it the “skull and face length” factor, or
perhaps more simply, the “length” factor.
Unrotated factor 2 seems to contrast skull breadth (XCB, XFB,
STB) with face length (BPL). Rotated factor 2, however, seems to
represent skull breadth, with maybe some smaller contribution of
face breadth. I will call this the “skull breadth” factor.
Scientific Report Format: The Materials and Methods Section. The materials and
methods section is the second of the five parts of a scientific document. As you
might infer from the name, this is the section in which the writer describes the
materials used in the analysis and the methods used to analyze them. It is
normally divided into two subsections – you guessed it – a materials subsection
and a methods subsection.
The Materials Subsection. In this subsection you should present all the
information known about the materials (specimens, subjects, items, etc.)
analyzed. Include citations to published descriptions if possible. I will list
the important items of information to include and provide examples based
on the Boaz anthropometric database.
What types of materials are these? The Boaz database is
described by Jantz et al. (1992). It presents measurements and
other data about more than 15,000 primarily Native American
individuals of both sexes and all ages, with some persons of other
populations incidentally included.
Who collected the information and when? The data were collected
by people hired and trained by Franz Boaz, in the late 1800's.
What measurements or other data are included? The data include
last name, first name, age group, age, whether age is an estimate
or exact, birthplace, sex, age, tribe, band, purity, blood quantum,
mother’s tribe, father’s tribe, occupation, standing height, shoulder
height, finger height, finger reach, sitting height, shoulder width,
head length, head breadth, face height, face breadth, nose height,
nose breadth, ear height, hand length, weight, whether these
measures are estimated or exact, the observer name, the place of
observation, and the date of observation.
Any additional observations about the condition or nature of the
materials. The people in the Boaz database were alive at the time
the measurements were taken.
The methods subsection. In this subsection, you should thoroughly
describe all the methods used in your analysis. Here I will provide an
example based on a study I did in the late 1990's using the Boaz dataset
(Skelton, 1997).
A discriminant function for sex was constructed for the individuals
age 20 and older, using the SPSS-X Discriminant procedure on the U.M.
campus mainframe. This function was then used to classify all 13530
individuals in the data set [who did not have missing data]. Mean sexing
accuracy by age was calculated using the SPSS-X Means procedure.
The results were downloaded and imported into Word Perfect 5.1, which
was used to format them for import into Quattro Pro 4.0. Finally, the
formatted results were imported into Quattro Pro, which was used to plot
sexing accuracy vs age for the males and the females. Overall sexing
accuracy was obtained by averaging the accuracies for the males and the
females, and was also plotted.
Three methods for correcting for the effect of size were applied to the
data. First a principal components analysis was performed using the
SPSS-X Factor procedure. Second, a form of size scaling was attempted,
wherein the values for each of the variable are summed to yield an overall
size variable. Each value was then divided by the overall size variable.
This method is widely known to be an inefficient way to adjust for size.
Third, the data were divided into three age groups: 1 to 12, 13 to 19, and
20 or older. Each variable was regressed on age, using the SPSS-X
Regression procedure, separately for each of the age groups. The
residuals for each variable, after the effect of age was accounted for by
this procedure, were retained and a discriminant analysis was performed
using them. Finally, the sexing accuracies by age of the size-scaling and
the residuals procedures were average, downloaded, and plotted as
described above.
Assignment part 3: Writing a Materials & Methods Section. Now that you have
performed these analyses, you know exactly what you did.
Write a materials and methods section, as you would for a scientific report, using
the information and examples I gave you above. The codebook for our class
dataset and inspection of the data itself should give you all you need to know to
write a materials subsection. What you did to produce your results will form the
basis of your methods subsection.
Add your materials and methods section to the bottom of your results document.
Submitting your assignment. Add to your results document an
“Acknowledgements and Bibliography” section in which you acknowledge your
collaborators and sources. Send your results document (named
firstname_lastname_3.doc) to me using the Assignment Submission link I’ve put
on Moodle.
Advanced Anthropological Statistics (ANTY 408)
ASSIGNMENT 4: Discriminant Functions Analysis
For this assignment I want you to focus on the following practical tasks:
learn how to do discriminant functions analysis using SPSS;
learn how to interpret the output SPSS provides; and
continue the process of learning to write scientific reports by examining
the results section.
Assignment part 1: The Analysis. We will do three discriminant functions
analyses, two for sex, and one for population.
Begin by creating a results document to hold your results. Open a new
document in Word. Put your name and assignment 4 at the top and save it to
your memory stick with the file name firstname_lastname_4.doc (where firstname
is your first name and lastname is your last name).
Discriminant Functions Analysis (DFA) for sex: All populations. Use the
SPSS Discriminant procedure [Analyze – Classify – Discriminant] to
perform a discriminant functions analysis to produce a formula and
sectioning point to distinguish females from males using the
measurements in the class data set. The document at
explains the concepts of discriminant functions analysis.
Use all individuals in the data set (i.e. don’t do a data – select cases).
Enter SEX as the “Grouping Variable” and define the range as minimum =
1 and maximum = 2.
To keep the number of variables from getting out of hand, choose only the
following 5 measured variables, and enter them into the “Independents”
area: GOL, BBH, XCB, ZYB, and MDH. Thus we will have a
representative skull length, skull height, skull breadth, face breadth, and
mastoid height.
Click on the “Statistics” button, and choose “Unstandardized” for “Function
Coefficients”, then click “Continue”. Click on the “Classify” button and
make sure that “All groups equal” is chosen for “Prior Probabilities”; and
choose “Summary table” for “Display”; then click “Continue”. Click “OK” to
run the analysis.
Save the output produced by SPSS, by exporting it to Word format. Then
add it to your results document.
DFA for sex: Zalavar Only. Use the SPSS select cases procedure [Data –
Select Cases – If condition is Satisfied – If...] to choose only the Zalavar
individuals by entering ANY(POP,2) into the appropriate text box. Return
to the Discriminant procedure, and run the analysis again with the same
conditions and variables as in part A above (except that you are using only
the Zalavar instead of all individuals). Click “OK” to run the analysis.
Save ONLY the “Canonical Discriminant Function Coefficients” table and
the “Classification Results” table, by copying them from the SPSS output
window and pasting them into the bottom of your results document. Edit
the labels of these tables to “Canonical Discriminant Function Coefficients:
Zalavar Only” and “Classification Results: Zalavar Only”.
DFA for Population. We will do a DFA for population, but if we use all
populations, we will get a large, difficult to interpret, messy, output. So,
let’s simplify by using only four populations.
Use the SPSS select cases procedure [Data – Select Cases – If condition
is Satisfied – If...] to choose only the Zalavar, Teita, Zulu, and Australian
populations, ANY(POP,2,4,6,7) into the appropriate text box.
Return to the Discriminant procedure, and change the “Grouping Variable”
to POP. Define the range for POP to be minimum = 2, maximum = 7.
Enter all 20 measured variables into the “Independents” list. Let’s say that
we are interested in the simplest possible discriminant function, so click
the “Use Stepwise Method” radio button. The stepwise method works
similarly to how it worked for regression analysis, and will produce a
formula that balances the number of variables with accuracy.
Click on the “Classify” button and choose a “Combined-Groups” type of
plot, then click “Continue”. Leave all the other options the same as in part
A above. Click “OK” to run the analysis.
Save the output produced by SPSS, by exporting it to Word format. Then
add it to your results document.
Assignment part 3: Scientific Report Format – the Results Section. In your
results section of a scientific document you should present the results – just the
results – nothing but the results. In other words, just the results – nothing about
methods, and no discussion or interpretation of the results.
There is an exception for conference papers, in which the results and discussion
sections are often merged to achieve a pleasing flow to the presentation.
A results section should include an explanation of all the tables and figures
included. This can normally be done using a caption attached to the tables and
figures themselves. Here is an example from a paper I published several years
ago (Skelton, 1996).
| 103
|Cherokee-OK | 108
| 159
| 111
[... cut ...]
| 4182
| 467
| 1993
| 596
| 1943
| Cherokee-OK, Cheyenne, Haida, Hoopa
| Coahuilla, Haida, Mississauga, Navajo
| Chickasaw, Choctaw, Piegan, Sioux
| Chippewa, Comanche, Malecite, Ute
| Cherokee-OK, Cherokee, Chilcotin, Sioux
| Cherokee, Concow, Munsee, Tsimshian
| Coahuilla, Crow, Okanagan, Tsimshian
[... cut ...]
|Adult Males
| 75.60% | 76.78% | 75.93% | 74.47% | 76.25% |
|Test Set
| 85.96% | 87.46% | 86.49% | 86.36% | 82.76% |
|Adult Females
| 76.26% | 76.98% | 76.32% | 64.16% | 69.87% |
|Subadult Males |
| 52.91% | 55.39% | 60.72& | 57.08% | 53.29% |
|Subadult Females|
| 59.20% | 61.40% | 66.18% | 53.17% | 54.93% |
| 69.98% | 71.60% | 73.13% | 67.05% | 67.42% |
Assignment: Earlier you added the results of the discriminant functions
analysis for sex using all populations to your results file. Go through that
part of your output, and add a caption before each table or figure that
describes what information is presented in the table or figure. In the
captions, number the tables sequentially starting with the first one in the
output (e.g. Table 1, Table 2 ...). Figures (any graph or plot) should also be
numbered sequentially (Figure 1, Figure 2 ...), but as a separate list from
the tables. You should start at the top of your document, with the table that
says “Analysis Case Processing Summary”. Stop when you get to the end
of your results for sex using all populations.
Assignment part 3: Interpretation. Answer the following questions at the bottom
of your results document.
Interpreting the results of DFA. The document at
http://statistics.ats.ucla.edu/stat/spss/output/SPSS_discrim.htm is an
annotated output document for the SPSS discriminant procedure. You
should refer to this document while answering the questions below.
The first 5 questions refer to your DFA results for sex using all
Is the discriminant function significant? Explain how you
determined this?
Examining the “Standardized Canonical Discriminant Function
Coefficients” table, which single variable has the highest loading on
the discriminant function? Examining the “Structure Matrix” table,
which variable is most correlated with discriminant function score?
Is this the same variable in both cases? Sometimes it is not.
Different authorities favor one or the other of these ways of
determining which variable(s) is most important in distinguishing the
groups being examined by the discriminant function (which is why
SPSS gives you both).
Let’s say that you are a forensic anthropologist interested in
distinguishing females from males based on skull measurements.
What is the discriminant function formula for doing this? What is
your sectioning point, and how will you use it to decide whether a
skull is from a male or a female?
What is the best accuracy you can hope for using your discriminant
function formula?
Say that you have a skull with the following measurements: GOL =
184, NOL = 184, BNL = 101, BBH = 127, XCB = 135, XFB = 116,
STB = 112, ZYB = 125, AUB = 117, WCB = 68, ASB = 109, BPL =
102, NPH = 69, NLH = 52, OBH = 38, OBB = 40, JUB = 112, NLB =
27, MAB = 61, MDH = 27. Use the discriminant function you
developed in question 3 to determine the sex of this individual.
Show your work.
The next 2 questions refer to your analysis for distinguishing sex
among the Zalavar only.
Are the discriminant function coefficients the same for the Zalavar
only as they were for all individuals/populations? What does this
tell us about the relationship between discriminant functions, and
the nature of the sample used in finding them?
Is the accuracy of sex determination for the Zalavar higher or lower
for all individuals/populations? Why do you think explains this?
The rest of the questions refer to your analysis for distinguishing the
four populations.
How many functions did this analysis produce? What is the
relationship between the number of groups and the number of
How many of the original 20 variables were used in the discriminant
function for population?
Examine the table labeled “Functions at Group Centroids”, which
shows the discriminant function score for each group based on their
centroid (set of means for each variable) on each function. Note
the following facts.
Function 2 separates the Zalavar (POP 2) and the
Australians (POP 7), which have negative values; from the
Teita (POP 4) and Zulu (POP 6), which have positive values.
On function 1, the Zalavar have a large positive value and
the Australians have a large negative value.
On function 3, the Teita have a large negative value and the
Zulu have a large positive value.
How would you use these three facts to form a strategy for using
the three discriminant functions to sort individuals into the four
Examine the Combined Groups Plot. You should be able to clearly
see that function 2 separates Teita and Zulu from Zalavar and
Australian, and that function 1 separates Zalavar from Australian.
However, function 3 is not represented on this plot. If we had an
unknown individual with a score of 2.0 on function 2 and 0.0 on
function 1, what group centroid would this individual be closest to?
Could we accurately distinguish between whether this individual is
Zulu or this individual is Teita using this plot?
In the previous assignment you learned how to interpret and name
Principal Components by examining the coefficients. The same
procedure can be done with discriminant functions to get an idea of
what variables are important in distinguishing groups. Examine the
second discriminant function only (don’t worry about the others) as
presented in the “Standardized Canonical Discriminant Function
Coefficients” table. This function distinguishes the two African
populations (Teita and Zulu) from the European (Zalavar) and
Australian populations. There are two variables that have high
loadings but different signs. The high loadings suggest that these
variables contribute substantially to distinguishing African from nonAfrican populations in this analysis. Give this function a name in
terms of the contrast between these two variables.
Submitting your assignment. Add to your results document an
“Acknowledgements and Bibliography” section in which you acknowledge your
collaborators and sources. Send your results document (named
firstname_lastname_4.doc) to me using the Assignment Submission link I’ve put
on Moodle.
Advanced Anthropological Statistics (ANTY 408)
ASSIGNMENT 5: Cluster Analysis
For this assignment I want you to focus on the following
practical tasks:
hierarchical cluster analysis;
formulating models of what your output should look like given
different scenarios;
interpreting dendrograms; and
continuing the process of learning to write scientific reports by
focusing on discussion sections.
Assignment Part 1: Clustering Individuals. One of the most important
uses of cluster analysis is to probe a data set to see whether
subgroupings exist within it. Using this approach you do not assume that
any subgroupings exist within the data. For example, we know that there
are two sexes and 20 groups represented, but we will assume (pretend?)
that we do not know this and see whether any clusters emerge that seem
to represent sex or population groups. This is called exploratory cluster
analysis. We will do this first.
Begin by creating a new Word document with your name and Assignment
5 at the top. Save this file to your memory stick as
The document at
om/textbook/cluster-analysis/?button=1 does a good job of explaining the
concepts of various forms of cluster analysis. We will use only
hierarchical clustering, which this document refers to as “joining
Perform a cluster analysis on the class data (anth402data.sav) using the
SPSS Hierarchical Cluster procedure [Analyze – Classify – Hierarchical
Cluster]. We will perform a basic analysis with no frills this time, so that
you can see what sorts of things you can ask for in the output.
Use all variables except ID, SEX, POP, and POPNAME.
In the “Label by” text area, enter either POP (if you want to
see the numbers) or POPNAME (if you want to see the
Click the “Statistics” button and make sure that
agglomeration schedule is checked, and proximity matrix is
not checked; then click “Continue”.
Click the “Plots” button and make sure that “Dendrogram” is
checked. For Icicle, choose none; then click “Continue”.
Click the “Methods” button and make sure that the “Method”
is “Between-groups linkage” (also called UPGMA); make
sure that the “Measure” is “Squared Euclidean Distance”;
and set “Transform Values” to “Z scores” and “By variable”;
then click “Continue”.
Now, uncheck the box for “Statistics”.
Finally, click “OK” to run the analysis.
Note that this produces an output window with a single huge dendrogram
(containing 1852 branches) which is extremely difficult to interpret because it is
so large. In fact, only part of it shows in the SPSS output window.
Unfortunately, I was unable to find an good annotated output document
for SPSS cluster analysis, but the document I referred you to previously
explains the three main features of cluster analysis output: the
dendrogram, the agglomeration schedule, and the icicle plot.
The Dendrogram. The most important part of the output, in my opinion is
the dendrogram. A dendrogram is a type of evolutionary tree, that in this
case presents the order of relationships between the clusters. In fact,
dendrograms are the most general type of evolutionary tree, and other
types, such as cladograms and phylograms are categories of dendrogram.
The most important item in interpretation of dendrograms is the branching
relationships – not necessarily how close together they appear when
printed to paper or the screen. The required browsing article by Gregory
(2008) does a good job of explaining how to interpret evolutionary trees.
Capturing a Dendrogram. One commonly encountered problem with
dendrograms is that they often fail to export cleanly to a word processor.
Our normal procedure of exporting the SPSS output window to a Word
document is almost guaranteed to completely foul up the format of a
dendrogram. Cutting and pasting works somewhat better, but still leaves
something to be desired. The best way I have found to do this is to right
click on the dendrogram in the SPSS output window, then choose “SPSS
Rtf Document Object” and “Edit” from the menu that appears. The
dendrogram will open as a document in an editor window. Now, highlight
the dendrogram by placing your cursor at the top of this document,
clicking the left mouse button and HOLDING IT DOWN. Now, while
holding the left mouse button down, scroll down to the bottom of the
document. This will require some fancy gyrations of the mouse to get the
editor window to scroll down. I find that putting the cursor below the editor
window and moving the mouse up and down rapidly seems to work best.
Once you are at the bottom of the document and the dendrogram is
highlighted, release the left mouse button. Now press ctrl-c to copy the
dendrogram, go to your Word document, and paste using ctrl-v. The
dendrogram should now be in Word and look pretty good.
After looking at, and puzzling over, the dendrogram in your output window,
close the output window. You can try to capture this dendrogram while
keeping your fingers crossed, praying, shaking a rattle, or whatever you
personally do as a luck attraction ritual, but even if you succeed it’s very
difficult to interpret a dendrogram this complex. Simplification is in order
and we will accomplish that by clustering the group means.
Analysis Part 2: Clustering the Means. The dendrogram produced using
all 1853 individuals was too large to use in any practical way. The solution
is to use the means for each population. There are 20 populations in the
data set, so using their means reduces the number of dendrogram
branches enormously. Let’s run some cluster analyses that we can
actually interpret, using the means data.
Getting the data set. I have put a new data set, named
“anth402data_means.sav” on Blackboard in the assignments area.
You should download it to your memory stick, then load it into
SPSS. Unfortunately, getting a file of means data from SPSS is not
a trivial process and I have placed a document on Blackboard titled
“Getting Means from SPSS” that documents the procedure for
doing this, if you are interested in how it is done.
Evaluating population clusters. Let’s see whether the data contains
clusters that reflect population relationships. Before running the
analysis think about what an evolutionary tree of the relationships
between these groups ought to look like. An expected outcome of
this sort is one form of a model.
Formulate a model for how these populations should cluster if the
similarities between them are due to ancestral genetics. Formulate
a model for how these populations should cluster if the similarities
between them are due to adaptation to similar environments, or
perhaps non-genetic effects. For example, a comparison between
the U.S. Negros and the two African population, the Zulu and the
Teita, might be informative in distinguishing these two models.
Reduce these models to a couple of sentences. Call them
“Ancestral Genetics Model” and “Not Ancestral Genetics
Now, let’s do the analysis.
Select the means for the sexes combined (both). You
should know how to use the SPSS select cases procedure to
do this. If not, review the procedure in previous
assignments. [Hint: ANY(SEX,3).]
Go to the hierarchical cluster procedure [Analyze – Classify
– Hierarchical Cluster].
Use all variables except SEX, POP, POPNAME, and the
filter_$ variable (which will usually be the last variable)..
Label cases by NAME.
Uncheck Display Statistics so we can avoid getting an
agglomeration schedule.
Click the “Plots” button, check the checkbox for
“Dendrogram” and set icicle plot to “None”, then click
“Continue”. This prevents SPSS from sending an icicle plot
to the output.
Click the “Methods” button and make sure that the “Method”
is “Between-groups linkage”; make sure that the “Measure”
is “Squared Euclidean Distance”; and set “Transform Values”
to “Z scores” and “By variable”; then click “Continue”.
Click “OK” to run the analysis.
Copy the dendrogram from the SPSS output window using the
procedure described previously and paste it into your results
file. Give it an informative label, such as “Dendrogram For
Population, Sexes Combined”.
Sex vs Population. Let’s do a cluster analysis for sex and
population together. I believe that you will find this analysis to be
the most informative. Before running the analysis formulate a
model of what the dendrogram will look like if population is the most
important cause of clustering in the data. Also formulate a model of
what the dendrogram will look like if sex is the most important
cause of clustering in the data. Reduce these models to a couple
of sentences. Call them “Population First Model” and “Sex
First Model”.
Now, let’s do the analysis.
Use the SPSS select cases procedure to tell SPSS to use
the males and the females, but not the entries for both sexes
combined. [Hint: ANY(SEX,1,2).]
Run a cluster analysis using the same settings as above.
Copy the dendrogram from the SPSS output window and paste
it into your results file. Give it an informative label, such as
“Dendrogram for Sex and Population”.
Analysis Part 3: Interpretation. Here are some questions that you
should answer at the bottom of your results file.
The document at
http://txcdk.unt.edu/iralab/sites/default/files/Hierarchical_Handout.pdf is an
annotated output document for hierarchical clustering. You should refer to
it to help in answering these questions.
The first questions refers to Analysis Part 2B: Evaluating population
For this analysis you developed two models, “Ancestral genetics
model” and “Not Ancestral Genetics Model”. Reduce these models
to a couple of sentences each and write them down here.
Which of these two model above does your dendrogram labelled
something like "Dendrogram For Population, Sexes Combined"
seem to support? Explain how you came to this conclusion.
The rest of the questions refer to Analysis Part 2C: Sex vs Population.
For this analysis you developed two models, “Population First
Model” and “Sex First Model”. Reduce them to a couple of
sentences each and write them down here.
Which of your two models (sex first or population first) seems to
more accurately explain the pattern of clusters in the dendrogram
labeled something like "Dendrogram for Sex and Population”?
Explain how you came to this conclusion.
When I ran this analysis the Bushman males clustered with a group
that consisted otherwise entirely of females, and the Buriat females
clustered with a group that consisted otherwise entirely of males.
You may or may not get this result. As an anthropologist, you
should know something about these populations (at least the
Bushman population), which have been extensively studied by
ethnographers and physical anthropologists. What characteristics
of the Bushman and Buriat populations might account for this
misclassification? (Hint 1, it’s something that standardizing the
variables as Z scores should prevent, but doesn’t seem to in this
case. Hint 2, it’s what the first principal component almost always
Assignment Part 4. Scientific Paper Format: Discussion Sections. The
fourth part of a scientific paper is the discussion section. In the discussion
section you should present your interpretations of your results. Do not
repeat the results themselves. You should also discuss any limitations or
problems with your analysis or data that may have given you inaccurate
results and, therefore, an incorrect interpretation. You should also discuss
what further research should be done to confirm or augment your analysis
(the basis of future papers).
Below is the discussion section of a paper I presented some years back at
the Northwest Anthropological Research Conference.
These results demonstrate that trait list bias can have an effect on an
evolutionary analysis, and that Strait et al. did not use a method that adequately corrects
for the effects of trait list bias. When trait list bias is corrected for, a result similar to the
phylogeny shown in Figure 1 is obtained, and when trait list bias is not corrected for, a
result similar to the phylogeny shown in Figure 2 is obtained.
Therefore, these two phylogenies should be regarded as competing hypotheses,
and one's choice of which one to consider more accurate depends on two questions: 1)
whether one believes that trait list bias should be accounted for, and 2) whether one
believes that the Skelton and McHenry (1992) method of grouping traits by function is
the most appropriate way to handle trait list bias. Though I obviously favor answering
both questions in the affirmative, Interestingly, at the annual meeting of the American
Association of Physical Anthropologists, earlier this month, Strait (1998) presented the
results of his investigation of the basicranial flexion functional complex. As I understand
from what people who listened to his presentation tell me (Sperazza, personal
communication), he believes that we constructed this functional complex incorrectly in
our analysis. It is quite likely that we did get it at least partly wrong, and we welcome
this sort of re-evaluation of the intercorrelation of traits.
(Skelton, 1998)..
Note that this discussion section has all the elements I listed above. First,
it gives an interpretation of the results (These results demonstrate that trait
list bias can have an effect on an evolutionary analysis...; Therefore, these two
phylogenies should be regarded as competing hypotheses, ...). It mentions
possible limitations or problems with the analysis or data (... one's choice of
which one to consider more accurate depends on two questions: ...; ...he
believes that we constructed this functional complex incorrectly in our analysis).
Directions for further research are also mentioned (... this is a problem that
the field as a whole needs to address through continued research and debate.
;... we welcome this sort of re-evaluation of the intercorrelation of traits).
Write a discussion section at the bottom of your results document,
as you would for a scientific research paper. Looking over the four
questions I asked in Assignment Part 3: Interpretation, you should
have no problem coming up with interpretations (but do not simply
copy your answers to the questions, format your interpretations as
sentences in paragraphs). There are also some flaws in the data and
perhaps in the analysis that may be important sources of error in
your results and/or interpretations (hint: what is unusual about the
data for U.S. Negros). A few moments of reflection should give you
some ideas for additional research or analyses that could be done to
follow up on what you did in this assignment.
Submitting your Assignment. Add to your results document an
“Acknowledgements and Bibliography” section in which you acknowledge
your collaborators and sources. Send your results document (named
firstname_lastname_5.doc) to me using the Assingment Submission link
I’ve put on Moodle.
Advanced Anthropological Statistics (ANTY 408)
ASSIGNMENT 6: Clustering PC and DF Scores
For this assignment I want you to focus on the following practical tasks:
writing intermediate results to your data file for use in subsequent
clarify the uses of PCA and DFA;
further explore the use of cluster analysis to reveal relationships;
continuing the process of learning to write scientific reports by
focusing on conclusion sections.
Saving PC scores and DF scores. Very often, a researcher will want to
perform a principal components or discriminant functions analysis and
save the results for use in other analyses. I’ll show you how to do this,
though I’ll provide the actual data we will use on Blackboard.
In contrast to the process of capturing means, which we explored in
assignment 5, asking for PC or DF scores to be saved is simply a matter
of checking the right checkbox. The scores are then saved in your data
file as new columns of data to the right of your original data.
Saving PC scores.
Start the SPSS Factor procedure [Analyze – Dimension
Reduction – Factor] as explained in Assignment 3. Choose
whatever variables, extraction options, and rotation options
you want.
Click the “Scores” button, check “Save as variables”, set the
“Method” to “Regression”, then click “Continue”;
Click “OK” to run the analysis.
Look at your data file. Note that there are new variables in columns
to the right of your original data. These columns contain each
individual’s PC scores on each PC’s or Factors that SPSS found to
be significant. In order to save the insignificant PC/Factor scores
you will need to trick SPSS by clicking on the “Extraction” button
and setting “Eigenvalues greater than” to 0.
The new variables will have names like FAC1_1, FAC2_1, etc. The
“FAC” signifies that these are scores produced by the Factor
procedure. The first number is the PC or factor number – i.e. PC1,
PC2, etc. The second number is the run number. If you run the
analysis again, say with a different extraction or rotation method the
names will be FAC1_2, FAC2_2, etc.
Saving DF scores.
Start the SPSS discriminant procedure [Analyze – Classify –
Discriminant]. Set whatever Grouping Variable,
independents, method, and other options you want.
Click on the “Save” button and check “Discriminant scores”,
then click “Continue”;
Click “OK” to run the analysis.
You can also choose to save the predicted group membership (i.e.
how the procedure classified each individual) and/or the probability
of membership in the predicted group if you want to.
The scores for the DF’s (canonical variates) are saved in columns
to the right of your data. My experience is the SPSS saves scores
for all DF’s whether they are significant or not. The new variables
have names like Dis1_1, Dis2_1, etc. The Dis part is for the
Discriminant procedure and the numbers signify exactly what they
did in the explanation above of the names of saved scores from the
Factor procedure.
SPSS will often create an extra empty column to the right of the DF
scores that it saved. You can simple delete this column if it’s
Now we need means. Again I have generated the means for you to save
time. Find and download the file “anth402_pcdfmeans.sav” from
Blackboard and save it to your memory stick. This data file contains
the means for the four PC scores (FAC1_1 ... FAC4_1), the means for the
16 significant discriminant functions (Dis1_1 ... Dis16_1) for population,
and the means for a discriminant function for sex (Discrim_SEX).
Assignment Part 1: Clustering PC Scores. Now we can do something
interesting. We will start by doing a cluster analysis using mean PC
scores to see if these scores cluster the means by sex and/or population.
Begin by creating a results file called firstname_lastname_6.doc.
Use the Select Variables procedure to use only male and female means
(not both combined). [Hint: ANY(SEX,1,2).]
Run a cluster analysis on the four PC score variables, FAC1_1 through
FAC4_1. Here’s the recipe.
Go to the hierarchical cluster procedure [Analyze – Classify –
Hierarchical Cluster].
Use the variables FAC1_1, FAC1_2, FAC1_3, and FAC1_4.
Label cases by NAME.
Uncheck Display Statistics so we can avoid getting an
agglomeration schedule.
Click the “Plots” button, check the checkbox for “Dendrogram” and
set icicle plot to “None”, then click “Continue”. This prevents SPSS
from sending an icicle plot to the output.
Click the “Methods” button and make sure that the “Method” is
“Between-groups linkage”; make sure that the “Measure” is
“Squared Euclidean Distance”; and set “Transform Values” to
“None”; then click “Continue”.
Click “OK” to run the analysis.
Copy the dendrogram from the SPSS output window and paste it into
your results file (see assignment 5 for the procedure). Give it an
informative label, such as “Dendrogram Using PC Scores, For
Population and Sex”.
Assignment Part 2: Removing the Effect of Size. Since you looked back
at assignment 3 and your write-up for assignment 3, you remember that
we interpreted PC1 as “Size”. There are many situations in which a
researcher wants to remove the effect of size so they can focus on shape
variables. Let’s do this, by excluding PC1 scores from our analysis.
Repeat the procedure in Assignment Part 1, but this time eliminate the
effect of size by using only the means for FAC2_1, FAC3_1, and FAC4_1
– not for FAC1_1.
Copy the dendrogram from the SPSS output window and paste it into
your results file. Give it an informative label, such as “Dendrogram
Using PC Scores, For Population and Sex , Size Removed”.
Assignment Part 3: Clustering DF Scores. Now, let’s cluster using DF
Clustering by mean DF’s for population. Here’s the recipe.
Go to the hierarchical cluster procedure [Analyze – Classify
– Hierarchical Cluster].
Use the variables Dis1_1, Dis2_1, through Dis16_1, which
correspond to those discriminant functions/canonical
variates that my output for the discriminant analysis
indicated were significant in distinguishing populations
according to the “Wilkes Lambda” table.
Label cases by NAME.
Uncheck Display Statistics so we can avoid getting an
agglomeration schedule.
Click the “Plots” button, check the checkbox for
“Dendrogram” and set icicle plot to “None”, then click
“Continue”. This prevents SPSS from sending an icicle plot
to the output.
Click the “Methods” button and make sure that the “Method”
is “Between-groups linkage”; make sure that the “Measure”
is “Squared Euclidean Distance”; and set “Transform Values”
to “None”; then click “Continue”.
Click “OK” to run the analysis.
Copy the dendrogram from the SPSS output window and paste
it into your results file. Give it an informative label, such as
“Dendrogram Using DF Scores For Population”.
Clustering by mean DF’s for sex. Use the same procedure as in
part A, but use only the means for Discrim_Sex.
Copy the dendrogram from the SPSS output window and paste
it into your results file. Give it an informative label, such as
“Dendrogram Using DF Score For Sex”.
Assignment Part 4: Interpretations. Answer the questions below at the
bottom of your results file. Think of some models of how the data
should cluster if it primarily represents sex or if it represented population.
You do not need to write this down – just think of some criteria that you
will use to judge whether the dendrogram mostly reflects sex differences
or mostly reflect population differences. Here’s my way of doing it. I
assume that if the dendrogram primarily represents sex, then the first
branch above the root will divide the individuals into (mostly) males and
(mostly) females. I call this a “sex-first” branching order. I have to say
“mostly” here because no statistical method perfectly sorts groups under
all conditions. In contrast, if the dendrogram primarily represents
population differences, then the first branch above the root will divide the
individuals (mostly) into two clusters that contain different populations.
Within these population clusters both males and females should be
represented, perhaps separated into different branches higher up the tree.
I call this a “population-first” branching order.
Take a look at the dendrogram that you labeled something like
“Dendrogram Using PC Scores, For Population and Sex”, which
was produced in analysis part 1. Does it seem as if PC scores
facilitate clustering by sex or do they seem to cluster by population?
Explain how you arrived at this interpretation.
Take a look at the dendrogram that you labeled something like
“Dendrogram Using PC Scores, For Population and Sex , Size
Removed”, which was produced in analysis part 2. Explain and
discuss any differences you see from the dendrogram you labeled
something like “Dendrogram Using PC Scores, For Population and
Sex”, which was produced in analysis part 1. Discuss what this
implies about removing the effect of size from the analysis.
Take a look at the dendrogram you labeled something like
“Dendrogram Using DF Scores for Population”, which was produce
in analysis part 3A. Does it seem as if these DF scores facilitate
clustering by sex or do the means seem to cluster by population?
Explain how you arrived at this interpretation.
Examine how the dendrogram you labeled something like
“Dendrogram Using DF Scores for Population” produced in analysis
part 3A differs from the dendrograms produced using PC scores.
What is the important difference between PC analysis and DF
analysis for population that causes this difference?
Take a look at the dendrogram you labeled something like
“Dendrogram Using DF Scores for Sex”, which was produced in
analysis part 3B. Does it seem as if DF scores facilitate clustering
by sex or do the means seem to cluster by population? Explain
how you arrived at this interpretation. Does this clustering pattern
differ from that obtained using DF scores for population? If so,
what do you think explains this difference?
Discuss how you would fill in the blanks in the following sentence.
The main difference between individuals in this data set is their
_____, based on my interpretation and naming of PC1 (back in
assignment 3, check your grading form to make sure you got it
Discuss how you would fill in the blank in the following sentence.
Based on my interpretation and naming of DF2 (back in assignment
4, check your grading form to make sure you got it right) the most
important difference between the populations in this data set is
Assignment Part 5. Scientific Paper Format: Hypotheses and
Conclusions. The fifth part of a scientific paper is the conclusion section.
In the conclusion section you revisit the hypotheses or problems you set
up in your introduction section and evaluate whether they are refuted or
supported. Often, the hypotheses or problems are restated in the
conclusion, followed by an assessment of what the results and
interpretations say about them.
If the document is a more theoretical treatise rather than a research paper,
the conclusion is used to present the final “bottom line” statement of what
the author(s) has been arguing.
Here is an example conclusion section from an archaeological research
The purpose of this report is twofold. First, it is intended to bring Binford's (1978)
important work on meat drying back to zooarchaeologists' attention, because its potential
for the interpretation of the archaeofaunal record remains largely untapped. The drying
of meat is widespread in ethnographic accounts of both hunter-gatherers and
pastoralists, and must have a considerable time depth. The development of an index
that allows the identification of dry meat storage, therefore, has the potential for
application in a wide range of archaeological contexts.
Second, this research represents a critical reassessment of the Drying Utility Index
(Binford 1978), intended to simplify it and make its calculation more transparent. This
process led to the creation of the Meat Drying Index, which provides a comparatively
simple method for calculation of a carcass portion's "dryability." The usefulness of the
MDI is reinforced by the fact that it is correlated slightly better with Binford's (1978)
ethnoarchaeologically observed Nunamiut meat drying data than is his own DUI, which
was developed with specific reference to that data set. Furthermore, both the MDI and
DUI exhibit significant and positive correlation with the relatively "independent" sample of
caribou bones from dry-meat caches at site LcLg-22 in arctic Canada. In sum, the MDI
can be seen as the better index for the interpretation of meat drying, both because it is
calculated in a more straightforward manner, and because it appears to predict dryingrelated element distributions as well as, or better than, the DUI. Because the formula for
the MDI is relatively uncomplicated, it should be practical to calculate for mammalian
species other than caribou; all that is needed is the meat, brain, marrow, and bone
weights for each body portion of a given species. Appropriate raw data have already
been collected for many species and published in other utility index studies. However, as
with other utility indices, it can be predicted that the MDI calculated here for caribou
should be applicable to related taxa, and in particular to other artiodactyls, without
further modification (Friesen et al. 2001).
The MDI, as outlined here, can be used in conjunction with other utility indices and bone
density data to interpret several categories of bone assemblage. Element distributions
from dry-meat caches, and from camp sites at which large quantities of dry meat were
consumed, are predicted to be positively correlated with the MDI, while element
frequencies from kill or butchery sites at which dry meat was prepared for storage or
consumption elsewhere are expected to be negatively correlated with the MDI.
Importantly, however, "real-life" drying activities will not necessarily result in
assemblages that are as readily interpreted as those discussed in this paper. As
Binford's (1978) Nunamiut Ethnoarchaeology so robustly indicates, decision-making
processes relating to the butchery, storage, transport, and consumption of meat are
complex, and the effects of marrow or grease processing, the feeding of dogs, sharing,
cultural preferences, and a variety of taphonomic agents will all serve to obscure "pure"
dry-meat as semblages. For most sites on which dry meat was consumed, it is
reasonable to assume that dry meat will comprise only a portion of the total bone
sample, which may also include body portions from freshly killed or frozen carcasses.
This will add a layer of complexity to the interpretation of those assemblages, and it will
not always be possible to infer past meat-drying activities. Future research should be
directed at resolving this problem through further ethnoarchaeological work, and through
reinterpretation of faunal assemblages that may result from the consumption of dry
meat. Despite this caveat, in many instances element distributions may be the only
practical way to identify past drying activities, and by extension food storage. Therefore,
zooarchaeologists should continue to refine methods for recognizing meat drying in the
archaeological record. This study represents one step toward that goal. Friesen
Note that this conclusion restates the problems that the research
addresses, though without framing them in terms of hypotheses. This is
done in the first two paragraphs. The third paragraph states the outcomes
or findings of the research as they relate to the problems presented in the
first two paragraphs.
Write a conclusion section at the bottom of your results document,
as you would for a scientific research paper. Include a statement of
the problems addressed in this assignment and the outcomes or findings
relating to these problems. I will be most happy (and therefore generous
with points) if you can frame the problems in terms of hypotheses to be
tested. In doing this, you may want to re-read the information I gave you
on hypotheses as required browsing for the third week of class. You will
not be able to do any significance tests of your hypotheses, because
cluster analysis does not provide significance tests. Therefore, you should
simply state whether your results and interpretations (as discussed in the
7 questions in the section above) refute or support the hypotheses. (Hint:
one null hypothesis might be: sex is not represented in this data in such a
way that it can be revealed by clustering PC or DF scores – i.e. males and
females are equal with respect to their mean PC and DF scores by
Submitting your assignment. Add to your results document an
“Acknowledgements and Bibliography” section in which you acknowledge
your collaborators and sources. Send your results document (named
firstname_lastname_6.doc) to me using the Assignment Submission link
I’ve put on Moodle.
Advanced Anthropological Statistics (ANTY 408)
ASSIGNMENT 7: Analysis of Variance (ANOVA)
For this assignment I want you to focus on the following practical tasks:
learn how to perform an ANOVA analysis using SPSS;
learn how to perform a MANOVA analysis using SPSS;
learn about post-hoc tests,
compare discriminant analysis and ANOVA for investigating
groups; and
Interesting features of ANOVA. One way of looking at ANOVA is as an
extension of t-tests, where you are comparing the means of several
groups for significant differences. A t-test compares the means for two
groups, and ANOVA compares means for several groups simultaneously.
There are several flavors of ANOVA. Here are some:
One-way: Only one variable is used, so this is essentially a
univariate method. The analysis detects differences in means
between groups. The groups in an ANOVA are usually called
“treatments” or “levels”. The two most common types of one-way
ANOVA are between-subjects and within-subjects.
Between-subjects is analogous to an independent samples ttest. The groups consist of different individuals.
Within-subjects is analogous to a paired samples t-test. The
groups are the exact same individuals measured at different
times. This is also called “repeated measures” ANOVA.
Two-way: Two different variables are examined simultaneously, so
this is a bivariate method. Between-subjects and within-subjects
approaches can both be used.
Three-way: Three different variables are examined simultaneously,
so this is multivariate method. Again, both between-subjects and
within-subjects analyses may be used.
Multivariate ANOVA (MANOVA). This form of ANOVA uses two or
more categories simultaneously (for example, population and sex)
In this assignment, we will work with one-way ANOVA and MANOVA.
If an ANOVA analysis is significant, then you know that there are
significant differences between at least some of the group means, though
perhaps not between all of them. For example, if we performed an
ANOVA on the Norse, Zalavar, and Teita population, and obtained a
significant result, we know that at least two of the means are significantly
different (maybe Norse is significantly different from Teita), but some
means may not be significantly different (maybe Norse is not significantly
different from Zalavar). In order to help you determine which means are
actually significantly different, SPSS provides two tools – post-hoc tests
and a list of homogeneous groups.
A post-hoc test is a set of direct pairwise comparison of the means – i.e.
every mean is compared with every other mean – using some form of
significance test, such as a t-test. Given that running a post-hoc test
along with an ANOVA analysis is standard practice, I often wonder if there
is a use for ANOVA at all, since these pairwise comparisons accomplish
the same task more simply. Some authorities suggest that ANOVA solves
a technical problem that arises when too many pairwise significance tests
are preformed. Whatever we may decide, ANOVA is a standard and
widely used method.
A list of homogeneous groups is created by grouping together those
groups that are not significantly different from each other. Some
correction has to be made for multiple comparisons since not all members
of a homogeneous group a necessarily significantly different from the
same members of other homogeneous groups. For example, if Norse
plus Zalavar form an homogeneous group, and Teita plus Zulu form
another homogeneous group, it does not necessarily imply that both
Norse and Zalavar are significantly different from both Teita and Zulu. It
might be the case that Norse is significantly different from Teita but not
Zulu and Zalavar is significantly different from Zulu but not Teita.
Occasionally, a groups will be assigned to two homogenous set.
Nonetheless, homogenous groups are a useful concept (in my opinion).
Another interesting feature of ANOVA is its underlying model. In the
ANOVA model, all the individuals in all of the groups are conceived of as
identical, and they differ only in how they have been treated. The classic
example of an ANOVA analysis illustrates this. In the classical example,
there is a sample of headache sufferers, who are identical in the fact that
they suffer from headaches even though they may be a mixture of sexes,
races, etc.. The point is that the samples should be as identical as
possible with regard to size, gender mix, etc. These headache sufferers
are given one of a set of headache relievers (say aspirin, Tylenol, and
Advil) then ANOVA is used to determine whether the treatments (type of
headache reliever taken) have differing effects in relieving the headaches.
This model – identical individuals, different treatments – is very different
from the model underlying most analyses, which assumes that the people
in different groups are inherently different in some way.
This model leads to a reversal of our normal concept of what variables are
dependent and independent. In most analysis for group differences (say
discriminant analysis) we treated the population group (POP) as the
dependent variable, and the measured variables were our independent
variables. Therefore, we examined the effect of these measured variables
on population group. For example, we could ask whether there are any
differences in GOL that allow us to determine an individual’s POP. In an
ANOVA analysis, the measured variables are the dependents, and the
group is the independent. So, we are essentially looking at the effect of
group membership on the measured variables. For example, in ANOVA,
we are asking are there any differences in POP that are influencing GOL.
If you compare this to the classic example of ANOVA analysis of
headache treatments we see that POP is analogous to what medicine is
given to treat the headache, and GOL is analogous to the degree of pain
relief experienced.
Given this, we can look at population differences in two ways. In
assignment 4, we used discriminant functions analysis to distinguish the
Zalavar, Teita, Zulu, and Australian populations, under the assumption
that they must differ in some inherent way (perhaps different genetics).
We can revisit this analysis using the ANOVA model, assuming that the
individuals in these populations are inherently the same (they are all
human beings), but differ in how they were treated (i.e the Zalavar grew
up in Europe, the Teita and Zulu grew up in different regions of Africa, and
the Australians grew up in Australia).
Assignment Part I: One-way ANOVA for Population Differences. In this
analysis we will do an ANOVA of the Zalavar, Teita, Zulu, and Australian
populations. The document at http://www.statsoft.com/textbook/anovamanova/?button=1 explains the concepts and procedures of ANOVA and
MANOVA. Here’s the recipe.
First, create a results document. Open a blank document in Word,
put your name and “Assignment 7" at the top, and save it to your
memory stick as firstname_lastname_7.doc.
Load the class data (anth402data.sav) into SPSS.
Use the SPSS select cases procedure to choose only the Zalavar,
Teita, Zulu, and Australian populations. [Hint: ANY(POP,2,4,6,7).]
Go to the ANOVA procedure [Analyze – Compare Means – One
Way Anova].
Add GOL to the "Dependent List".
Use POP as the "Factor".
Click on "Post Hoc" button and check "Tukey", then click
Click on "OK" to run.
Save the output by exporting it to a Word document and adding it to your
results document.
Assignment Part 2: MANOVA. Starting with version 15, SPSS no longer
supports MANOVA directly. However, we will do yet another workaround
and simulate a MANOVA using a discriminant functions analysis.
The link between MANOVA and discriminant functions is very close. In
fact some authorities state that the calculations are identical. The best
way to think of their relationship is as looking through two ends of a
telescope. From the discriminant functions end you are looking a nominal
category from the point of view of ratio measurements. From the
MANOVA end you are looking at the ratio measurements from the point of
view of a nominal category. As it turns out, we can use discriminant to get
the results that MANOVA would give, and to simplify ANOVA analyses of
many variables.
Here’s the Process.
The class data should already be loaded into SPSS and the data
file should be filtered using “Select cases” to use only the Zalavar,
Teita, Zulu, and Australian populations. If not, do this now.
Go to the discriminant procedure [Analyze – Classify –
Use POP as the "Grouping Variable", and define it’s range as
minimum 2 and maximum 7.
Add all the measured variables (GOL through MDH) to the
“Independents" box.
Click on the “Statistics” button, and uncheck everything except
“Univariate ANOVAs”, which should be checked. Click “Continue”.
Click on "OK" to run the analysis.
Export your output window to a Word document and append it to your
results document.
Assignment Part 3: Interpretations. Answer the questions below at the
bottom of your results file. The document at
htm> is an annotated output document for one-way ANOVA if you scroll
down far enough. The document at
2.chass.ncsu.edu/garson/pa765/discrim3.htm is an annotated output
document for the SPSS discriminant procedure.
The following questions apply to the results of your one-way
ANOVA analysis in assignment part 1.
Do the results of your analysis indicate that there are
significant differences between the groups in their means for
GOL? Explain how you came to this conclusion. What is
the probability that the means for GOL are equal in all four
Which groups are significantly different from which other
groups, and which groups are not significantly different?
Explain how you came to this conclusion.
The following questions apply to the results of your Discriminant
Analysis workaround for MANOVA analysis in assignment part 2.
Find a table titled “Tests of Equality of Group Means”. Notice that
for each measured variable there is an F statistic, suggesting that
this has something to do with ANOVA. In actuality, these are the
results of a set of 20 one-way ANOVA tests, one for each of the
measurements. Because the discriminant procedure will do this
analysis for each supplied variable, many people find it easier to
use the discriminant procedure than to run the ANOVA procedure
over and over for each measurement.
Based on the information in the “Tests of Equality of Group
Means” table, are there any variables do NOT exhibit
significantly differences between the four populations used in
the analysis? If so, what are they?
Examine the entry for GOL in the “Tests of Equality of Group
Means” table. Are the F value and significance the same as
you obtained using one-way ANOVA?
MANOVA uses the Wilkes Lambda statistic to assess whether the
measured variables, taken all at once, differ between the treatment
categories. In this case it would assess whether the 20
measurements of the skull differ between the four populations used
in the analysis. Note that the “Tests of Equality of Group Means”
has Wilkes Lambda values for each separate measurement. The
Wilkes Lambda we want for all measurements, taken together, is
given in a table titled “Wilkes Lambda”. The first row of this table,
the one for test of functions 1 through 3 is the one we are looking
for. I happens to be exactly the same whether you are looking for
the significance of the three discriminant functions taken together or
looking for the significance of the 20 measurements taken together
as in a MANOVA analysis.
Do the 20 measurements, taken together, differ significantly
between the four populations used in this analysis? At what
level of confidence can you say that they differ significantly?
Submitting your assignment. Add to your results document an
“Acknowledgements and Bibliography” section in which you acknowledge
your collaborators and sources. Send your assignment (named
firstname_lastname_7.doc) to me using the assignment submission link
I’ve put on Moodle.
Advanced Anthropological Statistics (ANTY 408)
ASSIGNMENT 8: Find a Data Set for Your Project
For this assignment I want you to find a data set to use for your project.
See the syllabus for details about the class project.
There are several places you could find a data set.
The data you are using for your thesis or dissertation (or a
subset of it). I prefer this for grad students.
Data that you find on the internet in downloadable form.
Probably, most of you will use data of this type.
Data that you find in a publication and enter by scanning or
by hand.
The data set you choose should be relatively large, with at least
100 individual cases and 15 variables. I may waive this
requirement if the data is sufficiently interesting.
I have put links to many online data sources, plus a few data sets that I
have, on Blackboard in the “Data Sets” area. This is only a small sample
of what is out there. You can often find interesting data by doing a Google
search for some relevant terms, then digging around through the links
Google finds.
Assignment. Find a data set to use for your project and send me a (1)
copy of it, and (2) a description of it, via Blackboard’s digital dropbox.
Submitting your assignment. Send me two files using the Assignment
Submission link I’ve put on Moodle. The first file should be your data set
(which can have any name) and the second file should be a separate
description of your data set that includes the name of your data set plus
anything you want me to know about it.
Advanced Anthropological Statistics (ANTY 408)
ASSIGNMENT 9: Writing a Research Plan
for Your Project
For this assignment I want you to prepare a research plan (also called a
proposal) for your class project. A research proposal is a relatively short
document that used for communication and as a guide in planning.
As a communication tool, your project research plan is used to
communicate to other people the nature of your research project,
and to show them that you have a plan for carrying the project out.
Therefore, you should aim to demonstrate command of the
background for the project; knowledge of the data; how you will
analyze the data; and how you will interpret your results. The
consumers of your research plan way be the teacher of your class
(me), the faculty who are advising you in your undergraduate or
graduate research project, or perhaps even an agency who might
give you funds to carry out the project.
As a guide in planning, the project research plan forces you to think
about the various aspects of the project. Hopefully, this will allow
you to move through your project in an effective way, rather than
bumbling through it.
Assignment. Prepare a research plan for your project.
It is important to do the required browsing for this assignment.
Some of you may have done a research plan or proposal as part of
ANTY 601, ANTY 413, or another class. You are welcome to
polish it up and use it for this assignment, so long as it refers to the
data you will be using for this class and it contains all the parts
listed below.
The research plan should include the following parts:
Introduction. Your introduction should explain the nature,
intent, and importance of your project. It should be in ‘funnel
format’. It should review at least a moderate amount of
literature, including a citation of where the data was obtained
if appropriate, and a selection of other works that have used
that data or similar data for a similar purpose. It should
include a null hypothesis that is being tested.
Materials and methods. This section should include what
normally goes into a materials and methods section, plus a
bit more. Describe your data thoroughly, and describe the
analytical procedures you will use with the data. Finally,
explain why this data and these methods are appropriate for
investigation your problem and for testing your hypothesis.
Discussion. In this section you should anticipate various
possible results that you might obtain, both positive and
negative. Explain what these possible results may tell you,
such as whether a particular result will allow you to reject
your null hypothesis and what you will conclude if this is the
A bibliography of literature cited.
Submitting your assignment. Send your research plan (named
firstname_lastname_9.doc) to me using the Assignment Submission link
I’ve put on Moodle.
Advanced Anthropological Statistics (ANTY 408)
ASSIGNMENT 10: Data Analysis Results
For this assignment I want you to send me the results of the data analyses
you performed on your data for your class project.
Assignment: Perform the analyses you described in your research plan on
your data. Store the results (at least the relevant tables, etc.) in a results
document named firstname_lastname_10.doc, and send this document to
Submitting your assignment. Send your analytical results (named
firstname_lastname_10.doc) to me using the Assignment Submission link
I’ve put on Moodle.
Advanced Anthropological Statistics (ANTY 408)
ASSIGNMENT 11: Preliminary Draft of Paper
For this assignment I want you to prepare a preliminary draft of your
paper. This draft should be as polished as you can make it, and I will edit
it and return it to you for revision.
Assignment. Write up your results as a scientific paper and submit it to
me through the Blackboard dropbox.
Paper Format. Your paper should be in scientific paper format. We have
been working on this all semester, and I expect you to be comfortable with
it by now. There are some formatting issues listed in the class syllabus as
well. I have found the web site at
http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWtoc.html to be useful to me
in preparing scientific papers, and it may be useful to you as well. Also
note that I have put several resources in the required browsing for the
assignment section for this assignment on Moodle.
Submitting your assignment. Send your preliminary paper draft (named
firstname_lastname_11.doc) to me using the Assignment Submission link
I’ve put on Moodle.
Advanced Anthropological Statistics (ANTY 408)
ASSIGNMENT 12: Revise Your Paper and
Submit Final Draft
I will have edited and commented on your preliminary paper draft, which
you submitted in Assignment 11; and I will have returned it to you by email
to your official UM email address as listed in Moodle.
Assignment. Revise your paper, taking into account the suggestions I
made on your preliminary draft.
Submitting your assignment. Final drafts of your paper are due on the day
scheduled in the syllabus – probably the Tuesday of the week before
finals. See the syllabus for details. Please name your assignment using
our normal scheme: firstname_lastname_12.doc or docx. Submit it to me
using the Assignment Submission link I’ve put on Moodle.