Upper-division Writing Requirement Review Form (2/11) I. General Education Review – Upper-division Writing Requirement Dept/Program Course # (i.e. ANTH 408 ANTY Subject 455) or sequence Course(s) Title Advanced Anthropological Statistics Description of the requirement if it is not a single course. Change from “Upper Division Requirement for the Major” to “Approved Writing Class” II. Endorsement/Approvals Complete the form and obtain signatures before submitting to Faculty Senate Office. Please type / print name Signature Instructor Randall R. Skelton Phone / Email 243-4245 Date randall.skelton@umontana.edu Program Chair Gilbert Quintero Dean Christopher Comer III. Type of request New One-time Only Reason for new course, change or deletion Change X Remove Our department offers many “Upper Division Requirement for the Major” classes but few “Approved Writing” classes. This class could be either so I want to change it to “Approved Writing” in the interest of anthropology students being able to fulfill their Gened writing requirements. IV Overview of the Course Purpose/ Description This is a class in advanced (multivariate) statistics that also teaches scientific paper writing. V Learning Outcomes: Explain how each of the following learning outcomes will be achieved. Students choose a data set relevant to their Student learning outcomes : Identify and pursue sophisticated questions for interests and use it to test hypotheses in the course of their term project. academic inquiry Students need to do a literature review on Find, evaluate, analyze, and synthesize the problem or hypothesis of their term information effectively and ethically from project. diverse sources (see http://www.lib.umt.edu/informationliteracy/) Students have to present all sides of the issue Manage multiple perspectives as appropriate upon which they are basing their term project. Recognize the purposes and needs of discipline-specific audiences and adopt the academic voice necessary for the chosen discipline Use multiple drafts, revision, and editing in conducting inquiry and preparing written work Follow the conventions of citation, documentation, and formal presentation appropriate to that discipline Develop competence in information technology and digital literacy (link) I teach the students to write in scientific style with a slant toward scientific style in biological anthropology. I also explain to them why scientific style is important within this context. I edit a draft of the students’ term paper and give it back to them for revision. I teach the students how to use CBE/SCE citation style. They give a short presentation of their project in class at the end of the semester. The students use Web of Knowledge and other online databases (JSTOR is popular) to find articles for their term project/paper. They also need to find a copy of the data they will work with online. VI. Writing Course Requirements Enrollment is capped at 25 students. If not, list maximum course enrollment. Explain how outcomes will be adequately met for this number of students. Justify the request for variance. Capped at 15. Briefly explain how students are provided with tools and strategies for effective writing and editing in the major. I explain scientific writing in class and also provide abundant feedback on their assignments and term paper. Which written assignment(s) includes revision in The term paper based on the term response to instructor’s feedback? project. VII. Writing Assignments: Please describe course assignments. Students should be required to individually compose at least 20 pages of writing for assessment. At least 50% of the course grade should be based on students’ performance on writing assignments. Quality of content and writing are integral parts of the grade on any writing assignment. Formal Graded Assignments The entire course grade (100%) is based on written assignments. 7 written assignments on statistical analyses, each about 5 pages in length. Five written assignments relating to the term paper, varying from 2 pages to perhaps 25 pages in length. Informal Ungraded Assignments None VIII. Syllabus: Paste syllabus below or attach and send digital copy with form. For assistance on syllabus preparation see: http://teaching.berkeley.edu/bgd/syllabus.html The syllabus must include the following: 1. Writing outcomes 2. Information literacy expectations 3. Detailed requirements for all writing assignments or append writing assignment instructions Paste syllabus here. ANTHROPOLOGY 408: Spring 2013 Advanced Anthropological Statistics TR 8:10-9:30 SS 258 Dr. Randy Skelton Office Hours: Phone: 243-4245 226 Social Sciences Building MWF 8:00-8:50, TR 10:00-11:00 Email: randall.skelton@umontana.edu GOALS AND OBJECTIVES The goal of this class is to learn several advanced (multivariate) methods of data analysis and to learn the skill of writing a scientific paper. The focus will be on use of statistical software to perform analyses, with interpretation and write-up of the results obtained. Students who pass this class will: • Learn to use several types of statistical analysis including multiple regression, principal components analysis, cluster analysis, discriminant analysis, and more. • Explore how these analysis can be applied to novel situations by carrying out a project that involves the use of data analysis. • Gain facility with a statistical software packages such as SPSS. • Build the ability to interpret the results of multivariate statistical analyses and express them in a professional manner. • Become familiar with standard scientific paper style and format. • Gain experience with finding sources through library databases. • Come to appreciate the vast array of data that is available on the web. ADMINISTRIVIA Moodle Supplement There will be a Moodle supplement for this class, where I will post various types of useful materials and information, including required materials. The people at IT Central in SS120 and Moodle Tech Support (243-4999, umonlinehelp@umontana.edu) can help you with access and technical issues. As your instructor I can only be responsible for content placed on Moodle – not for it’s administration or technical issues. Required Materials Text: Landau, Sabine and Everitt, Brian S., 2004. A Handbook of Statistical Analyses using SPSS. Chapman & Hall/CRC Press. Hereafter I will refer to this text as “the Handbook”. The Handbook provides a walk-through of many of the methods we will be covering. The Handbook will be most useful to you when you are doing your assignments, and need not be read before coming to class. Online Resources: For each week I have some required browsing listed. Some of this is for help with your assignment. There are many statistical texts online, some of which I have links to in the “Helpful Materials” section of the class Moodle supplement. The most useful of these materials are made available on the WWW by Statsoft Inc, Karl L. Wuensch (a professor in psychology at East Carolina University), and William K. Trochim (a professor in policy analysis and management at Cornell University). Statistical Package: We will use SPSS. SPSS is available in the Fred W. Reed Social Sciences Research Lab (SSRL) and other campus computing labs. You may also buy SPSS. Other Software: I assume that you have, and know how to use, Microsoft Office products, especially Word and Excel. You may also download the free office package OpenOffice and use it instead, though I don’t guarantee that it operates exactly the same as MS Office. Computer access: You will need access to a computer with SPSS installed. SPSS is installed on computers for student use in the SSRL. We will have an orientation to the use of these labs by the SSRL staff early in the semester. Also, the computers in the UC 225 general student lab are supposed to have SPSS installed. You will need to show your GrizCard for access to the general student labs. Data Storage: You will need some mechanism for storing the data sets you use and the output from the statistical software. The best option for this is a USB flash drive (also known as a memory stick, pen drive, flash drive, etc.). How will this class work? 1. First 2/3 of the semester. We will explore several methods of advanced statistical analysis. The focus will be on using SPSS to perform the analysis, interpreting the output, and writing up the procedure in standard scientific paper format. We will meet at every normally scheduled class meeting time. Each week there is an assignment due and you will be expected to do the analysis requested, write up your results, and submit them to me by uploading them through Moodle. Moodle doesn’t have a way for me to send materials back to you, so I will do that using email to your official University email address (what I see in Moodle). You will need to either check your University email regularly or forward it to where you normally check your email. Most weeks there will be a lecture on Tuesday, and we will work with data on Thursday. 2. Last 1/3 of the semester. You will each do a project in which you analyze a data set of interest to you in order to draw some conclusions about some topic of interest to anthropology. Grad students should use the data set they are working with in their thesis or dissertation research, if possible. We will continue to meet for class, and I will use this time to explore and demonstrate additional statistical and analytical methods. I will not allow you to fall behind or put off the steps of the project until the end, and there is an assignment related to your project due every week. Grading For undergraduate students, your grade will be based on attendance, preparation, and participation (25%); weekly exercises you complete (30%); and your project (45%). For graduate students, your grade will be based on attendance, preparation, and participation (20%); weekly exercises you complete (30%); and your project (40%). and a short presentation of your project (10%). There are no examinations. Your score in the course will be calculated to yield your grade using this scale: A = 100-90, B = 89-80, C = 79-60, D = 59-50, F = <50. I may modify these basic grades with a + or - in special cases if I believe it is appropriate. Basic Grading Philosophy for This Class This class is not required for any students. Therefore, I assume that all students who have enrolled in the class have done so because they want to learn how to do data analysis. Given this, I will have little tolerance for any behavior which suggests that a student is trying to avoid learning the material. On the other hand, I encourage and try to reward behavior which suggests that a student is attempting to enhance how quickly or thoroughly they learn the material, how to minimize the effort involved in doing an analysis correctly, and similar wholesome strategies. I will assess your understanding of the material using assignments, and each student’s final write-up and presentation of their project. I will not give tests, because genuine understanding of this material is difficult to assess via a test, and because I do not want to encourage students to merely memorize material for a test. Attendance Policy Attendance is required at every class meeting except in the case of documented excusable absences (see the document online at http://www2.umt.edu/catalog/acpolpro.htm for University policy on excused absences). Attendance will constitute 20% of your grade. Policy on Collaboration and Use of Outside Resources: Students are encouraged to work and study together during the first 2/3 of the semester, including working together on completing the exercises. Additionally, there are many resources available on the internet and elsewhere, including model answers to most of the exercises in the textbook (see pp v-vi). I encourage you to use these to the extent that they enhance your understanding of the analyses being learned. My only requirement is that in your write-ups you must acknowledge your collaboration with other students and/or your use of these and other resources. There is never a penalty for working with other students or using additional resources so long as you acknowledge them. However, the privilege of collaboration and use of external resources does not extend to your required individual written solution to each exercise. Each student must write up the exercises independently using their own words. You should use these write-ups to show me that you understand the analysis being performed, how to make SPSS perform the analysis, and how to interpret the output generated by SPSS. In general, the way to do this is to provide a detailed explanation of why you took the steps you chose and how you drew any interpretations you made. Regretfully, I must punish infractions of this policy. If I find that two or more students have turned in write-ups that are copies, or which I judge to be “too similar”, I will split the credit for that assignment evenly between the students involved. If I detect an answer that is too similar to the model answer on the textbook website or to those on other websites that I know of, I will at most award that student half credit. During the last 1/3 of the semester each student will be working on their own individual data analysis project. You are welcome and encouraged to discuss your project with anybody who will sit still for it. However, you must write it up individually in your own words. Furthermore, you must acknowledge any help you got from fellow students, or anyone else, in the acknowledgment section of your final report. This principle also extends to published and online resources, which must be cited in your report and referenced in the bibliography of your report. Direct copying of published or online materials, or use of them without citation is considered plagiarism, a form of academic misconduct, and I am required by University policy give you zero credit for any assignment for which I detect it. Weekly Assignments You will have an assignment to do (almost) every week. The assignment will be posted on Moodle. Each assignment is explicit in what I want you to do and what I want you to submit. Most of the assignments will also include practice in writing parts of a scientific research paper. Project Each student will complete a project that involves analysis of a data set of their choice, applied to an anthropological problem they are interested in. Certain milestones in the completion of the project (selection of a data set, analysis results, rough draft, and final draft) will be submitted, with one or another of these due each week. The format of the paper should be scientific research paper format, which you will learn over the course of the semester. Here are some things that I will expect to see in your research paper. 1. 2. 2. Five part scientific format, including the following sections: introduction, materials & methods, results, discussion, and conclusions. The introduction should include at least a brief literature review of other studies that have been done in the area you are working on. A minimum of 10 sources should be discussed and cited in the text of this section. These sources should be referenced in the bibliography. Your paper should include a bibliography. The citation or bibliography format should be according to one of the major journals in anthropology, such as American Anthropologist, American Journal of Physical Anthropology, etc. Alternatively, you can use CSE/CBE style. Online materials are acceptable if referenced properly, and there is a large amount of advice online about how to reference online or other electronic documents. Submission Procedures Weekly assignments and project fragments should be submitted via Moodle . This saves me time, saves you printing costs, saves trees, and (possibly most importantly) helps me avoid losing students’ work. Assignments are due via Moodle before Tuesday at midnight during the week after they are listed in the syllabus. There will be a penalty of 20% of that assignment’s score for each day (or fraction of a day) that an assignment is late. You can expect me to grade your assignments promptly and give you feedback via a grading form or via comments on your assignment returned to you via your official University email address. Other Statistical Software As the person who has to grade your assignments, I have to standardize on one statistical software package. For many reasons, I have chosen SPSS for our standard statistical software for ANTH 402. However, there are several other commercial, shareware, and freeware statistical software packages available. In particular, I am impressed with PAST. PAST will do almost everything SPSS does, though the output isn’t as pretty or as easy to capture. PAST offers additional useful types of analysis that SPSS doesn’t, such as cladistics, neighbor joining clustering, mixture analysis, and correspondence analysis. It has the best, fastest, and most flexible cluster analysis that I have ever seen. In my own research I use PAST 10 times more often than SPSS. Advanced Anthropological Statistics: Provisional Schedule This schedule is tentative and the topics might change as we go. Topics and readings will always be current on Moodle, so I have put the readings and other materials there. Thus, this schedule is a bare-bones list of topics, Readings from the Handbook, and assignments. Required browsings are listen on Moodle. Assignments are due before 12:00 midnight on the Tuesday of the week after the assignment is listed unless otherwise noted on Moodle. WEEK OF 1/29 TOPIC Getting Started HANDBOOK ASSIGNMENT Chapter 1 Understand how the class will work Find some place to use SPSS Intro to SPSS and the Labs LECTURES (Notes on Moodle) Lecture 1: Intro to the Class Lecture 2: Types of Data and Sampling 2/5 Basic Statistics Chapter 2 Assig 1: Descriptive & Inferential Stats Chapter 3 Descriptive & Inferential Stats LECTURES (Notes on Moodle) Lecture 3: Descriptive Statistics Lecture 4: Inferential Statistics Lecture 5: Frequencies, Data Transformation, and Capturing Output 2/12 Multiple Regression Chapter 4 LECTURES (Notes on Moodle) Lecture 6: Regression & Correlation Lecture 7: Working with Non-Linear Data in Regression Assig 2: Multiple Regression 2/19 Principle Component Analysis Chapter 11 LECTURES (Notes on Moodle) Lecture 8: Principal Components Lecture 9: The Search for Significant Relationships Assig 3: PCA and FA 2/26 Discriminant Analysis Chapter 12 Assig 4: Discriminant Analysis LECTURES (Notes on Moodle) Lecture 10: Discriminant Analysis Lecture 11: Decisions, Decisions 3/5 Cluster Analysis Assig 5: Cluster Analysis LECTURES (Notes on Moodle) Lecture 12: Cluster Analysis Lecture 13: Constructing & Reading Dendrograms 3/12 Clustering PC’s and DF’s LECTURES (Notes on Moodle) Lecture 14: Clustering PC’s and DF’s Lecture 15: Causes of Similarity Between Things Lecture 16: Logistic Regression 3/19 ANOVA LECTURES (Notes on Moodle) Assig 6: Clustering PC & DF Chapter 5 & 6 Assig 7: ANOVA Lecture 17: ANOVA Lecture 18: Character Coding and Missing Data 3/26 Beginning Your Project LECTURES (Notes on Moodle) Lecture 19: Finding a Data Set Lecture 20: Scientific Paper Format 4/2 **SPRING BREAK** 4/9 Project week 1 Assig 8: Find a Data Set Assig 9: Make a Research Plan LECTURES (Notes on Moodle) Lecture 21: Advanced Nominal Data Methods: Multidimensional scaling, classification trees, and correspondence analysis Lecture 22: How to Treat Males and Females in an Analysis 4/15 Project week 2 Assig 10: Data Analysis Results LECTURES (Notes on Moodle) Lecture 23: Multivariate Statistics in Forensic Anthro Lecture 24: The effect of Admixture on DF for Ancestry 4/23 4/30 Project week 3 LECTURES (Notes on Moodle) Lecture 25: Models of Genetic Similarity Lecture 26: Intro to Some Other Statistical Software Project week 4 LECTURES (Notes on Moodle) Lecture 27: Cladistics Lecture 28: Data Mining 5/7 Assig 11: Preliminary Draft Assig 12: Revise Paper, Turn In final draft Due by midnight Sunday May 12 Project completion: Project presentations by undergrad students. Finals Week Thursday, May 16, 8:00 to 10:00. Meet for project presentations by grad students. Advanced Anthropological Statistics (ANTY 408) ASSIGNMENT 1: Descriptive and Inferential Statistics For your first formal assignment I want you to focus on the following practical tasks: • • • • • • starting and running SPSS; downloading the main dataset we will use for these assignments and loading it into SPSS; generating descriptive statistics (means and standard deviations) for the data; plotting data using a histogram; data set subsectioning; and testing the hypothesis that two means are equal. This sounds like a lot to accomplish, but it won’t actually take you very long to run the analyses, because you are using SPSS. Remembering back to your introductory statistics class, where you did calculations with a hand calculator, even calculating the mean for a large data set might take several minutes. For the 1853 individuals in the data set we will be working with it would take you a very long time. If you could enter one individual’s measurement into the calculator every 5 second, it would take you just under 3 hours to calculate the mean for one variable. Of course, in entering and adding 1853 data values the probability that your fingers pressed the wrong button for at least one of them is very high. With SPSS you can obtain means and standard deviations for all 20 variables for the 1853 individuals in just a few seconds, and you can be sure that the result is accurate because the data was entered accurately. I. Starting and running SPSS. Get SPSS running. If you haven’t a clue how to do this, the first part of the document at http://www.csub.edu/ssrictrd/SPSS/SPSS11-1/11-1.htm has good instructions on opening and closing SPSS. II. Downloading the data set. I have a data set on Blackboard in the “Assignments” section, named anth402data.xls. Download this data to the computer you are using and store it on your memory .. The data file is in Microsoft Excel format, which is a common format that you will want to import data from. Also download the file anth402data_codebook.htm, which is the “codebook” for this data set. A codebook is a document that explains what the data variables in the data set are. This file is a HTML file, which should be viewable using a web browser or any modern word processor. III. Load the data into SPSS. Note that SPSS will directly load Excel spreadsheets. If you don’t know how to do this, the document (Wuensch, 2005, “Importing Data To SPSS From Excel”) at http://core.ecu.edu/psyc/wuenschk/SPSS/Excel2SPSS.doc will show you how. Once you have the data loaded, save it as a native SPSS file. SPSS data files have the extension ‘.sav’. To do this, click “File” on the taskbar, choose “Save as”, then if necessary choose the SPSS file type in the “File Type” window. Make sure that the file will be saved on your memory stick, then click OK. IV. Prepare a results file. Open Microsoft Word (or an equivalent word processor). Type in your name and “Assignment 1" at the top. Save this file with the name “firstname_lastname_1.doc” to your memory stick, where firstname_lastname is your first and last names joined by an underscore. For example, my results file would be named “Randy_Skelton_1.doc”. You will use this file to store and save the results from your analyses described below. V. Descriptive statistics. The term “descriptive statistics” refers to those statistics that describe a data set’s central tendency (mean or median), its dispersion (range, standard deviation, or variance), and,perhaps its degree of departure from a true normal curve (skewness and kurtosis). The document at http://www.ats.ucla.edu/stat/spss/output/descriptives.htm is an “annotated output document” for the descriptive procedures. An annotated output document explains each item of the output in detail in case you don’t understand what the tables and figures in the output mean. Here are the analyses you should do. A. Use the SPSS decriptives procedure [Analyze->Descriptive Statistics>Descriptives] to find the means, standard deviations, minimum values, and maximum values for the variables GOL, NOL, BNL, BBH, and XCB. Your output should look something like the table below (don’t expect your numbers to match exactly). Examine the output produced to see whether it gave you the results you were expecting. Copy and paste the table that looks like the one below into your results file. In your results document, change the table label from “Descriptive Statistics” to “Descriptive Statistics: All Populations”. To do this, right click on the table and choose “Edit Picture” from the menu. Now, click on the “Descriptive Statistics Label, and you should be able to access the text box to change the label. Save your results file. There is no need to save your file using a different name. Descriptive Statistics N Minimum Maximum Mean Std. Deviation GOL 3047 151 212 179.39 8.713 NOL 3047 151 209 177.06 8.093 BNL 3047 80 125 99.24 5.896 BBH 3047 103 159 131.81 7.273 XCB 3047 116 167 136.99 7.371 Valid N (listwise) 3047 B. Use the SPSS select cases procedure [Data->Select Cases->If condition is satisfied] to set a filter for only the Norse and the Zalavar populations. To choose only the Norse and the Zalavar data, type ANY(POPNAME,”Norse”, “Zalavar”) into the “Select cases if” window. This command says to choose any individual case that has a value for POPNAME of Norse or Zalavar. Note that Norse and Zalavar must be inside quotes because they are not numbers, and that SPSS is case sensitive for text items. Now, generate the descriptive statistics for the variables GOL, NOL, BNL, BBH, and XCB; using only the Norse and Zalavar samples. Your output should look something like the table above. Examine this table and note that the sample size (N) is much less for this analysis, because you are only using the Norse and Zalavar – not all the individuals. Copy and paste the table that looks like the one above into your results file. Change the table label from “Descriptive Statistics” to “Descriptive Statistics: Norse and Zalavar”, and save your results file. C. Use the SPSS explore procedure [Analyze->Descriptive Statistics>Explore] to list the descriptive statistics for GOL, separately for the Norse and Zalavar samples. Unless you have changed something, SPSS will still be using filtering that chooses only the individuals who are Norse or Zalavar. In the “Dependent List” window enter GOL. In the “Factor List” window enter POP (here we have to use a number and POP is the number equivalent of POPNAME, where 1 = Norse, 2 = Zalavar, etc.). Click on the “Plots” button and choose a histogram type of plot. Click “Continue” and click “OK” to run. You should get a table that looks like the one below. Examine this table and convince yourself that the means (and other possibly interesting information) are given separately for the Norse and the Zalavar). You should get two histograms that look like the one below below, one for POP = 1 and one for POP =2. Note that these histograms only vaguely resemble a normal curve. Copy and paste the table that looks like the one below into your results file. Change the table label from “Descriptives” to “Descriptives: Norse and Zalavar Separately”. Copy and paste the two histograms that look like the one below into your results file, and save your results file. Descriptives POP GOL 1 Mean 95% Confidence Interval for Mean 2 184.17 Lower Bound Upper Bound .619 182.94 185.40 5% Trimmed Mean 184.11 Median 184.00 Variance 42.580 Std. Deviation 6.525 Minimum 170 Maximum 201 Range 31 Interquartile Range 10 Skewness .170 .229 Kurtosis -.629 .455 181.28 .701 Mean 95% Confidence Interval for Mean Lower Bound Upper Bound 179.89 182.67 5% Trimmed Mean 181.37 Median 181.00 Variance 51.154 Std. Deviation 7.152 Minimum 165 Maximum 196 Range 31 Interquartile Range 11 Skewness -.085 .237 Kurtosis -.569 .469 Histogram for POP= 1 25 Frequency 20 15 10 5 Mean = 184.17 Std. Dev. = 6.525 N = 111 0 170 180 190 200 GOL VI. Inferential Statistics. Inferential statistics refers to hypothesis testing. As you no doubt remember from introductory statistics, this often involves constructing a confidence interval based on a mean and some function of the standard deviation. SPSS often does this slightly differently, but the principle is the same. For this assignment, we will use an independent samples t-test to test the hypothesis that the Norse and Zalavar have the same means for GOL. Use the SPSS independent samples t-test procedure [Analyze->Compare Means->Independent Samples T Test]. Enter GOL into the “Test Variables” window. Enter POP into the “Grouping Variable” window. Click on the “Define Groups” button and tell SPSS that Group 1 is 1 (Norse) and Group 2 is 2 (Zalavar). Click “Continue”, then click “OK”. Your output should include a table that looks similar to the table below. Copy and paste this table into your results file, and save your results file. Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the F GOL Equal variances assumed Equal variances not assumed Sig. .499 .481 t df Sig. (2-tailed) Mean Std. Error Difference Difference Difference Lower Upper 3.131 195 .002 3.090 .987 1.144 5.036 3.110 184.800 .002 3.090 .994 1.130 5.050 SPSS it almost always presents probabilities that a null hypothesis is true in a column labelled “Sig”. Note that SPSS rounds probabilities to three decimal places. This means that “sig” values are often presented as “.000". This does not mean that the probability is actually zero – it might be .000248 or some similar value, which is rounded to .000 using three decimal places. The correct way to refer to a probability of “.000" is “less than .001" or “< .001". To finish this assignment answer the following four questions at the bottom of your results file. The document at http://statistics-help-forstudents.com/How_do_I_interpret_data_in_SPSS_for_an_independent_samples _T_test.htm#.UNnd_Vvcjac is a good annotated output document that explains how to interpret the SPSS t-test results. This document should be opened in Mozilla Firefox or the tables won’t display correctly. VII. 1. What is the null hypothesis about the means that is being tested in this analysis? 2. The table you copied to your results file has a section about the Levene’s test for equality of variances. Given the results of this test, should you use the t-test results for “Equal variances assumed” or for “Equal variances not assumed”? Explain your reason for this choice. 3. What is the probability that the null hypothesis about the means (that you gave as the answer to question 1) is true? 4. Do you accept or reject the null hypothesis about the means? What do you then conclude about the means? Submitting your assignment. Add to your results document an “Acknowledgements and Bibliography” section in which you acknowledge your collaborators and sources. Submit your results file (named firstname_lastname_1.doc) to me through the Assignment Submission link I’ve put on Moodle. Bibliography Information Technology Services at The University of Texas at Austin, 2001. SPSS for Windows: Descriptive and Inferential Statistics. <http://www.utexas.edu/its/rc/tutorials/stat/spss/spss2/> (Version current as of May 30, 2006). Nelson, Edward, et al., 2000. Chapter One: Getting Started With SPSS for Windows, in SPSS for Windows 11.0: A Basic Tutorial. <http://www.csub.edu/ssrictrd/spss/SPSS11-1/11-1.htm> (Version current as of May 30, 2006). Statistics-help-for-tudents.com, 2008. How do I interpret data in SPSS for an independent samples T-test?. <http://statistics-help-forstudents.com/How_do_I_interpret_data_in_SPSS_for_an_independent_samples_T_tes t.htm> (Version current as of December 25, 2012) UCLA Academic Technology Services, n.d., SPSS Class Notes: Exploring Data. <http://www.ats.ucla.edu/stat/spss/notes2/exploring.htm> (Version current as of May 30, 2006). UCLA Academic Technology Services, n.d., Annotated SPSS Output Descriptive statistics. <http://www.ats.ucla.edu/stat/spss/output/descriptives.htm> (Version current as of May 30, 2006). Wuensch, Karl L., 2005. Importing Data To SPSS From Excel. <http://core.ecu.edu/psyc/wuenschk/SPSS/Excel2SPSS.doc> (version current as of May 28, 2006). Advanced Anthropological Statistics (ANTY 408) ASSIGNMENT 2: Multiple Regression For this assignment I want you to focus on the following practical tasks: • learn how to do a multiple regression analysis using SPSS; • learn how to interpret the output SPSS provides (including examining and testing hypothesis about regression, correlation, and prediction); and • beginning the process of learning to write scientific reports. I. Scientific Report Format. Scientific reports usually follow a conventional format. Most of the anthropology faculty, and certainly the editors of any journal you might want to submit an article to, will want you to adhere to this format. For most students, scientific report format seems odd and constraining at first. This is because almost everything we learned about writing we learned in an English class. In English classes, you learn how to write stories. Even the most in-depth analysis is framed in the form of a story. A story has the characteristic that it begins with a statement of a problem. It then moves fluidly through a series of processes that lead to the conclusion of the problem. Finally, the problem is concluded in some manner. A moment’s reflection will convince you that this style applies to novels as well as to most non-scientific nonfiction you have read. This is a good style in many ways. It is designed to capture the reader’s attention and to keep the reader’s attention to the end. The assumption is that the reader is going to read the work from start to finish. Scientific reports however, start with a different assumption. They assume that the reader will use the report as a reference. That is, that the reader is probably not going to read the work from start to finish, but instead to add it to a collection of similar works and refer to it for specific items of information as necessary. In other words, a scientific report is used more like an encyclopedia than like a novel. I know that there are some people who read the encyclopedia cover to cover (I am one), but, let’s face it, those people are weird (proudly so in my case). Given this, although scientific reports have some element of the basic story pattern to them, they sacrifice the fluid movement through the steps of a solution in order to compartmentalize information. When information is compartmentalized it is easy to find. This style makes scientific reports difficult to read, as many of you have no doubt observed. However, this difficulty disappears as you gain experience reading scientific articles (which is why your professors often assign a lot of them). As you gain experience working with scientific reports, you will find that the convenience of being able to easily locate information far outweighs the inconvenience of not having a coherent “plot line” to the story. Scientific format consists of five parts: • introduction: this is where the problem is described and given a context; • materials and methods: this is where the materials used are described, along with the analyses performed upon them; • results: this is where the results of the analysis are presented, free from interpretation; • discussion: this is where the results are interpreted; and • conclusions: this is where some conclusion to the problem is reached and stated. Most scientific reports will also include an abstract at the beginning, and a bibliography at the end. You can see how the scientific format compartmentalizes information, as a service to the reader who is looking for particular types of information. For example, if the reader (usually another researcher) wants to find out which SPSS procedure the author(s) of the article used, he or she can turn directly to the materials and methods section to find this information, without having to read through the entire body of the article looking for it. II. Introductions. For this assignment we will focus on introduction sections. Recall that this is the section in which the problem is introduced and given a context. There are three main things that are normally done in an introduction section: • a statement of the problem in context; • a literature review; and • hypotheses to be tested, or other focus of the research. The statement of the problem in context and literature review may be combined and integrated. By convention the hypotheses to be tested (or other focus of the research) are at the very end of the introduction section, which makes them easy to locate. The statement of the problem in context should be in “funnel” format. Funnel format means that you start with the broadest possible context, then proceed to narrow it down until the “pointy end” of the funnel leads directly to the exact problem being examined in the report. Here is an example, using very terse statements that you would expand to sentences or paragraphs in an actual report. The topic is estimating stature from femur length in Asian immigrants to the U.S. Statement Statement Purpose Crime exists Provide the broadest possible context by giving a broad statement about the nature of the world that is of interest to a broad group of people. Some crimes result in unidentified skeletal remains Narrows the focus to a certain group of crimes, which are still of interest to a broad group of people. Stature is an important Connects stature to identity and to the process of part of the description of a identifying skeletal remains in criminal cases. missing person Narrows the focus to stature estimation. There is a physical (possibly genetic) relationship between femur length and stature Provides a reason why your method might work. Narrows the focus to stature estimation from femur length. There does not exist a formula for estimating stature from femur length for Asian Americans Presents a practical problem to be solved. Narrows the focus to a specific population Therefore, I propose to develop a formula for estimating stature from femur length for Asian Americans States your exact project. Narrows the focus to your exact project. Notice how each statement narrows the focus of interest. After being led down this funnel, the reader is clear about what the problem is and why it is interesting and important. Notice how the statements made can also guide you as to what literature you would need to review in order to carry out the project. III. Assignment part 1: Writing an introduction. We will focus in this assignment in figuring out what to put in the introduction section. I am not interested in having you review literature for this assignment, but I do want you to think about “funnel” format as described above. Begin by creating a results document to store your answers and results to be submitted. Open a new Microsoft word document and put your name and assignment 2 at the top. Save this file to your memory stick as firstname_lastname_2.doc (where firstname is your first name, and lastname is your last name). Now, assume that you have a business that makes high quality custom hats for people. Unfortunately, you are just starting the business and can’t afford the instrument that most hat makers would use to measure the length of the head from front to back, GOL in our data set, although you do own the less expensive instrument for measuring skull breadth at several points. Therefore you need to find a method for estimating GOL from measures of the width of the head: XCB, XFB, STB, ZYB, and AUB in our data set (the B in all these abbreviations stands for Breadth). In your results document, create a table similar to the one above. The first statement should be “everybody wears a hat sometimes”, and the last statement should be “therefore, I will develop a formula for estimating head length from five measures of head breadth”. Fill in the missing statements in between. For each statement, including the ones I just gave you, identify the purpose of the statement for achieving the “funnel” effect. IV. Assignment part 2: The Analysis. Use the SPSS regression procedure [Analyze – Regression – Linear] to perform a multiple regression analysis to develop a formula for estimating GOL (the dependent variable) from XCB, XFB, STB, ZYB, and AUB (the independent variables). Use all the individuals in the data set (i.e. don’t use any form of case selection). Choose the stepwise regression method, in the “Method” box that is immediately under the list of independent variables. Now we are going to “trick” SPSS into showing the results for multiple regression analyses using 1 variable, 2 variables, 3, 4, and all 5 variables. To do this, while the “Linear Regression” window is showing, click on the “Options” button and make sure that the “Use probability of F” radio button is selected, then in the box for “Entry” type in 0.99 and in the box for “Removal” type in 1.0. Click continue. SPSS uses these value to decide whether adding another variable to the regression formula would make it significantly more accurate, and whether removing a variable from the regression formula (in the backward or remove methods) would make it significantly less accurate. By default, “significantly” more or less accurate are set to 0.05 for entry and 0.10 for removal, meaning that if SPSS is 95% confident that adding the variable will make the formula more accurate then the variable will be added; and if SPSS is 90% confident that removing a variable will not make the formula less accurate, then it will be removed. Setting the “Entry” value to 0.99 tell SPSS that it only needs to be 1% confident that adding a variable will increase the accuracy of the formula. The number in the “Removal” box must be larger than the number in the “Entry” box, so we will set it to the highest possible value, which is 1.0. Finally, click “OK” to run the analysis. The document at http://core.ecu.edu/psyc/wuenschk/MV/multReg/intromr.docx does a good job of explaining all the concepts involved with multiple regression and how to implement them in SPSS. Save the output produced by SPSS, by exporting it to Word format. Then add it to your results document. V. Assignment part 3: Interpretation. Answer the following questions at the bottom of your results document. The document at http://www.ats.ucla.edu/stat/spss/output/reg_spss.htm is an annotated output document for the SPSS regression procedure, and you should refer to it while answering these questions. 1. What is the multiple regression formula for estimating GOL from XCB, XFB, STB, ZYB, and AUB. You want the formula that will allow you to take measurements of XCB, XFB, STB, ZYB, and AUB, plug and chug, and generate an estimate of GOL. [Hint, use the unstandardized coefficients from the coefficients table.] 2. What is the multiple correlation between the independent variables XCB, XFB, STB, ZYB, AUB and the dependent variable GOL? [Hints: Look at the Model Summary table. SPSS symbolizes multiple correlation as R.] 3. What is the amount of variability in GOL explained by the combined effects of XCB, XFB, STB, ZYB, and AUB? [Hint: this is asking for a coefficient of determination. You remember how these are calculated from a correlation, don’t you?] 4. Say that you obtained the following measurements from an individual: XCB = 139mm, XFB = 117mm, STB = 114mm, ZYB = 140mm, AUB = 121mm. What is your estimate for this individual’s GOL? You will need to use a calculator. Show your work. 5. What is a 95% confidence interval (i.e. based on two standard errors in either direction) for your estimate) of GOL? [Hint: Look at the Model Summary table for some critical information.] 6. Are all the coefficients in your multiple regression formula (including the constant) significantly different from zero? Which are not significantly different from zero and how do you know? [Hint: You only need to look at model 5, and if you asked for confidence intervals as the instructions specified the work has been done for you in the Coefficients table.] 7. What is the single most important of the breadth measurements for estimating GOL? Why does this variable seem to be the most important one? 8. As the hat maker, you are most interested in finding the simplest regression formula for estimating GOL. What is the regression formula for estimating GOL from the single most important breadth measurement? VI. Submitting your assignment. Add to your results document an “Acknowledgements and Bibliography” section in which you acknowledge your collaborators and sources. Send your results document (named firstname_lastname_2.doc) to me using the Assignment Submission link I’ve put on Moodle. Bibliography UCLA Academic Technology Services, n.d., Annotated SPSS Output: Regression Analysis. <http://www.ats.ucla.edu/STAT/SPSS/output/reg_spss.htm> (Version current as of May 31, 2006). Wuensch, Karl, 2012. A Brief Introduction to Multiple Correlation/Regression Analysis. <http://core.ecu.edu/psyc/wuenschk/MV/multReg/intromr.docx>. (Version current as of December 25, 2012). Advanced Anthropological Statistics (ANTY 408) ASSIGNMENT 3: Principal Components & Factor Analysis For this assignment I want you to focus on the following practical tasks: • learn how to do principal components analysis using SPSS; • learn how to do factor analysis using SPSS; • learn how to interpret the output SPSS provides; and • continue the process of learning to write scientific reports by examining the materials and methods section. I. Assignment part 1: The Analysis. We will do both a principal components analysis and a factor analysis. Begin by creating a results document for storing your answers and results to be submitted. Open a new Microsoft word document and put your name and assignment 3 at the top. Save this document to your memory stick as firstname_lastname_3.doc (where firstname is your first name and lastname is your last name). A. Principal Components Analysis (also called “principle” components analysis). The document at http://core.ecu.edu/psyc/wuenschk/MV/FA/PCA-SPSS.docx explains the concepts of principal components as implemented by SPSS, and how to perform a principal components analysis. The author, Karl L. Wuensch, has a different point of view from mine on how factor analysis and principal component analysis differ, but that should not be a problem. Use the SPSS factor procedure [Analyze – Dimension Reduction -Factor] to perform a principal components analysis of the class data set. Use all the variables except ID, POP, POPNAME, and SEX (it won’t give you POPNAME as a choice). Click the “Extraction” button and make sure that the “Method” is “Principal components”, “Extract” is set to “Based on Eigenvalue”, and that “Unrotated Factor Solution” is selected in the “Display” area. Click “continue” to get back to the Factor window. Click the “Rotation” button, and make sure that “None” is selected for “Method”. Click “continue” to get back to the factor window, then click “OK” to run the analysis. Your output window will call this a “Factor” analysis, even though it’s only a Principal Components analysis at this point. B. Save the output produced by SPSS, by exporting it to Word format. Then add it to your results document. Factor Analysis. The document at http://core.ecu.edu/psyc/wuenschk/MV/FA/FA-SPSS.docx continues the discussion in the document I gave you above for principal components analysis (with a somewhat skewed point of view in my opinion). Use the SPSS factor procedure [Analyze – Dimension Reduction -Factor] to perform a factor analysis of the class data set. Use all the variables except ID, POP, POPNAME, and SEX. Click the “Extraction” button, set the “Method” to “Principal axis factoring”. Click “continue” to get back to the Factor window. Click the “Rotation” button, and set the “Method” to “Varimax”. Click “continue” to get back to the factor window, then click “OK”. Save the output produced by SPSS, by exporting it to Word format. Then add it to your results document. II. Assignment part 2: Interpretation. Answer the following questions at the bottom of your results document. A. Interpreting the results of PCA. The document at http://www.ats.ucla.edu/stat/spss/output/principal_components.htm is an annotated output document for the SPSS principal components procedure, and you should refer to it while answering these questions. 1. What proportion of the variation in WCB can be explained by the principal components? 2. How much of the total variation in the entire data set can be accounted for by the first 4 principal components? 3. Why were only 4 principal components (out of 20) selected as “significant”? What is the significance of an eigenvalue of 1? 4. Interpreting and naming the principal components. I will do the first three to show you how, then ask you to name the fourth one. Interpretation of principal components or factors involves examining the coefficients for each principal component or factor as presented in an appropriate SPSS output table. The size of the coefficient is called its “loading”. If a principle component or factor coefficient for a certain variable is large, we say that this variable “loads highly” or “has a high loading” on this principal component or factor. A variable has a “small loading” on a principal component or factor if the coefficient for that variable is small for that PC or factor. We also describe loadings as positive or negative, depending on whether the coefficient is positive or negative. Principal component 1 (PC1) has the classic signature of a size component. All of the variables have positive loadings and all but three of them are high loadings (let’s say high = over 0.50 for this case). PC2, PC3, and PC4 have a mixture of positive and negative loadings, and so must represent shapes rather than sizes, where these shapes are defined by contrasts in dimensions. To interpret PC2, PC3, and PC4, we need to know more about the variables. All of the variables that end in an L (GOL, NOL, etc.) are length variables, meaning that they are measured in the anterior (front) to posterior (back) dimension. All of the variables that end in B (XCB, STB, etc.) are breath variables, meaning that they are measured in the side to side (right to left) dimension. All of the variables that end in H are height variables, meaning that they are measured in the superior (up) to inferior (down) direction. Further, the first 10 variables, GOL through WCB are measurments of the entire skull. The next 9 variables, ASB through MAB are measurements of the face. The 20th variable, MDH is the height of the mastoid process, which is located behind the ear. Ignoring MDH, we have a set of skull lengths, breadths, and heights; and a set of face lengths, breadths, and heights. Looking at the loadings (coefficients) for component 2, we see that there are both positive and negative loadings. The largest single positive loading is .679 for BPL, a face length; and the single largest negative loading is -.615 for STB, a skull breadth. Therefore, we know that this component contrasts lengths and breadths, at first glance contrasting face length with skull breadths. We must modify this slightly, however, when we examine which other variables have high or moderate loadings. We note that GOL, NOL, and BNL, all skull lengths, also have moderately large positive loadings compared to the other variables. So now we know that the length being contrasted here is both face length and skull length. When we examine which other variables have high negative loadings, we find XCB and XFB, which are both skull breadths like STB. So now we know that this principal component contrasts skull breadth with face and skull length. Therefore, I will call it “skull breadth vs face and skull length”. Examining PC3, we see that the single highest positive loading is for NLB, a face breadth; and that the loading for STB, a skull breadth, is nearly as high and positive. We also see that the single highest negative loading is for OBH, a face height; and that NPH and NLH, also face heights, also have high negative loadings. Therefore, this component primarily contrasts face and skull breadth with face height, and I will call it “face and skull breadth vs face height”. Apply this type of reasoning to PC4 to give it a name that describes the shape contrast it represents. B. Interpreting the results of FA. The document at http://www.ats.ucla.edu/stat/spss/output/factor1.htm is an annotated output document for SPSS factor analysis. 1. Are there the same number of “significant” factors as there were significant principal components? 2. Examine the Factor Matrix, which presents the loadings (coefficients) of the variables before rotation. Have the loadings changed significantly from your PCA? Explain why you would or would not give these factors the same names as their corresponding principal components. 3. Examine the Rotated Factor Matrix, which presents the loadings (coefficients) of the variables after rotation. Have the loadings changed significantly from those in the Factor Matrix? Explain whether it seem to you as if rotation made the pattern of high and low loadings more interpretable or less interpretable than they were before rotation? Rotation of the factors, and varimax rotation in particular, seeks to make some factors larger while making others small. Therefore, we need to switch our strategy for interpreting the factors slightly. Instead of focusing on the contrasts (large positive vs large negative), focus only on which variable exhibit high positive loadings and then figure out what they represent. 4. Do the new loadings change your interpretation and naming of the factors? I will interpret factors 1 and as examples, and leave the remaining two factors for you to interpret and name. Factor 1 Unrotated factor 1, like principal component 1 is clearly a size variable (all variables have fairly high positive loadings). However, after rotation, only GOL, NOL, BNL, and BPL have high loadings. Therefore, this factor now represents skull and face length instead of general size, and I will call it the “skull and face length” factor, or perhaps more simply, the “length” factor. Unrotated factor 2 seems to contrast skull breadth (XCB, XFB, STB) with face length (BPL). Rotated factor 2, however, seems to represent skull breadth, with maybe some smaller contribution of face breadth. I will call this the “skull breadth” factor. III. Scientific Report Format: The Materials and Methods Section. The materials and methods section is the second of the five parts of a scientific document. As you might infer from the name, this is the section in which the writer describes the materials used in the analysis and the methods used to analyze them. It is normally divided into two subsections – you guessed it – a materials subsection and a methods subsection. A. B. The Materials Subsection. In this subsection you should present all the information known about the materials (specimens, subjects, items, etc.) analyzed. Include citations to published descriptions if possible. I will list the important items of information to include and provide examples based on the Boaz anthropometric database. 1. What types of materials are these? The Boaz database is described by Jantz et al. (1992). It presents measurements and other data about more than 15,000 primarily Native American individuals of both sexes and all ages, with some persons of other populations incidentally included. 2. Who collected the information and when? The data were collected by people hired and trained by Franz Boaz, in the late 1800's. 3. What measurements or other data are included? The data include last name, first name, age group, age, whether age is an estimate or exact, birthplace, sex, age, tribe, band, purity, blood quantum, mother’s tribe, father’s tribe, occupation, standing height, shoulder height, finger height, finger reach, sitting height, shoulder width, head length, head breadth, face height, face breadth, nose height, nose breadth, ear height, hand length, weight, whether these measures are estimated or exact, the observer name, the place of observation, and the date of observation. 4. Any additional observations about the condition or nature of the materials. The people in the Boaz database were alive at the time the measurements were taken. The methods subsection. In this subsection, you should thoroughly describe all the methods used in your analysis. Here I will provide an example based on a study I did in the late 1990's using the Boaz dataset (Skelton, 1997). A discriminant function for sex was constructed for the individuals age 20 and older, using the SPSS-X Discriminant procedure on the U.M. campus mainframe. This function was then used to classify all 13530 individuals in the data set [who did not have missing data]. Mean sexing accuracy by age was calculated using the SPSS-X Means procedure. The results were downloaded and imported into Word Perfect 5.1, which was used to format them for import into Quattro Pro 4.0. Finally, the formatted results were imported into Quattro Pro, which was used to plot sexing accuracy vs age for the males and the females. Overall sexing accuracy was obtained by averaging the accuracies for the males and the females, and was also plotted. Three methods for correcting for the effect of size were applied to the data. First a principal components analysis was performed using the SPSS-X Factor procedure. Second, a form of size scaling was attempted, wherein the values for each of the variable are summed to yield an overall size variable. Each value was then divided by the overall size variable. This method is widely known to be an inefficient way to adjust for size. Third, the data were divided into three age groups: 1 to 12, 13 to 19, and 20 or older. Each variable was regressed on age, using the SPSS-X Regression procedure, separately for each of the age groups. The residuals for each variable, after the effect of age was accounted for by this procedure, were retained and a discriminant analysis was performed using them. Finally, the sexing accuracies by age of the size-scaling and the residuals procedures were average, downloaded, and plotted as described above. IV. Assignment part 3: Writing a Materials & Methods Section. Now that you have performed these analyses, you know exactly what you did. Write a materials and methods section, as you would for a scientific report, using the information and examples I gave you above. The codebook for our class dataset and inspection of the data itself should give you all you need to know to write a materials subsection. What you did to produce your results will form the basis of your methods subsection. Add your materials and methods section to the bottom of your results document. V. Submitting your assignment. Add to your results document an “Acknowledgements and Bibliography” section in which you acknowledge your collaborators and sources. Send your results document (named firstname_lastname_3.doc) to me using the Assignment Submission link I’ve put on Moodle. Bibliography Jantz, Richard L. et al., 1992. Variation among North Amerindians: Analysis of Boas's Anthropometric Data. Human Biology 64(3):435-61. Skelton, Randall R, 1997. How Children Score on Discriminant Functions Designed for Adults. Intermountain J. of Sci. 3(1):47-53. UCLA Academic Technology Services, n.d., Annotated SPSS Output: Princpal Components Analysis. <http://www.ats.ucla.edu/STAT/SPSS/output/principal_components.htm> (Version current as of May 31, 2006). UCLA Academic Technology Services, n.d., Annotated SPSS Output: Factor Analysis. <http://www.ats.ucla.edu/STAT/SPSS/output/factor1.htm> (Version current as of May 31, 2006). Wuensch, Karl L., 2005. Principal Components Analysis - SPSS. <http://core.ecu.edu/psyc/wuenschk/MV/FA/PCA-SPSS.doc> (Version current as of May 31, 2005). Wuensch, Karl L., 2005. Factor Analysis - SPSS. <http://core.ecu.edu/psyc/wuenschk/MV/FA/FA-SPSS.doc> (Version current as of May 31, 2005). Advanced Anthropological Statistics (ANTY 408) ASSIGNMENT 4: Discriminant Functions Analysis For this assignment I want you to focus on the following practical tasks: • learn how to do discriminant functions analysis using SPSS; • learn how to interpret the output SPSS provides; and • continue the process of learning to write scientific reports by examining the results section. I. Assignment part 1: The Analysis. We will do three discriminant functions analyses, two for sex, and one for population. Begin by creating a results document to hold your results. Open a new document in Word. Put your name and assignment 4 at the top and save it to your memory stick with the file name firstname_lastname_4.doc (where firstname is your first name and lastname is your last name). A. Discriminant Functions Analysis (DFA) for sex: All populations. Use the SPSS Discriminant procedure [Analyze – Classify – Discriminant] to perform a discriminant functions analysis to produce a formula and sectioning point to distinguish females from males using the measurements in the class data set. The document at http://www.statsoft.com/textbook/discriminant-function-analysis/?button=1 explains the concepts of discriminant functions analysis. Use all individuals in the data set (i.e. don’t do a data – select cases). Enter SEX as the “Grouping Variable” and define the range as minimum = 1 and maximum = 2. To keep the number of variables from getting out of hand, choose only the following 5 measured variables, and enter them into the “Independents” area: GOL, BBH, XCB, ZYB, and MDH. Thus we will have a representative skull length, skull height, skull breadth, face breadth, and mastoid height. Click on the “Statistics” button, and choose “Unstandardized” for “Function Coefficients”, then click “Continue”. Click on the “Classify” button and make sure that “All groups equal” is chosen for “Prior Probabilities”; and choose “Summary table” for “Display”; then click “Continue”. Click “OK” to run the analysis. Save the output produced by SPSS, by exporting it to Word format. Then add it to your results document. B. DFA for sex: Zalavar Only. Use the SPSS select cases procedure [Data – Select Cases – If condition is Satisfied – If...] to choose only the Zalavar individuals by entering ANY(POP,2) into the appropriate text box. Return to the Discriminant procedure, and run the analysis again with the same conditions and variables as in part A above (except that you are using only the Zalavar instead of all individuals). Click “OK” to run the analysis. Save ONLY the “Canonical Discriminant Function Coefficients” table and the “Classification Results” table, by copying them from the SPSS output window and pasting them into the bottom of your results document. Edit the labels of these tables to “Canonical Discriminant Function Coefficients: Zalavar Only” and “Classification Results: Zalavar Only”. C. DFA for Population. We will do a DFA for population, but if we use all populations, we will get a large, difficult to interpret, messy, output. So, let’s simplify by using only four populations. Use the SPSS select cases procedure [Data – Select Cases – If condition is Satisfied – If...] to choose only the Zalavar, Teita, Zulu, and Australian populations, ANY(POP,2,4,6,7) into the appropriate text box. Return to the Discriminant procedure, and change the “Grouping Variable” to POP. Define the range for POP to be minimum = 2, maximum = 7. Enter all 20 measured variables into the “Independents” list. Let’s say that we are interested in the simplest possible discriminant function, so click the “Use Stepwise Method” radio button. The stepwise method works similarly to how it worked for regression analysis, and will produce a formula that balances the number of variables with accuracy. Click on the “Classify” button and choose a “Combined-Groups” type of plot, then click “Continue”. Leave all the other options the same as in part A above. Click “OK” to run the analysis. Save the output produced by SPSS, by exporting it to Word format. Then add it to your results document. II. Assignment part 3: Scientific Report Format – the Results Section. In your results section of a scientific document you should present the results – just the results – nothing but the results. In other words, just the results – nothing about methods, and no discussion or interpretation of the results. There is an exception for conference papers, in which the results and discussion sections are often merged to achieve a pleasing flow to the presentation. A results section should include an explanation of all the tables and figures included. This can normally be done using a caption attached to the tables and figures themselves. Here is an example from a paper I published several years ago (Skelton, 1996). TABLE 1: GROUPS AND SAMPLE SIZES |------------+--------+--------+--------+--------+--------+---------| |GROUP NAME | ADULT | TEST | ADULT |SUBADULT|SUBADULT| TIMES | | | MALES | SAMPLE |FEMALES |FEMALES | MALES | USED IN | | | N | N | N | N | N |SUBSAMPLE| |------------+--------+--------+--------+--------+--------+---------| |Apache | 103 | 11 | 35 | 76 | 71 | 3 | |Arapaho | 53 | 6 | 8 | 5 | 15 | 4 | |Cherokee-OK | 108 | 12 | 38 | 14 | 59 | 4 | |Cherokee | 159 | 18 | 111 | 64 | 89 | 5 | |Cheyenne | 28 | 3 | 3 | 6 | 11 | 4 | |Chickasaw | 95 | 11 | 35 | 28 | 41 | 3 | [... cut ...] |------------+--------+--------+--------+--------+--------+---------| |TOTALS: | 4182 | 467 | 1993 | 596 | 1943 | | |------------+--------+--------+--------+--------+--------+---------| TABLE 2: SUBSAMPLE COMPOSITIONS | SUBSAMPLE | GROUPS INCLUDED | |-----------+----------------------------------------------| | 1 | Cherokee-OK, Cheyenne, Haida, Hoopa | | 2 | Coahuilla, Haida, Mississauga, Navajo | | 3 | Chickasaw, Choctaw, Piegan, Sioux | | 4 | Chippewa, Comanche, Malecite, Ute | | 5 | Cherokee-OK, Cherokee, Chilcotin, Sioux | | 6 | Cherokee, Concow, Munsee, Tsimshian | | 7 | Coahuilla, Crow, Okanagan, Tsimshian | [... cut ...] |-----------+----------------------------------------------| TABLE 3: AVERAGE ACCURACIES |----------------+------------+--------+--------+--------+--------+--------| |DATA SET |CONVENTIONAL| OTHER | ALL | ALL | ALL |OVERALL | | | ANALYSIS | MALES | MALES | ADULTS | PEOPLE |ACCURACY| | | | FILE | FILE | FILE | FILE | | |----------------+------------+--------+--------+--------+--------+--------| |Adult Males | 78.49% | 75.60% | 76.78% | 75.93% | 74.47% | 76.25% | |Test Set | 67.53% | 85.96% | 87.46% | 86.49% | 86.36% | 82.76% | |Adult Females | 55.62% | 76.26% | 76.98% | 76.32% | 64.16% | 69.87% | |Subadult Males | 40.32% | 52.91% | 55.39% | 60.72& | 57.08% | 53.29% | |Subadult Females| 34.70% | 59.20% | 61.40% | 66.18% | 53.17% | 54.93% | |----------------+------------+--------+--------+--------+--------+--------| |OVERALL | 55.33% | 69.98% | 71.60% | 73.13% | 67.05% | 67.42% | |----------------+------------+--------+--------+--------+--------+--------| Assignment: Earlier you added the results of the discriminant functions analysis for sex using all populations to your results file. Go through that part of your output, and add a caption before each table or figure that describes what information is presented in the table or figure. In the captions, number the tables sequentially starting with the first one in the output (e.g. Table 1, Table 2 ...). Figures (any graph or plot) should also be numbered sequentially (Figure 1, Figure 2 ...), but as a separate list from the tables. You should start at the top of your document, with the table that says “Analysis Case Processing Summary”. Stop when you get to the end of your results for sex using all populations. III. Assignment part 3: Interpretation. Answer the following questions at the bottom of your results document. A. Interpreting the results of DFA. The document at http://statistics.ats.ucla.edu/stat/spss/output/SPSS_discrim.htm is an annotated output document for the SPSS discriminant procedure. You should refer to this document while answering the questions below. The first 5 questions refer to your DFA results for sex using all individuals/populations. 1. Is the discriminant function significant? Explain how you determined this? 2. Examining the “Standardized Canonical Discriminant Function Coefficients” table, which single variable has the highest loading on the discriminant function? Examining the “Structure Matrix” table, which variable is most correlated with discriminant function score? Is this the same variable in both cases? Sometimes it is not. Different authorities favor one or the other of these ways of determining which variable(s) is most important in distinguishing the groups being examined by the discriminant function (which is why SPSS gives you both). 3. Let’s say that you are a forensic anthropologist interested in distinguishing females from males based on skull measurements. What is the discriminant function formula for doing this? What is your sectioning point, and how will you use it to decide whether a skull is from a male or a female? 4. What is the best accuracy you can hope for using your discriminant function formula? 5. Say that you have a skull with the following measurements: GOL = 184, NOL = 184, BNL = 101, BBH = 127, XCB = 135, XFB = 116, STB = 112, ZYB = 125, AUB = 117, WCB = 68, ASB = 109, BPL = 102, NPH = 69, NLH = 52, OBH = 38, OBB = 40, JUB = 112, NLB = 27, MAB = 61, MDH = 27. Use the discriminant function you developed in question 3 to determine the sex of this individual. Show your work. The next 2 questions refer to your analysis for distinguishing sex among the Zalavar only. 6. Are the discriminant function coefficients the same for the Zalavar only as they were for all individuals/populations? What does this tell us about the relationship between discriminant functions, and the nature of the sample used in finding them? 7. Is the accuracy of sex determination for the Zalavar higher or lower for all individuals/populations? Why do you think explains this? The rest of the questions refer to your analysis for distinguishing the four populations. 8. How many functions did this analysis produce? What is the relationship between the number of groups and the number of functions? 9. How many of the original 20 variables were used in the discriminant function for population? 10. Examine the table labeled “Functions at Group Centroids”, which shows the discriminant function score for each group based on their centroid (set of means for each variable) on each function. Note the following facts. a. Function 2 separates the Zalavar (POP 2) and the Australians (POP 7), which have negative values; from the Teita (POP 4) and Zulu (POP 6), which have positive values. b. On function 1, the Zalavar have a large positive value and the Australians have a large negative value. c. On function 3, the Teita have a large negative value and the Zulu have a large positive value. How would you use these three facts to form a strategy for using the three discriminant functions to sort individuals into the four populations? 11. Examine the Combined Groups Plot. You should be able to clearly see that function 2 separates Teita and Zulu from Zalavar and Australian, and that function 1 separates Zalavar from Australian. However, function 3 is not represented on this plot. If we had an unknown individual with a score of 2.0 on function 2 and 0.0 on function 1, what group centroid would this individual be closest to? Could we accurately distinguish between whether this individual is Zulu or this individual is Teita using this plot? 12. V. In the previous assignment you learned how to interpret and name Principal Components by examining the coefficients. The same procedure can be done with discriminant functions to get an idea of what variables are important in distinguishing groups. Examine the second discriminant function only (don’t worry about the others) as presented in the “Standardized Canonical Discriminant Function Coefficients” table. This function distinguishes the two African populations (Teita and Zulu) from the European (Zalavar) and Australian populations. There are two variables that have high loadings but different signs. The high loadings suggest that these variables contribute substantially to distinguishing African from nonAfrican populations in this analysis. Give this function a name in terms of the contrast between these two variables. Submitting your assignment. Add to your results document an “Acknowledgements and Bibliography” section in which you acknowledge your collaborators and sources. Send your results document (named firstname_lastname_4.doc) to me using the Assignment Submission link I’ve put on Moodle. Bibliography Skelton, Randall R., 1996. A Suggested Method for Using Means Data in Discriminant Functions Using Anthropometric Data. Journal of World Anthropology 1(4). <http://wings.buffalo.edu/research/anthrogis/JWA/V1N4/skelton-art.txt> (Version current as of January 4, 1997). Statsoft Inc, 2012. Discover Which Variables Discriminate Between Groups, Discriminant Function Analysis. <http://www.statsoft.com/textbook/discriminantfunction-analysis/?button=1> (Version current as of December 25, 2012). University of California Los Angeles, Institute for Digital Research and Education, 2012. Annotated SPSS Output Discriminant Analysis. <http://statistics.ats.ucla.edu/stat/spss/output/SPSS_discrim.htm> (Version current as of December 25, 2012). Advanced Anthropological Statistics (ANTY 408) ASSIGNMENT 5: Cluster Analysis • • • • I. For this assignment I want you to focus on the following practical tasks: hierarchical cluster analysis; formulating models of what your output should look like given different scenarios; interpreting dendrograms; and continuing the process of learning to write scientific reports by focusing on discussion sections. Assignment Part 1: Clustering Individuals. One of the most important uses of cluster analysis is to probe a data set to see whether subgroupings exist within it. Using this approach you do not assume that any subgroupings exist within the data. For example, we know that there are two sexes and 20 groups represented, but we will assume (pretend?) that we do not know this and see whether any clusters emerge that seem to represent sex or population groups. This is called exploratory cluster analysis. We will do this first. Begin by creating a new Word document with your name and Assignment 5 at the top. Save this file to your memory stick as firstname_lastname_5.doc. The document at http://www2.chass.ncsu.edu/garson/pa765/cluster.htmhttp://www.statsoft.c om/textbook/cluster-analysis/?button=1 does a good job of explaining the concepts of various forms of cluster analysis. We will use only hierarchical clustering, which this document refers to as “joining clustering”. Perform a cluster analysis on the class data (anth402data.sav) using the SPSS Hierarchical Cluster procedure [Analyze – Classify – Hierarchical Cluster]. We will perform a basic analysis with no frills this time, so that you can see what sorts of things you can ask for in the output. • Use all variables except ID, SEX, POP, and POPNAME. • In the “Label by” text area, enter either POP (if you want to see the numbers) or POPNAME (if you want to see the names). • Click the “Statistics” button and make sure that agglomeration schedule is checked, and proximity matrix is not checked; then click “Continue”. • Click the “Plots” button and make sure that “Dendrogram” is checked. For Icicle, choose none; then click “Continue”. • Click the “Methods” button and make sure that the “Method” is “Between-groups linkage” (also called UPGMA); make sure that the “Measure” is “Squared Euclidean Distance”; and set “Transform Values” to “Z scores” and “By variable”; then click “Continue”. • • Now, uncheck the box for “Statistics”. Finally, click “OK” to run the analysis. Note that this produces an output window with a single huge dendrogram (containing 1852 branches) which is extremely difficult to interpret because it is so large. In fact, only part of it shows in the SPSS output window. Unfortunately, I was unable to find an good annotated output document for SPSS cluster analysis, but the document I referred you to previously explains the three main features of cluster analysis output: the dendrogram, the agglomeration schedule, and the icicle plot. The Dendrogram. The most important part of the output, in my opinion is the dendrogram. A dendrogram is a type of evolutionary tree, that in this case presents the order of relationships between the clusters. In fact, dendrograms are the most general type of evolutionary tree, and other types, such as cladograms and phylograms are categories of dendrogram. The most important item in interpretation of dendrograms is the branching relationships – not necessarily how close together they appear when printed to paper or the screen. The required browsing article by Gregory (2008) does a good job of explaining how to interpret evolutionary trees. Capturing a Dendrogram. One commonly encountered problem with dendrograms is that they often fail to export cleanly to a word processor. Our normal procedure of exporting the SPSS output window to a Word document is almost guaranteed to completely foul up the format of a dendrogram. Cutting and pasting works somewhat better, but still leaves something to be desired. The best way I have found to do this is to right click on the dendrogram in the SPSS output window, then choose “SPSS Rtf Document Object” and “Edit” from the menu that appears. The dendrogram will open as a document in an editor window. Now, highlight the dendrogram by placing your cursor at the top of this document, clicking the left mouse button and HOLDING IT DOWN. Now, while holding the left mouse button down, scroll down to the bottom of the document. This will require some fancy gyrations of the mouse to get the editor window to scroll down. I find that putting the cursor below the editor window and moving the mouse up and down rapidly seems to work best. Once you are at the bottom of the document and the dendrogram is highlighted, release the left mouse button. Now press ctrl-c to copy the dendrogram, go to your Word document, and paste using ctrl-v. The dendrogram should now be in Word and look pretty good. After looking at, and puzzling over, the dendrogram in your output window, close the output window. You can try to capture this dendrogram while keeping your fingers crossed, praying, shaking a rattle, or whatever you personally do as a luck attraction ritual, but even if you succeed it’s very II. difficult to interpret a dendrogram this complex. Simplification is in order and we will accomplish that by clustering the group means. Analysis Part 2: Clustering the Means. The dendrogram produced using all 1853 individuals was too large to use in any practical way. The solution is to use the means for each population. There are 20 populations in the data set, so using their means reduces the number of dendrogram branches enormously. Let’s run some cluster analyses that we can actually interpret, using the means data. A. Getting the data set. I have put a new data set, named “anth402data_means.sav” on Blackboard in the assignments area. You should download it to your memory stick, then load it into SPSS. Unfortunately, getting a file of means data from SPSS is not a trivial process and I have placed a document on Blackboard titled “Getting Means from SPSS” that documents the procedure for doing this, if you are interested in how it is done. B. Evaluating population clusters. Let’s see whether the data contains clusters that reflect population relationships. Before running the analysis think about what an evolutionary tree of the relationships between these groups ought to look like. An expected outcome of this sort is one form of a model. Formulate a model for how these populations should cluster if the similarities between them are due to ancestral genetics. Formulate a model for how these populations should cluster if the similarities between them are due to adaptation to similar environments, or perhaps non-genetic effects. For example, a comparison between the U.S. Negros and the two African population, the Zulu and the Teita, might be informative in distinguishing these two models. Reduce these models to a couple of sentences. Call them “Ancestral Genetics Model” and “Not Ancestral Genetics Model”. Now, let’s do the analysis. i. Select the means for the sexes combined (both). You should know how to use the SPSS select cases procedure to do this. If not, review the procedure in previous assignments. [Hint: ANY(SEX,3).] ii. Go to the hierarchical cluster procedure [Analyze – Classify – Hierarchical Cluster]. iii. Use all variables except SEX, POP, POPNAME, and the filter_$ variable (which will usually be the last variable).. iv. Label cases by NAME. v. Uncheck Display Statistics so we can avoid getting an agglomeration schedule. vi. vii. viii. Click the “Plots” button, check the checkbox for “Dendrogram” and set icicle plot to “None”, then click “Continue”. This prevents SPSS from sending an icicle plot to the output. Click the “Methods” button and make sure that the “Method” is “Between-groups linkage”; make sure that the “Measure” is “Squared Euclidean Distance”; and set “Transform Values” to “Z scores” and “By variable”; then click “Continue”. Click “OK” to run the analysis. Copy the dendrogram from the SPSS output window using the procedure described previously and paste it into your results file. Give it an informative label, such as “Dendrogram For Population, Sexes Combined”. C. Sex vs Population. Let’s do a cluster analysis for sex and population together. I believe that you will find this analysis to be the most informative. Before running the analysis formulate a model of what the dendrogram will look like if population is the most important cause of clustering in the data. Also formulate a model of what the dendrogram will look like if sex is the most important cause of clustering in the data. Reduce these models to a couple of sentences. Call them “Population First Model” and “Sex First Model”. Now, let’s do the analysis. ix. Use the SPSS select cases procedure to tell SPSS to use the males and the females, but not the entries for both sexes combined. [Hint: ANY(SEX,1,2).] x. Run a cluster analysis using the same settings as above. Copy the dendrogram from the SPSS output window and paste it into your results file. Give it an informative label, such as “Dendrogram for Sex and Population”. IV. Analysis Part 3: Interpretation. Here are some questions that you should answer at the bottom of your results file. The document at http://txcdk.unt.edu/iralab/sites/default/files/Hierarchical_Handout.pdf is an annotated output document for hierarchical clustering. You should refer to it to help in answering these questions. The first questions refers to Analysis Part 2B: Evaluating population clusters. 1. For this analysis you developed two models, “Ancestral genetics model” and “Not Ancestral Genetics Model”. Reduce these models to a couple of sentences each and write them down here. 2. Which of these two model above does your dendrogram labelled something like "Dendrogram For Population, Sexes Combined" seem to support? Explain how you came to this conclusion. The rest of the questions refer to Analysis Part 2C: Sex vs Population. V. 3. For this analysis you developed two models, “Population First Model” and “Sex First Model”. Reduce them to a couple of sentences each and write them down here. 4. Which of your two models (sex first or population first) seems to more accurately explain the pattern of clusters in the dendrogram labeled something like "Dendrogram for Sex and Population”? Explain how you came to this conclusion. 5. When I ran this analysis the Bushman males clustered with a group that consisted otherwise entirely of females, and the Buriat females clustered with a group that consisted otherwise entirely of males. You may or may not get this result. As an anthropologist, you should know something about these populations (at least the Bushman population), which have been extensively studied by ethnographers and physical anthropologists. What characteristics of the Bushman and Buriat populations might account for this misclassification? (Hint 1, it’s something that standardizing the variables as Z scores should prevent, but doesn’t seem to in this case. Hint 2, it’s what the first principal component almost always reflects.) Assignment Part 4. Scientific Paper Format: Discussion Sections. The fourth part of a scientific paper is the discussion section. In the discussion section you should present your interpretations of your results. Do not repeat the results themselves. You should also discuss any limitations or problems with your analysis or data that may have given you inaccurate results and, therefore, an incorrect interpretation. You should also discuss what further research should be done to confirm or augment your analysis (the basis of future papers). Below is the discussion section of a paper I presented some years back at the Northwest Anthropological Research Conference. These results demonstrate that trait list bias can have an effect on an evolutionary analysis, and that Strait et al. did not use a method that adequately corrects for the effects of trait list bias. When trait list bias is corrected for, a result similar to the phylogeny shown in Figure 1 is obtained, and when trait list bias is not corrected for, a result similar to the phylogeny shown in Figure 2 is obtained. Therefore, these two phylogenies should be regarded as competing hypotheses, and one's choice of which one to consider more accurate depends on two questions: 1) whether one believes that trait list bias should be accounted for, and 2) whether one believes that the Skelton and McHenry (1992) method of grouping traits by function is the most appropriate way to handle trait list bias. Though I obviously favor answering both questions in the affirmative, Interestingly, at the annual meeting of the American Association of Physical Anthropologists, earlier this month, Strait (1998) presented the results of his investigation of the basicranial flexion functional complex. As I understand from what people who listened to his presentation tell me (Sperazza, personal communication), he believes that we constructed this functional complex incorrectly in our analysis. It is quite likely that we did get it at least partly wrong, and we welcome this sort of re-evaluation of the intercorrelation of traits. (Skelton, 1998).. Note that this discussion section has all the elements I listed above. First, it gives an interpretation of the results (These results demonstrate that trait list bias can have an effect on an evolutionary analysis...; Therefore, these two phylogenies should be regarded as competing hypotheses, ...). It mentions possible limitations or problems with the analysis or data (... one's choice of which one to consider more accurate depends on two questions: ...; ...he believes that we constructed this functional complex incorrectly in our analysis). Directions for further research are also mentioned (... this is a problem that the field as a whole needs to address through continued research and debate. ;... we welcome this sort of re-evaluation of the intercorrelation of traits). Write a discussion section at the bottom of your results document, as you would for a scientific research paper. Looking over the four questions I asked in Assignment Part 3: Interpretation, you should have no problem coming up with interpretations (but do not simply copy your answers to the questions, format your interpretations as sentences in paragraphs). There are also some flaws in the data and perhaps in the analysis that may be important sources of error in your results and/or interpretations (hint: what is unusual about the data for U.S. Negros). A few moments of reflection should give you some ideas for additional research or analyses that could be done to follow up on what you did in this assignment. VII. Submitting your Assignment. Add to your results document an “Acknowledgements and Bibliography” section in which you acknowledge your collaborators and sources. Send your results document (named firstname_lastname_5.doc) to me using the Assingment Submission link I’ve put on Moodle. Bibliography Gregory, T. Ryan, 2008. Understanding Evolutionary Trees. Evo Edu Outreach 1:121–137. Skelton, Randall R., 1998. A Comparison of Two Cladistic Models for Early Hominid Phylogeny. Presented at the 1998 Northwest Anthropological Conference, Missoula, MT, 4/17/98. Statsoft Inc., 2012. How To Group Objects Into Similar Categories, Cluster Analysis. <http://www.statsoft.com/textbook/cluster-analysis/?button=1> (Version current as of December 24, 2012). University of North Texas, n.d. Cluster Analysis. <http://txcdk.unt.edu/iralab/sites/default/files/Hierarchical_Handout.pdf> (Version current as of December 24, 2012). Advanced Anthropological Statistics (ANTY 408) ASSIGNMENT 6: Clustering PC and DF Scores For this assignment I want you to focus on the following practical tasks: • writing intermediate results to your data file for use in subsequent analyses; • clarify the uses of PCA and DFA; • further explore the use of cluster analysis to reveal relationships; and • continuing the process of learning to write scientific reports by focusing on conclusion sections. I. Saving PC scores and DF scores. Very often, a researcher will want to perform a principal components or discriminant functions analysis and save the results for use in other analyses. I’ll show you how to do this, though I’ll provide the actual data we will use on Blackboard. In contrast to the process of capturing means, which we explored in assignment 5, asking for PC or DF scores to be saved is simply a matter of checking the right checkbox. The scores are then saved in your data file as new columns of data to the right of your original data. A. Saving PC scores. • Start the SPSS Factor procedure [Analyze – Dimension Reduction – Factor] as explained in Assignment 3. Choose whatever variables, extraction options, and rotation options you want. • Click the “Scores” button, check “Save as variables”, set the “Method” to “Regression”, then click “Continue”; • Click “OK” to run the analysis. Look at your data file. Note that there are new variables in columns to the right of your original data. These columns contain each individual’s PC scores on each PC’s or Factors that SPSS found to be significant. In order to save the insignificant PC/Factor scores you will need to trick SPSS by clicking on the “Extraction” button and setting “Eigenvalues greater than” to 0. The new variables will have names like FAC1_1, FAC2_1, etc. The “FAC” signifies that these are scores produced by the Factor procedure. The first number is the PC or factor number – i.e. PC1, PC2, etc. The second number is the run number. If you run the analysis again, say with a different extraction or rotation method the names will be FAC1_2, FAC2_2, etc. B. Saving DF scores. • Start the SPSS discriminant procedure [Analyze – Classify – Discriminant]. Set whatever Grouping Variable, independents, method, and other options you want. • Click on the “Save” button and check “Discriminant scores”, then click “Continue”; • Click “OK” to run the analysis. You can also choose to save the predicted group membership (i.e. how the procedure classified each individual) and/or the probability of membership in the predicted group if you want to. The scores for the DF’s (canonical variates) are saved in columns to the right of your data. My experience is the SPSS saves scores for all DF’s whether they are significant or not. The new variables have names like Dis1_1, Dis2_1, etc. The Dis part is for the Discriminant procedure and the numbers signify exactly what they did in the explanation above of the names of saved scores from the Factor procedure. SPSS will often create an extra empty column to the right of the DF scores that it saved. You can simple delete this column if it’s annoying. II. Now we need means. Again I have generated the means for you to save time. Find and download the file “anth402_pcdfmeans.sav” from Blackboard and save it to your memory stick. This data file contains the means for the four PC scores (FAC1_1 ... FAC4_1), the means for the 16 significant discriminant functions (Dis1_1 ... Dis16_1) for population, and the means for a discriminant function for sex (Discrim_SEX). III. Assignment Part 1: Clustering PC Scores. Now we can do something interesting. We will start by doing a cluster analysis using mean PC scores to see if these scores cluster the means by sex and/or population. Begin by creating a results file called firstname_lastname_6.doc. Use the Select Variables procedure to use only male and female means (not both combined). [Hint: ANY(SEX,1,2).] Run a cluster analysis on the four PC score variables, FAC1_1 through FAC4_1. Here’s the recipe. • Go to the hierarchical cluster procedure [Analyze – Classify – Hierarchical Cluster]. • Use the variables FAC1_1, FAC1_2, FAC1_3, and FAC1_4. • Label cases by NAME. • Uncheck Display Statistics so we can avoid getting an agglomeration schedule. • Click the “Plots” button, check the checkbox for “Dendrogram” and set icicle plot to “None”, then click “Continue”. This prevents SPSS from sending an icicle plot to the output. • Click the “Methods” button and make sure that the “Method” is “Between-groups linkage”; make sure that the “Measure” is “Squared Euclidean Distance”; and set “Transform Values” to “None”; then click “Continue”. • Click “OK” to run the analysis. Copy the dendrogram from the SPSS output window and paste it into your results file (see assignment 5 for the procedure). Give it an informative label, such as “Dendrogram Using PC Scores, For Population and Sex”. IV. Assignment Part 2: Removing the Effect of Size. Since you looked back at assignment 3 and your write-up for assignment 3, you remember that we interpreted PC1 as “Size”. There are many situations in which a researcher wants to remove the effect of size so they can focus on shape variables. Let’s do this, by excluding PC1 scores from our analysis. Repeat the procedure in Assignment Part 1, but this time eliminate the effect of size by using only the means for FAC2_1, FAC3_1, and FAC4_1 – not for FAC1_1. Copy the dendrogram from the SPSS output window and paste it into your results file. Give it an informative label, such as “Dendrogram Using PC Scores, For Population and Sex , Size Removed”. V. Assignment Part 3: Clustering DF Scores. Now, let’s cluster using DF scores. A. Clustering by mean DF’s for population. Here’s the recipe. • Go to the hierarchical cluster procedure [Analyze – Classify – Hierarchical Cluster]. • Use the variables Dis1_1, Dis2_1, through Dis16_1, which correspond to those discriminant functions/canonical • • • • • variates that my output for the discriminant analysis indicated were significant in distinguishing populations according to the “Wilkes Lambda” table. Label cases by NAME. Uncheck Display Statistics so we can avoid getting an agglomeration schedule. Click the “Plots” button, check the checkbox for “Dendrogram” and set icicle plot to “None”, then click “Continue”. This prevents SPSS from sending an icicle plot to the output. Click the “Methods” button and make sure that the “Method” is “Between-groups linkage”; make sure that the “Measure” is “Squared Euclidean Distance”; and set “Transform Values” to “None”; then click “Continue”. Click “OK” to run the analysis. Copy the dendrogram from the SPSS output window and paste it into your results file. Give it an informative label, such as “Dendrogram Using DF Scores For Population”. B. Clustering by mean DF’s for sex. Use the same procedure as in part A, but use only the means for Discrim_Sex. Copy the dendrogram from the SPSS output window and paste it into your results file. Give it an informative label, such as “Dendrogram Using DF Score For Sex”. VI. Assignment Part 4: Interpretations. Answer the questions below at the bottom of your results file. Think of some models of how the data should cluster if it primarily represents sex or if it represented population. You do not need to write this down – just think of some criteria that you will use to judge whether the dendrogram mostly reflects sex differences or mostly reflect population differences. Here’s my way of doing it. I assume that if the dendrogram primarily represents sex, then the first branch above the root will divide the individuals into (mostly) males and (mostly) females. I call this a “sex-first” branching order. I have to say “mostly” here because no statistical method perfectly sorts groups under all conditions. In contrast, if the dendrogram primarily represents population differences, then the first branch above the root will divide the individuals (mostly) into two clusters that contain different populations. Within these population clusters both males and females should be represented, perhaps separated into different branches higher up the tree. I call this a “population-first” branching order. 1. Take a look at the dendrogram that you labeled something like “Dendrogram Using PC Scores, For Population and Sex”, which was produced in analysis part 1. Does it seem as if PC scores facilitate clustering by sex or do they seem to cluster by population? Explain how you arrived at this interpretation. 2. 3. VII. Take a look at the dendrogram that you labeled something like “Dendrogram Using PC Scores, For Population and Sex , Size Removed”, which was produced in analysis part 2. Explain and discuss any differences you see from the dendrogram you labeled something like “Dendrogram Using PC Scores, For Population and Sex”, which was produced in analysis part 1. Discuss what this implies about removing the effect of size from the analysis. Take a look at the dendrogram you labeled something like “Dendrogram Using DF Scores for Population”, which was produce in analysis part 3A. Does it seem as if these DF scores facilitate clustering by sex or do the means seem to cluster by population? Explain how you arrived at this interpretation. 4. Examine how the dendrogram you labeled something like “Dendrogram Using DF Scores for Population” produced in analysis part 3A differs from the dendrograms produced using PC scores. What is the important difference between PC analysis and DF analysis for population that causes this difference? 5. Take a look at the dendrogram you labeled something like “Dendrogram Using DF Scores for Sex”, which was produced in analysis part 3B. Does it seem as if DF scores facilitate clustering by sex or do the means seem to cluster by population? Explain how you arrived at this interpretation. Does this clustering pattern differ from that obtained using DF scores for population? If so, what do you think explains this difference? 6. Discuss how you would fill in the blanks in the following sentence. The main difference between individuals in this data set is their _____, based on my interpretation and naming of PC1 (back in assignment 3, check your grading form to make sure you got it right). 7. Discuss how you would fill in the blank in the following sentence. Based on my interpretation and naming of DF2 (back in assignment 4, check your grading form to make sure you got it right) the most important difference between the populations in this data set is ______. Assignment Part 5. Scientific Paper Format: Hypotheses and Conclusions. The fifth part of a scientific paper is the conclusion section. In the conclusion section you revisit the hypotheses or problems you set up in your introduction section and evaluate whether they are refuted or supported. Often, the hypotheses or problems are restated in the conclusion, followed by an assessment of what the results and interpretations say about them. If the document is a more theoretical treatise rather than a research paper, the conclusion is used to present the final “bottom line” statement of what the author(s) has been arguing. Here is an example conclusion section from an archaeological research article. Conclusion The purpose of this report is twofold. First, it is intended to bring Binford's (1978) important work on meat drying back to zooarchaeologists' attention, because its potential for the interpretation of the archaeofaunal record remains largely untapped. The drying of meat is widespread in ethnographic accounts of both hunter-gatherers and pastoralists, and must have a considerable time depth. The development of an index that allows the identification of dry meat storage, therefore, has the potential for application in a wide range of archaeological contexts. Second, this research represents a critical reassessment of the Drying Utility Index (Binford 1978), intended to simplify it and make its calculation more transparent. This process led to the creation of the Meat Drying Index, which provides a comparatively simple method for calculation of a carcass portion's "dryability." The usefulness of the MDI is reinforced by the fact that it is correlated slightly better with Binford's (1978) ethnoarchaeologically observed Nunamiut meat drying data than is his own DUI, which was developed with specific reference to that data set. Furthermore, both the MDI and DUI exhibit significant and positive correlation with the relatively "independent" sample of caribou bones from dry-meat caches at site LcLg-22 in arctic Canada. In sum, the MDI can be seen as the better index for the interpretation of meat drying, both because it is calculated in a more straightforward manner, and because it appears to predict dryingrelated element distributions as well as, or better than, the DUI. Because the formula for the MDI is relatively uncomplicated, it should be practical to calculate for mammalian species other than caribou; all that is needed is the meat, brain, marrow, and bone weights for each body portion of a given species. Appropriate raw data have already been collected for many species and published in other utility index studies. However, as with other utility indices, it can be predicted that the MDI calculated here for caribou should be applicable to related taxa, and in particular to other artiodactyls, without further modification (Friesen et al. 2001). The MDI, as outlined here, can be used in conjunction with other utility indices and bone density data to interpret several categories of bone assemblage. Element distributions from dry-meat caches, and from camp sites at which large quantities of dry meat were consumed, are predicted to be positively correlated with the MDI, while element frequencies from kill or butchery sites at which dry meat was prepared for storage or consumption elsewhere are expected to be negatively correlated with the MDI. Importantly, however, "real-life" drying activities will not necessarily result in assemblages that are as readily interpreted as those discussed in this paper. As Binford's (1978) Nunamiut Ethnoarchaeology so robustly indicates, decision-making processes relating to the butchery, storage, transport, and consumption of meat are complex, and the effects of marrow or grease processing, the feeding of dogs, sharing, cultural preferences, and a variety of taphonomic agents will all serve to obscure "pure" dry-meat as semblages. For most sites on which dry meat was consumed, it is reasonable to assume that dry meat will comprise only a portion of the total bone sample, which may also include body portions from freshly killed or frozen carcasses. This will add a layer of complexity to the interpretation of those assemblages, and it will not always be possible to infer past meat-drying activities. Future research should be directed at resolving this problem through further ethnoarchaeological work, and through reinterpretation of faunal assemblages that may result from the consumption of dry meat. Despite this caveat, in many instances element distributions may be the only practical way to identify past drying activities, and by extension food storage. Therefore, zooarchaeologists should continue to refine methods for recognizing meat drying in the archaeological record. This study represents one step toward that goal. Friesen (2001). Note that this conclusion restates the problems that the research addresses, though without framing them in terms of hypotheses. This is done in the first two paragraphs. The third paragraph states the outcomes or findings of the research as they relate to the problems presented in the first two paragraphs. Write a conclusion section at the bottom of your results document, as you would for a scientific research paper. Include a statement of the problems addressed in this assignment and the outcomes or findings relating to these problems. I will be most happy (and therefore generous with points) if you can frame the problems in terms of hypotheses to be tested. In doing this, you may want to re-read the information I gave you on hypotheses as required browsing for the third week of class. You will not be able to do any significance tests of your hypotheses, because cluster analysis does not provide significance tests. Therefore, you should simply state whether your results and interpretations (as discussed in the 7 questions in the section above) refute or support the hypotheses. (Hint: one null hypothesis might be: sex is not represented in this data in such a way that it can be revealed by clustering PC or DF scores – i.e. males and females are equal with respect to their mean PC and DF scores by population.) VIII. Submitting your assignment. Add to your results document an “Acknowledgements and Bibliography” section in which you acknowledge your collaborators and sources. Send your results document (named firstname_lastname_6.doc) to me using the Assignment Submission link I’ve put on Moodle. Bibliography Friesen, T. Max, 2001. A Zooarchaeological Signature for Meat Storage: ReThinking the Drying Utility Index. American Antiquity, Vol. 66, No. 2, pp. 315-331 Advanced Anthropological Statistics (ANTY 408) ASSIGNMENT 7: Analysis of Variance (ANOVA) For this assignment I want you to focus on the following practical tasks: • learn how to perform an ANOVA analysis using SPSS; • learn how to perform a MANOVA analysis using SPSS; • learn about post-hoc tests, • compare discriminant analysis and ANOVA for investigating groups; and I. Interesting features of ANOVA. One way of looking at ANOVA is as an extension of t-tests, where you are comparing the means of several groups for significant differences. A t-test compares the means for two groups, and ANOVA compares means for several groups simultaneously. There are several flavors of ANOVA. Here are some: • One-way: Only one variable is used, so this is essentially a univariate method. The analysis detects differences in means between groups. The groups in an ANOVA are usually called “treatments” or “levels”. The two most common types of one-way ANOVA are between-subjects and within-subjects. • Between-subjects is analogous to an independent samples ttest. The groups consist of different individuals. • Within-subjects is analogous to a paired samples t-test. The groups are the exact same individuals measured at different times. This is also called “repeated measures” ANOVA. • Two-way: Two different variables are examined simultaneously, so this is a bivariate method. Between-subjects and within-subjects approaches can both be used. • Three-way: Three different variables are examined simultaneously, so this is multivariate method. Again, both between-subjects and within-subjects analyses may be used. • Multivariate ANOVA (MANOVA). This form of ANOVA uses two or more categories simultaneously (for example, population and sex) In this assignment, we will work with one-way ANOVA and MANOVA. If an ANOVA analysis is significant, then you know that there are significant differences between at least some of the group means, though perhaps not between all of them. For example, if we performed an ANOVA on the Norse, Zalavar, and Teita population, and obtained a significant result, we know that at least two of the means are significantly different (maybe Norse is significantly different from Teita), but some means may not be significantly different (maybe Norse is not significantly different from Zalavar). In order to help you determine which means are actually significantly different, SPSS provides two tools – post-hoc tests and a list of homogeneous groups. A post-hoc test is a set of direct pairwise comparison of the means – i.e. every mean is compared with every other mean – using some form of significance test, such as a t-test. Given that running a post-hoc test along with an ANOVA analysis is standard practice, I often wonder if there is a use for ANOVA at all, since these pairwise comparisons accomplish the same task more simply. Some authorities suggest that ANOVA solves a technical problem that arises when too many pairwise significance tests are preformed. Whatever we may decide, ANOVA is a standard and widely used method. A list of homogeneous groups is created by grouping together those groups that are not significantly different from each other. Some correction has to be made for multiple comparisons since not all members of a homogeneous group a necessarily significantly different from the same members of other homogeneous groups. For example, if Norse plus Zalavar form an homogeneous group, and Teita plus Zulu form another homogeneous group, it does not necessarily imply that both Norse and Zalavar are significantly different from both Teita and Zulu. It might be the case that Norse is significantly different from Teita but not Zulu and Zalavar is significantly different from Zulu but not Teita. Occasionally, a groups will be assigned to two homogenous set. Nonetheless, homogenous groups are a useful concept (in my opinion). Another interesting feature of ANOVA is its underlying model. In the ANOVA model, all the individuals in all of the groups are conceived of as identical, and they differ only in how they have been treated. The classic example of an ANOVA analysis illustrates this. In the classical example, there is a sample of headache sufferers, who are identical in the fact that they suffer from headaches even though they may be a mixture of sexes, races, etc.. The point is that the samples should be as identical as possible with regard to size, gender mix, etc. These headache sufferers are given one of a set of headache relievers (say aspirin, Tylenol, and Advil) then ANOVA is used to determine whether the treatments (type of headache reliever taken) have differing effects in relieving the headaches. This model – identical individuals, different treatments – is very different from the model underlying most analyses, which assumes that the people in different groups are inherently different in some way. This model leads to a reversal of our normal concept of what variables are dependent and independent. In most analysis for group differences (say discriminant analysis) we treated the population group (POP) as the dependent variable, and the measured variables were our independent variables. Therefore, we examined the effect of these measured variables on population group. For example, we could ask whether there are any differences in GOL that allow us to determine an individual’s POP. In an ANOVA analysis, the measured variables are the dependents, and the group is the independent. So, we are essentially looking at the effect of group membership on the measured variables. For example, in ANOVA, we are asking are there any differences in POP that are influencing GOL. If you compare this to the classic example of ANOVA analysis of headache treatments we see that POP is analogous to what medicine is given to treat the headache, and GOL is analogous to the degree of pain relief experienced. Given this, we can look at population differences in two ways. In assignment 4, we used discriminant functions analysis to distinguish the Zalavar, Teita, Zulu, and Australian populations, under the assumption that they must differ in some inherent way (perhaps different genetics). We can revisit this analysis using the ANOVA model, assuming that the individuals in these populations are inherently the same (they are all human beings), but differ in how they were treated (i.e the Zalavar grew up in Europe, the Teita and Zulu grew up in different regions of Africa, and the Australians grew up in Australia). II. Assignment Part I: One-way ANOVA for Population Differences. In this analysis we will do an ANOVA of the Zalavar, Teita, Zulu, and Australian populations. The document at http://www.statsoft.com/textbook/anovamanova/?button=1 explains the concepts and procedures of ANOVA and MANOVA. Here’s the recipe. • First, create a results document. Open a blank document in Word, put your name and “Assignment 7" at the top, and save it to your memory stick as firstname_lastname_7.doc. • Load the class data (anth402data.sav) into SPSS. • Use the SPSS select cases procedure to choose only the Zalavar, Teita, Zulu, and Australian populations. [Hint: ANY(POP,2,4,6,7).] • Go to the ANOVA procedure [Analyze – Compare Means – One Way Anova]. • Add GOL to the "Dependent List". • Use POP as the "Factor". • Click on "Post Hoc" button and check "Tukey", then click "Continue". • Click on "OK" to run. Save the output by exporting it to a Word document and adding it to your results document. III. Assignment Part 2: MANOVA. Starting with version 15, SPSS no longer supports MANOVA directly. However, we will do yet another workaround and simulate a MANOVA using a discriminant functions analysis. The link between MANOVA and discriminant functions is very close. In fact some authorities state that the calculations are identical. The best way to think of their relationship is as looking through two ends of a telescope. From the discriminant functions end you are looking a nominal category from the point of view of ratio measurements. From the MANOVA end you are looking at the ratio measurements from the point of view of a nominal category. As it turns out, we can use discriminant to get the results that MANOVA would give, and to simplify ANOVA analyses of many variables. Here’s the Process. • The class data should already be loaded into SPSS and the data file should be filtered using “Select cases” to use only the Zalavar, Teita, Zulu, and Australian populations. If not, do this now. • Go to the discriminant procedure [Analyze – Classify – Discriminant]. • Use POP as the "Grouping Variable", and define it’s range as minimum 2 and maximum 7. • Add all the measured variables (GOL through MDH) to the “Independents" box. • Click on the “Statistics” button, and uncheck everything except “Univariate ANOVAs”, which should be checked. Click “Continue”. • Click on "OK" to run the analysis. Export your output window to a Word document and append it to your results document. IV. Assignment Part 3: Interpretations. Answer the questions below at the bottom of your results file. The document at <http://www.une.edu.au/WebStat/unit_materials/c7_anova/scene2_spss_output. htm> is an annotated output document for one-way ANOVA if you scroll down far enough. The document at http://statistics.ats.ucla.edu/stat/spss/output/SPSS_discrim.htmhttp://www 2.chass.ncsu.edu/garson/pa765/discrim3.htm is an annotated output document for the SPSS discriminant procedure. A. The following questions apply to the results of your one-way ANOVA analysis in assignment part 1. 1. Do the results of your analysis indicate that there are significant differences between the groups in their means for GOL? Explain how you came to this conclusion. What is the probability that the means for GOL are equal in all four groups? 2. B. Which groups are significantly different from which other groups, and which groups are not significantly different? Explain how you came to this conclusion. The following questions apply to the results of your Discriminant Analysis workaround for MANOVA analysis in assignment part 2. Find a table titled “Tests of Equality of Group Means”. Notice that for each measured variable there is an F statistic, suggesting that this has something to do with ANOVA. In actuality, these are the results of a set of 20 one-way ANOVA tests, one for each of the measurements. Because the discriminant procedure will do this analysis for each supplied variable, many people find it easier to use the discriminant procedure than to run the ANOVA procedure over and over for each measurement. 3. Based on the information in the “Tests of Equality of Group Means” table, are there any variables do NOT exhibit significantly differences between the four populations used in the analysis? If so, what are they? 4. Examine the entry for GOL in the “Tests of Equality of Group Means” table. Are the F value and significance the same as you obtained using one-way ANOVA? MANOVA uses the Wilkes Lambda statistic to assess whether the measured variables, taken all at once, differ between the treatment categories. In this case it would assess whether the 20 measurements of the skull differ between the four populations used in the analysis. Note that the “Tests of Equality of Group Means” has Wilkes Lambda values for each separate measurement. The Wilkes Lambda we want for all measurements, taken together, is given in a table titled “Wilkes Lambda”. The first row of this table, the one for test of functions 1 through 3 is the one we are looking for. I happens to be exactly the same whether you are looking for the significance of the three discriminant functions taken together or looking for the significance of the 20 measurements taken together as in a MANOVA analysis. 5. Do the 20 measurements, taken together, differ significantly between the four populations used in this analysis? At what level of confidence can you say that they differ significantly? V. Submitting your assignment. Add to your results document an “Acknowledgements and Bibliography” section in which you acknowledge your collaborators and sources. Send your assignment (named firstname_lastname_7.doc) to me using the assignment submission link I’ve put on Moodle. Bibliography Statsoft Inc., 2012. Introduction to ANOVA / MANOVA. <http://www.statsoft.com/textbook/anova-manova/?button=1> (Version current as of December 25, 2012. University of California Los Angeles, Institute for Digital Research and Education, 2012. Annotated SPSS Output Discriminant Analysis. <http://statistics.ats.ucla.edu/stat/spss/output/SPSS_discrim.htm> (Version current as of December 25, 2012). University of New England, Armidale, 2000. Scenario and Data Set #2, SPSS Output 7.1, Compare means - One-way ANOVA (Version Current as of July 7, 2006). <http://www.une.edu.au/WebStat/unit_materials/c7_anova/scene2_spss_o utput.htm> Wuensch, Karl L., 2001. One-Way Independent Samples ANOVA with SPSS. <http://core.ecu.edu/psyc/wuenschk/SPSS/ANOVA1-SPSS.doc> (Version current as of July 6, 2006). Advanced Anthropological Statistics (ANTY 408) ASSIGNMENT 8: Find a Data Set for Your Project I. For this assignment I want you to find a data set to use for your project. See the syllabus for details about the class project. A. There are several places you could find a data set. 1. The data you are using for your thesis or dissertation (or a subset of it). I prefer this for grad students. 2. Data that you find on the internet in downloadable form. Probably, most of you will use data of this type. 3. Data that you find in a publication and enter by scanning or by hand. B. The data set you choose should be relatively large, with at least 100 individual cases and 15 variables. I may waive this requirement if the data is sufficiently interesting. II. I have put links to many online data sources, plus a few data sets that I have, on Blackboard in the “Data Sets” area. This is only a small sample of what is out there. You can often find interesting data by doing a Google search for some relevant terms, then digging around through the links Google finds. III. Assignment. Find a data set to use for your project and send me a (1) copy of it, and (2) a description of it, via Blackboard’s digital dropbox. IV. Submitting your assignment. Send me two files using the Assignment Submission link I’ve put on Moodle. The first file should be your data set (which can have any name) and the second file should be a separate description of your data set that includes the name of your data set plus anything you want me to know about it. Advanced Anthropological Statistics (ANTY 408) ASSIGNMENT 9: Writing a Research Plan for Your Project I. II. For this assignment I want you to prepare a research plan (also called a proposal) for your class project. A research proposal is a relatively short document that used for communication and as a guide in planning. A. As a communication tool, your project research plan is used to communicate to other people the nature of your research project, and to show them that you have a plan for carrying the project out. Therefore, you should aim to demonstrate command of the background for the project; knowledge of the data; how you will analyze the data; and how you will interpret your results. The consumers of your research plan way be the teacher of your class (me), the faculty who are advising you in your undergraduate or graduate research project, or perhaps even an agency who might give you funds to carry out the project. B. As a guide in planning, the project research plan forces you to think about the various aspects of the project. Hopefully, this will allow you to move through your project in an effective way, rather than bumbling through it. Assignment. Prepare a research plan for your project. A. It is important to do the required browsing for this assignment. B. Some of you may have done a research plan or proposal as part of ANTY 601, ANTY 413, or another class. You are welcome to polish it up and use it for this assignment, so long as it refers to the data you will be using for this class and it contains all the parts listed below. C. The research plan should include the following parts: 1. Introduction. Your introduction should explain the nature, intent, and importance of your project. It should be in ‘funnel format’. It should review at least a moderate amount of literature, including a citation of where the data was obtained if appropriate, and a selection of other works that have used that data or similar data for a similar purpose. It should include a null hypothesis that is being tested. III. 2. Materials and methods. This section should include what normally goes into a materials and methods section, plus a bit more. Describe your data thoroughly, and describe the analytical procedures you will use with the data. Finally, explain why this data and these methods are appropriate for investigation your problem and for testing your hypothesis. 3. Discussion. In this section you should anticipate various possible results that you might obtain, both positive and negative. Explain what these possible results may tell you, such as whether a particular result will allow you to reject your null hypothesis and what you will conclude if this is the case. 4. A bibliography of literature cited. Submitting your assignment. Send your research plan (named firstname_lastname_9.doc) to me using the Assignment Submission link I’ve put on Moodle. Advanced Anthropological Statistics (ANTY 408) ASSIGNMENT 10: Data Analysis Results I. For this assignment I want you to send me the results of the data analyses you performed on your data for your class project. II. Assignment: Perform the analyses you described in your research plan on your data. Store the results (at least the relevant tables, etc.) in a results document named firstname_lastname_10.doc, and send this document to me. III. Submitting your assignment. Send your analytical results (named firstname_lastname_10.doc) to me using the Assignment Submission link I’ve put on Moodle. Advanced Anthropological Statistics (ANTY 408) ASSIGNMENT 11: Preliminary Draft of Paper I. For this assignment I want you to prepare a preliminary draft of your paper. This draft should be as polished as you can make it, and I will edit it and return it to you for revision. II. Assignment. Write up your results as a scientific paper and submit it to me through the Blackboard dropbox. III. Paper Format. Your paper should be in scientific paper format. We have been working on this all semester, and I expect you to be comfortable with it by now. There are some formatting issues listed in the class syllabus as well. I have found the web site at http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWtoc.html to be useful to me in preparing scientific papers, and it may be useful to you as well. Also note that I have put several resources in the required browsing for the assignment section for this assignment on Moodle. IV. Submitting your assignment. Send your preliminary paper draft (named firstname_lastname_11.doc) to me using the Assignment Submission link I’ve put on Moodle. Advanced Anthropological Statistics (ANTY 408) ASSIGNMENT 12: Revise Your Paper and Submit Final Draft I. I will have edited and commented on your preliminary paper draft, which you submitted in Assignment 11; and I will have returned it to you by email to your official UM email address as listed in Moodle. II. Assignment. Revise your paper, taking into account the suggestions I made on your preliminary draft. III. Submitting your assignment. Final drafts of your paper are due on the day scheduled in the syllabus – probably the Tuesday of the week before finals. See the syllabus for details. Please name your assignment using our normal scheme: firstname_lastname_12.doc or docx. Submit it to me using the Assignment Submission link I’ve put on Moodle.