Data Handling & Analysis BD7054 2012-2013 Andrew Jackson Zoology, School of Natural Sciences a.jackson@tcd.ie Statistics in science • Analysis of data is central to science – Metaphorically – Literally (Introduction -> methods -> results -> discussion) • Underpins one’s own research – Your own research project • Essential in understanding others’ research – To question what they did – To incorporate their ideas in your own research The scientific method • Ask a question about the world around you – Why are vultures the only obligate scavengers among the extant terrestrial vertebrates? The scientific method • Decide what measurable outcome you will use to test a specific hypothesis – Physiology of vultures favours this mode of life – Compare metabolic costs of flight across vertebrate taxa • Design an experiment or field study to test this idea • Use statistics to determine whether your predictions hold • Frame your findings within the broader background of the precedent science –introduction and discussion Course Outline • 8th Oct – 12th Oct – Introduction to R and statistics • 21st Jan – 25th Jan – General Linear Models • 29th April – 3rd May – Generalised Linear Models – Multivariate methods Assessment • On the Friday ending each week, you will be asked to submit either – an assessment, or – complete an online exam assessing your proficiency in data analysis using R Learning outcomes • NB slightly different from course handbooks • summarise and communicate quantitative results graphically and textually to scientific standards. • apply appropriate statistical analyses of commonly encountered data types. • discuss the context of the analyses within a hypothesis driven framework of scientific logic. • use the R statistical computing language for data analysis. Course structure • Series of Lecture / tutorials and computer practicals • Lectures will be as interactive as possible • Computer practicals – Use R to analyse data – Follow video podcasts for instruction – Demonstrators present to help Week 1 • Lectures / Tutorials – Monday to Thursday 10-12 – GGSR-A • Computer sessions – Monday to Thursday 14-16 – Botany Hut computer rooms Summary of statistics covered • Linear regression • General linear models – As a way to ask increasingly complex questions of our data using a common framework (ANCOVA / multiple regression) • Generalised linear models – Extending these concepts to deal with non-normal data types (binary / surivival / count data) Statistical software - R • R is a command line interfaced software – – – – Scary the first few times Incredibly powerful and adaptable Free Open development • Time-tabled computer sessions – Complete video-podcast and examples in your own time • When Googling for R related topics add “cran” to your search terms Delivery of course content • Attendance at lectures/tutorials is mandatory • Moodle website associated with course – Lectures will be posted – Web-based discussions – Links to video-podcasts • Statistics, An introduction using R. Michael J Crawley. Wiley. ISBN 0-470-02298-1 Basic Experimental Design For more details see Experimental Design for the Life Sciences by Ruxton and Colegrave Relationship between hormone levels in male chimpanzees and #females • Measure hormone levels of male chimps and then count how many females are they foraging with. • Higher hormone levels are expected when there are more females to mate with. • However, hormone levels are influenced by age, diet, time of day etc. Male hormones and #females • Hormone level difference could be due to age, diet, time of day OR #females Relationship between hormone levels in male chimpanzees and #females • All chimps are the same age, diet, and time of day so hormone level difference ~ #females Class Exercise Come up with a scientific question and plot your predictions Computer Session th 8 October • Work through 3 podcasts on my website – http://www.tcd.ie/Zoology/research/research/the oretical/Rpodcasts.php 1. Opening R for the first time 2. Working with script files 3. Importing data into R