→ Survey (sampling at random from the exiting population) → Create own samples by performing an experiment (experimental subjects: samples of the infinite population of subjects // … u could’ve created w/ infinite resources) 01: AN INTRODUCTION TO STATISTICS TYPICAL FIGURES IN LECTURES & SCIENTIFIC PAPERS - Bar chart - Box & whisker plot - Scatter plot - Contingency table Why do biologists have to bother with statistics? - Helps with variability, investigates the distribution of samples - Calculate reasonable estimates of the situation in the whole population (e.g. how tall women are on average) → a.k.a. descriptive statistics - Descriptive statistics: summarize what you know about your samples - Answer questions by conducting hypothesis testing (e.g. whether one group of women were taller than another) - To discount the possibility of chance results → conduct a statistical test (e.g. two-sample t test) Why is statistical logic so strange? - There is a need to construct a null hypothesis - Test if the null hypothesis is to be true (statistical tests: 4 main stages) - AWKWARD QUESTIONS Why do biologists have to repeat everything? - Making generalizations in biology is inaccurate (variability) - Replicated observations of a sample: overcome variability - - 1. Formulating a null hypothesis Opposite the scientific hypothesis No differences/relationships Preliminary assumption 2. Calculating a test statistic Measures the size of any effect Usually a difference between groups / relationship between measurements relative to variability USUALLY: larger the effect, larger the test statistic (↑ effect, ↑ test statistic) 3. Calculating the significance probability The chances that a certain set of results could be obtained if the null hypothesis were true - - - - - - GENERALLY: the larger the test statistic & sample size, the smaller the significance probability (↑ test statistic & sample size, ↓ significance probability) 4. Deciding whether to reject the null hypothesis REJECT → Significance probability ≤ 1 in 20 (5% or 0.05) NO EVIDENCE TO REJECT / SUPPORT → Significance probability > 5% 5% cut-off: compromise to reduce the chances of errors TYPE 1 ERROR: detection of an apparently significant difference / association, when in reality there is NONE between the populations TYPE 2 ERROR: failure to detect a significant difference / association, when in reality it is PRESENT in the populations → ↑↑↑ chances by lowering the cut-off point *** STATISTICAL TESTS DON’T PROVE ANYTHING CONCLUSIVELY (there is still a chance that there may or may not be a significant effect) Why are there so many different statistical tests? - There are many different ways to quantify things w/ different types of data - Data can vary in different ways - There are very different questions you might want to ask about the collected data TYPES OF DATA MEASUREMENTS - Character state which can be meaningfully represented by a number - - - - The most common way to quantify things is to take measurements Interval data: → Continuously (e.g. weight) → Discretely (e.g. # of hairs) Normal distribution: usual symmetrical & bell-shaped distribution pattern influenced by a large number of factors Parametric test: statistical test which assumes that data is normally distributed Non-parametric test: statistical test that doesn’t assume that data is normally distributed… BUT uses the ranks of the observations RANKS - Put measurements into an order w/o the actual values having meaning - Ranked / ordinal data - E.g. 1st, 12th; none, light, medium, heavy; 1 = poor to 5 = excellent - MUST be analyzed w/ nonparametric tests CATEGORICAL DATA - Some organism features are unquantifiable - Classify into different categories - Quantify this sort of data by counting the frequency - Usually analyzed with x2 (chisquared) tests or logic regression TYPES OF QUESTIONS - Statistical tests are designed to answer 2 main types of questions: → Are there differences between sets of measurements? → Are there relationships between them? TESTING FOR DIFFERENCES BETWEEN SETS OF MEASUREMENTS - To find out if experimentally treated organisms/cells are different from controls - To compare 2 sets of measurements taken on a single group of organisms (e.g. medical condition of patients before & after treatment) - To see if several types of organisms (e.g. 5 different bacterial strains), or those subjected to treatments (e.g. wheat in different levels of nitrate & phosphate) were different from each other TESTING FOR RELATIONSHIPS BETWEEN MEASUREMENTS - Take 2 or more measurements on a single group of organisms/cells & investigate how the measurements are related - E.g. variation of heart rates w/ their blog pressure, variation of weight with age, concentrations of different cations in neurons vary w/ others - Help study how organisms operate - Help predict things about them TESTING FOR DIFFERENCES AND RELATIONSHIPS BETWEEN CATEGORICAL DATA - Determine if there are different frequencies of organisms in different categories (e.g. rats turn more frequently to the right in a maze) - Determine if categorical traits are associated (e.g. eye & hair color) - Study how quantitative measurements might affect categorical traits (e.g. are tall people more likely to have brown eyes?) USING THE DECISION CHART - Complication: final box alternative tests Parametric test: bold type w/ 2 - - Non-parametric test: normal type ALWAYS advised to use the parametric test if valid Parametric tests: more powerful in detecting significant effects Non-parametric tests: ranked data, irregularly distributed data that can’t be transformed to the normal distribution or have measurements w/ only a few, discrete values INVESTIGATE the DISTRIBUTION of data before deciding which tests to carry out (to see if parametric tests are valid or if you can transform the data so that it can be valid) USING THE DECISION CHART CARRYING OUT TESTS - Test descriptions will do 5 things… 1. Tell the kinds of questions the test will help you answer & give examples to show the range of situations in which it is suitable. This helps you ensure you are choosing the right test 2. Tell when it is valid to use the test 3. Describe the rationale & mathematical basis for the test (says how it works) 4. Show how to perform the test using a calculator and/or the computer-based statistical packages SPSS and RStudio 5. Tells how to present the results of the statistical tests DESIGNING EXPERIMENTS - Can use information about statistics to design better experiments COMPLEX STATISTICAL ANALYSIS - Describe most of the statistical tests needed to analyze straight-forward experiments and surveys (may look at 1-2 factors) - Best to stay as far as possible within the limits when designing experiments - In some branches of biology in some occasions, you simply have to carry out and analyze rather more complex experiments or investigate huge sets of data PRESENTING & DISCUSSING STATISTICS - Describes how to present information about what statistical tests you did and why, how to present the results of your tests, and the level at which to discuss the results - To produce professional write-ups