The Integration of … Modeling, Statistics, Computation and Calculus at East Tennessee State University Jeff Knisley — East Tennessee State University Project Mosaic Kickoff Event – June 28, 2010 Integrative Projects at ETSU Focus on What we have been up to The Symbiosis Project General Education Statistics Course Quantitative Modeling Track of the Math Major Later today / this week: Focus on How we’ve done what we done (And just as valuable – What we’ve learned from what hasn’t worked) Symbiosis: An Introductory Integrated Mathematics and Biology Curriculum for the 21st Century (HHMI 52005872) Team-taught by Biologists (6), Mathematicians (3), and Statisticians (1) Biologists progress to needs for analyses, models, or related concepts (e.g., optimization) A complete intro stats and calculus curriculum via the needs and contexts provided by the biologists More Recently … extensive computational activities featuring R, Maple, and Netlogo Goals of the Symbiosis Project Implement a large subset of the recommendations of the BIO2010 report in an introductory lab science sequence Semester 1: Statistics + Precalculus, Limits, Continuity Semester 2: Calculus I course + Statistics (Our focus on Semesters 1 and 2) Semester 3: Modeling, BioInformatics, reinforcement of previous ideas, More Statistics Goals of the Symbiosis Project Use Biological contexts to motivate mathematical and statistical concepts and tools Analysis of data used to inform and interpret Models and inference used to predict and explain Use Mathematical concepts and Statistical Inference to produce biological insights Insights often need to be quantified if only to predict the scale on which the insight is valid Especially useful are insights that cannot be obtained without resorting to mathematics or statistics Table of Contents Symbiosis I and II List of “modules” with topics selected by biologists Mathematical and Statistical Highlights included (Not enough time to explore Symbiosis III) Logistics: 5 + 1 format, student populations between 7 and 30, and 3 or 4 faculty per course Symbiosis I 1. 2. 3. 4. 5. 6. The Scientific Method: Numbers, models, binomial, Randomization Test, Intro to Statistical Inference The Cell: Descriptive Statistics and Correlation Size and Scale: Lines, power laws, fractals, Poisson, exponentials, logarithms, and linear regression Mendelian Genetics: Chi-Square, Normal, Goodness of Fit Test, Test of Independence DNA: Conditional Probability, the Markov Property, Sampling distributions Proteins and Evolution: Limits, continuity, approximations, and the t-test Symbiosis II Population Ecology: Derivatives, Rates of Change, Power, Product, Quotient rules, Differential Equations 8. Species-Species Interactions: Chain rule, Properties of the Derivative, Differential Equations Qualitatively, Equilibria, Parameter Estimation 9. Behavioral Ecology: Optimization, curve-sketching, L’hopital’s rule 10. Chronobiology: Trigonometric functions and their derivatives, Periodograms 11. Integration and Plant Growth: Antiderivatives, Definite Integrals, and the Fundamental Theorem 12. Energy and Enzymes: Applications of the Integral, differential equations methods, Nonlinear Regression 7. Major Outcomes Complete and/or Comprehensive Biological Investigations Traditional Bio Curriculum: Biological questions pursued to a point short of quantitative analysis Symbiosis: Data and Models used to explore biological questions and predict answers Mendelian genetics via chi-square analysis of data rK strategists based on logistic model and importance/stability of equilibria Aspects of Integration Biologists need or can use almost all the math and stats we can provide But their goals are radically different Statistical inference as a tool for justifying classification of organisms into different categories Models as a means of separating different phenomena And the results are used to address their (often non-quantitative) questions E.g.: Simple epidemiological models used to suggest whether or not mosquito’s can carry the aids virus Aspects of Integration Statisticians and Mathematicians can contribute to biology in a variety of ways But transparency is paramount Examples of concepts/techniques “Transparent” to our biologists: The Randomization test, p-values, normal distribution, Chi-square, Periodograms, logarithms, power laws, Nonlinear Regression, phase-plane analysis Examples of concepts/techniques that are NOT “Transparent” to our biologists: the limit concept, the exponential function, Poisson distribution, conditional probability, t-test, degrees of freedom Aspects of Integration Statisticians and Mathematicians can contribute to biology in a variety of ways And time/effort must be devoted to important subtleties – within biological contexts Example: Logarithms and exponentials with base e. (Why not just use base 10 for everything?) Example: Number of offspring, which is an important bio-quantity – as Poisson-distributed Example: The approximation (1+x)n ≈ enx occurs in numerous applications and contexts in biology, but it takes a long time before it “sinks in” Observation Issues preventing “downstream” usage of math and stats Start as small issues at the most elementary levels Nearly all of module 1 addresses the difference between a scientific hypothesis and a statistical hypothesis Surface area to volume ratio: First we must agree on notation (i.e., A or S or SA or … ). And grow into major obstacles If insufficient time spent developing the hypotheses, result may be “Doing the test” without really knowing what they are testing. E.g.: If time is not spent exploring what a biologist means by a population density, ecological models may become impossible to interpret biologically. Further Insights Computing and Computational Science have emerged as major components Informatics, genetics, proteomics, … And Even in Ecology! Programming in R Need is for math/stat informed algorithms Not for elaborate structures or sophisticated programming languages Further Insights Logistics are a challenge Transcripts are important!!! Course sizes / delivery methods differ significantly Biology lectures can be huge Biology labs are typically smaller than math/stat sections (I had never had to consider how to combine a lab grade with a lecture grade) Communication is very important, especially about the “little issues” that tend to grow Future Directions for Symbiosis More emphasis on computation Algorithms as method to address biological inquiries Algorithms as statistical tools Inference via bootstrapping, Predictions via clustering Informatics Avoiding reliance on “off-the-shelf” approaches Symbiosis IV: A Gen Ed “Intro to Computational Science” course for math and bio majors General Education Statistics In 1996, ETSU began requiring every non-calculus student take an introductory statistics course in their first year To enable students to understand and participate in a data-driven world To prepare students for the stats they would see in their respective majors In 2001, the Gen Ed Stats course moved into the “Stat Cave” – a 45 station computer lab To make the course technology-driven and data-intensive Approx 1200 students per semester (100 in summer) continuously using Minitab, applets, etc. math.etsu.edu/stats/ Some Features of the Course Teaching multiple sections Extensive training of instructors Highly structured course content Online/Off-campus sections may use calculators for some activities Two part Final Exam A comprehensive data analysis project due the week before the in-class Final Exam A standardized M/C final exam common to all sections of the course Quantitative Modeling Track in the Math Major In conjunction with our Statistical Literacy and Quantitative Biology emphases Features many different modeling courses Statistical modeling Mathematical modeling Predictive modeling (data mining, machine learning) Survival models (with computational emphasis) Computational/Discrete Modeling (students take 2 to 4 of these) Future: Integrate with other sciences, Public Health, Medicine, Pharmacy, etcetera… Thank you! Any questions