Lecture1

Introduction: Why statistics? Petter Mostad 2005.08.29 Statistics is… • …a way to summarize and describe information: not very interesting in itself • …an important tool for research in my field, and something I look forward to learning more about • …an important tool for research in my field, but I only learn what I must learn about this • … boring What best describes your attitude towards statistics? How much do you already know? • Definition of mean value, median, standard deviation? • Bayes formula? • t-tests? • p-values? • Computing the probability of getting dealt a flush in a game of poker? Why a course in statistics? What is research? • A distinguishing feature of scientific research is that its conclusions are reproducible by other scientists • Thus, research must – contain information about exactly what has been done – somehow convince the reader that if she repeates what has been done, she will reach the same conclusions A goal of science: To study causality • Ultimately, much of science is concerned with establishing statements like ”If A happens, then B will follow” • In other words, one wants to show that B is reproduced every time A happens. Example: Studying causality through intervension • Retrospective studies can show covariation between variables, but not causality. • Intervension can be used to argue that changing a certain variable causes another variable to change. • To study effect of intervension, a control group is needed Example: Reproducibility through randomization Assume an experiment is done, with two groups, receiving different ”treatment”: • Differences in the result could be caused by differences in the treatments, or by differences between the groups from the start. • Randomising the division into groups makes it unlikely that the groups are systematically different from the start Example: blind, or double-blind studies • Differences between the two groups could be caused by people’s knowledge they are in one group or the other. • Differences could also be caused by the experimentalists (doctors) knowledge who is in which group. • Removing the first knowledge gives a blind study, removing the second gives a double-blind study. Quantitative and qualitative research • Quantitative: Focus on things that can be measured or counted • Qualitative: Focus on descriptions and examples. • Two different scientific tratidions. Health economics and administration has elements from both. • Both have advantages and disadvantages (which)? Quantitative research • For quantitative research, we have many good tools to ensure reproducibility of conclusions • Statistics is a very important such tool • Statistics used in this way can be called inferential statistics Example: Reproducibility through statistics • If you repeat a quantitative investigation (a questionnaire, an observation of a social phenomenon, a measurement) you are unlikely to get exactly the same numbers. • Statistics can help you to estimate how different results are likely to be. • This can tell you which conclusions are likely to be reproducible in a potential repetition of the investigation. Descriptive vs. inferential statistics • Descriptive statistics: To sum up, present, and visualize data. • Inferential statistics: A tool to handle, and to draw (”infer”) reproducible conclusions on the basis of, uncertain information. Descriptive statistics • Goal: To reduce amount of data, while extracting the ”most important information” • Can be done with single numbers (”summary statistics”), tables, or graphical figures. • My next lecture will look at descriptive statistics Can descriptive statistics be ”objective”? • A person makes choices about: – What to measure – How to measure (for example what questions to ask or what scale to use) – How to present the result • Thus: A presentation or publication should always contain information about exactly how results have been obtained Inferential statistics: Hypothesis test example • You throw a dice ten times, and get 1 seven out of these ten times. You conclude that this is not a fair dice. Is the conclusion reproducible? • You need to compute what observations are to be expected if the dice is a fair one. Example: probability calculations • The disease X has a 1% prevalence in the population. There is a test for X, and – If you are sick, the test is positive in 90% of cases. – If you are not sick, the test is positive in 10% of cases. • You have a positive test: What is the probability that you are sick? Example: desicions based on uncertain information • An oil company wants to produce the maximum amount from an oil field. • Available information: – Measurements (seismics) describing approximately the geometry of the rock layers – Information from a couple of test drills – Information from geologists • Where should they place the wells, and how should they produce? The concept of a MODEL • What separates inferential statistics from descriptive statistics is the use of a model. • A model is a (mathematical) description of the connections between the variables you are interested in. • It is a simplification of reality, and so never ”correct” or ”wrong”, but it can be more or less useful. Statistical (or stochastic) models • In statistical models, the variables are predicted with some variation or uncertainty: – The model for force moving a mass: F=ma, is exact. – The model for what the eyes of a fair dice will show contains probabilities • We can use the observed data to choose between possible models. • The word ”stochastic” is often used when we are focusing more on the model than on the data. Example • Assume a certain portion of the population carry a specific gene, you want to know how many • The model is simply the unknown proportion p • You select and measure a number of individuals, and use the information to select the right model, i.e., the right p Example • You want to know the height distribution among 30 year old Norwegian women. • You assume, using experience, that a good model is a normal distribution with some expectation and some variance • You use data from a number of women to select a model (i.e. an expectation and variance), or a range of likely such expectations and variances Sampling • Often, the model can be a simplifying description of the population we want to study. • We investigate the model by sampling from the population. • When each individual is selected independently and randomly from the population, we call it (simple) random sampling • Simple random sampling makes it easier to compute what we can conclude about the model from the data Using the results • Selecting some models over others means that you increase your understanding of each variable, and the relationships between variables • Once a model has been selected, it can be used to forecast or predict the future • Being able to predict the likely results of different desicions can be used to improve the desicion making The goals of this course • To enable you to understand, use, and criticise research results produced by others, and in particular to understand and view critically the statistical arguments • To enable you to produce your own valid research results, using statistical tools. Overview of statistics topics we will look at • • • • • • • • • Descriptive statistics Probability theory Sampling and estimation Regression Non-parametrics Analysis of variance Desicion theory Some more advanced topics Much information is and will be available at course web page

Lecture1

Related documents

Products

Support

Lecture1

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib