High school statistics revision http://www.robin-beaumont.co.uk/virtualclassroom/stats/precourse.html One – Defining Data Written by: Robin Beaumont e-mail: robin@organplayers.co.uk Date last updated: Tuesday, 11 September 2012 Version: 1 How thi s do cum ent s hould b e us ed: This document has been designed to be suitable for both web based and face-to-face teaching. The text has been made to be as interactive as possible with exercises, Multiple Choice Questions (MCQs) and web based exercises. If you are using this document as part of a web-based course you are urged to use the online discussion board to discuss the issues raised in this document and share your solutions with other students. W ho th is do cu me nt i s ai me d at : This document is aimed at those people who want to learn more about statistics in a practical way. It is the first in a series and therefore sets the scene in many respects. I hope you enjoy working through this document. Robin Beaumont Introduction to Statistics Defining Data Contents 1 Prerequisites ..................................................................................................................................... 2 2 Learning Outcomes .......................................................................................................................... 2 3 Introduction ....................................................................................................................................... 3 4 Data .................................................................................................................................................... 3 5 The Statisticians View of Data ......................................................................................................... 4 6 The Social Scientists View of Data ................................................................................................. 6 6.1 Concepts ...................................................................................................................................... 7 7 Hierarchy of datatypes ..................................................................................................................... 8 8 Ranking Data ..................................................................................................................................... 9 8.1 Magnitude and Ranking ............................................................................................................... 9 8.2 Rating Scales............................................................................................................................... 9 9 Multiple Choice Questions (MCQs) ............................................................................................... 10 10 Summary ...................................................................................................................................... 11 11 References ................................................................................................................................... 11 1 Prerequisites This document assumes that you have the following knowledge, skills and resources: Skills: You can use a simple calculator (either a pocket variety or the one that comes with Microsoft Windows) Knowledge: None specific Required Resources: None. 2 Learning Outcomes This document aims to provide you with the following skills and information. After you have completed it you should come back to these points, ticking off those you feel happy with. Learning outcome Tick box Be able to list the reasons why most people fall at the first hurdle of learning statistics. Be able to identify and describe the concepts Variable, Constant, Dataset and Frequency. Be able the compare and contrast the different data classification systems used by Statisticians and social scientists. Be able to provide examples of where various data types are encountered in psychology or medicine or pharmacy. Be able to discuss problems encountered with ordinal data. Be able to evaluate a dataset with regard to what types of variables it consists of. Be able to rank a dataset. Be able to discuss the suitability, process and effects that ranking has upon a dataset. Robin Beaumont robin@organplayers.co.uk Document1 page 2 Introduction to Statistics Defining Data 3 Introduction Today statistics are seen, and used in all walks of life; salespeople, doctors, window cleaners, politicians and research scientists all make use of them and therefore it would appear that most people need to understand something about 'Statistics'. However because each person uses statistics in a different way and to a different level of complexity, it is impossible to write a completely generic introduction to statistics. This course attempts to present the level and breadth of analysis required by most undergraduates in psychology, pharmacy and medicine. It also serves as an introduction to medical statistics for those medical graduates who did not gain much experience (or were asleep!) during their undergraduate years but wish to become more engaged in quantitative research now, hence there are some topic areas which might be considered to be at a post graduate level. This series of documents provide you with an insight into a small part of this discipline. In fact, by the time you have worked through them you will possibly feel as if you know less about statistics than before you started. Statistics is definitely one of those subjects where the more you know the less you feel you know. What you will have gained by the end will be a critical understanding of the basic concepts of the subject alongside a large number of associated practical skills. Unfortunately, for a variety of reasons, a large number of people fall foul at the first hurdle of learning statistics. Why is this? Firstly I believe that in an attempt by most authors to provide a easy 'user friendly' introduction to statistics the essential fundamental concepts underlying statistics are rushed through at break neck speed, never revisited or ever presented in a variety of ways. The results of this are that we, "the students" are left high and dry after the first session and find the next one complete gibberish!. Secondly, statistics is taught as a clear cut subject with clearly defined rules. This is not the case, and when the more questioning students become aware of these ambiguities and controversies they see them unfortunately as deficiencies in their own understanding of the topic rather than just the opposite. The above paragraph should act as both a warning and guide to everyone who wants to learn about statistics. Always: 'Go slowly and make absolutely certain you understand the previous steps before moving on' We will begin our journey by considering the raw building blocks of statistics 'Data' . 4 Data Data is everywhere. In fact nearly everything one can think of has a set of characteristics ('data') which enables us to recognise and remember it. Data can vary from the particular colour of a dandelions flower to the resting heart rate of a subject undergoing an exercise trial. While it is obvious that the colour and heart rate differ radically, one could not sensibly compare a colour with a heart rate, it is more difficult to define how exactly they do differ but in this document we will present a classification which does achieve this, so how does data differ, and why is it inappropriate to compare colour of dandelions to heart rates. Often a set of Data is collected to form a Dataset. A dataset consists of values for one or more characteristics over a number of objects (i.e. patients) A simple dataset may consist of peoples shoe size. If a particular characteristic can take more than one value it is known as a variable e.g. shoe size, height, IQ etc. A characteristic that can take only one value is known as a constant. However note that a characteristic may be a variable in one situation (e.g. species for a group of farm animals or age of patients) and a constant in another (e.g. species for a group of children, age for a group of children in their first year at primary school in the UK). We frequently say that we have collected a number of variables rather than a dataset. Robin Beaumont robin@organplayers.co.uk Document1 page 3 Introduction to Statistics Defining Data Exercise 1. In the following dataset which could be classified as a variable and which a constant? Age Gender 18 male 19 19 male male 20 male 24 male A persons star sign, shoe size and age can therefore all be classified as variable data for a group of people in which there might exist more than one different value for each variable. Instinctively we feel that each of these variables are different as we did with comparing the Heart rate to the Dandelions flower colour in the previous paragraph. What exactly is the difference between them? To help answer this question we will consider two different ways data has been classified, one from the viewpoint of the statistician and the other from the social scientists perspective. 5 The Statisticians View of Data Nominal qualitative Ordinal Data quantitative Discrete Continuous Statisticians classify data into two broad types qualitative and quantitative, each of these is further subdivided into two further types resulting in four basic types; Nominal, Ordinal, Discrete and Continuous. It is important to give you a warning concerning the use of the term qualitative: Qualitative data and Qualitative research methods are completely different things Eye colour is an example of qualitative data. This is because with qualitative data the 'values' are essentially words used to specify categories. In contrast, with quantitative data, the values are numerical attributes which the data possesses itself. Qualitative data are often given numerical codes, but any arithmetic done with the codes will be meaningless, as will become obvious when one returns to the words behind the codes. For example, type of dwelling ('semi-detached', 'detached', 'terrace', etc.) is qualitative (nominal). The 'values' might be coded '1', '2', '3', etc., but although we can add '1' and '2', adding 'detached' and 'semi-detached' has no meaning, and certainly does not result in 'terrace'! This scale only allows us to state which particular category a data value belongs in (e.g. red blue or green for eye colour) and count (enumerate) how many there may be in each category for a particular data set. Nominal data is therefore also often called categorical or enumerate data. The count for a particular category is often referred to as the frequency. The term 'ordinal' is used when it is possible to order the various categories to create a scale. For example, the responses to a question asking 'How often do you have problems getting to sleep?' might be labelled 'every night' , 'most nights', 'some times', 'rarely' and 'never'. The correspondence between these words, although clearly graded in order, is questionable in terms of the relative distance between each, for example is 'most nights' a similar distance away from 'some times' as 'rarely' is to it in the opposite direction. Much effort is made to make such ordinal data possess a scale that approximates equal distances such as the common set of responses ('strongly disagree', 'disagree', 'neutral', 'agree', 'strongly agree') to attitudinal questions such as, 'I find statistics boring?'. It should be noted that even when efforts are made to make the scale have similar intervals it does not make sense to perform mathematical operations on them such as 'disagree' + neutral = 'agree' etc. Robin Beaumont robin@organplayers.co.uk Document1 page 4 Introduction to Statistics Defining Data In medicine there are several systems for classifying the extent or stage of cancer. The two most common are the Stage I, II, III, IV system (in contrast the American Joint Committee on Cancer (AJCC) uses five stages from 0 to IV) and the TNM (Tumour, Node, metastases) system. These staging systems provide estimates of the stage of disease and chances of survival. For more information see both http://www.cancerstaging.org/cstage/index.html and http://www.oncologychannel.com/coloncancer/staging.shtml The only valid type of mathematical operation that can be carried out on ordinal data is to order it and count how many observations exist at each point in the scale. The process of ordering data is called Ranking which we will discuss latter. Both Nominal and Ordinal data is sometimes just referred to as qualitative data again: Qualitative data and Qualitative research methods are completely different things Exercise 2. Which of the following variables are Nominal and which are Ordinal? 1. List of patron saints (Broadcaster = St. Gabriel; Invalids= St. Stephan; Bee keepers= St. Ambrose etc.) 2. Hair style 3. Learning style (such as 'deep', and 'superficial') 4. Shoe size (1 to 14) 5. Species of Ant 6. The Seven Dwarves (Bashful, Doc, Dopey, Grumpy, Happy, Sleepy, Sneezy) 7. Eye response (No eye opening, Eye opening to pain, Eye opening to speech, Eyes open spontaneously) 8. Sleeve length of glove( Shoulder length, Above elbow, Elbow, Mid-forearm) 9. Ivy League Universities (Brown, Columbia, Cornell, Dartmouth, Harvard, Pennsylvania, Princeton, Yale) 10. Deadly sins (Pride, Greed, Lust, Envy, Gluttony, Anger, Sloth) 11. Star (Zodiac) sign 12. Pencil Hardness 13. Iceberg size( Growler, Bergy bit, Small, medium, Large, Very large) The other major subdivision of data used by statisticians is that dividing Quantitative data. Quantitative data, also called numerical data can be either discrete (for example, the number of children in a family) or continuous ( for example, the height of an experimental subject in cm). What is the difference? Discrete data can only have values that are separated by impossible values, e.g. you cannot have half a child. Another example is Shoe, or for that matter most readymade clothes, sizes. Continuous data can take any value within a range, e.g. While a particular height might be 217 cm it could easily be 217.34 or even 2.17.345635 depending upon the accuracy of the measuring mechanism. While often people are told that discrete data consists of whole numbers (integers) this is not always the case. Take for example, the case of the 'number of questions' answered correctly in a test on spelling' . This will be represented by 'whole' numbers, however you can represent the same data as a 'proportion of correct answers', for it is clear that 21 correct answers out of 30 is a discrete value that may nevertheless be represented as 0.7 when it is the proportion of correct answers that interests us. While statisticians spend a great deal of time differentiating between discrete and continuous data, for our purposes continuous data is often treated as discrete data and it does not really cause too many problems. In contrast to the above classification social scientists classify data in a slightly different way which will be discussed next. Robin Beaumont robin@organplayers.co.uk Document1 page 5 Introduction to Statistics Defining Data 6 The Social Scientists View of Data While social scientists use the Nominal and Ordinal classification they prefer to adopt another type of classification for the terms discrete and continuous. The third level of measurement they describe is one which possesses those characteristics described above for ordinal data but in addition has equal sized intervals e.g.: Fahrenheit, Celsius, bank balance. It is interval measurement data. The most complex type of scale involves all of the above characteristics and in addition possesses an absolute zero point e.g. height, weight, distance, Kelvin's. This, ratio measurement type, is the most complex of the measurement scale types. Do not worry too much if you find it difficult to differentiate between Interval and ratio data as it does not matter too much, in contrast the important thing is to be able to differentiate between Nominal, Ordinal, and (Interval/Ratio) data. Since Stevens 1951, suggested this classification several authors have criticised the degree of prominence it has achieved concerning which types of statistics are considered appropriate for each type (Gaito 1980). The chart below provides the above information in summary form. An easy way to remember the data types is the word 'Noir'. Measurement type (e.g.) Nominal Name Order Equal Intervals Absolute zero Tip: Memorise this chart and you can't go wrong X (star sign) Ordinal (fitness rating scale) X X Interval (Fahrenheit) X X X Ratio (weight) X X X X Exercise 3. Produce a list of about 20 characteristics about yourself and categorise them into the appropriate measurement scale type. Exercise 4. As the Dean of Students for your Medical School, you must prepare each student’s class ranking in clinical work. The ranking, which will be used for internship and other recommendations, comes from a combination of grades for clinical clerkships in five departments. Each grade is given an equal weight in the student’s “class standing.” The five clinical departments express their individuality by using different scales for giving grades. The scales are as follows: Internal Medicine: A,B,C,D,E, (with A = highest and E = lowest) Obstetrics-Gynecology: A+, A, A−, B+, B, B−, C+, C, C−, D, and E Pediatrics: Numerical examination grade from 100 (perfect) to 0 (terrible) Psychiatry: Superior, Satisfactory, Fail Surgery: High honors, Honors, Pass, Fail How would you combine these non-commensurate scaling systems to form a composite score for class standing? Taken from Feinstein, 2002 page. 65 Robin Beaumont robin@organplayers.co.uk Document1 page 6 Introduction to Statistics Defining Data 6.1 Concepts Researchers frequently wish to investigate some abstract concept (e.g. health, patience etc) which means they need to develop one or more measures which they may combine in an attempt to measure it. Two such examples are intelligence and physical exhaustion. A standard method of measuring intelligence is to use the Wechsler Adult Intelligence Scale (WAIS) intelligence scale to obtain an intelligence quotient (IQ). The WAIS consists of 10 measures, which are themselves grouped together to form sub-scales before being combined. From: http://en.wikipedia.org/wiki/Wechsler_Adult_Intelligence_Scale Constructs assessed by the MHP-H Adult Health History (individual ratings of general health and the presence of chronic illness) 1. Severity of typical illness 2. Presence of a chronic illness 3. Impairment due to chronic illness 4. Overall health (over adult life) 5. Recent health (over past six months) Health Habits (sums of 24 individual ratings of positive and negative health habits) 6. Positive habits 7. Negative habits Health Care Utilization (frequency of utilizing four different types of health care) 8. Medical office visits 9. Overnight hospital treatments 10. Emergency-room visits 11. Over-the-counter medication Response to Illness (response to a person’s “typical” illness on a total of 13 items covering the following areas) 12. Professional help 13. Self-help 14. Help from friends 15. Spiritual help Health Beliefs and Attitudes (23 items covering six beliefs/attitudes) 16. Self-efficacy 17. Health vigilance 18. Health values 19. Trust in health care personnel 20. Trust in health care system 21. Hypochondriasis Similarly the concept of physical exhaustion can be measured by considering both the Borg scale, and Vo2Max reflecting both the psychological sensation and the physiological manifestation of physical exhaustion. The process by which a researcher chooses/devises one or more scales (also called constructs) to measure some possibly abstract, concept is known as operationalisation and how well s/he does this often provides the litmus test for the research. There is always much debate concerning the validity of combining or alternatively keeping separate the various sub scales. As you can imagine there are numerous statistical techniques to help you make a decision. Obviously the measurement of ‘health’ is a major area of concern. Many scales have been developed, a typical examples are the Duke health profile (17 items!) and the Nottingham health profile. Specialist measures have also been developed such as the Multidimensional [psychological] Health Profile (Karoly, Ruehlman, & Lanyon, 2005). According to Lanyon, Maxwell & Karoly 2007 “This instrument was designed to alert health care personnel to potential [psychological] problems that should be addressed in more detail, and consists of two sections: Psychosocial Functioning (MHP-P), and Health-related [psychological] Functioning (MHP-H).” Exercise 5. Consider the concept of fatigue? Do you think many papers have been published discussing how you might measure it? Do you think there are different varieties? Within the healthcare setting which conditions do you think might benefit from such a measure? Do you think when measuring it you would need to develop one or more sub-scales, that is do you think it might have different aspects? Please don’t turn the page until you have carried out the above exercise. Robin Beaumont robin@organplayers.co.uk Document1 page 7 Introduction to Statistics Defining Data Pathological fatigue Quoting Kittiwatanapaisan 2003 Pathological fatigue, in contrast to normal fatigue, does not subside with rest and is characterized by a feeling of tiredness before activity, lack of energy to complete tasks, exhaustion after usual activity, or all of the above. Fatigue has been found to correlate with physical and psychological parameters in patients with multiple sclerosis, chronic fatigue syndrome, HIV infection, and AIDS (Breitbart, McDonald, Rosenfeld, Monkman, & Passik, 1998; Ford, Trigwell, & Johnson, 1998; O'Dell, Meighen, & Riggs, 1996; Packer, Foster, & Brouwer, 1997; Vercoulen et al., 1997; Walker, McGown, Jantos, & Anson, 1997). . . . . Fatigue has been measured with various instruments, mainly, visual analog scales and questionnaires with Likert-scale format, including the Checklist of Individual Strength-Fatigue (van der Werf et al., 1998; Vercoulen et al., 1996, 1997), the Fatigue Severity Scale (Packer, Sauriol, & Brouwer, 1994), the Chalder Fatigue Scale (Ford et al., 1998), the Fatigue Assessment Inventory (O'Dell et al., 1996), the Piper Fatigue Scale (Cupler, Otero, Hench, Luciano, & Dalakas, 1996; O'Dell et al., 1996), and the Multidimensional Assessment of Fatigue (Schwartz, Coulthard-Morris, & Zeng, 1996). However, these instruments are not specific for measuring fatigue in the MG [Myasthenia Gravis] patient population. Grohar-Murray, Sears, Hubsky, and Becker (1994) combined and modified two unpublished questionnaires, which were used to measure fatigue in multiple sclerosis patients, for use with MG [Myasthenia Gravis] patients. . . . . . Abstract references: Breitbart, W., McDonald, M.V., Rosenfeld, B., Monkman, N.D., & Passik, S. (1998). Fatigue in ambulatory AIDS patients. Journal of Pain and Symptom Management, 15, 159-167. Chalder, T., Berelowitz, G., Pawlikowska, T., Watts, L., Wessely, S., Wright, D., et al. (1993). Development of fatigue scale. Journal of Psychosomatic Research, 37, 147-153. Cupler, E.J., Otero, C., Hench, K., Luciano, C., & Dalakas, M.C. (1996). Acetylcholine receptor antibodies as a marker of treatable fatigue in HIV-1 infected individuals. Muscle & Nerve, 19, 1186-1188. Dzurec, L.C., Hoover, P.M., & Fields, J. (2002). Acknowledging unexplained fatigue of tired women. Journal of Nursing Scholarship, 31(1), 41-46. Ford, H., Trigwell, P., & Johnson, M. (1998). The nature of fatigue in multiple sclerosis. Journal of Psychosomatic Research, 45, 33-38 Grohar-Murray, M.E., Becker, A., Ricci, M., Polak, M., & Danehy, S. (1994). Fatigue characteristics, severity, and impact on the functional status in myasthenia gravis. Unpublished manuscript. O'Dell, M., Meighen, M., & Riggs, R.V. (1996). Correlates of fatigue in HIV infection prior to AIDS: A pilot study. Disability and Rehabilitation, 18, 249-254 Packer, T.L., Foster, D.M., & Brouwer, B. (1997). Fatigue and activity patterns of people with chronic fatigue syndrome. The Occupational Therapy Journal of Research, 17, 186-199. Packer, T.L., Sauriol, A., & Brouwer, B. (1994). Fatigue secondary to chronic illness: Postpolio syndrome, chronic fatigue syndrome, and multiple sclerosis. Archives of Physical Medicine & Rehabilitation, 75, 1122-1126. Schwartz, C.E., Coulthard-Morris, L., & Zeng, Q. (1996). Psychological correlates of fatigue in multiple sclerosis. Archives of Physical Medicine & Rehabilitation, 77, 165-170. van der Werf, S.P., Jongen, P.J.H., a Nijeholt, G.J.L., Barkhof, F., Hommes, O.R., & Bleijenberg, G. (1998). Fatigue in multiple sclerosis: Interrelations between fatigue complaints, cerebral MRI abnormalities and neurological disability. Journal of the Neurological Sciences, 160, 164-170. Vercoulen, J.H.M.M., Hommes, O.R., Swanik, C.MA., Jongen, P.J.H., Fennis, J.F.M., Galama, J.M.D., et al. (1996). The measurement of fatigue in patients with multiple sclerosis: A multidimensional comparison with patients with chronic fatigue syndrome and healthy subjects. Archives of Neurology, 53, 642-649. Vercoulen, J.H.MM., Bazelmans, E., Swanik, C.M.A., Fennis, J.F.M., Galama. JM.D., Jongen, P.J.H., et al. (1997). Physical activity in chronic fatigue syndrome: Assessment and its role in fatigue. Journal of Psychiatry, Research, 31, 661-673. Walker, K., McGown, A., Jantos, M., & Anson, J. (1997). Fatigue, depression, and quality of life in HIV-positive men. Journal of Psychological Nursing, 35(9), 32-40. A though literature review of various measures of fatigue as used in chronic exercise research can be found in Puetz, O'Connor, Dishman, 2006. 7 Hierarchy of datatypes It is importance to realise that the various datatypes (or measurement scales if you prefer the term) represent a hierarchy of complexity. The most complex data it that which has Interval/Ratio characteristics and because of this hierarchy of complexity if you convert Interval/Ratio data to Ordinal data you will loose some of the information within it. This is related to the process of Ranking which is discussed in the next section. Robin Beaumont robin@organplayers.co.uk Document1 page 8 Introduction to Statistics Defining Data 8 Ranking Data The process of ordering data and assigning a numerical value is called Ranking. Let's take an example by considering the following numbers: 5, 3, 8, 1, 10 Ranking them from smallest to largest and assigning a value to each ‘the rank’ would produce the following result: Original data Original 5 5 3 3 2 8 4 1 1 10 5 3 8 1 10 Ranking Rankings 3 2 4 1 5 What do we do if we have the situation of tied scores (ties) i.e. two, or more, with the same value? Example: Consider the following numbers 5, 3, 8, 3, 1, 3, 10 Score (ordered) Rank Placing them in order of magnitude: 10, 8, 5, 3, 3, 3, 1, We note that there are 10 1 three 3s. These are equivalent to the ranked scores or the 4th, 5th and 6th score. 8 2 We therefore allocate the average of these ranks (i.e. 4 + 5 + 6 / 3 = 5) to each of 5 3 them. 3 5 3 5 3 5 1 7 8.1 Magnitude and Ranking Now considering the following example instead of one set of data consider the two given below. Notice that increasing the magnitude of the lowest and highest scores has not affect on their rankings. Therefore by ranking our data we have lost the importance of magnitude in the original dataset. Originals Original data 1 5 3 8 1 10 Original data 2 5 Ranking 5 5 3 3 3 2 8 8 4 1 -10 1 10 25 5 Rankings 3 2 4 1 5 3 8 -10 25 8.2 Rating Scales Rating scales are just a way of obtaining ordinal data by asking subjects to rate from say 1 to 10 a particular response. For example it might be to ask Please mark on the line below how anxious you feel: them 'how exhausted they Not at all Extremely are' (known as the Borg scale), or how much pain they have etc. Psychologists and statisticians argue if the 0 10 data collected can be considered ordinal or does possess the additional characteristics to be interval or ratio. Rating scales are also sometimes represented graphically as shown above. Robin Beaumont robin@organplayers.co.uk Document1 page 9 Introduction to Statistics Defining Data 9 Multiple Choice Questions (MCQs) 1. I suggest two reasons why I feel people fall foul at the first hurdle of learning statistics. Which of the following are they? (two correct choices) a. 'user friendly' introductions under emphasising basic concepts b. 'user friendly' introductions incorrectly explaining basic concepts c. statistics presented as a poorly defined subjective discipline d. over emphasis on the use of computers e. statistics presented as a clear cut subject with clearly defined rules 2. Which of the following is an example of nominal data? (one correct choice) a. Number of people on a course b. Cancer staging scale c. List of different species of bird visiting a garden over the past week d. Popularity rating of UK top ten television programmes e. Heart rate 3. Which of the following are examples of Interval/Ratio data? (two correct choices) a. Number of people on a course b. Cancer staging scale c. List of different species of bird visiting a garden over the past week d. Popularity rating of UK top ten television programmes e. Heart rate 4. Which of the following are examples of Ordinal data? (two correct choices) a. Number of people on a course b. Cancer staging scale c. List of different species of bird visiting a garden over the past week d. Popularity rating of UK top ten television programmes e. Heart rate 5. Which of the following is the correct listing of data from the simplest to the most complex? (one correct choice) a. Nominal -> Ordinal -> Interval -> Transcendental b. Nominal -> Ordinal -> Interval -> Ratio c. Qualitative -> Ordinal -> Interval -> Discrete d. Qualitative -> Ordinal -> Interval -> Ratio e. Nominal -> Ordinal -> Interval -> Quantitative 6. Which of the following is an incorrect statement about Ranking a dataset? (one correct choice) a. You can rank any dataset as long it is not Nominal b. Each value in a dataset should only occur once c. The process of ranking a dataset involves ordering it and then assigning a 'rank' value to each score from 1 to the number of scores in the dataset. d. When ranking a dataset tied scores receive the average of the rank value given to the ties. e. The result of ranking a dataset means that you lose the effect of magnitude if the data were Interval/Ratio Robin Beaumont robin@organplayers.co.uk Document1 page 10 Introduction to Statistics Defining Data 10 Summary In this chapter we have looked at data and how it is classified from two perspectives, that of the statistician and also the Social scientist. We started by discussing the simplest type of data, Nominal data which can only be categorised and counted. Next came data where it was possible to order the various categories to produce a scale, although the relative distance between the points was not specified. A large number of examples of this type of Ordinal data were given from attitudinal question scoring to Cancer staging scores. The more complex data, termed by statisticians 'Quantitative' data possessed the additional characteristic of magnitude, and the social scientists termed such data Interval/Ratio. Such data was what most people would traditionally class as real data possessing a scale with equal intervals. The process of ranking data was described and the effect that the process had upon a original dataset of Interval/Ratio data was discussed. The effects of this process highlighted the importance of bearing in mind the hierarchy of data types (measurement scales) and how possibly valuable information about data may be lost by converting data from a higher level scale to that of a lower level one. The chapter ended with a set of Multiple Choice Questions (MCQs) to help you review what you have learnt. As a final revision exercise you should return to the learning outcomes at the beginning of the before moving on to the next chapter. 11 References Feinstein RA 2002 Principles of Medical Statistics Chapman Hall [Author died at the age of 75 in 2001] Gaito J 1980 Measurement Scales and Statistics: Resurgence of an Old Misconception. Psychological Bulletin. Vol 87(7), 564-567 Gigerenzer G Swinjtink Z Porter T Daston L Beatty J Kruger L 1989 The empire of chance. Cambridge University Press Gonick L Smith W 1993 The cartoon guide to Statistics. Harper Resource Grant S, Aitchison T, Henderson, E Christie J, Zare S, McMurray J, Dargie H 1999 A comparison of the reproducibility and the sensitivity to change of visual analogue scales, borg scales, and likert scales in normal subjects during submaximal exercise. doi:10.1378/chest.116.5.1208 Hawkins A Jolliffe F Glickman L 1992 Teaching statistical concepts. Longman Howell D 1992 Statistical methods for psychologists Duxbury (chapman & hall in UK) Karoly, P., Ruehlman, L. S., & Lanyon, R. I. 2005 The assessment of adult health care orientations: Development and preliminary validation of the Multidimensional Health Profile-Health Functioning (MHP-H) in a national sample. Journal of Clinical Psychology in Medical Settings, 12, 79–91. Kittiwatanapaisan W, Gauthier D K, Williams A M, Shin J O. 2003 Fatigue in Myasthenia Gravis patients. Journal of Neuroscience Nursing. April, 2003 Retrieved from: http://www.entrepreneur.com/tradejournals/article/102271373_4.html on 15/02/2010 17:04 Lanyon R · Barbara M. Maxwell B M, Karoly P, Ruehlman S L. 2007 Concurrent Validity of the Multidimensional Health Profile—Health Functioning Scales (MHP-H) in the Pre-operative Assessment of Applicants for Gastric Bypass Surgery. Journal of Clinical Psychology in Medical Settings, 14:41–49. Mood A M Graybill F A Boes C D 1974 Introduction to the theory of statistics (3rd ed.) McGraw Hill Puetz, T W; O'Connor, P J; Dishman, R K. 2006 Effects of chronic exercise on feelings of energy and fatigue: A quantitative synthesis. Psychological Bulletin. Vol 132(6), 66-876 Stevens S S 1951 Mathematics, measurement, and psycho physics. In S S Stevens (Ed.) Handbook of experimental psychology (pp. 1 - 49) New York Wiley Robin Beaumont robin@organplayers.co.uk Document1 page 11