Making Social Work Count Lecture 4 An ESRC Curriculum Innovation and Researcher Development Initiative What is being studied? Approaches to measuring variables Assessment and judgment • Social workers have to assess all the time: – Is there a problem or need here? – What is the risk of things getting worse? – Have I made a difference? • Researchers carry out similar tasks • This lecture considers the key issue of developing meaningful measurements for use in quantitative research • Many of the issues are of relevance to the more general task of “assessment” Quantitative and qualitative • All research involves simplification – The question is whether we know what is gained and lost by simplification • Qualitative studies tend to focus on meaning – Common strategy is identifying themes of relevance • Quantitative studies convert issues to numbers – Allows certain types of important description (e.g. how many people have this problem?) – And – crucially - comparison (e.g. are things getting better? Does one group have more problems?) Quantitative and qualitative Quantitative research • This session focuses on quantitative research • It identifies key considerations in thinking about the quality of quantitative study – Reliability – Validity Qualitative research • Some of these considerations can also be applied to qualitative research • However, qualitative studies also have their own criteria for assessing good research Learning outcomes Understand what a variable is Appreciate different types of variable that can be used in quantitative research Understand issues in relation to reliability and validity Know what a standardised instrument is Have had the opportunity to reflect on implications for practice Example of children in care • Returning to idea that care “fails” children • Lecture 3 suggested that comparing children who have left care with the general population is not a valid comparison sample • Now let’s look at outcome measures Forrester et al. (2009) review • The literature review focused on studies that looked at child welfare over time for children in care • Strongest finding: very poor research base – this is a difficult area to research • Of 13 studies, almost all suggested: – Most of the harm occurs before care – Children tend to do better once in care – Some harm occurs as children leave care – Even in good placements children still tend to have problems But… What “outcomes” were being measured? What outcomes do YOU think should be measured for children in care? Key points • Deciding on “outcomes” • Key issues to consider: or variables for a study – WHO is deciding what is to be measured? (e.g. is NOT some valueexperts? Government? neutral, technocratic Service users?) activity – WHAT is being measured? – HOW is it being measured? [focus of this lecture] Key points What is measured? • For instance, in studies reviewed by Forrester: – the most common issue “measured” was behaviour (and particularly problem behaviour) – education was the second most common – others included physical growth, social relations, etc How is it measured? • Studies in the review: – obtained information from social work files and made a researcher “judgment” – used school tests – pooled interview and other data and made a researcher “judgment” – used questionnaires to carers • What are the strengths and weaknesses of each? Attributes and variables • An attribute – is a characteristic of an individual e.g. height, intelligence, beauty, serenity • A variable – is the operationalisation of an attribute e.g. metres, IQ score, marks out of 10?, err… It allows attributes to be compared and described • The focus of lecture is on: how attributes are operationalised? Variables need to be reliable and valid Reliability • Are the results consistent, e.g. can the results be replicated in different conditions and across different groups? Validity • Does the instrument measure what it claims to measure? Measures should be both reliable and valid Reliable Not valid Low reliability Cannot be valid if not reliable… Not reliable AND not valid Standardised Instruments (SIs) • Tools that measure a specific quality or characteristic e.g. psychological distress • They let us compare results across groups in different settings e.g. social workers, families, teachers, police..... • SIs need to be high in both reliability and validity Reliability – overview • The consistency of a measure • A test is considered reliable if we get the same result repeatedly • Reliability can be estimated in a number of different ways – Test-retest reliability: over time – Inter-rater reliability: between different scorers – Internal Consistency Reliability: across items on the same test Test-Retest Reliability • Tests the extent to which the test is repeatable and stable over time • The same social workers are given the same questions 2 to 3 weeks later • If the results differ substantially, and there has been no intervention, then we should question the reliability of those questions Inter-rater reliability • Where two or more people rate/score/judge the test • The scores of the judges are compared to find the degree of correlation/consistency between their judgements • If there is a high degree of correlation between the different judgements, the test can be said to be reliable Internal Consistency Reliability • For example where there are two questions within a SI that seem to be asking the same thing • If the test is internally valid the respondent should give the same answer to both questions • More generally questions should be linked to one another if they measuring the same attribute Validity • The extent to which a test measures what it claims to measure: – – – Construct validity: The degree to which the test measures the construct of what it wants to measure – the overarching type of validity Predictive validity: The degree of effectiveness with which the performance on a test predicts performance in a real-life situation Content validity: that items on the test represent the entire range of possible items the test should cover Construct validity • The degree to which the test measures what it is intended to measure • The over-arching concept in validity – all other types of validity are ways of assessing this • As a result construct validity has many elements: – Predictive validity (can it predict things e.g. IQ scores and later test results) – Criterion validity (does it correctly differentiate e.g. does a screening instrument identify people who are depressed) – Construct validity (is the full range of the construct included) – And other types… Predictive validity • Can structured risk assessment tools predict children who will be abused? • Are the predictions more accurate than practitioners’ decisions? Predictive validity • Barlow et al (2013) found that most attempts to predict had low success i.e. high numbers of false positives or false negatives • Further research needed to develop reliable tools that predict abuse or re-abuse • Though this is also true for practitioners… Content validity • Refers to the extent to which a measure represents elements of a social construct or trait • For example, a depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioural dimension • Or : how should “ethnicity” be defined? In practice it is not possible to capture the full range of possible ethnicities – but what level of simplification is “valid”? General Health Questionnaire (GHQ) • A reliable and valid screening instrument identifying aspects of current mental health (anxiety/depression/social phobia) • The self administered questionnaire asks if someone has experienced a particular symptom or behaviour recently • Each item is rated on a fourpoint scale • Used in many countries in different languages GHQ 12 questions Questions include: Have you recently ...... 1. Been able to concentrate on whatever you are doing 2. Lost much sleep over worry 3. Felt that you are playing a useful part in things 4. Felt capable of making decisions about things 5. Felt constantly under strain 6. Felt you couldn’t overcome your difficulties 7. Been able to enjoy your normal day to day activities 8. Been able to face up to your problems 9. Been feeling unhappy and depressed 10. Been losing confidence in yourself 11. Been thinking of yourself as a worthless person 12. Been feeling reasonably happy, all things considered GHQ 12 • Different ways of measuring risk of psychiatric problems using data • All show reasonable link with clinical diagnosis • Common way is ‘yes’ or ‘no’ (depending on question) in 4 or more questions • How do social workers do…? Clinical scores for social workers and general population using GHQ 50 43 45 40 35 33 30 25 18 20 15 10 5 0 NQSW One year later General population Carpenter et al, 2010; ONS, 2010 How to measure children’s emotional and behavioural welfare? • SDQ: Questionnaire designed for carers, children and teachers • Reliability is tested by: • Validity is tested by: comparing emotional • seeing whether scores and behavioural welfare predict children receiving – and over time specialist help, criminal behaviour, excluded from school and “real world” outcomes • also comparing with clinical assessment and other instruments Strengths and Difficulties Questionnaire (SDQ) • A brief behavioural screening questionnaire for parents/carers/ teachers with 3-16 year olds • Asks about psychological attributes, some positive and others negative – E.g. emotional, conduct, hyperactivity, peer relationship, prosocial behaviour SDQ questions • 25 questions composed of five scales with five questions in each scale • E.g. 5 questions in the Emotional Symptoms Scale 1. I get a lot of headaches 2. I worry a lot 3. I am often unhappy 4. I am nervous in 5. I have many fears Responses: Not true/Somewhat true/Certainly true Why does this matter? • Worth considering common social work research methods such as coming to a “researcher judgment” – how reliable? How valid? • More importantly – what about your practice? • What is a better way of judging whether a child has emotional or behavioural problems, or an adult is at risk of psychological problems – your judgment or a standardized instrument? • If you want to evaluate whether you are making a difference – what role might a standardized instrument have? Learning outcomes Do you? • Understand what a variable is • Appreciate different types of variable that can be used in quantitative research • Understand issues in relation to: – Reliability – Validity • Know what a standardised instrument is • Have had the opportunity to reflect on implications for practice References • Goldberg, D. & Williams, P. (1988) A user’s guide to the General Health Questionnaire. Slough: NFER-Nelson • Goodman R (1997) The Strengths and Difficulties Questionnaire: A Research Note. Journal of Child Psychology and Psychiatry, 38, 581-586 • http://www.sdqinfo.com/d0.html • Barlow, J., Fisher, J.D. and Jones, D. (2013) Systematic Review of Models for Analysing Significant Harm, Department for Education Report; London Accessed: https://www.gov.uk/government/uploads/system/uploads/attachment_d ata/file/183949/DFE-RR199.pdf