Activity Report #4 For this activity, read through this document with a partner or small group, talk about the ideas, and answer the questions anywhere there are red ellipses (...). For this report, it is OK if you have nearly identical answers as your partners or groups. Just make sure that you turn in the report individually and list the full names of who you worked with! This activity will reference ideas from class on inference, sampling, and measurement. It will also especially reference ideas from the D’Ignazio & Klein reading! In order to submit this report, do the following: ● Click File -> Make a Copy (this lets you edit this document) ● Answer all the questions ● Once you are done, click File -> Download -> PDF Document and save the document to your computer ● Go to the assignment on Canvas and upload the PDF to submit! What is your name?: Danny MArre-Surges Who did you work on this module report with?: IDK her name lol something with an N I think. What Gets Counted Counts Discussion Activity As students (and faculty!) members of the ASU community, we will all learn new ideas, scientific techniques, and practical skills, but importantly, as ASU community members we are relatively unique in our goals compared to people working and learning at other universities! The ASU mission statement is: ASU is a comprehensive public research university, measured not by whom we exclude, but rather by whom we include and how they succeed; advancing research and discovery of public value; and assuming fundamental responsibility for the economic, social, cultural and overall health of the communities it serves. Part of living up to this mission statement is learning how the ideas and tools that we learn and practice at ASU have the potential to help or hurt the communities we live in! Just like any other conceptual or practical tools you might learn how to use in college, statistics and the data collection and analysis that it accompanies are not just abstract ideas, but things that can be used to maintain or change our community. In statistical analysis, this idea goes all the way to the first steps of the statistical process: data collection. Whenever we use data, that data was created by someone (or some group) for some purpose, and specific choices were made to decide how data would be collected, who data would be collected from, and what variables should be included in data collection and future analysis. These choices are rarely completely obvious, and there is rarely a perfect “right” or “wrong” way to accomplish these steps. However, it is important to consider the choices and alternatives that can be made in creating and using data, and the potential consequences! D’Ignazio & Klein, in their book chapter “What gets counted counts”, outline a number of ways that the authors see the collection and use of data as supporting, illuminating, or dismantling systems that they view as unjust. The exact issues they are concerned with are important – trying to make a world that is more just for people of all genders and races. Additionally, the examples D’Ignazio & Klein outline also illustrate ideas that can be kept in mind for all data collection and analysis – regardless of whether or not you share the same concerns or beliefs as the authors. Perhaps the single most important idea from the chapter is the following: “Data collection and analysis does not happen in a vacuum.” This means that data collection and analysis has the potential to impact people, and is impacted by the context in which researchers live and how researchers think. Choices about data collection and analysis will be made partly on the basis of how well researchers think the data will represent what they care about, but these choices are also ethical, political, and social choices, because data is used in our ethical, political, and social world! In the following part of the activity, work with your partners to identify examples in the D’Ignazio & Klein reading that demonstrate the major points. It’s a good idea to have the reading open! You can get the reading from Canvas or from here. Harm and Benefits of Using Data In a world where data and statistics are used to answer questions, form scientific ideas, find potential treatments, and form public policy, there are potentially serious consequences to creating and using data that does or does not adequately represent populations that researchers should study. In D’Ignazio & Klein, examples of data collection and use are mostly taken from government and corporate data projects. However, the same ideas apply in academic settings too! Some of the examples in the reading seem very different from what a laboratory psychologist might do. However, in all cases, some group of individuals is sampled, each of those individuals is scored or measured to create data, that data is then analyzed, and used to make a decision. A scientist might measure how well a clinical treatment helps a group of people and uses that data to make recommendations for whether or not we should consider the clinical treatment. A TSA agent might visually “measure” whether someone is male or female and combine that with measurements of that person’s body size and use that data to make a recommendation for whether or not that individual is a security risk at an airport. In both cases, data is collected, some kind of analysis happens, and a recommendation or action occurs! Being an empirical scientist (or really anyone who uses data) requires deciding how to collect data and what and how to analyze data. To do this, researchers learn tools for collecting data and choosing variables that can be used for statistical analyses but researchers also must make decisions based on their ethical understanding. Research can benefit or harm individuals and groups. What examples in the reading are meant to demonstrate these possibilities? Note that “being well represented in the data” is not a direct benefit or risk on its own! Spend no more than 10 minutes discussing and answering these if with a group! If you and your partner did not complete the reading, consider temporarily skipping this section and returning to it later! An example of potential benefits to individuals or groups as a result of having their experiences turned into data (injustices illuminated, individuals helped out, etc.): ● Having groups that have experienced discrimination, systematically, or not, etc. is good because it can shine light on the current policies that specifically target certain groups of people. An example of potential risks of harm to individuals or groups as a result of having their experiences turned into data (physical, financial, or emotional harm; lack of resources): ● There's a chance that their personal information being allocated for data could be harmful to them and marginalize them, or discredit their experience because they may be an “outlier”. An example of potential risks of harm to individuals or groups as a result of not having their experiences turned into data; in other words, an example of a risk of harm that comes from being invisible in data: ● This kinda goes back to what I previously mentioned. Not having their experiences quantified runs the risk of not getting the proper attention/representation that is needed to make the change that is needed. Data, Representation, and Invisibility Statistical data is used to characterize groups of people or things. Typically a researcher wants to characterize some population by choosing some sample of people to measure using some set of variables. The researcher has to choose both the sample (who they measure) and the variables (how and what is measured). These choices determine whether or not the data that is collected represents what the data should represent! These choices can lead some individuals to become “invisible” in data. If those people, or people like them, are not included in a sample, then the researcher will have a biased view of the population as a whole. Similarly, the variables that researchers record (the measurements that they use, the questions that they ask) have the potential to make people “invisible” if the question does not adequately represent the people in the sample. In D’Ignazio & Klein, what is one principle example in which a common data collection practice makes some people or groups of people invisible? Who are people who might be made invisible by this practice? (Answer in 2-3 sentences) ● The use of binary gender categories make nonbinary people invisible. But by incorporating non-binary as a category it is able to represent nonbinary people making them feel less invisible. The following are sample questions used to generate data for the US Federal Census. These questions are very useful for many purposes and can be used to represent a lot of different people in different ways! However, in each case, some people will be made “invisible” in the data. This means that those individuals would not be accurately represented – maybe there is no answer that adequately represents the person or the choices for answering are things that the person would not know how to answer. In some cases, the person might not be able to answer a question out of fear for possible repurcussions. Some of these might affect lots of people, and some fewer. With a partner, look through these questions, and identify a person or group who might be made “invisible” in the US Census based on that question. (Spend no more than 20 minutes answering these with your group) What is an example of a person or group who would not be well represented in data collected using this question? ● A group of people not properly identified by this question are non-binary individuals, or individuals who identify outside of these questions. I feel like however, sex has to do with your genitals, and gender has become something you decide. Can you think of an extra question, answer, or change to the question that would make that person or group “visible” in the data? ● What was your sex assigned at birth? What is an example of a person or group who would not be well represented in data collected using this question? ● Anyone who is homeless or living on a reservation. Can you think of an extra question, answer, or change to the question that would make that person or group “visible” in the data? ● Do you identify as homeless, or do you identify as a part of a reservation? What is an example of a person or group who would not be well represented in data collected using this question? ● Someone who was adopted - not sure of their heritage. Can you think of an extra question, answer, or change to the question that would make that person or group “visible” in the data? ● If you were to add an unsure or not enough information option, I think that would give more flexibility. What is an example of a person or group who would not be well represented in data collected using this question? ● I feel like babies who are less than wouldn’t be represented well in this data. Other than that, someone who doesn’t know their birthday, and or someone who was born on leap day Can you think of an extra question, answer, or change to the question that would make that person or group “visible” in the data? ● Were you born on leap day? ● Or add an instruction for doing like a decimal. “If a baby, designate with a .1 - .11 for the months Considering Data Choices In this section, you will read scenarios about the collection, interpretation, and use of data. In these scenarios, there will be a choice that can be made about how the data is collected and used – the choice will either be about how a variable is measured or about who will be measured when creating data. Your job is to consider the possible benefits of choosing either of two options. (try not to spend more than 10 minutes each discussing these with your group!) Scenario 1 – Choices about variables The US Federal Census is a major data collection effort that is made every ten years in the United States. The stated goal by the US Census Bureau is “... counting every person in the 2020 Census once, only once, and in the right place”. Data collected in the US Federal Census is used for many different purposes. The main purpose it is used for is in determining how many representatives each state gets to send to the US House of Representatives. However, census data is also used for determining what communities should be targeted for government programs and which communities may need funding for projects. Most “typically”, data is collected by having a single member of a household respond to questions designed to figure out how many people live in a residence. The idea is that the Census bureau could then count up all the people in all the residences in a community to figure out how many people live there. However, things can get tricky when people don’t live in a single-family house or apartment. For instance, people staying long-term in hospitals (including newborn babies) are counted as residing in the hospital, not their “normal home”. When it comes to prisoners, the choice of how to decide the value of the “residence location” variable in census data has potential major consequences! In the US, there are between one and two million people incarcerated in prisons and jails (one current estimate is 1.9 million). That’s a lot of people to count! When those people are counted, should they “count” as a resident of the prison community (the place that they physically sleep and spend their days, though they may never have seen the community outside the prison walls) or as a resident of their home community (where they likely still have family and community interests, even if they aren’t currently physically present)? Currently, in the US, prisoners are counted as residing in the community where their prison is located, regardless of the prisoner’s home community. However, some organizations have advocated for changing this, and recording prisoners as residing in their home communities, even while in prison. Consider the following: Census data is used for many things, including allocating resources to communities and legislative representation. Many large prisons are located in small, rural communities. Many prisoners stay in prisons that are in counties or even states far from their home. In 2-3 sentences, what are some potential benefits to counting prisoners as residents of the community that the prison is located in? ● This gives the proper resources to the right area. If there is an underreported number of individuals living in a certain area, utilizing the resources, but not being acknowledged as there, the amount of resources the government may allocate to that area may decrease and be problematic. And if you have a parent in jail, but report living in a home with two parents it could impact scholarships In 2-3 sentences, what are some potential benefits to switching to counting prisoners as residents of the community that they lived in before going to prison? ● Switching to the locations that they should actually be I think will switch it. These areas probably need the resources that should or hopefully would be provided by the individual who is supposed to be living in the home. It shows what communities are kind of being targeted/what areas are producing more felons. Additionally it may reveal migration patterns, and again give us more accurate data of what areas are supposed to be more saturated. Scenario 2 – Choices about Observations Depression is identified by the National Institute of Mental Health as one of the most common mental disorders in the United States population. Many treatments have been suggested and are currently practiced for treating depression, including the use of pharmacological treatments (ie.: drugs). However, the way that treatments for depression work and what data is collected might surprise you (see this BBC article for more details if you want to read up on this later!): To test treatments, people with depression may be asked to participate in an experiment. In the experiment, half of participants will be given a new “experimental” treatment and half will be given a “control” treatment, often a standard treatment that is already popular. The efficacy of a new drug is calculated by comparing reductions in symptoms between the “control” group of participants who had the old treatment and the “experimental” group of participants who had the new treatment. If the people in the experimental group reliably improved more than the control group, the new treatment is probably a good approach! Depression can range from mild to severe, and the extent to which depression may lead to poor well-being for individuals is variable. However, when testing treatments for depression, data is usually not collected from the full range of people with depression. Oftentimes, participants with severe depression – and especially those who are deemed at risk of suicide – are not included in research studies. In other words, when depression treatments are backed up by scientific data, that scientific data often only represents what happens in people with mild or moderate cases of depression. In 2-3 sentences, what do you think is a potential benefit or good reason for not including participants with severe depression in studies of new, experimental treatments? ● Severe depression is unique because it is severe. Many diagnoses aren’t for individuals who are severe. So when conducting experimental treatments, if you take the outliers to treat the mean you're going to have results that don’t actually treat the highest population in need, and you are going to under-treat the other population that is also still in need, just of different interventions. In 2-3 sentences, what is a potential downside of not including participants with severe depression, what could potentially be gained by including those participants? ● Including these individuals will give you a good range. Everyone experiences depression differently, so having a wide range of people it will give you the opportunity to see how effective your experimental treatment is on different levels of depression, and how dosage may be able to play a factor. Puzzling Statistical Results Different choices can have different benefits when it comes to creating and collecting data. The section above had you think about the choices “before” the data collection and analysis. Now, you’ll look at the results (at least the statistical results) of some data choices and limitations related to collecting and categorizing data and think back to how these results might have happened. It is important to remember that the same statistical results can sometimes arise from many data collection choices and many patterns in the real world. As a result, you want to critically think about different possibilities that might have produced some result when you see them published or shared! Your job is to think of ways that data collection choices might have resulted in the reports – and those possibilities might make evidence a little bit less clear than it first seems. Scenario 1 – Sampling Musicians Here's an example that originally made the rounds of the Internet a little while ago (2015) but could be easily replicated today. The following graph was based off a published research study and shared on social and traditional media without much context: This graph was typically interpreted as supporting the claim that the lifestyles of rap, hip-hop, and metal musicians are more violent and dangerous than those of other types of musicians-- these artists live hard and die young. This *might* be true, but there are other explanations for the pattern we see here. Try to think of a possibility where these results would be obtained even if there were no differences in the lifestyle or mortality risks of these different types of musicians. The graph plots the average age of death for different groups. Given that measurement, think about approaches to sampling – who might be able to be included in the sample for the research study here? In a sentence or two, write out an alternative explanation for the differences in averages seen here that is tied to the possible samples that might have been included in the data? ● I think the way we need to look at this and a way to look at it if everyone has the same lifestyle is who is entering or performing in these fields. You are going to have older people doing blues or gospel, but often you will have a younger demographic doing rap or mettle. Most usually at least. If you're stumped, feel free to check out this case study looking at this graph! This case study is included as part of Carl Bergstrom and Jevin West’s excellent course on “Calling Bullshit” that was turned into an excellent book on understanding quantitative claims by the same name. Scenario 2 – Categorizing Students School districts are regularly required to report many different statistics that are intended to characterize the degree to which students at the school are attaining different academic outcomes. These statistics can be used to allocate resources to different schools. ACT Aspire is one standardized test that is sometimes used to understand if students at schools are on a path to being “college ready”. It is taken at a variety of grade levels. The test has different subsections, but there is also a “composite score” that ranges from 400 (lowest performing) to 460 (highest performing). In different states and districts, these scores might be reported differently in different contexts. Now, let’s think about two example schools. We’ll call them Washington High School (WHS) and Lincoln High School (LHS). Here’s some facts about them: Both schools are in a district where the ACT Aspire is given to students every year. Additionally, both schools have a “gifted student” program where high-achieving students take separate “honors” classes from the rest of the student body. Among other things, the schools both invite students to join the gifted program based on their 8th grade middle school scores on standardized tests like ACT Aspire. Also, because the mainstream student body and gifted program students take different types of course loads, these schools report their standardized test scores separately for those groups. WHS reports the following: ● Our mainstream 9th grade students score on average of 417 on the ACT-A, while our honors students on average score 450! LHS reports the following: ● Our mainstream 9th grade students score on average of 413 on the ACT-A, while our honors students on average score 441. It looks like WHS is doing a better job than LHS at serving both their gifted and mainstream students. Based on these stats, this is possible! It might also be the case that students at WHS have more family and community resources to help them succeed. This is possible based on these stats! Another possibility is that WHS teachers help their students cheat on the standardized tests. This is also possible based on these stats! However, given the scenario described above, there is an additional possibility that can be perplexing. It is possible that the students at these two schools get scores that are just as good as each other. As a thought experiment, let’s pretend that both schools have only 10 students total. Their ACT Aspire scores in 8th grade and the 9th grade are included for all 10 students below (including both mainstream and gifted students). Washington High School Lincoln High School Student ACT-A 8 Score ACT-A 9 Score Student ACT-A 8 Score ACT-A 9 Score HAQ 400 405 ON 400 405 JOS 405 410 MRT 405 410 KOF 408 410 SCC 408 410 VT 415 420 VGK 415 420 MSL 418 420 ZR 418 420 BAM 422 425 SCK 422 425 RBB 425 430 FTT 425 430 BC 435 445 PLQ 435 445 LS 440 450 VBD 440 450 PSA 440 455 ZAQ 440 455 Means 420.8 427 Means 420.8 427 So, the students at WHS and LHS have the exact same test scores! But both the gifted program and the mainstream students at WHS are doing better on average then their counterparts at LHS. In 1-2 sentences, what is one way that the differences in those averages could be obtained even when the individual students have the same scores. You can talk in terms of real numbers or in general terms. ● I think one way to do it is by measuring the culture between the two schools/teachers. Test scores are a one time thing, but learning is a cumulation of knowledge, and if the students aren’t engaged they are less likely to consistently do as good as the other students. If you’re stumped on this question, feel free to consult either this wiki article or this 3 ½ minute video that discusses parallel issues in clinical science. That’s all for today! For this activity, all you need to “report” is this document! Once you’ve filled this out, save it as a .pdf file and upload it to Canvas!