Uploaded by Daniel Marre-Surges

Danny's PSY230 Activity 4 What Gets Counted Counts

advertisement
Activity Report #4
For this activity, read through this document with a partner or small group, talk about the ideas,
and answer the questions anywhere there are red ellipses (...). For this report, it is OK if you
have nearly identical answers as your partners or groups. Just make sure that you turn in the
report individually and list the full names of who you worked with!
This activity will reference ideas from class on inference, sampling, and measurement. It will
also especially reference ideas from the D’Ignazio & Klein reading!
In order to submit this report, do the following:
● Click File -> Make a Copy (this lets you edit this document)
● Answer all the questions
● Once you are done, click File -> Download -> PDF Document and save the document to
your computer
● Go to the assignment on Canvas and upload the PDF to submit!
What is your name?: Danny MArre-Surges
Who did you work on this module report with?: IDK her name lol something with an N I think.
What Gets Counted Counts Discussion Activity
As students (and faculty!) members of the ASU community, we will all learn new ideas,
scientific techniques, and practical skills, but importantly, as ASU community members
we are relatively unique in our goals compared to people working and learning at other
universities! The ASU mission statement is:
ASU is a comprehensive public research university, measured not by whom we exclude,
but rather by whom we include and how they succeed; advancing research and
discovery of public value; and assuming fundamental responsibility for the economic,
social, cultural and overall health of the communities it serves.
Part of living up to this mission statement is learning how the ideas and tools that we
learn and practice at ASU have the potential to help or hurt the communities we live in!
Just like any other conceptual or practical tools you might learn how to use in college,
statistics and the data collection and analysis that it accompanies are not just abstract
ideas, but things that can be used to maintain or change our community.
In statistical analysis, this idea goes all the way to the first steps of the statistical
process: data collection. Whenever we use data, that data was created by someone (or
some group) for some purpose, and specific choices were made to decide how data
would be collected, who data would be collected from, and what variables should be
included in data collection and future analysis. These choices are rarely completely
obvious, and there is rarely a perfect “right” or “wrong” way to accomplish these steps.
However, it is important to consider the choices and alternatives that can be made in
creating and using data, and the potential consequences!
D’Ignazio & Klein, in their book chapter “What gets counted counts”, outline a number of
ways that the authors see the collection and use of data as supporting, illuminating, or
dismantling systems that they view as unjust. The exact issues they are concerned with
are important – trying to make a world that is more just for people of all genders and
races. Additionally, the examples D’Ignazio & Klein outline also illustrate ideas that can
be kept in mind for all data collection and analysis – regardless of whether or not you
share the same concerns or beliefs as the authors.
Perhaps the single most important idea from the chapter is the following: “Data
collection and analysis does not happen in a vacuum.” This means that data
collection and analysis has the potential to impact people, and is impacted by the
context in which researchers live and how researchers think. Choices about data
collection and analysis will be made partly on the basis of how well researchers think
the data will represent what they care about, but these choices are also ethical, political,
and social choices, because data is used in our ethical, political, and social world!
In the following part of the activity, work with your partners to identify examples in the
D’Ignazio & Klein reading that demonstrate the major points. It’s a good idea to have the
reading open! You can get the reading from Canvas or from here.
Harm and Benefits of Using Data
In a world where data and statistics are used to answer questions, form scientific ideas,
find potential treatments, and form public policy, there are potentially serious
consequences to creating and using data that does or does not adequately represent
populations that researchers should study. In D’Ignazio & Klein, examples of data
collection and use are mostly taken from government and corporate data projects.
However, the same ideas apply in academic settings too! Some of the examples in the
reading seem very different from what a laboratory psychologist might do. However, in
all cases, some group of individuals is sampled, each of those individuals is scored or
measured to create data, that data is then analyzed, and used to make a decision. A
scientist might measure how well a clinical treatment helps a group of people and uses
that data to make recommendations for whether or not we should consider the clinical
treatment. A TSA agent might visually “measure” whether someone is male or female
and combine that with measurements of that person’s body size and use that data to
make a recommendation for whether or not that individual is a security risk at an airport.
In both cases, data is collected, some kind of analysis happens, and a recommendation
or action occurs!
Being an empirical scientist (or really anyone who uses data) requires deciding how to
collect data and what and how to analyze data. To do this, researchers learn tools for
collecting data and choosing variables that can be used for statistical analyses but
researchers also must make decisions based on their ethical understanding. Research
can benefit or harm individuals and groups. What examples in the reading are meant to
demonstrate these possibilities? Note that “being well represented in the data” is not a
direct benefit or risk on its own! Spend no more than 10 minutes discussing and
answering these if with a group! If you and your partner did not complete the reading,
consider temporarily skipping this section and returning to it later!
An example of potential benefits to individuals or groups as a result of having their
experiences turned into data (injustices illuminated, individuals helped out, etc.):
● Having groups that have experienced discrimination, systematically, or not,
etc. is good because it can shine light on the current policies that
specifically target certain groups of people.
An example of potential risks of harm to individuals or groups as a result of having their
experiences turned into data (physical, financial, or emotional harm; lack of resources):
● There's a chance that their personal information being allocated for data
could be harmful to them and marginalize them, or discredit their
experience because they may be an “outlier”.
An example of potential risks of harm to individuals or groups as a result of not having
their experiences turned into data; in other words, an example of a risk of harm that
comes from being invisible in data:
● This kinda goes back to what I previously mentioned. Not having their
experiences quantified runs the risk of not getting the proper
attention/representation that is needed to make the change that is needed.
Data, Representation, and Invisibility
Statistical data is used to characterize groups of people or things. Typically a researcher
wants to characterize some population by choosing some sample of people to measure
using some set of variables. The researcher has to choose both the sample (who they
measure) and the variables (how and what is measured). These choices determine
whether or not the data that is collected represents what the data should represent!
These choices can lead some individuals to become “invisible” in data. If those people,
or people like them, are not included in a sample, then the researcher will have a biased
view of the population as a whole. Similarly, the variables that researchers record (the
measurements that they use, the questions that they ask) have the potential to make
people “invisible” if the question does not adequately represent the people in the
sample.
In D’Ignazio & Klein, what is one principle example in which a common data collection
practice makes some people or groups of people invisible? Who are people who might
be made invisible by this practice? (Answer in 2-3 sentences)
● The use of binary gender categories make nonbinary people invisible. But
by incorporating non-binary as a category it is able to represent nonbinary
people making them feel less invisible.
The following are sample questions used to generate data for the US Federal Census.
These questions are very useful for many purposes and can be used to represent a lot
of different people in different ways! However, in each case, some people will be made
“invisible” in the data. This means that those individuals would not be accurately
represented – maybe there is no answer that adequately represents the person or the
choices for answering are things that the person would not know how to answer. In
some cases, the person might not be able to answer a question out of fear for possible
repurcussions. Some of these might affect lots of people, and some fewer. With a
partner, look through these questions, and identify a person or group who might be
made “invisible” in the US Census based on that question. (Spend no more than 20
minutes answering these with your group)
What is an example of a person or group who would not be well represented in data
collected using this question?
● A group of people not properly identified by this question are non-binary
individuals, or individuals who identify outside of these questions. I feel like
however, sex has to do with your genitals, and gender has become something
you decide.
Can you think of an extra question, answer, or change to the question that would make
that person or group “visible” in the data?
● What was your sex assigned at birth?
What is an example of a person or group who would not be well represented in data
collected using this question?
● Anyone who is homeless or living on a reservation.
Can you think of an extra question, answer, or change to the question that would make
that person or group “visible” in the data?
● Do you identify as homeless, or do you identify as a part of a reservation?
What is an example of a person or group who would not be well represented in data
collected using this question?
● Someone who was adopted - not sure of their heritage.
Can you think of an extra question, answer, or change to the question that would make
that person or group “visible” in the data?
● If you were to add an unsure or not enough information option, I think that would
give more flexibility.
What is an example of a person or group who would not be well represented in data
collected using this question?
● I feel like babies who are less than wouldn’t be represented well in this data.
Other than that, someone who doesn’t know their birthday, and or someone who
was born on leap day
Can you think of an extra question, answer, or change to the question that would make
that person or group “visible” in the data?
● Were you born on leap day?
● Or add an instruction for doing like a decimal. “If a baby, designate with a .1 - .11
for the months
Considering Data Choices
In this section, you will read scenarios about the collection, interpretation, and use of
data. In these scenarios, there will be a choice that can be made about how the data is
collected and used – the choice will either be about how a variable is measured or
about who will be measured when creating data. Your job is to consider the possible
benefits of choosing either of two options. (try not to spend more than 10 minutes each
discussing these with your group!)
Scenario 1 – Choices about variables
The US Federal Census is a major data collection effort that is made every ten years in
the United States. The stated goal by the US Census Bureau is “... counting every
person in the 2020 Census once, only once, and in the right place”. Data collected in
the US Federal Census is used for many different purposes. The main purpose it is
used for is in determining how many representatives each state gets to send to the US
House of Representatives. However, census data is also used for determining what
communities should be targeted for government programs and which communities may
need funding for projects.
Most “typically”, data is collected by having a single member of a household respond to
questions designed to figure out how many people live in a residence. The idea is that
the Census bureau could then count up all the people in all the residences in a
community to figure out how many people live there. However, things can get tricky
when people don’t live in a single-family house or apartment. For instance, people
staying long-term in hospitals (including newborn babies) are counted as residing in the
hospital, not their “normal home”.
When it comes to prisoners, the choice of how to decide the value of the “residence
location” variable in census data has potential major consequences! In the US, there
are between one and two million people incarcerated in prisons and jails (one current
estimate is 1.9 million). That’s a lot of people to count! When those people are counted,
should they “count” as a resident of the prison community (the place that they physically
sleep and spend their days, though they may never have seen the community outside
the prison walls) or as a resident of their home community (where they likely still have
family and community interests, even if they aren’t currently physically present)?
Currently, in the US, prisoners are counted as residing in the community where their
prison is located, regardless of the prisoner’s home community. However, some
organizations have advocated for changing this, and recording prisoners as residing in
their home communities, even while in prison.
Consider the following: Census data is used for many things, including allocating
resources to communities and legislative representation. Many large prisons are located
in small, rural communities. Many prisoners stay in prisons that are in counties or even
states far from their home.
In 2-3 sentences, what are some potential benefits to counting prisoners as residents of
the community that the prison is located in?
● This gives the proper resources to the right area. If there is an underreported
number of individuals living in a certain area, utilizing the resources, but not
being acknowledged as there, the amount of resources the government may
allocate to that area may decrease and be problematic. And if you have a parent
in jail, but report living in a home with two parents it could impact scholarships
In 2-3 sentences, what are some potential benefits to switching to counting prisoners as
residents of the community that they lived in before going to prison?
● Switching to the locations that they should actually be I think will switch it. These
areas probably need the resources that should or hopefully would be provided by
the individual who is supposed to be living in the home. It shows what
communities are kind of being targeted/what areas are producing more felons.
Additionally it may reveal migration patterns, and again give us more accurate
data of what areas are supposed to be more saturated.
Scenario 2 – Choices about Observations
Depression is identified by the National Institute of Mental Health as one of the most
common mental disorders in the United States population. Many treatments have been
suggested and are currently practiced for treating depression, including the use of
pharmacological treatments (ie.: drugs). However, the way that treatments for
depression work and what data is collected might surprise you (see this BBC article for
more details if you want to read up on this later!):
To test treatments, people with depression may be asked to participate in an
experiment. In the experiment, half of participants will be given a new “experimental”
treatment and half will be given a “control” treatment, often a standard treatment that is
already popular. The efficacy of a new drug is calculated by comparing reductions in
symptoms between the “control” group of participants who had the old treatment and
the “experimental” group of participants who had the new treatment. If the people in the
experimental group reliably improved more than the control group, the new treatment is
probably a good approach!
Depression can range from mild to severe, and the extent to which depression may lead
to poor well-being for individuals is variable. However, when testing treatments for
depression, data is usually not collected from the full range of people with depression.
Oftentimes, participants with severe depression – and especially those who are deemed
at risk of suicide – are not included in research studies. In other words, when
depression treatments are backed up by scientific data, that scientific data often only
represents what happens in people with mild or moderate cases of depression.
In 2-3 sentences, what do you think is a potential benefit or good reason for not
including participants with severe depression in studies of new, experimental
treatments?
● Severe depression is unique because it is severe. Many diagnoses aren’t for
individuals who are severe. So when conducting experimental treatments, if you
take the outliers to treat the mean you're going to have results that don’t actually
treat the highest population in need, and you are going to under-treat the other
population that is also still in need, just of different interventions.
In 2-3 sentences, what is a potential downside of not including participants with severe
depression, what could potentially be gained by including those participants?
● Including these individuals will give you a good range. Everyone experiences
depression differently, so having a wide range of people it will give you the
opportunity to see how effective your experimental treatment is on different levels
of depression, and how dosage may be able to play a factor.
Puzzling Statistical Results
Different choices can have different benefits when it comes to creating and collecting
data. The section above had you think about the choices “before” the data collection
and analysis. Now, you’ll look at the results (at least the statistical results) of some data
choices and limitations related to collecting and categorizing data and think back to how
these results might have happened. It is important to remember that the same statistical
results can sometimes arise from many data collection choices and many patterns in
the real world. As a result, you want to critically think about different possibilities that
might have produced some result when you see them published or shared! Your job is
to think of ways that data collection choices might have resulted in the reports – and
those possibilities might make evidence a little bit less clear than it first seems.
Scenario 1 – Sampling Musicians
Here's an example that originally made the rounds of the Internet a little while ago
(2015) but could be easily replicated today. The following graph was based off a
published research study and shared on social and traditional media without much
context:
This graph was typically interpreted as supporting the claim that the lifestyles of rap,
hip-hop, and metal musicians are more violent and dangerous than those of other types
of musicians-- these artists live hard and die young.
This *might* be true, but there are other explanations for the pattern we see here. Try to
think of a possibility where these results would be obtained even if there were no
differences in the lifestyle or mortality risks of these different types of musicians. The
graph plots the average age of death for different groups. Given that measurement,
think about approaches to sampling – who might be able to be included in the sample
for the research study here? In a sentence or two, write out an alternative explanation
for the differences in averages seen here that is tied to the possible samples that might
have been included in the data?
● I think the way we need to look at this and a way to look at it if everyone has the
same lifestyle is who is entering or performing in these fields. You are going to
have older people doing blues or gospel, but often you will have a younger
demographic doing rap or mettle. Most usually at least.
If you're stumped, feel free to check out this case study looking at this graph! This case
study is included as part of Carl Bergstrom and Jevin West’s excellent course on
“Calling Bullshit” that was turned into an excellent book on understanding quantitative
claims by the same name.
Scenario 2 – Categorizing Students
School districts are regularly required to report many different statistics that are
intended to characterize the degree to which students at the school are attaining
different academic outcomes. These statistics can be used to allocate resources to
different schools. ACT Aspire is one standardized test that is sometimes used to
understand if students at schools are on a path to being “college ready”. It is taken at a
variety of grade levels. The test has different subsections, but there is also a “composite
score” that ranges from 400 (lowest performing) to 460 (highest performing). In different
states and districts, these scores might be reported differently in different contexts.
Now, let’s think about two example schools. We’ll call them Washington High School
(WHS) and Lincoln High School (LHS). Here’s some facts about them: Both schools are
in a district where the ACT Aspire is given to students every year. Additionally, both
schools have a “gifted student” program where high-achieving students take separate
“honors” classes from the rest of the student body. Among other things, the schools
both invite students to join the gifted program based on their 8th grade middle school
scores on standardized tests like ACT Aspire. Also, because the mainstream student
body and gifted program students take different types of course loads, these schools
report their standardized test scores separately for those groups.
WHS reports the following:
● Our mainstream 9th grade students score on average of 417 on the ACT-A, while
our honors students on average score 450!
LHS reports the following:
● Our mainstream 9th grade students score on average of 413 on the ACT-A, while
our honors students on average score 441.
It looks like WHS is doing a better job than LHS at serving both their gifted and
mainstream students. Based on these stats, this is possible! It might also be the case
that students at WHS have more family and community resources to help them
succeed. This is possible based on these stats! Another possibility is that WHS teachers
help their students cheat on the standardized tests. This is also possible based on these
stats! However, given the scenario described above, there is an additional possibility
that can be perplexing. It is possible that the students at these two schools get scores
that are just as good as each other. As a thought experiment, let’s pretend that both
schools have only 10 students total. Their ACT Aspire scores in 8th grade and the 9th
grade are included for all 10 students below (including both mainstream and gifted
students).
Washington High School
Lincoln High School
Student
ACT-A 8 Score
ACT-A 9 Score
Student
ACT-A 8 Score
ACT-A 9 Score
HAQ
400
405
ON
400
405
JOS
405
410
MRT
405
410
KOF
408
410
SCC
408
410
VT
415
420
VGK
415
420
MSL
418
420
ZR
418
420
BAM
422
425
SCK
422
425
RBB
425
430
FTT
425
430
BC
435
445
PLQ
435
445
LS
440
450
VBD
440
450
PSA
440
455
ZAQ
440
455
Means
420.8
427
Means
420.8
427
So, the students at WHS and LHS have the exact same test scores! But both the gifted
program and the mainstream students at WHS are doing better on average then their
counterparts at LHS. In 1-2 sentences, what is one way that the differences in those
averages could be obtained even when the individual students have the same scores.
You can talk in terms of real numbers or in general terms.
● I think one way to do it is by measuring the culture between the two
schools/teachers. Test scores are a one time thing, but learning is a
cumulation of knowledge, and if the students aren’t engaged they are less
likely to consistently do as good as the other students.
If you’re stumped on this question, feel free to consult either this wiki article or this 3 ½
minute video that discusses parallel issues in clinical science.
That’s all for today!
For this activity, all you need to “report” is this document! Once you’ve filled this out,
save it as a .pdf file and upload it to Canvas!
Download