Uploaded by Tyrell Garza

Python CSE 163 UW Seattle Learning Summary 9

advertisement
Learning Reflection 9
Summary
Tyrell Garza
CSE 163
3 -16 -23
This week's lessons focused on the topic of privacy in data analysis. We learned about differential privacy, which involves
adding random noise to data to protect the privacy of individuals while still allowing meaningful analysis. We also learned
about the Laplace distribution and how it can be used for "jittering" data to achieve differential privacy. Finally, we discussed
randomized response, a mechanism for ensuring differential privacy in the absence of a trusted aggregator, which involves
randomizing individual data before sending it to the aggregator.
Group Fairness
Conceptual Inventory
Statistical Parity vs Equal
Opportunity vs Predictive Equality
Group fairness : fairness in algorithms that aims to
avoid discrimination against subgroups based on
their protected characteristics such as race, sex,
ability, religion, political identity, etc. It ensures that
the model does not treat individuals differently based
on their subgroup membership.
Statistical parity, equal opportunity, and predictive equality are definitions of group
fairness in algorithms that check for equity in subgroup decisions, false-negative rates,
and false-positive rates, respectively. Statistical parity aims to ensure equity in the
predictions for each subgroup, equal opportunity aims to ensure equity in the falsenegative rates for each subgroup, while predictive equality aims to ensure equity in the
false-positive rates for each subgroup.
EX.) college admissions decision model that does not
discriminate against a minority subgroup, such as
squares in a population of circles and squares, and
ensures that both subgroups have equal
opportunities for admission.
Calculating False Positives Rates and
False Negative Rates
True Positive vs True Negative vs False
Positive vs False Negative (concrete example,
described in these terms)
•
•
•
•
WYSIWYG vs WAE worldview
•
•
A true positive is an instance where the model predicts a
positive result for a data point that is actually positive.
A true negative is an instance where the model predicts a
negative result for a data point that is actually negative.
A false positive is an instance where the model predicts a
positive result for a data point that is actually negative.
A false negative is an instance where the model predicts a
negative result for a data point that is actually positive.
For example, in a medical diagnosis scenario, a true positive
would be when a patient is diagnosed as having a disease and
actually has it, a true negative would be when a patient is
diagnosed as not having a disease and actually does not have
it, a false positive would be when a patient is diagnosed as
having a disease but actually does not have it, and a false
negative would be when a patient is diagnosed as not having a
disease but actually does have it.
•
•
WYSIWYG (What You See Is What You Get) worldview
assumes that observed data is a good measure of the
construct space, making individual fairness easy to
achieve but group fairness difficult.
WAE (We're All Equal) worldview assumes structural
bias in the process of making proxy measurements,
making group fairness the ideal to strive for and
individual fairness potentially discriminatory.
Jittering
Jittering is the act of adding random noise to
published statistics to ensure differential
privacy. The Laplace distribution is used to select
the amount of random noise to add, with the
amount of jittering directly informing the level of
differential privacy.
•
•
•
False positive rate (FPR) is calculated by dividing the number of false
positive predictions by the total number of negative instances in the dataset.
False negative rate (FNR) is calculated by dividing the number of false
negative predictions by the total number of positive instances in the dataset.
Pareto Frontier
•
•
Pareto Frontier refers to the set of optimal trade-offs between two or more
conflicting objectives. It is the curve that connects all the points representing
the best possible solutions to a problem.
The Pareto Frontier helps decision-makers to choose the best trade-off
between multiple objectives.
K-anonymity
k-anonymity is a privacy property that ensures at least some level of privacy
for an individual in a dataset by making sure that any combination of
"insensitive" attributes appearing in the dataset match at least k individuals in
the dataset. This can be achieved by either removing insensitive attributes or
"fuzzing" the data to make it less precise in identifying individuals.
Differential Privacy
Differential privacy is a privacy guarantee that ensures the results of a study
are not too dependent on any one individual's data. It measures how much
privacy is lost in an analysis and allows for fine-tuning how similar the results
of an analysis are between two parallel universes, one with an individual's data
and one without, by controlling the value of epsilon (ε).
Randomized Response
Randomized response is a mechanism for ensuring
differential privacy in the absence of a trusted aggregator
by jittering individual data before sending it to the
aggregator. It involves flipping a coin to randomly choose
whether to report the truth or a random answer.
Uncertainties
how to choose appropriate values for
parameters like epsilon?
how to implement these methods in practice?
how to evaluate the trade-off between
privacy and data utility?
Download