Differential Privacy
Contents
Full Title or Meme
Differential Privacy is a system for publicly sharing information about a data collection by describing the patterns of groups within the collection while withholding information that could identify the Subject.
Context
Differential Privacy is a mathematical technique that injects inaccuracies or "noise," into the data, making some people younger and an equal number older, changing races or other attributes. The more noise you inject, the harder deanonymization becomes. Apple and Facebook started using this technique in 2019 to collect aggregate data without identifing particular users.
Problems
To much noise can render the data unless. One analysis showed that a differentially private verson of the 2010 census included households that supposedly had 90 people.[1] It seems that "normal statistics" biases in the randomization would hide any data that was not "normal".
Solutions
U.S. Census Bureau officials said the agency is revamping its systems to prevent anyone from using published data to target individual respondents through the information they disclosed to the census. The bureau aims to use a mathematical process, called Differential Privacy, to modify census results sufficiently to reliably conceal respondents' identity. The agency will make small additions to and subtractions from each number, prior to almost every table's publication, and significantly cut the number of published statistics. Although data users are concerned these changes will disrupt their use of census data, not addressing the danger could allow information on individuals to be exposed, violating federal privacy law and elevating the risk of identity theft and other kinds of misuse.[2]
from ACM (2021-01-11)
As privacy violations have become rampant, and calls for better measures to protect sensitive, personally identifiable information have primarily resulted in bureaucratic policies satisfying almost no one, differential privacy is emerging as a potential solution.
In "Differential Privacy: The Pursuit of Protections by Default," a Case Study in ACM Queue, Google’s Damien Desfontaines and Miguel Guevara reflect with Jim Waldo and Terry Coatta on the engineering challenges that lie ahead for differential privacy, as well as what remains to be done to achieve their ultimate goal of providing privacy protection by default.
Differential privacy, an approach based on a mathematically rigorous definition of privacy that allows formalization and proof of the guarantees against re-identification offered by a system, signifies measures of privacy that can be quantified and reasoned about—and then used to apply suitable privacy protections.
In September 2019, Google released an open source version of the differential privacy library, making the capability generally available. To date, differential privacy has been adopted by the US Census Bureau, along with a number of technology companies.
Differential Privacy is basically just a technique for adding noise to a signal. Shannon showed how much noise can be removed by redundancy which tells us that the more noise added, the less the signal that survives. "In the context of differentially private data analysis there is a trade-off between privacy and utility. In the context of differentially private cryptographic primitive and resulting applications, there is a broader trade-off space between privacy, utility and performance."[3] The same issue of CACM contains new algorithms designed to recover just such noisy data in data bases.[4]
Federation Learning and Analysis
An exploration of how data ownership and provenance can be made first-class concepts in systems for learning and analytics in areas know as Federation Learning and Analysis.[5] This allows multiple devices, some in user's hands, to collaborate in creating a common knowledge base. Consider the ability of smart phone to complete sentences. This has the potential to leak data unless carefully designed.
References
- ↑ Angel Chen, Differential Privacy. (2020-03) Technology Review p 27.
- ↑ Paul Overberg, Census Overhaul Seeks to Avoid Outing Individual Respondent Data The Wall Street Journal (2019-11-10)
- ↑ Sameer Wagh +3, DP-Cryptography: Marrying Differential Privacy and Cryptography in Emerging Applications (2021-02) CACM 64 No 2 pp 84ff
- ↑ Abolfazi Asudeh +6, Scalable Signal Reconstruction for a Broad Range of Applications (2021-02) CACM 64 N0 2 pp106 ff
- ↑ Kallista Bonawitz +3, Federation Learning and Privacy. CACM 65 no 5 (2022-04) p 90ff