Research Areas

10 Dec 2020

Privacy and Statistical Disclosure Control

In many situations, such as in national censuses and medical studies, it is desirable to release statistical information about a population while at the same time protecting the privacy of individual participants in the sample. In a medical study, for instance, one may desire that the release allows the data consumer to learn statistical information about the population (e.g., the prevalence of a certain disease in the population), without allowing for violations of privacy (e.g., the user learning whether or not a particular individual has the disease). Conciliating utility requirements with privacy in a data release is, however, typically a non-trivial task. The knowledge areas of privacy and statistical disclosure control are concerned with the trade-offs between privacy and utility in data releases, applying sophisticated techniques such as differential privacy and its variants.

Quantitative Information Flow (QIF)

Quantitative Information Flow (QIF) is the area of knowledge concerned with measuring and controlling the amount of information that flows from a source (who knows the information) to a target (who does not yet know it). In some cases the flow of information is intended to be facilitated (e.g., when available training data provides useful information to a machine learning algorithm), whereas in other cases it should be prevented (e.g., when protecting an internet baking user’s account information). Usually, however, the aims are a careful mix of the two: to let information flow to those who need to know it, but to keep it from those who must not have it. The QIF framework allows for the precise quantification of information flow in computational systems, including measuring leaks in computational programs (e.g., election systems), in communication protocols (e.g., TOR), in the disclosure of statistical information (e.g., government census or medical research), among many others.