Title: Some mathematical and statistical problems of disclosure limitation for microdata
In this talk we will describe some key problems that arise when data collectors and statistical agencies try to release collections of personal records (or microdata) to researchers and general public. These are statistical and mathematical problems pertinent to the area of Statistical Disclosure Limitation (SDL)—a cross-disciplinary field at the intersection of statistics, mathematics and computer science. In particular, before releasing microdata to the public, statistical agencies have an obligation by law to protect the confidentiality of the respondents/data providers while, at the same time, they strive to release a product that would satisfy the ever growing demands of potential data users.
Thus, the goal of microdata protection is two-fold: minimize the risk of disclosure of respondents' confidential information and maximize the utility of the released data for the user. It's important to note that these goals are conflicting goals. To decrease the disclosure risk, data protector typically has to perturb microdata in some way, which often leads to decreased utility of the resultant data to the user, because some information in the data is inevitably lost. On the other hand, the efforts dedicated to improvement of the analytical validity of results of statistical analyses obtained on the perturbed data may lead to the increase of disclosure risk, because some of the protection may be undone. Hence, a trade-off between data utility and disclosure risk is the main issue of SDL practice.
To illustrate the aforementioned problems we will present some of the
methods of statistical disclosure limitation for microdata and also the
metrics of disclosure risk and data utility that can be used to asses the
performance of these methods.