Training and Events
Generating Synthetic Data for Statistical Disclosure Control (few places remaining)
|University of Southampton/ADRC-E|
Dr Jörg Drechsler
16/10/2017 - 17/10/2017
NatCen Social Research, 35 Northampton Square, London
View in Google Maps (EC1V 0AX)
Course No. ADRCE-Training037 Drechsler
Course places are limited and registration by 9 October 2017 is strongly recommended.
Short Summary of Course
This short course will provide a detailed overview of the topic, covering all important aspects relevant for the synthetic data approach. Starting with a short introduction to data confidentiality in general and synthetic data in particular, the workshop will discuss the different approaches to generating synthetic datasets in detail. Possible modelling strategies and analytical validity evaluations will be assessed and potential measures to quantify the remaining risk of disclosure will be presented. To provide the participants with hands on experience, the course will include practical sessions using R, in which the students generate and evaluate synthetic data based on real data examples.
The course covers:
By the end of the course participants will:
Delegates will need to bring their own laptops with the latest version of R installed. It would be helpful if you installed the most recent version of the synthpop package in R prior to the course. This is the link https://CRAN.R-project.org/package=synthpop. Or you could instead open an R session and type install.packages(“synthpop”).
Dr Jörg Drechsler
Jörg is distinguished researcher at the Department for Statistical Methods at the Institute for Employment Research in Nürnberg, Germany. He received his PhD in Social Science from the University in Bamberg in 2009 and his Habilitation in Statistics from the Ludwig-Maximilians-Universität in Munich in 2015. He is also an adjunct assistant professor in the Joint Program in Survey Methodology at the University of Maryland. His main research interests are data confidentiality and nonresponse in surveys. He received several awards for his research on synthetic data and recently published a book on this topic.
The course intends to summarize the state of the art in synthetic data. The main focus will be on practical implementation and not so much on the motivation of the underlying statistical theory. Participants may be academic researchers or practitioners from statistical agencies working in the area of data confidentiality and data access. Basic knowledge in R is expected. Some background in Bayesian statistics is helpful but not obligatory.
This is a two-day course. On Day one, the Registration will start from 9.30 and formal teaching will commence at 10.00 and finish at around 17.00. On Day two, it will start at 9.00 and finish at around 16.00.
Event Outline (Programme)
1. A Brief History of Data Confidentiality
Some background regarding general linear modelling is expected. Familiarity with the concept of Bayesian statistics is helpful but not required. The statistical software R will be used to illustrate the implementation of the approach.
Familiarity with basics in R would be useful. Participants not familiar with the software can team up with experienced R users during the practical sessions.
The course is based on the following book:
Some useful papers are:
Participants will receive written course notes.
Intermediate (some prior knowledge)
Thanks to ESRC funding we are able to offer this course at reduced rates as follows:
Website and registration
Survey Research, Analysis of official statistics, Analysis of administrative data, Statistical Disclosure Control, Statistical Theory and Methods of Inference, Microdata Methods, R, Confidentiality and Anonymity, Synthetic Data, Synthetic Datasets
Related publications and presentations