Exploring Synthetic Record Linkage for Administrative Databases
The use of linked administrative data for empirical, and particularly policy-oriented, research has increased steadily in recent decades (Chetty, 2012). There are at least two factors driving this trend: 1) a desire to use most detailed and up to date data to inform policy and practice; and 2) a gradual change in political and data infrastructure landscape to facilitate access to record-level data. While linking and expanding access to these data undoubtedly increases scientific research opportunities and facilitates the study of complex policy-oriented research questions, both factors present methodological and data privacy issues that have not been fully worked out.
From a data privacy perspective, the reuse of administrative records for research purposes poses strong risks because they tend to capture whole populations, often contain sensitive data, and consent of the data subjects has not been obtained. These risks increase further when the data are linked. For these reasons, access to such data is restricted to approved researchers working within secure laboratory environments with linkage carried out by trusted third parties (TTPs - organisations that facilitate linkage between data owned by different organisations that cannot share data with one another).
The overarching aim of this piece of research is to develop a new methodological form which we call synthetic linkage; a fusion of synthetic data generation and statistical record linkage.
1. To develop the methodological framework of synthetic linkage.
2. To develop software which implements the new methodology.
3. To test that software and methodology in a set of experiments involving two parties.
The project is at top level a proof of concept for that methodological form. If it proves feasible then it would have multiple applications.