Analysis of Linked Datasets
Date:
02/09/2015 - 03/09/2015
Organised by:
University of Southampton/ADRC-E
Presenter:
Natalie Shlomo
Level:
Intermediate (some prior knowledge)
Contact:
Map:
View in Google Maps (SO17 1BJ)
Venue:
Southampton Statistical Sciences Research Institute, Building 39, University of Southampton, Highfield, Southampton
Description:
Course places are limited and registration by 26 Aug 2015 is strongly recommended.
Course number: ADRCE-training015 Shlomo
Summary of Course
The two day course will introduce basic concepts of deterministic and probabilistic approaches to data (record) linkage, including pre-processing requirements, blocking, match weights, types of errors in the classification and evaluation procedures. We then present methods to compensate for potential linkage errors when carrying out some standard statistical models on the linked dataset. These methods assume that linkage errors can be quantified and used to correct for measurement error in our statistical models. We also describe other statistical methods for analysing linked datasets, such as a multiple imputation approach. By the end of the course, students should have an understanding of data linkage techniques and how linked datasets should be analysed when subject to linkage errors. There will be a practical tutorial and a computing lab.
Course Objectives:
- To provide basic concepts and theory of data linkage.
- To provide tools for evaluating and quantifying linkage errors.
- To demonstrate how linkage errors can be incorporated into statistical models when analysing linked datasets.
Course Content:
This course will include the following topics:
- Overview of the theory of data linkage techniques with an emphasis on probabilistic data linkage
- Evaluation of the data linkage procedure and quantifying linkage errors
- Develop understanding of the potential bias when analysing linked data subject to linkage errors
- Present methods for compensating for linkage errors under some standard statistical models: measures of association in a contingency table and a linear regression model
Target Audience:
The course is aimed at researchers who need to gain an understanding of data linkage procedures and the analysis of linked datasets subject to linkage errors. The course emphasises putting theory into practice for those who need to carry out data linkage and analysis in their own work. Participants may be academic researchers in the social and health sciences or may work in government, survey agencies, official statistics, for charities or the private sector.
Pre-requisites:
The course does not assume any prior knowledge of data linkage but does require basic knowledge in statistical analysis, such as measures of association for a contingency table and a multiple regression model. No familiarity with the chosen software will be assumed.
Course Materials:
Participants will receive course notes, tutorials and computing lab material.
Preparatory Reading:
- Chamber, R. (2009). Regression Analysis of Probability-Linked Data. Official Statistics Research Series 4, Statistics New Zealand (available at http://www.statisphere.govt.nz/further-resources-and-info/official-statistics-research/series/volume-4-2009.aspx#2)
- Chipperfield, J. O., Bishop, G. R. and Campbell P. (2011). Maximum Likelihood Estimation for Contingency Tables and Logistic Regression with Incorrectly Linked Data. Survey Methodology, Vol. 37, No. 1, 13-24.
- Clark D. Practical introduction to record linkage for injury research. Injury Prevention 2004, 10(3):186.
- Fellegi, I. P. and Sunter, A. B. (1969) A Theory for Record Linkage. Journal of the American Statistical Association, 64, 1183-1210.
- Gill, L. (2001) Methods for Automatic Record Matching and Linkage and their use in National Statistics, The National Statistics Methodology Series, ONS (available at http://www.ons.gov.uk/ons/guide-method/method-quality/specific/gss-methodology-series/index.html)
- Goldstein, H., Harron, K. and Wade, A. (2012) The Analysis of Record-linked Data Using Multiple Imputation with Data Value Priors. Statistics in Medicine, Vol. 31 (28), 3481-3493.
- Herzog, T. N., Scheuren, F. J. and Winkler, W. E. (2007) Data Quality and Record Linkage Techniques. New York: Springer. ISBN 978-0-387-69502-0
Presenter:
Natalie Shlomo is a Professor at the University of Manchester. She has extensive knowledge of survey methods including data processing: record linkage, edit and imputation processes and statistical disclosure control. She also has wide ranging experience in teaching of short courses at international level.
Programme:
Wednesday, September 2nd, 2015
9:30 – 10:00 Registration
10:00 – 10:45 Welcome and Introductions
Introduction to Data Linkage
Sources for Data Linkage
10:45 – 11:15 Tea/Coffee Break
11:15 – 12:00 Applications in the Social Sciences
Ethics and Disclosure Risks
12:00 – 12:45 Pre-processing and String Comparators
Probabilistic Data Linkage - tutorial
12:45 – 1:30 Lunch
1:30 – 2:25 Linkage Methods
Thresholds and Types of Error
2:30 – 3:30 Evaluation of Data Linkage
3:30 – 4:00 Tea/Coffee Break
4:00 – 5:00 Analysis of Linked Data: Assumptions
Thursday, September 3rd, 2015
10:00 – 10:45 Linkage Errors
Regression Models under Linkage Errors
10:45 – 11:15 Tea/Coffee Break
11:15 – 12:00 Multiple Imputation
Regression Models (cont.)
12:00 – 12:45 Contingency Tables under Linkage Errors
12:45 – 1:30 Lunch
1:30 – 2:45 Computing Lab
2:45 – 3:15 Tea/Coffee Break
3:15 – 4:00 Computing Lab
Terms and conditions: 12 Cancellation and Refund of Events and Services
http://store.southampton.ac.uk/help/?HelpID=1
Cost:
The fee per day is:
1. £30 - For UK registered postgraduate students
2. £60 - For staff at UK academic institutions, Research Council UK funded researchers, UK public sector staff and staff at UK registered charity organisations
3. Free Place for ADRC-E/ADRN/ADS staff
4. £220 - For all other participants
All fees include event materials, lunch, morning and afternoon tea. They do not include travel and accommodation costs.
Website and registration:
http://store.southampton.ac.uk/browse/extra_info.asp?compid=1&modid=5&deptid=39&catid=113&prodid=501
Region:
South East
Keywords:
Analysis of administrative data, Data Quality and Data Management (other), Data linkage, Quantitative Approaches (other)
Related publications and presentations:
Analysis of administrative data
Data Quality and Data Management (other)
Data linkage
Quantitative Approaches (other)