Analysis of Linked Datasets

Date:

01/09/2016 - 02/09/2016

Organised by:

University of Southampton/ADRC-E

Presenter:

Natalie Shlomo

Level:

Intermediate (some prior knowledge)

Contact:

adrce@southampton.ac.uk

Map:

View in Google Maps  (SO17 1BJ)

Venue:

Southampton Statistical Sciences Research Institute, Building 39, University of Southampton, Highfield, Southampton

Description:

Course places are limited and registration by 25 August 2016 is strongly recommended.

Course number: ADRCE-training024 Shlomo

 

Summary of Course

The two day course will introduce basic concepts of deterministic and probabilistic approaches to data (record) linkage, including pre-processing requirements, blocking, match weights, types of errors in the classification and evaluation procedures. We then present methods to compensate for potential linkage errors when carrying out some standard statistical models on the linked dataset. These methods assume that linkage errors can be quantified and used to correct for measurement error in our statistical models. We also describe other statistical methods for analysing linked datasets, such as a multiple imputation approach. By the end of the course, students should have an understanding of data linkage techniques and how linked datasets should be analysed when subject to linkage errors. There will be a practical tutorial and a computing lab.

 

Course Objectives:

By the end of the course, students should have an understanding of data linkage techniques and how linked datasets should be analysed when subject to linkage errors. There will be a practical tutorial and a computing lab.

 

Course Content:
This course will include the following topics:

  • Overview of the theory of data linkage techniques with an emphasis on probabilistic data linkage
  • Evaluation of the data linkage procedure and quantifying linkage errors
  • Develop understanding of the potential bias when analysing linked data subject to linkage errors
  • Present methods for compensating for linkage errors under some standard statistical models: measures of association in a contingency table and a linear regression model

 

Target Audience:

The course is aimed at researchers who need to gain an understanding of data linkage procedures and the analysis of linked datasets subject to linkage errors. The course emphasises putting theory into practice for those who need to carry out data linkage and analysis in their own work. Participants may be academic researchers in the social and health sciences or may work in government, survey agencies, official statistics, for charities or the private sector.


Pre-requisites:

The course does not assume any prior knowledge of data linkage but does require basic knowledge in statistical analysis, such as measures of association for a contingency table and a multiple regression model. No familiarity with the chosen software will be assumed.

Course Materials:
Participants will receive course notes, tutorials and computing lab material.


Preparatory Reading:

  • Chamber, R. (2009). Regression Analysis of Probability-Linked Data. Official Statistics Research Series 4, Statistics New Zealand (available at http://www.statisphere.govt.nz/further-resources-and-info/official-statistics-research/series/volume-4-2009.aspx#2)
  • Chipperfield, J. O., Bishop, G. R. and Campbell P. (2011). Maximum Likelihood Estimation for Contingency Tables and Logistic Regression with Incorrectly Linked Data. Survey Methodology, Vol. 37, No. 1, 13-24.
  • Clark D. Practical introduction to record linkage for injury research. Injury Prevention 2004, 10(3):186.
  • Fellegi, I. P. and Sunter, A. B. (1969) A Theory for Record Linkage. Journal of the American Statistical Association, 64, 1183-1210.
  • Gill, L. (2001) Methods for Automatic Record Matching and Linkage and their use in National Statistics, The National Statistics Methodology Series, ONS (available at http://www.ons.gov.uk/ons/guide-method/method-quality/specific/gss-methodology-series/index.html)
  • Goldstein, H., Harron, K. and Wade, A. (2012) The Analysis of Record-linked Data Using Multiple Imputation with Data Value Priors. Statistics in Medicine, Vol. 31 (28), 3481-3493.
  • Herzog, T. N., Scheuren, F. J. and Winkler, W. E. (2007) Data Quality and Record Linkage Techniques. New York: Springer. ISBN 978-0-387-69502-0

 

Presenter:

Natalie Shlomo is a Professor at the University of Manchester. She has extensive knowledge of survey methods including data processing: record linkage, edit and imputation processes and statistical disclosure control. She also has wide ranging experience in teaching of short courses at international level.

 

Programme:

Available nearer to the course date:

The course will start with registration and coffee at 9:30 am with formal teaching starting at 10:00 am on the first day. The lectures will go on to 5:00 pm. There will be an opportunity on the second day for participants to ask questions about the course and to discuss with the instructor how to link their own datasets and problems arising with the analysis (you can bring your own data to the course if you wish).

 

Our courses are very popular and are often oversubscribed. If you cannot attend a course you have registered for, it is essential to kindly notify us a minimum of 30 days in advance so that your place can be released for another attendee. Details of our cancellation policy are here: http://store.southampton.ac.uk/help/?HelpID=1 . Please see our full course list here: http://store.southampton.ac.uk/browse/product.asp?compid=1&modid=5&catid=113.

Cost:

The fee per day is:

1. £30 - For UK registered postgraduate students
2. £60 - For staff at UK academic institutions, Research Council UK funded researchers, UK public sector staff and staff at UK registered charity organisations
3. Free Place for ADRC/ADRN/ADS staff
4. £220 - For all other participants

All fees include event materials, lunch, morning and afternoon tea. They do not include travel and accommodation costs.

Website and registration:

Region:

South East

Keywords:

Analysis of administrative data, Data Quality and Data Management (other), Data linkage, Quantitative Approaches (other)

Related publications and presentations:

Analysis of administrative data
Data Quality and Data Management (other)
Data linkage
Quantitative Approaches (other)

Back to archive...