Combining Data from Multiple Administrative and Survey Sources for Statistical Purposes

Date:

08/06/2015 - 10/06/2015

Organised by:

University of Southampton/ADRC-E

Presenter:

Prof Li-Chun Zhang

Level:

Intermediate (some prior knowledge)

Contact:

adrce@southampton.ac.uk

Map:

View in Google Maps  (SO17 1BJ)

Venue:

Southampton Statistical Sciences Research Institute, Building 39, University of Southampton, Highfield, Southampton

Description:

Course places are limited and registration by 1 Jun 2015 is strongly recommended.

Course number: ADRCE-training014 Zhang

Summary of Course: 

More and more, social science research makes use of data residing in multiple sources, including sample surveys, census and administrative registers. A major benefit is widened scope of analysis that could not have been feasible based on the data from each source on its own. However, the combined data may contain many apparent inconsistencies and shortcomings that one needs to overcome. The data linkage and integration process may also generate errors of its own. To analyse such imperfect data as-is leads generally to incorrect inference.    This course will provide a general introduction to combining multiple administrative and survey datasets for statistical purposes. A total-error framework is described for integrated statistical data, which provides a systematic overview of the origin and nature of the various potential errors. The techniques and uncertainty of data fusion, or statistical matching, are discussed, by which the joint distribution or dataset of interest is constructed based on the marginal observations alone. Case studies from register-based census are given of micro integration, i.e. the generic process for achieving micro-level consistency both in terms of the analysis units and the measurements associated with them, including methods for statistical linkage of different types of units that do not share common match keys, and methods of imputation under multiple linear constraints on both micro- and macro-levels.    The course will include a mixture of lectures, group works and computer practice.

 

Course Contents: 

The course covers:

• Life-cycle of integrated statistical data and transformation processes

• Framework of error sources: characterisation and conceptualisation

• Quality indicators and statistical uncertainty measures

• Nature and propagation of linkage and unit errors

• Uncertainty and techniques of categorical data fusion, or statistical matching

• Methods of micro integration: statistical unit and measurement

• Examples of micro integration from register-based census statistics

• Imputation and adjustment methods for micro- and macro-level benchmarking constraints

 

Learning Outcomes: 

By the end of the course participants will have gained:

• Understanding of potential errors and statistical uncertainty involved in data integration

• Ability to apply relevant concepts and methods in practice

• Appreciation of opportunities and challenges of inference based on data integration Computer Software and Computer workshops

This event includes computer workshops.The course will introduce participants to R-implement of the methods discussed.

 

Presenters:

Dr Li-Chun Zhang is Professor of Social Statistics at the Southampton Statistical Sciences Research Institute (S3RI) at the University of Southampton, and senior methodologist at Statistics Norway. He has participated in a number of EU framework projects and Eurostat ESSnet projects. His research interest includes data integration, statistical uses of administrative sources, survey sampling, sample coordination, estimation and imputation, treatment of non-sampling errors, small area estimation, statistical data editing, and statistical modelling. He obtained Dr. Scient. in Statistics at the University of Tromsø, Norway.

 

Target Audience: 

Social and medical researchers with interests in combining data from multiple sources or analysing data from different sources; saff at National Statistical Institutes (or similar organisations) who are involved in the design, management and quality assurance of statistical processes based on data from multiple sources including censuses, administrative data and sample surveys. Methodological interest, training/knowledge and experience will be helpful.

 

Pre-requisites:

Basic concepts of statistical uncertainty (such as bias, variance, confidence interval), basic understanding/appreciation of data cleaning, editing and/or imputation, basic experience/skill of statistical modelling

 

Event Outline: 

(Draft Programme, subject to minor changes)

Day 1:

09:30 Registration and coffee

10:00 Integrated statistical data and total-error framework (with breaks)

13:00 Lunch

14:00 Generic process of Micro Integration, data quality and statistical uncertaintyand group work / exercise (with breaks, tea at 15:45)

17:00 Close

Day 2: 

09:00 Categorical data fusion: Uncertainty and techniques and practical exercise (with breaks)

13:00 Lunch

14:00 Case studies of statistical units: Error and linkage (with breaks & tea 15:45)

17:00 Close

Day 3:

09:00 Imputation with micro and macro-level constraints and practical exercise (with breaks)

13:00 Lunch

14:00  Adjustment and uncertainty of macro-level time series and accounts (with break)

15:30 Close

 

Terms and conditions: 12 Cancellation and Refund of Events and Services

http://store.southampton.ac.uk/help/?HelpID=1

Cost:

The fee per day is:
1. £30 - For UK registered postgraduate students
2. £60 - For staff at UK academic institutions, Research Council UK funded researchers, UK public sector staff and staff at UK registered charity organisations
3. £220 - For all other participants
4. Free Place for ADRC-E/ADRN/ADS staff
All fees include course materials, refreshments and lunch; however, they do not include travel and accommodation costs.

Website and registration:

Region:

South East

Keywords:

Imputation, Data fusion, Data integration , Total error framework , Statistical methods , Uncertainty assessment & propagation , Linkage error , Statistical matching , Micro integration , Combining administrative & survey data , Coverage, relevance

Related publications and presentations:

Imputation
Data fusion

Back to archive...