Integrating and analysing multiple datasets (bookings closed)
|The University of Manchester|
Dr Ana Ivon Morales Gomez
12/09/2019 - 13/09/2019
Roscoe Building, University of Manchester
View in Google Maps (M13 9PL)
Claire Spencer, 0161 275 4579, email@example.com
This workshop will provide participants with conceptual and technical skills to understand the processes of using data from different sources. The workshop will be comprised of presentations and practical exercises using data from the UK Data Service and open data sources.
Course Leaders: Dr Ana Ivon Morales Gomez (UK Data Service-University of Manchester)
This course will introduce participants to the complexities of analysing data from multiple sources. It will cover issues of data quality, cleaning, derivation and linkage.
The increasing availability of data on all aspects of modern life - whether such data be open, archived or proprietary - has started to open up the possibility of drawing on multiple datasets to solve analytical problems.
Getting to know the data available is a fundamental step in data analysis. Not only does it allow us to know what they contain, their scope and shape, but also provides insights about the quality, format and other potential issues that affect the usability of the data. This is especially important when working with data from different sources, where inconsistencies between the different sources are more prone to occur presenting problems with merging or linking the datasets together.
The morning session will be focused on data cleaning and manipulation as an essential part of data analysis. In this session, we will learn how to identify the type of cleaning a particular data set needs in preparation for the data analysis. We will learn different techniques and practical tools to explore and manipulate the data with an emphasis on: checking the quality of the data, removing unnecessary data, creating new variables and dealing with potential errors and inconsistencies.
The afternoon session will be firstly devoted to discussing issues around missing data, with the goal of learning to identify missing data mechanisms and how different methods are applied to address missingness, depending on the underlying mechanism. Then, we will move on to discuss challenges around linking relational data and learn different methods to integrate data from different sources.
All sessions will include a mixture of presentations and hands-on practical activities. All the practical exercises will be done using R Studio. These practical sessions will give participants the opportunity to apply the main concepts discussed in the lectures to real-world data.
Day 2 will focus on working in teams to produce an analysis requiring them to work on multiple datasets. At the end of the day each team will present their solution.
On completion of this workshop, participants will gain new skills to understand the challenges of using real-world data and to apply a range of data analysis tools to process, clean and transform data into a suitable format for data analysis. Participants will also learn how to work with multiple datasets and apply practical methods for handling missing data.
Introduction to R webinar (optional)
The course will be taught using R. For those with no prior experience an introductory webinar is designed with the purpose of giving a brief introduction to R to participants with no previous experience using R or R Studio. A private link to the webinar will be sent to all participants.
Day One- 10:00 -5:00 pm
Day Two - 9:00-3:30 pm
Reading materials (not compulsory)
Wickham, H; Grolemund, G. 2016”R for Data Science” available online: https://r4ds.had.co.nz/
Intermediate (some prior knowledge)
Quantitative Data Handling and Data Analysis
Related publications and presentations