Data cleaning with OpenRefine


Bio: Aleksandra Nenadic is the Training Lead at the UK's Software Sustainability Institute based at the University of Manchester committed to ongoing improvement of research software practice through training and community engagement work. Aleksandra is working to improve the provision and access to training in foundational computational and data analysis skills for researchers and scientists and is advocating for openness, reproducibility, collaboration and inclusion in research. Aleksandra volunteers in several open communities and serves on the Executive Council of The Carpentries, an international community teaching foundational coding and data science skills to researchers worldwide.

Peter Smyth, The University of Manchester

Most of the real world data is messy. Even if data is organised well; it can include errors; corruptions; inaccuracies and inconsistencies. Before data analysis can take place; data cleaning is needed to identify and correct errors; and make the data structure and formatting consistent. This process has the potential to radically change the data; so it must be completed with the same care and attention to reproducibility as the data analysis itself. We will provide an introduction to the open source tool OpenRefine which can be used to clean and (re)format data. You may also be interested in the short course on "Best practices in data organisation with spreadsheets” to give you some useful advice when you start collecting and organising your data from scratch; which is happening on Monday afternoon.