Tools for Efficient Workflows (GitHub, Markdown, Docker) - Online

Date:

28/11/2023 - 15/12/2023

Organised by:

University College London (UCL)

Presenter:

Lukas Lehner

Level:

Advanced (specialised prior knowledge)

Contact:

IOE Short Courses
ioe.shortcourses@ucl.ac.uk

video conference logo

Venue: Online

Description:

This course focuses on developing efficient workflows that prioritize transparency, reproducibility, and collaboration. Tools originally used in software development make their way increasingly into the economic and social sciences. Specifically, R Markdown enables the creation of research papers with integrated and reproducible code. Git revolutionizes change tracking, collaboration, and code sharing through version control. By combining R Markdown and GitHub, participants can create professional and customizable websites. Docker, on the other hand, allows researchers to containerize their projects, ensuring reproducibility and stability amid rapidly changing software environments. Veracrypt is a software tool used for encrypting sensitive data frequently encountered in research. The effective utilization of these tools streamlines workflows, enhances research visibility, and promotes the citability and sustainability of research outputs. Ultimately, these practices help researchers maintain high-quality outputs that meet the increasing replicability and transparency standards required by many journals for publication. This course is hands-on, providing participants with practical experience in setting up and utilizing these software tools for their research projects.

 

The course covers: 

  • Markdown and R Markdown for automatable reports.
  • Git and GitHub for version control and collaboration in teams.
  • GitHub pages and Jekyll for creating a website.
  • Docker and DockerHub to containerize projects in a reproducible environment.
  • Veracrypt for encryption and data protection.

 

By the end of this course, participants will have:

  • Confidently mastered R Markdown for writing executable code and creating automatable reports.
  • Developed skills in using Git and GitHub for version control, enabling effective collaboration and documentation of work processes.
  • Learned how to combine R Markdown and GitHub Pages to create their own academic websites.
  • Acquired knowledge on utilizing Docker and DockerHub to containerize their projects.
  • Utilized Veracrypt for advanced data protection and encryption when working with sensitive data.

 

Pre-requisites

Basic knowledge of a statistical programming language such as R is useful but not required.

It is very important that participants have admin rights to install software on their computer for participating in the course. Participants using a laptop from their employer please ensure to obtain the admin rights for the installation sessions.

 

Event Outline

Installation session 1: 28 November (18:00-20:00)

  • We install R, Rstudio, Pandoc, Tinytex, VScode, Git, and GitHub Desktop, and make sure that the software is correctly configured.

 

Installation session 2: 5 December (18:00-20:00)

  • We install Ruby, Jekyll, Docker, and Veracrypt, and make sure that the software is correctly configured.

 

Day 1: Automatable reports and version control (1st December 2023)

 

Morning: automatable reports

  • In the first lecture, we cover foundational concepts of efficient workflows and collaboration, which are implemented throughout the course. 
  • In the first lab exercise, participants install the required software to generate automatable reports (R, R studio, Pandoc, and tinytex). Participants then write executable code and create automatable reports using R Markdown and Pandoc.

Literature

 

Afternoon: version control basics

  • In the second lecture, we cover the conceptional underpinning of modern version control tools such as Git.
  • In the second lab exercise, participants install Git, GitHub Desktop, and set up their GitHub account. Participants then solve small coding challenges to familiarize themselves with using key Git operations, including branching, merging, forking, and resolving merge conflicts.

Literature:

 

Day 2: Version control and academic websites (8th December 2023)

 

Morning: using version control in teams

  • In the third lecture, we cover practical tips and share tacit knowledge for using version control to collaborate effectively in teams.
  • In the third lab exercise, participants collaborate on a small coding challenge in teams. Participants learn to integrate code review and merge request into their day-to-day coding workflow. 

Literature:

  • Bryan. 2018. Excuse Me, Do You Have a Moment to Talk About Version Control? The American Statistician, 72(1), 20-27. 
  • Chacon and Straub. 2014. Pro Git. Apress. https://git-scm.com/book/en/v2 

 

Afternoon: Academic websites using GitHub pages

  • In the fourth lecture, we discuss how to combine executable reports (day 1) with version control (day 2) to effectively disseminate our findings online via a website.
  • In the fourth lab exercise, participants install Ruby and Jekyll and use GitHub Pages to create their own static website.

Literature:

 

Day 3: Containerisation for reproducible environments and encryption (15th December 2023)

 

Morning: Containerisation with Docker

  • In the fifth lecture, we cover the key idea underlying containerization to bundle software, libraries and configuration files with a focus on how to ship fully reproducible virtual environments as a part of scientific replication packages. 
  • In the fifth lab exercise, participants install Docker and use it with DockerHub to containerize their projects to ensure full software and code reproducibility.

Literature:

 

Afternoon: Encryption and open issues

  • In the sixth lecture, we discuss modern encryption methods, their application for academic research, and tricks for collaboration in teams.
  • In the sixth lab exercise, participants install and use Veracrypt for advanced data protection and encryption. Participants also have some time to address any open questions that may remain from the previous challenges.

Literature:

Cost:

The fee per teaching day is: £30 per day for registered students / £60 per day for staff at academic institutions, Research Councils researchers, public sector staff, staff at registered charity organisations and recognised research institutions / £100 per day for all other participants. In the event of cancellation by the delegate a full refund of the course fee is available up to two weeks prior to the course. No refunds are available after this date.If it is no longer possible to run a course due to circumstances beyond its control, NCRM reserves the right to cancel the course at its sole discretion at any time prior to the event. In this event every effort will be made to reschedule the course. If this is not possible or the new date is inconvenient a full refund of the course fee will be given. NCRM shall not be liable for any costs, losses or expenses that may be incurred as a result of the cancellation of a course. The University of Southampton’s Online Store T&Cs also continue to apply.

Website and registration:

Region:

International

Keywords:

Data Management , ICT and Software (other), GitHub, Markdown, Docker, Website, Workflow, Collaboration, Version Control, Reproducibility, Data Management

Related publications and presentations:

Data Management
ICT and Software (other)

Back to archive...