Big Data Analytics with R – 2-Day Online Course


04/07/2022 - 05/07/2022

Organised by:

Mind Project Ltd


Simon Walkowiak MSc, MBPsS


Intermediate (some prior knowledge)


Simon Walkowiak
Phone: 02033223786

Venue: Online


1. Course description.

This hands-on two-day tutor-led course covers one of the most exciting and current topics within the R community. Although traditionally R has not been used for Big Data analytics due to various limitations, recent R packages have provided much-needed connectivity for out-of-memory processing with popular Big Data tools such as Hadoop, Spark, SQL and NoSQL databases etc.

During this training course, you will learn essential know-how on applications of R language to manage, manipulate and analyse Big Data, datasets stored in distributed file systems or large databases, and to write fast, parallel R code to allow scalability of algorithms and data processing. The course also serves as a good introduction to Cloud Computing (Amazon Web Services and Microsoft Azure) and MLOps as you will be presented with the best practices of the Big Data system design which utilises the growing ecosystem of tools that support Big Data analytics including software and engines applicable to large scale statistical and machine learning (h2o, Spark, keras, and tensorflow).

This course is based on the “Big Data Analytics with R” book (with recent edits) authored by the course tutor. You will receive full access to the ebook version of this book before the course. Additionally, all activities (tutorials and exercises) of the course will be performed on the Mind Project Big Data virtual machines and computing clusters which consist of both CPU and GPU-accelerated multi-node Hadoop DFS with Spark engines and Hive databases, separate databases (both SQL and NoSQL including scalable and distributed MongoDB, HBase and CassandraDB) as well as all necessary R packages (e.g. Spark, h2o, keras, tensorflow, data.table, tidyverse etc.) pre-installed for your convenience to allow seamless connectivity of R with various Big Data tools and processing/analytical engines.

More details available at


2. Course programme.

This is a 2-day instructor-led online training course with a week-long follow up period. The course will run from 10:00 in the morning to ~17:00 each day and will include a 50-minute break for lunch between morning and afternoon sessions and two 15-minute coffee/tea breaks. Following the course, you will be able to submit your solutions to the homework exercises and you will receive feedback from the tutor. 

This training course is tutor-led – all online tutorials are presented live by our expert instructor, you can ask questions, discuss the topic and interact with other learners. You can also email the tutor after the course if you have any questions related to the material presented during the course. 

The course will be recorded – you will have access to the video recordings of the course webinars and additional resources such as datasets, R code, academic papers related to the topics of the workshop, and supplementary exercises via Mind Project Learning Platform. 

Course dates: Monday-Tuesday, 4th-5th of July 2022, 10:00-17:00 London (UK) time

Deadline for registrations: Friday, 1st of July 2022 @ 17:00 London (UK) time


The programme for this course covers the following concepts and topics:

  • Use third-party R packages, which support parallel computing in order to increase the speed and processing capabilities of R with both CPUs and GPUs,
  • Work on large data sets in the Cloud (Microsoft Azure and Amazon EC2) through R deployed on the server,
  • Implement MapReduce framework through Hadoop straight from R console,
  • Manage Hadoop Distributed File System, HBase and Hive databases through R,
  • Connect to and extract, aggregate and manage the data in major relational SQL-based database management systems (RDBMSs) using a variety of R packages,
  • Apply NoSQL queries to access, transform and manipulate large data sets in MongoDB using R packages,
  • Improve the data flow and speed of data processing as well as implementation of AI models for large data sets through R’s connectivity with Spark and h2o packages,
  • Learn how to design Big Data systems to support scalable and distributed machine learning and AI (with Spark, h2o, keras and tensorflow packages),
  • Implement selected Big Data tools in the Big Data Product Cycle with R.


3. Course pre-requisites and further instructions

  • You will need at least one commonly used web browser installed on your PC (e.g. Chrome, Safari, Firefox, Edge etc.) to access our computing clusters and the Mind Project Learning Platform (with additional resources). There is no need for you to install any specific R packages as our clusters and virtual machines will be set-up for your convenience.
  • We recommend that the attendees have practical experience in data processing or quantitative research – gathered from either professional work or university education/research. We suggest that the course is preceded with our “Applied Data Science with R” open-to-public training course.

  • Your PC needs to be connected to a stable WiFi/Internet network (either home or office-based) and have Zoom video-conferencing application installed.


4. Your course instructor. 

Your instructor for this course will be Simon Walkowiak. Simon is a director at Mind Project Limited and a Ph.D. researcher in Artificial Intelligence at the Bartlett Centre for Advanced Spatial Analysis (University College London) and the Alan Turing Institute in London. Simon holds BSc (First Class Honours) in Psychology with Neuroscience and MSc (Distinction) in Big Data Science. He conducts and manages research projects on implementation and computational optimisation of novel AI approaches applicable to large-scale datasets to predict human behaviour and spatial cognition. Simon is the author of “Big Data Analytics with R” (2016) – a widely used textbook on high-performance computing with R language and its compatibility with ecosystem of Big Data tools e.g. SQL/NoSQL databases, Spark, Hadoop etc. Apart from research and data management consultancy, during the past several years, Simon has taught at more than 150 in-house or open-to-public statistical training courses (in R, Python, SQL and Scala for Spark languages) in the UK, Europe, Asia and USA. His major clients include organisations from finance and banking (HSBC, RBS, GE Capital, European Central Bank, Credit Suisse, ING etc.), research and academia (GSMA, CERN, University of Cambridge, UK Data Archive, Agri-Food Biosciences Institute, Newcastle University etc.), health (NHS), insurance (Liberty IT), transport (Steer Group) and government (Home Office, Ministry of Justice, Government Actuary’s Department etc.).


Should you have any questions please contact Mind Project Ltd at or by phone on 0203 322 3786. Please visit the course website at


By 6th of June 2022 (Early Bird offer): £360 (normally £450) per person for the whole course (regular fee). £270 (normally £330) per person for the whole course applicable to undergraduate and postgraduate students, representatives of registered charitable organisations and NHS employees only (discounted fee). Additional discounts available for multiple bookings and groups.

Website and registration:


Greater London


Bayesian methods, Data linkage, Data Mining, Neural networks, Machine learning, Quantitative Approaches (other), Quantitative Software, R, Big Data, Spark, SQL and NoSQL databases, Cloud computing

Related publications and presentations:

Bayesian methods
Data linkage
Data Mining
Neural networks
Machine learning
Quantitative Approaches (other)
Quantitative Software

Back to archive...