Big Data: Tools and Statistical Methods


18/05/2022 - 19/05/2022

Organised by:

Royal Statistical Society


Mark Briers


Intermediate (some prior knowledge)



View in Google Maps  (EC1Y 8LX)


12 Errol Street, London


Level: Intermediate (I)

The emergence of Big Data as a recognised and sought-after technological capability can be traced to the following factors: the general recognition that data is ubiquitous and is an asset from which organisations can derive business value; the efficient interconnectivity of sensors, devices, networks, services and consumers, which allows data to be transported with relative ease as well as the emergence of middleware processing platforms, such as Hadoop, InfoSphere Streams, Accumulo, Storm, Spark and Elastic Search, which empower developers to efficiently create distributed fault-tolerant applications that execute statistical analytics at scale.

In order to promote the use of advanced statistical methods within a Big Data environment -- an essential requirement if correct conclusions are to be reached -- statisticians and data scientists must use Big Data tools when supporting or performing data analysis.

The objective of this two day virtual course is to train statistically-minded practitioners in the use of common Big Data tools, with an emphasis on the use of advanced statistical methods for analysis. The course will focus on the application of statistical methods in the processing platforms Hadoop and Spark and will highlight how these can be used to analyse data at scale.

Learning Outcomes

Following this course the attendees will:

  • Gain an understanding of the Big Data platforms Hadoop and Spark
  • Develop hands-on experience of using these platforms to analyse data
  • Gain an understanding of the classes of statistical methods used on these platforms                          

Topics Covered

  • The Big Data landscape
  • Hadoop
  • Map Reduce
  • Python and Hadoop
  • An introduction to functional programming and Spark
  • Statistical operations in Spark
  • Anomaly detection in network data

Target Audience

Statisticians and data scientists wishing to use emerging computing platforms (Hadoop and Spark) to perform statistical inference across large datasets.

Knowledge Assumed

Familiarity with at least one of the programming languages mentioned.

Delegates are expected to bring a laptop with the latest version of PuTTY installed.


£599.76 - £832.32 (including VAT)

Website and registration:


Greater London


Quantitative Data Handling and Data Analysis, Big Data , Hadoop , Spark , Map Reduce

Related publications and presentations:

Quantitative Data Handling and Data Analysis

Back to archive...