Scala

Date:

16/04/2017 - 18/04/2017

Organised by:

Jumping Rivers ltd

Presenter:

Prof Darren Wilkinson & Dr Jamie Owen

Level:

Intermediate (some prior knowledge)

Contact:

Esther Gillespie, esther@jumpingrivers.com, 07740285328

Map:

View in Google Maps  (WC1E 7HU)

Venue:

University of London, Senate House, University of London, Malet St, London WC1E 7HU

Description:

LONDON at London University Senate House

This course is aimed at statisticians and data scientists already familiar with a dynamic programming language (such as R, Python or Octave) who would like to learn how to use Scala. Scala is a free modern, powerful, strongly-typed, functional programming language, well-suited to statistical computing and data science applications. In particular, it is fast and efficient, runs on the Java virtual machine (JVM), and is designed to easily exploit modern multi-core and distributed computing architectures.

The course will begin with an introduction to the Scala language and basic concepts of functional programming (FP), as well as essential Scala tools such as SBT for managing builds and library dependencies. The course will continue with an overview of the Scala collections library, including parallel collections, and we will see how parallel collections enable trivial parallelisation of many statistical computing algorithms on multi-core hardware. We will next survey the wider Scala library ecosystem, paying particular attention to Breeze, the Scala library for scientific computing and numerical linear algebra. We will see how to exploit non-uniform random number generation and matrix computations in Breeze for statistical applications. Both maximum-likelihood and simulation-based Bayesian statistical inference algorithms will be considered. Much of the final day will be dedicated to understanding Apache Spark, the distributed Big Data analytics platform for Scala. We will understand how Spark relates to the parallel collections we have already examined, and see how it can be used not only for the processing of very large data sets, but also for the parallel and distributed analysis of large or otherwise computationally-intensive models. As time permits, we will discuss more advanced FP concepts, such as typeclasses, higher-kinded types, monoids, functors, monads, applicatives, streams and streaming data, and see how these enable the development of flexible, scalable, generic code in strongly-typed functional languages.

PREREQUISITES

The course assumes a basic familiarity with essential concepts in statistical computing, as well as some basic programming experience. It is assumed that participants will be familiar with writing their own functions in a language such as R, including essential control structures such as 'for-loops' and 'if-statements'. The course is not suitable for people completely new to programming. However, no prior knowledge of Scala or functional programming is assumed. All participants will be expected to bring their own (multi-core) laptop and to have a recent version of Java pre-installed. Other set-up instructions will be provided in advance to registered participants.

COURSE STRUCTURE

The course will be delivered through a combination of lectures, live demos and hands-on practical sessions. For the practical sessions, participants will be expected to actively engage with the material, run demos, follow examples, and write code to solve simple problems.

PRESENTERS

The course will be delivered by Prof Darren Wilkinson (Newcastle University, U.K.). Prof Wilkinson is co-Director of Newcastle's EPSRC Centre for Doctoral Training in Cloud Computing for Big Data. He is a well-known expert in computational Bayesian statistics and a leading proponent of the use of strongly-typed FP languages (such as Scala) for scalable statistical computing.“,

Cost:

£1500 + Vat (25% discount to academics & charity)

Website and registration:

Region:

Greater London

Keywords:

Secondary Analysis, Digital Social Research, Mixed Methods, Data Collection (other), Qualitative Data Handling and Data Analysis, Quantitative Data Handling and Data Analysis, Mixed Methods Data Handling and Data Analysis, ICT and Software, Python, R, Data Visualisation, Creating graphs and charts, Interactive data visualisation, Workshops, Training research methods teachers

Related publications and presentations:

Secondary Analysis
Digital Social Research
Mixed Methods
Data Collection (other)
Qualitative Data Handling and Data Analysis
Quantitative Data Handling and Data Analysis
Mixed Methods Data Handling and Data Analysis
ICT and Software
Python
R
Data Visualisation
Creating graphs and charts
Interactive data visualisation
Workshops
Training research methods teachers

Back to archive...