Introduction to Machine Learning for Causal Analysis using Observational Data



Organised by:

NCRM, University of Southampton and MiSoC, University of Essex


Professor Paul Clarke, Dr Spyros Samothrakis and Damian Machlanski


Intermediate (some prior knowledge)


Jacqui Thorp
Training and Capacity Building Coordinator, National Centre for Research Methods, University of Southampton


View in Google Maps  (CO4 3SQ)


Room TBC - University of Essex, Wivenhoe Park, Colchester


This course is aimed at all quantitative researchers, academic and non-academic, with experience/knowledge of performing causal analysis with data from observational studies and of some of the challenges (e.g. adjusting for confounding bias/selection on observables, non-random selection, endogenous regressors).  It should be suitable for junior researchers or senior researchers who wish to get a hands-on introduction to this topic.

Some prior knowledge of programming would be desirable but not essential.  Experience with some statistical package should be sufficient to understand and run the exercises. Some familiarity with high-level programming concepts. Ideally, if you want to participate in the practical element of the course, have a python interpreter installed in your computer.

This workshop will

  1. Introduce the basic principles of causal modelling (potential outcomes, graphs, causal effects) and emphasise the key role of design and assumptions in obtaining robust estimates.
  2. Introduce the basic principles of machine learning and the use of machine learning methods to do causal inference (e.g. methods stemming from domain adaptation and propensity scores).
  3. Show how to implement these techniques for causal analysis and interpret the results in illustrative examples.

The course covers:

  • Fundamentals of causal analysis
  • Basic machine learning techniques
  • Running simple causal analysis using machine learning on real data sets

By the end of the course participants will understand

  •  The distinction between associations and causal effects and the key role played by study design and untestable assumptions in causal analysis
  •  How the training and testing steps in machine learning work and play a similar role to significance testing in traditional statistics
  •  The basics of Python and how to set up, run and interpret the output from causal learning algorithms

This course is suitable for all researchers and analysts interested in the measurement of socio-economic inequality in health and health care, including (but not limited to): Academics, Government and Third-Sector Researchers. It will be assumed that all participants have some experience of analysing observational data (e.g. from surveys) using statistical regression models.

The course will be conducted using Python in the user-friendly Google Colab environment (participants will be given details of how to register and use the platform)



The fee per teaching day is: • £35 per day for students registered at University. • £75 per day for staff at academic institutions, Research Councils researchers, public sector staff and staff at registered charity organisations and recognised research institutions. • £250 per day for all other participants In the event of cancellation by the delegate a full refund of the course fee is available up to two weeks prior to the course. NO refunds are available after this date. If it is no longer possible to run a course due to circumstances beyond its control, NCRM reserves the right to cancel the course at its sole discretion at any time prior to the event. In this event every effort will be made to reschedule the course. If this is not possible or the new date is inconvenient a full refund of the course fee will be given. NCRM shall not be liable for any costs, losses or expenses that may be incurred as a result of its cancellation of a course, including but not limited to any travel or accommodation costs. The University of Southampton’s Online Store T&Cs also continue to apply.

Website and registration:


South East


Regression Methods, Quantitative Approaches (other), Quantitative Software, Data Science, Machine Learning, Causal Analysis, Nonparametric Approaches: classification trees, Regression Methods: regression trees, Quantitative Approaches (other): propensity score matching

Related publications and presentations:

Regression Methods
Quantitative Approaches (other)
Quantitative Software

Back to archive...