The term AI on a microchip

Masterclasses and Spring School: Large-scale Data Analysis – Traditional Statistical Modelling, Machine Learning and AI-supported Methods

This series of short courses will explore the process of analysing complex large-scale survey data using traditional statistical methods and AI-supported approaches. It will be delivered in a hybrid format – offering participants the chance to join online or in person – between 2 and 5 March 2026.

Participants can choose between four courses on a variety of topics. A discount is available for participants attending all four courses.


Summary

In the era of big data and rapidly advancing AI technologies, the real challenge is no longer whether data exist, but how to extract credible, interpretable, and policy-relevant evidence from massive and complex datasets. This four-course series uses International Large-Scale Assessment (ILSA) data as its central example to guide participants – from beginners to those with prior analytical experience – through the process of analysing complex large-scale survey data in the social sciences.

Confronting the challenges inherent in complex secondary datasets – such as multilevel structures, cross-national sampling, complex weighting schemes and latent constructs – the series moves from foundational concepts to advanced applications. It covers traditional statistical methods (such as multilevel models, structural equation modelling, causal inference models), machine learning techniques, and emerging uses of generative AI in data analysis. Participants will develop an understanding of the differences, strengths, limitations, and complementarities of these approaches within the context of secondary data analysis.


Courses

1) Introduction to Machine Learning and AI Methods to Analyse Large-scale Data: An Example Using International Large-Scale Assessment Data

Monday, 2 March 2026

The series begins by using ILSA (International Large-Scale Assessment) datasets as examples to introduce the internal structures, typical characteristics, and methodological challenges of complex social-science secondary data. It also compares theory-driven and data-driven analytical approaches to help researchers understand how to select appropriate models based on their research questions and data features.

The course will help participants to understand and identify the internal structures of ILSA data, understand the relative merits of theory-driven vs data-driven approaches, and reflect critically on methodological choices.

Read more and register


2) Can Machines Really Understand Data? Using Machine Learning and AI Methods to Analyse Survey and International Large-Scale Assessment Data

Tuesday, 3 March 2026

This course moves into hands-on modelling, introducing Random Forest and CatBoost to show how these algorithms detect nonlinear patterns and interactions, handle high-dimensional data, improve prediction, and address limitations of traditional regression.

Read more and register


3) How Can Generative AI (LLMs) Help with Analysing Survey and International Large-Scale Assessment Data?

Wednesday, 4 March 2026

Focusing on LLMs, this session will demonstrate their potential in code generation, model reasoning, result interpretation, and academic writing, while also highlighting their unique advantages in enhancing research efficiency and interpretability. It will additionally discuss how to use AI tools safely and responsibly within the analytical workflow.

Read more and register


4) Which Analytical Choices Are Better? Comparing Traditional Statistical Methods with Machine Learning and AI-supported Methods

Thursday, 5 March 2026

The final course synthesises the frameworks and tools covered in the series and examines how analytical choices shape empirical results. Through Multiverse Analysis and Specification Curves, it provides a systematic way to identify which findings remain stable and reproducible across alternative modelling decisions.

Read more and register


Cost and registration

  • Session one: £25
  • Session two: £50
  • Session three: £50
  • Session four: £25

Special offer: a discount is available for participants attending all four courses. The discounted price is £120

To sign up for the courses, click on one the links listed above and then click the "Register for this course" button. On the payment page, you will be able to select the courses that you wish to attend.


Overview

This four-course series offers a clear framework for understanding traditional statistical modelling, machine learning and generative AI methods in complex data analysis. It also provides participants with a complete pathway — from understanding data structures and selecting methods to building models and strengthening robustness. By the end of the series, researchers will be better equipped to analyse complex secondary data confidently and to produce more reliable, interpretable, and methodologically sound findings in social science research.


Target audience

The four courses are for junior and senior researchers and professionals, from any sector and career stage, with an interest in learning about analysis methods of large-scale survey data and finding out about AI-based and machine learning methods and comparison to traditional statistical methods. The courses are also suitable for education professionals interested in quantitative reasoning, data-driven inquiry and/or evidence-based research.


Pre-requisites

No prior experience with statistical software or large-scale assessment data is required.


Venue

The courses will be delivered in a hybrid format. Participants can join online or in person at the University of Southampton, Southampton, SO17 1BJ, United Kingdom. The courses will take place in Building 54, Room 4001.


Further information

If you have any questions, please contact Jacqui Thorp: J.M.High@soton.ac.uk