Introduction to Machine Learning and AI Methods to analyse large-scale data: An example using International Large-Scale Assessment Data
Date:
02/03/2026
Organised by:
NCRM, University of Southampton
Presenter:
Dr Christian Bokhove
Level:
Entry (no or almost no prior knowledge)
Contact:
Jacqui Thorp
Training and Capacity Building Coordinator, National Centre for Research Methods, University of Southampton
Email: jmh6@soton.ac.uk
Location:
View in Google Maps (SO17 1BJ)
Venue:
Building 54, Room 4001,
University of Southampton, Highfield Campus, Southampton
Description:
This course is one of a series of four. You may register for any number of sessions individually. If you choose to register for all four, a discount will be applied. Further information about the series can be found at the end of this listing.
Session One – Introduction to Machine Learning and AI Methods to analyse large-scale data: An example using International Large-Scale Assessment Data
When faced with massive and complex secondary data in the social sciences, have you ever wondered: How should these data be analysed? Which model fits which type of research question? And how should the results be interpreted?
As a good example of such complex data, we will explore International Large-Scale Assessment (ILSA) data. ILSA data often combine multilevel structures, cross-national sampling, complex weights, missing data and multiple imputation, latent variable modelling, time dimensions, measurement errors, and cultural differences etc.
Using the flagship ILSA datasets (e.g., TIMSS, PIRLS, PISA) as examples, this course will help you:
- Understand and identify the internal structures of ILSA data (e.g., multilevel hierarchies, cross-national sampling, complex weighting, missing data and multiple imputation, etc), and understand the modelling challenges hidden beneath these features;
- Understand the relative merits of theory-driven vs data-driven approaches. We will navigate theory-driven reasoning (traditional statistical models, e.g. multilevel regression, causal models) to data-driven algorithmic exploration (machine learning, e.g. Random Forest, CatBoost), recognising the strengths, limitations, and appropriate use of each;
- Reflect critically on methodological choices, assessing how they shape the credibility of findings and the interpretation of evidence in policy and research.
By the end of this session, participants will be able to:
- Understand the key structural characteristics and modelling challenges of ILSA data.
- Differentiate between theory-driven and data-driven analytical paradigms, recognising their conceptual foundations and methodological boundaries.
- Compare the strengths and limitations of traditional statistical models and machine learning approaches.
- Reflect on how methodological choices influence interpretation and evidence use in educational policy and research.
This course is aimed at researchers, graduate students, data analysts, and education professionals. No prior experience with statistical software or large-scale assessment data is required. The course is open to all participants with an interest in applying traditional, machine learning and AI-supported methods to the analysis of international large-scale assessment data.
Delivery
This course is being delivered in a hybrid format on Monday 2nd March from 13:00-15:00:
In person - Room 54/4001 (limited capacity, offered on first-come first-served basis) or Online.
Series details:
Session Two – £50 - https://www.ncrm.ac.uk/training/show.php?article=14611
Session Three – £50 - https://www.ncrm.ac.uk/training/show.php?article=14612
Session Four – £25 - https://www.ncrm.ac.uk/training/show.php?article=14613
Special offer: Register for all four sessions for £120
Cost:
The fee for this session is:
• £25 per person for all participants.
In the event of cancellation by the delegate a full refund of the course fee is available up to two weeks prior to the course. NO refunds are available after this date.
If it is no longer possible to run a course due to circumstances beyond its control, NCRM reserves the right to cancel the course at its sole discretion at any time prior to the event. In this event every effort will be made to reschedule the course. If this is not possible or the new date is inconvenient a full refund of the course fee will be given. NCRM shall not be liable for any costs, losses or expenses that may be incurred as a result of its cancellation of a course, including but not limited to any travel or accommodation costs.
The University of Southampton’s Online Store T&Cs also continue to apply.
Website and registration:
Region:
South East
Keywords:
Quantitative Data Handling and Data Analysis, Mixed Methods Data Handling and Data Analysis, AI and machine learning, International large-scale assessment data., Modelling paradigms comparison, Artificial intelligence methods, Methodology Comparison, Machine learning, Cross-Sectional Research
Related publications and presentations from our eprints archive:
Quantitative Data Handling and Data Analysis
Mixed Methods Data Handling and Data Analysis
AI and machine learning
