Can Machines Really Understand Data? — Using Machine Learning and AI methods to Analyse Survey and International Large-Scale Assessment Data

Date:

03/03/2026

Organised by:

NCRM, University of Southampton

Presenter:

Dr Somnath Chaudhuri

Level:

Entry (no or almost no prior knowledge)

Contact:

Jacqui Thorp
Training and Capacity Building Coordinator, National Centre for Research Methods, University of Southampton
Email: jmh6@soton.ac.uk

Location:

View in Google Maps  (SO17 1BJ)

Venue:

Building 54, Room 4001, University of Southampton, Highfield, Hants

Description:

This course is one of a series of four. You may register for any number of sessions individually. If you choose to register for all four, a discount will be applied. Further information about the series can be found at the end of this listing.

Session Two - Can Machines Really Understand Data? — Using Machine Learning and AI methods to Analyse Survey and International Large-Scale Assessment Data

“Isn’t machine learning just about feeding data into an algorithm and letting it tell you which variables matter most?” That’s perhaps one of the most common perceptions about machine learning, and the one most worth re-examining.

International Large-Scale Assessment (ILSA) data—with its abundance of high-dimensional variables, multilevel structure, and complex nonlinear relationships—offers an ideal testing ground for exploring this claim.

This session introduces two key machine-learning methods: the widely used Random Forest and the fast-rising CatBoost. Through real examples based on ILSA data (e.g., TIMSS, PIRLS and TIMSS), we will explore how machine learning can help researchers move beyond the limits of traditional regression approaches, revealing its strengths in flexible nonlinear modelling and feature recognition within social-science data.

In the first part of the session, participants will develop a conceptual understanding of how these algorithms work and why they perform well in complex social datasets. We then move to a hands-on component, using ILSA data to experience the full process from model building to interpretation—discovering how machine learning can be applied meaningfully in real research contexts. Participants will learn to apply explainable machine learning techniques—such as Partial Dependence Plots (PDPs) and SHapley Additive exPlanations (SHAP)—to understand how models assess variable importance and to revisit the opening question: Does the machine really understand the data?

By the end of this session, participants will be able to:

  • Explain how ensemble learning methods like Random Forest and CatBoost model nonlinear and high-dimensional data.
  • Apply basic machine-learning workflows to ILSA-type data using R or Python.
  • Interpret model outputs using explainable ML tools (PDPs and SHAP).
  • Critically reflect on how algorithms “understand” data—and how researchers should interpret their results responsibly.

This course is aimed at researchers, graduate students, data analysts, and education professionals interested in applying traditional, machine learning and AI-supported methods to the analysis of international large-scale assessment data. 

Pre-requisites

This session is divided into two parts:

Morning lecture (2 hours): No prior technical background is required. The session introduces core ideas conceptually, making it accessible to participants from educational and social science backgrounds.

Afternoon workshop (3 hours): Basic familiarity with statistical syntax or programming is recommended to follow the hands-on exercises effectively. Participants with limited experience are still welcome to observe and learn from live demonstrations.

IMPORTANT: Please note that this course includes computer workshops. Before registering, please check that you will be able to access the software noted below. Please bear in mind minimum system requirements to run software and administration restrictions imposed by your institution or employer with may block the installation of software.

Software: R and possibly Python.

Format: Demonstration-based workshop with guided hands-on examples using ILSA data.

Delivery

This course is being delivered in a hybrid format on Tuesday 3rd March from 10:00-16:00 (10:00-12:00 lecture, 13:00-16:00 workshop):

In person - Room 54/4001 (limited capacity, offered on first-come first-served basis) or Online.

Series details:
Session One – £25 - https://www.ncrm.ac.uk/training/show.php?article=14610
Session Three – £50 - https://www.ncrm.ac.uk/training/show.php?article=14612
Session Four – £25 - https://www.ncrm.ac.uk/training/show.php?article=14613
Special offer: Register for all four sessions for £120

 

Cost:

The fee for this session is:

• £50 per person for all participants.

In the event of cancellation by the delegate a full refund of the course fee is available up to two weeks prior to the course. NO refunds are available after this date.

If it is no longer possible to run a course due to circumstances beyond its control, NCRM reserves the right to cancel the course at its sole discretion at any time prior to the event. In this event every effort will be made to reschedule the course. If this is not possible or the new date is inconvenient a full refund of the course fee will be given. NCRM shall not be liable for any costs, losses or expenses that may be incurred as a result of its cancellation of a course, including but not limited to any travel or accommodation costs.

The University of Southampton’s Online Store T&Cs also continue to apply.

Website and registration:

Register for this course

Region:

South East

Keywords:

Quantitative Data Handling and Data Analysis, Mixed Methods Data Handling and Data Analysis, AI and machine learning, International large-scale assessment data., Modelling paradigms comparison, Artificial intelligence methods, Machine Learning, Random Forest, CatBoost, Data Mining, Cross-Sectional Research, Secondary Analysis, Digital Social Research


Related publications and presentations from our eprints archive:

Quantitative Data Handling and Data Analysis
Mixed Methods Data Handling and Data Analysis
AI and machine learning

Back to the training database