Latent Variable Models for Social Research

Presenter(s): Chris Playford

In social science research, we commonly use multivariate statistical models to explore the association between an outcome measure of interest and a set of explanatory variables. The outcome measure we model is assumed to be a reasonably good indicator of a concept of interest. The concept we are interested in measuring is often not directly observable. What if there are several potential outcome measures and we might want to use several of these outcomes in our models? Latent variable models are a set of methods that have been developed to explore the measurement of concepts.

This online resource introduces the rationale for latent variable models, explains what a latent variable and indicates which latent variable models you might wish to consider. The resource also includes an overview of estimating latent variable models in Stata and a worked example showing output from different types of latent variable model.

Background and rationale

When analysing data, our research questions tend to be conceptual. For example, what is the influence of social class on educational attainment? Our analysis then is based on our choice of measures or indicators of these concepts. Regarding the research question above, there are numerous ways of measuring educational attainment (for an overview, see Connelly et al., 2016). Researchers then have to decide on which educational attainment measure to use in their models.

Most statistical models within the Generalised Linear Modelling framework have a single outcome measure but multiple explanatory variables. Both the outcome and the explanatory variables are measures of underlying concepts. We might consider sensitivity analysis to check how robust our findings are when we use different outcome measures.

What if we have multiple outcome measures that may each individually be imperfect indicators of our concept of interest? What do we do then?

What are latent variables?

Latent variable techniques help us to understand patterns of response and association across multiple observed indicator variables to develop a measurement model. Indicator variables are directly observed. These are the variables you will see in your dataset. In the sometimes-confusing terminology of different branches of statistics, indicator variables are also known as manifest variables or items. In contrast, latent variables are not directly observed but are summary measures constructed based on response patterns to a chosen set of manifest variables.

Using these techniques, we can then fit a measurement model to estimate a latent variable (or variables). The latent variable(s) summarise these patterns of response. The benefits of this approach are that these patterns of response are often obscured in conventional analysis. Latent variables models can also be considered as a form of data reduction technique which summarises complex patterns of response as a new simpler variable (or variables).

Why are there different types of latent variable model?

There are two reasons:

• It depends on how the manifest variables are measured
• It depends on how the latent variable(s) is conceptualised

The former is easy to establish by describing the manifest variables you want to include in your model. The latter is up to you. Do you think the latent variable is best represented by a continuous variable or a categorical variable?

This summarised in the table below:

Source: Classification of latent variable models (Bartholomew et al., 2008, p. 178)

For example, if the manifest variables are categorical and the researcher wishes to treat the latent variable as a series of groups or categories, then the appropriate method is a latent class analysis. In contrast, if the manifest variables are continuous and the researcher wishes to treat the latent variable as categorical, then latent profile analysis is suitable.

Estimating latent variable models in Stata

Each of the types of latent variable model above can be estimated in Stata. For some types of latent variable model there are model-specific commands (e.g. factor command for factor analysis). In contrast, latent class models are estimated within the Generalised Structural Equation Modelling (gsem) suite of commands. These commands are very flexible and allow the researcher to specify complex models which can include a range of different types of manifest and latent variables. This is because latent variable models can be understood as being located within the broader family of Generalised Linear Latent and Mixed Models (GLMMs).

A further advantage of estimating latent variable models using Stata’s gsem commands is that it is possible to simultaneously estimate latent variable or measurement models alongside structural models which predict membership of latent classes or position on a latent variable scale.

To help illustrate this, a demonstration of different types of latent variable model is included in the workbook below. The workbook explores what happens when we estimate different types of latent variable model using data on educational attainment. The workbook takes the form of a Jupyter notebook which includes commentary, Stata code and Stata output.

> Download a worked example (output from different types of latent variable model). The ZIP folder contains an example (HTML file) as well as images (png). In order to view the images, extract the ZIP files, and save the images in the same location as the HTML file.

Last updated on 16 May 2024.