Multiple Linear Regression

Presenter(s): Mark Tranmer, Jen Murphy, Mark Elliot, Maria Pampaka

Regression is a statistical modelling technique used for analysing and predicting the relationship between variables. Multiple Linear Regression (MLR) is used to understand the relationship between one dependent variable and two or more independent variables. By modelling the linear relationship between the variables, MLR allows for the prediction of outcomes and provides insights into the significance of the predictors.

This tutorial will explain the application of MLR, drawing on examples from social sciences to demonstrate its utility in real-world scenarios. A downloadable worksheet as well as workbook are also included.

Foundations and Applications of Multiple Linear Regression

Multiple Linear Regression (MLR) evolves from simple linear regression, marking a significant advancement in statistical analysis. It introduces the ability to examine the influence of multiple independent variables (determining outcome complexity) on a single dependent variable, broadening the scope of research and analysis across various fields. MLR is a critical tool for predictive modelling and data interpretation.

Download transcript | Download slides

Simple Linear Regression

A simple linear regression estimates the relationship between a response variable, and a single explanatory variable, given a set of data that includes observations for both of these variables for a particular sample. For example, we may ask: how can exam performance at age 16, the response variable, be predicted from exam results at age 11, the explanatory variable? So, we are interested in the relationship between age 11 and age 16 scores. We hypothesise that the age 16 score can be predicted from the age 11 score; that there is an association between the two.

Multiple Linear Regression

Multiple linear regression extends simple linear regression to include more than one explanatory variable. In both cases, we still use the term ‘linear’ because we assume that the response variable is directly related to a linear combination of the explanatory variables.

MLR can be applied to a variety of problems, such as predicting an individual’s income given several socio-economic characteristics, or estimating systolic or diastolic blood pressure, given a variety of socio-economic and behavioural characteristics (occupation, drinking smoking, age, etc.).

Assumptions when using Linear Regression

When we use linear regression to build a model, we assume that:

The response variable is continuous and the explanatory variables are either continuous or binary
The relationship between outcome and explanatory variables is linear
The residuals are homoscedastic
The residuals are normally distributed
There is no more than limited multicollinearity
There are no external variables. variables are not included in the model that have strong relationships with the response variable (after controlling for the variables that are in the model)
Independent errors
Independent observations.

For most of these assumptions, if they are violated then it does not necessarily mean we cannot use a linear regression method, simply that we may need to acknowledge some limitations, adapt the interpretation or transform the data to make it more suitable for modelling.

> Download a worksheet (with answers).

> Download handbook: Multiple Linear Regression (2nd Edition). The handbook expands on simple and multiple linear regression, hypothesis testing, using SPSS, assumptions, and more complex models.

Steps for MLR Analysis Using SPSS:

SPSS is a statistical software used for analysing data in social sciences. It enables researchers to perform multiple linear regression (MLR) and other analyses.

The steps below provide a structured approach to conducting MLR in SPSS, from data preparation to assumption testing and interpretation of results.

Prepare Data: ensure your dataset is correctly formatted, with each row representing a case and columns representing variables.
Variable Selection: identify your dependent variable and multiple independent variables. Use theoretical knowledge and exploratory data analysis to select relevant predictors.
Check Assumptions: before running the MLR, assess the data for multicollinearity using correlation matrices or Variance Inflation Factor (VIF), ensure linearity between independent variables and the dependent variable, and check residuals for normality.
Run the Regression: Navigate to Analyze > Regression > Linear, in SPSS. Assign your dependent variable and independent variables to the respective fields. For multicollinearity diagnostics, include tolerance and VIF in the output.
Interpret Results: examine the coefficients to understand the impact of each independent variable on the dependent variable. Evaluate the R-squared value to assess the model's overall explanatory power. Check the significance levels (p-values) to determine which variables significantly contribute to the model.
Assumption Validation Post-hoc: After running the MLR model, validate the assumptions by analysing the residuals for homoscedasticity and normality.

Terms and definitions

categorical	A variable where the responses are categories. For example, ethnicity.
collinear	When one variable can be used to predict another. When two variables are closely linearly associated.
continuous	A continuous variable is a variable which takes a numeric form and can take any value. For example, distance in miles to the nearest shop.
Cook's distance	A measure of whether or not observations are outliers. The threshold for further consideration is three times the mean of the Cook's distance. The creator of the measure defines any point as having a Cook's distance of 1 to be of concern. Used to assess whether or not an observation within a dataset should be removed to improve the fit of the model.
correlated	Two continuous variables are said to be correlated if a change in one variable results in a measurable change in the other. The correlation coefficient is a measure of the strength of this association or relationship.
response variable	The outcome we want to predict. The value of this variable is predicted to be dependent on the other terms in the model. Sometimes referred to as the dependent variable or the Y variable.
error term	See residual
explanatory variable	The variables which we use to predict the outcome variable. These variables are also referred to as independent or the X variable(s).
homoscedastic	One of the key assumptions for a linear regression model. If residuals are homoscedastic, they have constant variance regardless of any explanatory variables.
linear regression	A method where a line of best fit is estimated by minimising the sum of the square of the differences between the actual and predicted observations.
multicolinearity	When two or more variables are closely linearly associated or can be used to predict each other.
multiple linear regression	Linear regression with more than one explanatory variable.
ordinalordinal variable	A variable where the responses are categories, which can be put in an order. For example, the highest level of education achieved by a respondent. Remember that the possible responses may not be evenly spaced.
Pearson's coefficient	a measure of correlation.
population	The whole group we are interested in.
positive correlation	A situation where if one variable increases in value, another variable also tends to increase in value.
R²	A measure of model fit. The percentage of variance explained by the model.
representative	When a sample is representative, it has the same statistical properties as the population as a whole. This means that when we get results of a statistical analysis of the sample, we can infer that the same results are true for the population. To be representative a sample needs to be of sufficient size and the correct composition to reflect the means of groups within the underlying population.
residual	The difference between the predicted value from the model, and the actual value of the observation. When texts refer to the residuals, it means the data that is generated if we calculate the residual for every observation in the dataset.
sample	The sub section of the population which we are studying. A smaller number of units, drawn from the population. For example we might be interested in menu choices in a school canteen. Our population of interest is everyone in the school. We could then take a survey of 5 students from each year group. This would be our sample.
simple linear regression	Linear regression with one explanatory variable
skew	Measures the symmetry of a distribution. A symmetrical distribution has a skew of 0. Positive skew means more of the values are at the lower end of the distribution, negative skew means that more of the values are at the higher end of the distribution.
statistically significant	When a result is statistically significant, we mean that it meets our criteria for the hypothesis test. Statistically significant is not the same as "important" or "interesting" and has a specific technical meaning.

Our thanks to Dr. Gil Dekel for his support in preparing this tutorial.

Supporting materials

About the author

Mark Eliot is Professor of Data Science at the University of Manchester. He specialises in Confidentiality and Privacy and Social Data Science.

Mark Tranmer is Professor of Quantitative Social Science at the University of Glasgow. Research interests include Multilevel Modelling and Social Network Analysis, with applications in social science, public health and animal behaviour.

Maria Pampaka is a Professor at the Social Statistics department and the Manchester Institute of Education, at the University of Manchester. Her expertise and interests lie within evaluation and measurement, systematic reviews and meta-analyses, and advanced quantitative methods, including complex survey design, longitudinal data analysis, and dealing with missing data. She usually applies these in educational research around learners’ (maths) attitudes and dispositions and teaching practices.

Primary author profile page

Published on: 3 January 2025
Event hosted by: Manchester
Keywords: Multiple Linear Regression | Statistical Analysis | Variable Selection | Model Assumptions | Predictive Modelling | Data Interpretation | Critical Theory |
To cite this resource:
Mark Tranmer, Jen Murphy, Mark Elliot, Maria Pampaka. (2025). Multiple Linear Regression. National Centre for Research Methods online learning resource. Available at https://www.ncrm.ac.uk/resources/online/all/?id=20848 [accessed: 17 August 2025]

⌃
BACK TO TOP