Tackling Selection Bias in Sentencing Data Analysis:
A New Approach Based on Mixture Models, Expert Elicitation Techniques, and Bayesian Statistics


For reasons of methodological convenience statistical models analysing judicial decisions tend to focus on the duration of custodial sentences. These types of sentences are however quite rare (8% of the total in England and Wales), which generates a problem of selection bias, and raises questions about the external validity of much of the literature on key Criminological and Legal topics (e.g. discrimination, deterrence, court cultures).

While this problem has been acknowledged for more than four decades no adequate solutions are presently available to sentencing data researchers. Some have relied on left-censored Tobit models to specify the duration of custodial sentences while simultaneously incorporating non-custodial outcomes as if they were somehow equivalent to negative days in prison. Distributions of custodial sentences, however, do not really resemble a type of left-censored distribution, which violates the parametric assumptions of this approach. Another group of researchers has relied on two-stage Heckman adjustments, implicitly assuming that the sentencing process can be divided in two stages, a decision to imprison, followed by a choice of the duration. Once again this is not a realistic assumption, which can be demonstrated by the impossibility to find a valid auxiliary variable that could be affecting the probability of determining a custodial sentence, while simultaneously being unrelated to the duration of such sentence.

This project will develop an original approach based on finite mixture modelling, Bayesian statistics, aggregated views from judges, and the new sentencing guidelines, capable of modelling simultaneously custodial and non-custodial outcomes. Specifically different distributions of the relative severity of four major sentence outcomes (fines, community orders, suspended sentences, and custodial sentences) will be specified into the same mixture model. This solution will not only eliminate the problem of selection bias; by making use of the information available on non-custodial outcomes (i.e. duration of suspended sentences, fine amounts, etc.) it will also be more efficient than any of the alternative approaches used in the literature.



Jose Pina-Sánchez (University of Leeds)
Sara Geneletti (London School of Economics)
John Paul Gosling (University of Leeds)