SOCGEN: combining social science and molecular genetic research to examine inequality and the life course

Within the last decade there has been an explosion in the amount of data that includes both social science and molecular genetic information. The UK is a frontrunner in this type of data with large samples such as the UK Biobank, Understanding Society and many other longitudinal data sources (e.g., ALSPAC, 1958 Birth Cohort, ELSA). Although expensive data infrastructures for large biosocial data are available, they remain underutilized and yet to be exploited by social scientists. Knowledge from social scientists about how to properly use this data to answer social science research questions and statistical tools to accommodate social science problems remains underdeveloped. Yet this new data will allow social scientists to examine fundamentally new research questions and has the potential for substantive breakthroughs. For the first time in history social scientists can uncover whether there is a genetic and biological component to many of the behaviours that have until now only largely been attributed to social factors. Increasing studies demonstrate that there is a genetic component to core social science topics such as educational level (Rietveld et al. 2013), fertility (Mills & Tropf 2015; Tropf et al. 2015) and wellbeing (Rietveld et al. 2013). There are, however, many pitfalls to conducting this type of research including lack of accessible learning and teaching material aimed at social scientists and appropriate statistical models or robustness checks of existing models.

A primary objective of this project is to bring together substantive social science researchers in the field of inequality and the life course with expertise in statistics, biodemography, and quantitative molecular genetics to develop innovative learning resources, statistical models and packages to address the specific shortcomings in this substantive area of research. Developing accessible teaching resources and tailored statistical models and packages will allow UK social scientists to become trendsetting pioneers in answering new biosocial research questions. This will allow us to convey how insights from molecular genetic data and research can be integrated into life course (and social science) research.

The key research questions to be answered in this project are:

  • To what extent can genetic data be informative about an individual’s life course behaviour?
  • Which statistical methods can be developed to examine the smaller effects that need to be detected in Gene X Environment (GxE) analyses, where the socio-environment interacts with or moderate genetic effects?
  • Which statistical tools and packages can be developed to deal with central analytical problems faced by life course researchers engaging in sociogenomic analyses? How can we introduce Bayesian models that accommodate longitudinal covariates and measurement error in both covariates and their outcomes? How can cope with multiple correlated covariates?
  • Can recent models from molecular genetics (GCTA - genome-wide complex trait analysis) be validated and adapted to deal with substantive life course research?
  • How can biological, genetic and medical research benefit from insights from the life course and social science research?


Paper published in Nature Genetics: Twelve DNA areas linked with the age at which we have our first child and family size



Melinda Mills (PI, Department of Sociology & Nuffield College, University of Oxford)

David Steinsaltz (Department of Statistics, University of Oxford)

Nicola Barban (Department of Sociology & Nuffield College, University of Oxford)

Felix Tropf (Department of Sociology & Nuffield College, University of Oxford)